Chapter9LanguageandComputer2.ppt

上传人:豆**** 文档编号:26049023 上传时间:2022-07-15 格式:PPT 页数:32 大小:103KB
返回 下载 相关 举报
Chapter9LanguageandComputer2.ppt_第1页
第1页 / 共32页
Chapter9LanguageandComputer2.ppt_第2页
第2页 / 共32页
点击查看更多>>
资源描述

《Chapter9LanguageandComputer2.ppt》由会员分享,可在线阅读,更多相关《Chapter9LanguageandComputer2.ppt(32页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、9.2.3 MT qualityIt has to be admitted that there are still faults in all present actual translations produced. One can still find the same errors that no human translators would ever commit, such as, wrong pronouns, wrong prepositions, garbled syntax, incorrect choice of terms, plurals instead of si

2、ngulars, wrong tenses, etc.Secondly, translation is not an operation that preserves meaning. Of course, this will not pose a problem if systems remain research prototypes and poor quality has little public impact. But when it comes to commercial systems the whole MT industry will suffer from the poo

3、r quality translation.9.2.4 MT and the InternetThe impact of the Internet has been significant in recent years. One hears very often that the 21st century is the Internet ear. Naturally, we are already seeing an accelerating growth of real-time on-line translation on the Internet itself. For instanc

4、e, in recent years, we have seen many systems designed specifically for the translation of Web pages and of electronic mail. It is all agreed that the Internet is having further profound impacts that will surely change the future prospects for MT. One of the predictions is that the stand-alone PC wi

5、th its array of software for word-processing, databases, games, etc. will be replaced by NETWIRJ CINOYTERS which would download systems and programs from the Internet as and when required.Another profound impact of the Internet will concern the nature of the software itself. So it is probable that i

6、n future years there will be fewer “pure” MT systems but may more computer-based tools and applications where automatic translation is just one component. 9.2.5 Spoken language translationThe most widely anticipated development in the new century must be that of speech translation. When research pro

7、jects were begun in the late 1980s and early 1990s, it was known that practical applications were unlikely. It was assumed that once basic principles and methods had been successfully demonstrated on small-scale research systems it would be merely a question of finance and engineering to create larg

8、e practical systems. As a matter of fact, large-scale MT systems have to be designed as such from the beginning, and that requires many man-years of effort. It is more likely that there will be numerous applications of spoken language translation as components of small-domain natural language applic

9、ations, e.g. interrogation of databases (particularly financial and stockmarket data), interactions in business negotiations, intra-company communication, etc.9.2.6 MT and human translationAt the beginning of the new century, it is already apparent that MT and human translation can and will co-exist

10、 in relative harmony. Those skills which the human translator can contribute will always be in demand.When translation has to be of “publishable” quality, both human translation and MT have their roles. For the translation of texts where the quality of output is much less important, machine translat

11、ion is often an ideal solution.For the one-to-one interchange of information, there will probably always be a role for the human translator, e.g. for the translation of business correspondence (particularly if the content is sensitive or legally binding). But for the translation of personal letters,

12、 MT systems are likely to be increasingly used; and, for electronic mail and for the extraction of information from Web pages and computer-based information services, MT is the only feasible solution.As for spoken translation, there must surely always be a market for the human translator.9.3.1 Defin

13、itionThere are various definitions concerning “corpus” and “corpus linguistics”. The following are two representative ones which appeared in the same year:Corpus, plural corpora A collection of linguistic data, either compiled as written texts or as a transcription of recorded speech. Corpus linguis

14、tics deals with the principles and practice of using corpora in language study.CORPUS (1) A collection of texts, especially if complete and self-command; the corpus of Anglo-Saxon verse. (2) Plural also corpuses. Corpus linguistics studies data in any such corpus.9.3.2 Criticisms and the revival of

15、corpus linguisticsIn spite of its contribution to the development of American structuralism in linguistics, it was Chomsky who changed the direction of linguistics away from empiricism and towards rationalism in a remarkably short space of time.Chomsky suggested that the corpus could never be a usef

16、ul tool for the linguist, as the linguist must seek to model language competence rather than performance. For another thing, the only way to account for a grammar of a language is by description of its rules-not by enumeration of its sentences. Thirdly, even if language is a finite construct, would

17、corpus methodology still be the best method of studying language?Although Chomskys criticisms did discredit corpus linguistics, they did not stop all corpus-based work.9.3.3 ConcordanceIt was the wonder of computer that heralded the revival of corpus linguistics. The computer has the ability to sear

18、ch for a particular word, sequence of words, or perhaps even a part of speech in a text. The computer can also retrieve all examples of a particular word, usually in context, which is a further aid to the linguist. It can also calculate the number of occurrences of the word so that information on th

19、e frequency of the word may be gathered. We may then be interested in sorting the data in some way-for example, alphabetically on words occurring in the immediate context of the word. This is usually referred to as a CONCORDANCE. 9.3.4 Text encoding and annotationIf corpora is said to be unannotated

20、 -it appears in its existing raw state of plain text, whereas annotated corpora has been enhanced with various type of linguistic information. Unsurprisingly, the utility of the corpus is increased when it has been annotated, making it no longer a body of text where linguistic information is implici

21、tly present, but one which may be considered a repository of linguistic information. The implicit information has been made explicit through the process of concrete annotation.9.3.5 The roles of corpus dataThe importance of corpora to language study is aligned to the importance of empirical data. Em

22、pirical data enable the linguist to make objective statements, rather than those which are subjective, or based upon the individuals own internalized cognitive perception of language. Starting from this point, we will find corpora can play important roles in a number of different fields of study rel

23、ated to language, such as, speech research, lexical studies, grammar, semantics, pragmatics, discourse analysis, sociolinguistics, stylistics, historical linguistics, dialectology, variation studies, psycholinguistics, social psychology, cultural studies, etc. 9.4 Information retrieval9.4.1 Scope de

24、finedA perfectly straightforward definition is given by Lancaster (1986): “Information retrieval is the term conventionally, though somewhat inaccurately, applied to the type of activity discussed in this volume. An information retrieval system does not inform (i.e. change the knowledge of) the user

25、 on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request.” This excludes Questioning-Answering systems. It also excludes data retrieval systems such as used by, say, the stock exchange for on-line quotations. To make c

26、lear the difference between data retrieval (DR) and information retrieval (IR), some of the distinguishing properties of data and information retrieval are listed in Table 9-1.Table 9-1 Data retrieval vs information retrieval Data Retrieval(DR)Information Retrieval(IR)Matching Exact matchPartial mat

27、ch, best matchInference DeductionInductionModelDeterministicProbabilisticClassification MonotheticPolytheticQuery languageArtificialNaturalQuery specification completeIncompleteItems wantedMatchingRelevantError responseSensitiveInsensitive9.4.2 An information retrieval systemA typical IR system can

28、be illustrated by the following diagram, which shows three components: input, processor and output. Starting with the input, the main problem here is to obtain a representation of each document and query suitable for a computer to use. It is to be pointed out that most computer-based retrieval syste

29、ms store only a representation of the document (or query) which means that the text of a document is lost once it has been processed for the purpose of generating its representation. A document representative could be a list of extracted words considered to be significant. Rather than have the compu

30、ter process the natural language, an alternative approach is to have an artificial language within which all queries and documents can be formulated. FeedbackQueries ProcessorInput Output DocumentsWhen the retrieval system is on-line, it is possible for the user to change his request during one sear

31、ch session in the light of a sample retrieval, thereby, improving the subsequent retrieval run. Secondly, the processor, that part of the retrieval system concerned with the retrieval process. The process may involve structuring the information in some appropriate way, such as classifying it. It wil

32、l also involve performing the actual retrieval function, that is, executing the search strategy in response to a query. In the diagram, the documents have been placed in a separate box to emphasize the fact that they are not just input but can be used during the retrieval process in such a way that

33、their structure is more correctly seen as part of the retrieval process. Finally, we come to the output, which is usually a set of citations or document numbers. In an operational system the story ends here. 9.4.3 Three main areas of researchThere are many ways to subdivide information retrieval, bu

34、t three main areas of research which between them make up a considerable portion of the subject. The three areas are content analysis, information structures, and evaluation.Content analysis Content analysis is concerned with describing the contents of documents in a form suitable for computer proce

35、ssing. The approach pioneered by Luhn (1957) is typical by the use of frequency counts of words in the document text to determine which words were sufficiently significant to represent or characterize the document in the computer. Thus a list of what might be called KEYWORDS (or TERMS) was prepared

36、for each document. In addition the frequency of occurrence of these words in the body of the text could also be used to indicate a degree of significance. (2) Information structure Information structure is concerned with exploiting relationships, between documents to improve the efficiency and effec

37、tiveness of retrieval strategies. It covers specifically a logical organization of information, such as document representatives, for the purpose of information retrieval. The development in information structures has been fairly recent. The main reason for the slowness of development in this area o

38、f information retrieval is that for a long time no one realized that computers would not give an accurate retrieval time with a large document set unless some logical structure was imposed on it.(3) Evaluation Evaluation is concerned with the measurement of the effectiveness of retrieval. Evaluation

39、 of retrieval systems has proved extremely difficult. In the past there has been much debate about the validity of evaluation based on relevance judgements provided by erring human beings. A dichotomous scale on which a document is either relevant or non-relevant, when subjected to a certain probabi

40、lity of error, did not invalidate the results obtained for evaluation in terms of PRECISION (the proportion of retrieval documents which are relevant ) and RECALI. (the proportion of relevant documents retrieved). Today effectiveness of retrieval is still mostly measured in terms of precision and re

41、call or by measures based thereon. 9.5 Mail and newsOnce you enter Netscape or Internet Explorer, there are mainly two choices for your browsing, search/navigation or messenger mailbox, the former concerning information retrieval discussed in the previous section, the latter dealing with mail/news a

42、cceptance and delivery. In the past, if we wrote to a friend or relative, the letter will take 2 or 3 clays to reach the destination, and maybe one or two weeks to someone in another country, to say nothing of the cost of postage. Long distance phone call might save time, but it is charged minute by

43、 minute, and you may make several phone calls if you want to pass on the same message to different people. By means of e-mail, one can send the same mail to a number of correspondents or send files and graphs by way of attachment.Apart from this, the messenger mailbox, with the help of listserv or m

44、ajodomo, can also help the user with academic activities. What the user has to do is to subscribe to an electronic forum or society or journal.When his subscription is confirmed, he may receive information about the call of papers of conferences, the publication of new books or journals, and even jo

45、b opportunities; he may raise queries for help or take participant in discussions about academic matters and read a review when the discussion is over. (Hu, 1997). Take The Linguist for example, the present writer received the following messages from the mailbox in the morning of April 19, 2001.Seri

46、es No. Subject Time received12. 1085, FYI: Summer School 01-4-19 4:41 12.1084, Review: Corrections 01-4-19 3:54 12.1083,Review: Verbal Complexes 01-4-18 23:5112. 1082, Confs: Modality in. 01-4-18 3: 3012. 1081, Qs: English speakers. 01-4-18 23:24 12. 1080, Qs: DESS ou un DEA 01-4-18 22:1612. 1079 Bo

47、oks: Syntax/Semantics 01-4-18 21:43 Here, FYI stands for for your information, so the item No. 12. 1085 is a poster about a summer school of linguistics. Items 121084 and 121083 are two review articles after several weeks discussions through the channel of mailbox,one concerning how to deal with WWW

48、 corrections, the other concerning the discussion about verbal complexesItem 121082 informs people of an international conference on modality. Then we have members asking questions in items 121081 and 121080 and seeking answers or helps from Other membersThe 1ast item concerns the 1aunching of new b

49、ooks about syntax and semanticsWhen people have been discussing academic matters by means of email or WWW keyboard chat,one wil1 notice such kind of communication has produced immense washback effects, such as users tend to use less punctuation; use alphabets to stand for words, eg “u for “yon”,“4”f

50、or“for”,“r”for“are”, “brbfor“be right back”,etcSuch changes can also be found in structures in terms of increase in short sentences and informal express, and the avoidance of direct addresses, etc(Zhang Delu, 1998; Dong Qimin and Liu Yumei,2001)。Questions and Exercises1. Define the following terms:c

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育专区 > 教案示例

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁