语音识别系统中英文对照外文翻译文献.docx

上传人:暗伤 文档编号:24286211 上传时间:2022-07-04 格式:DOCX 页数:14 大小:53KB
返回 下载 相关 举报
语音识别系统中英文对照外文翻译文献.docx_第1页
第1页 / 共14页
语音识别系统中英文对照外文翻译文献.docx_第2页
第2页 / 共14页
点击查看更多>>
资源描述

《语音识别系统中英文对照外文翻译文献.docx》由会员分享,可在线阅读,更多相关《语音识别系统中英文对照外文翻译文献.docx(14页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、中英文资料对照外文翻译Speech Recognition1Defining the ProblemSpeech recognitionis the process of convertingan acousticsignal,capturedby a microphone or a telephone, to a set of words. The recognized words can be the final res for applications such as commands & control, data entry, and document preparation. Th

2、ey also serve as the input to further linguistic processing in order to achieve speech und subject covered in section.Speech recognitiosnystemscan be characterizebdy many parameters,some of themore important of which are shown in Figure. An isolated-word speech recognition system requ that the speak

3、er pause briefly between words, whereas a continuous speech recognition s does not.Spontaneous,or extemporaneouslygenerated,speech containsdisfluencieasn,d ismuch more difficultto recognizethan speechreadfrom scriptS.ome systemsrequirespeaker enrollment-a user must provide samples of his or her spee

4、ch before using them, wherea systems are said to be speaker-independent, in that no enrollment is necessary. Some of parametersdepend on the specifictask.Recognition is generallymore difficulwthen vocabularieasre largeor have many similar-soundinwgords. When speech isproduced in a sequence of words,

5、 language models or artificial grammars are used to restrict the comb of words.1The simplestlanguage model can be specifiedas a finite-stanteetwork, where the permissiblewords followingeach word are given explicitlMyo.re generallanguagemodels approximating natural language are specified in terms of

6、a context-sensitive grammar.One popular measure of the difficulty of the task, combining the vocabulary size an language model, ipserplexity, loosely defaisntehde geometric mean otfhe number of wordsthat can follow a word after the language model has been applied (see section for a dis languagemodel

7、ing in generaland perplexityin particularF)i.nally,thereare some external parameters that can affect speech recognition system performance, including the charact of the environmental noise and the type and the placement of the microphone.Parameters Speaking Mode Speaking Style Enrollment Vocabulary

8、Language Model PerplexitySNRTransducerRangeIsolated words to continuous speech Read speech to spontaneous speechSpeaker-dependent to Speaker-independent Small(20,000 words) Finite-state to context-sensitive Small(100)High (30 dB) to law (10dB)Voice-cancelling microphone to telephoneTable:Typical par

9、ameters used to characterize the capability of speech recognition syst Speech recognition is a difficult problem, largely because of the many sources ofassociated with the signal. First, the acoustic realizations of phonemes, the smallest of which words are composed, are highly dependent on the cont

10、ext in which they appear. phoneticvariabilitiaerse exemplifiedby the acousticdifferenceosf the phoneme ,At word boundaries, contextual variations can be quite dramaticg-a-s-msahkoirntgage sound ligkaesh shortage in American English, and devo andare sound like devandare in Italian.Second, acoustic va

11、riabilities can result from changes in the environment as well a position and characteristics of the transducer. Third, within-speaker variabilities can changesin the speakers physaincdalemotional state, speaking ravtoei,ceorqualitFyi.nally, differences in sociolinguistic background, dialect, and vo

12、cal tract size and shape canto across-speaker variabilities.Figure shows the major components of a typical speech recognition system. The digit speech signal is first transformed into a set of useful measurements or features at a f typicalloynce every 10-20 msec (seesectionsand11.3 forsignalrepresen

13、tatioannd digitalsignal processing, respectively). These measurements are then used to search for the mo word candidate, making use of constraints imposed by the acoustic, lexical, and languag Throughout this process, training data are used to determine the values of the model paFigure: Components o

14、f a typical speech recognition system.Speech recognition systems attempt to model the sources of variability described ab severalways. At the level of sirgenparlesentation, researchers have developed representation that emphasize perceptuallyimportant speaker-independentfeaturesof the signal,andde-e

15、mphasize speaker-dependentcharacteristicAst. the acousticphonetic level,speakervariabilitiys typicallmyodeled usingstatistictaelchniquesappliedto largeamounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent models to thoseof the currentspeakerduringsyst

16、em use,(seesection)E.ffectsof linguistic contextat the acousticphoneticlevelare typicallhyandled by trainingseparatemodels for phonemes in different contexts; this is called context dependent acoustic modeling.Word levelvariabilitcyan be handledby allowingalternatperonunciationosf words in represent

17、ations known as pronunciation networks. Common alternate pronunciations of wor as well as effectsof dialectand accentare handled by allowingsearchalgorithmsto find alternatpeaths of phonemes throughthesenetworks.Statisticlaalnguage models, based onestimates of the frequency of occurrence of word seq

18、uences, are often used to guide the through the most probable sequence of words.The dominantrecognition paradigm tihne pastfifteeynears isknown as hidden Markov models (HMM). An HMM is a doubly stochastimcodel, in which the generationof the underlyingphoneme stringand the frame-by-frame,surfaceacous

19、ticrealizationasre both representedprobabilisticaalslyMarkov processes,as discussedin sections,an1d1.2.Neuralnetworks have also been used to estimate the frame based scores; these scores are then intoHMM-basedsystem architectureisn,what has come to be known as hybridsystems,as described in section 1

20、1.5.An interesting feature of frame-based HMM systems is that speech segments are ident during the search process, rather than explicitly. An alternate approach is to first id segments,then classifythe segments and use the segment scoresto recognizewords. This approach has produced competitive recog

21、nition performance in several tasks.2State of the ArtComments about the state-of-the-art need to be made in the context of specific appl which reflectthe constraintosn the task.Moreover, differentechnologiesare sometimes appropriate for different tasks. For example, when the vocabulary is small, the

22、 entire be modeled as a single unit. Such an approach is not practical for large vocabularies, models must be built up from subword units.Performance of speech recognition systems is typically described in terms of word e E , defined as:where N is the total number of words in the tesSt,Is,eta,ndDand

23、are the total number of substitutions, insertions, and deletions, respectively.The past decade has witnessed significant progress in speech recognition technology error rates continue to drop by a factor of 2 every two years. Substantial progress has in the basic technology, leading to the lowering

24、of barriers to speaker independence, speech,and largevocabulariesT.here are severalfactorsthathave contributedto thisrapid progress.Firstt,hereis the coming of age of the HMM.HMMispowerfulin that,with theavailabiliotfy trainindgata,the parametersof themodel can be trainedautomaticalltyo give optimal

25、 performance.Second, much efforthas gone intothe developmentof largespeech corporaforsystem development,traininga,nd testingS.ome of thesecorporaare designedforacousticphonetic research, while others are highly task specific. Nowadays, it is not uncommon to have t thousands of sentencesavailablefor

26、system trainingand testing.These corpora permit researchertso quantifythe acousticcues importantfor phoneticcontrastsand to determine parametersof therecognizerisn a statisticamlelayningfulway. While many of thesecorpora (e.g.,TIMIT, RM,ATIS, and WSJ; see section12.3) were originallcyollectedunder t

27、he sponsorshipof theU.S. Defense Advanced Research ProjecAtgsency (ARPA) to spur human language technology development amongits contractorst,hey have neverthelessgainedworld-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standa which to evaluate speech recognition.Third,

28、progress has been brought about by the establishment of standards for perfor evaluationO.nly a decade ago, researcherstrainedand testedtheirsystems using locally collected data, and had not been very careful in delineating training and testing sets. itwas very difficultto compare performanceacrosssy

29、stems,and a systemsperformance typically degraded when it was presented with previously unseen data. The recent availa a large body of data in the public domain, coupled with the specification of evaluation has resulted in unifdoorcmumentation of test results, thus contributing to greater reliabil m

30、onitoring progress (corpus development activitieasnd evaluationmethodologies are summarized in chapters 12 and 13 respectively).Finally, advances in computer technology have also indirectly influenced our progre availability of fast computers with inexpensive mass storage capabilities has enabled to

31、 run many large scale experiments in a short amount of time. This means that the elap between an idea and itsimplementationand evaluationis greatlyreduced.In fact,speech recognitionsystems with reasonableperformancecan now run in realtime using high-end workstations without additional hardware-a fea

32、t unimaginable only a few years ago.One of the most popular, and potentially most useful tasks with low perplexity (PP= the recognition of digits. For American English, speaker-independent recognition of dig spoken continuouslyand restrictetdo telephonebandwidth can achievean errorrateof 0.3%when th

33、e string length is known.One of the bestknown moderate-perplexitasksisthe 1,000-wordso-calledResource Management (RM) task, in which inquiries can be made concerning various naval vessels i Pacific ocean. The best speaker-independent performance on the RM task is less than 4%, a word-pair language m

34、odel that constrains the possible words following a given word (P More recently,researchershave begun to addressthe issueof recognizingspontaneously generatedspeech.For example,in the Air TravelInformationService(ATIS) domain, word errorratesof lessthan 3% has been reportedfor a vocabularyof nearly2

35、,000 words and abigram language model with a perplexity of around 15.High perplexity tasks with a vocabulary of thousands of words are intended primaril the dictation application. After working on isolated-word, speaker-dependent systems fo years, the community has since 1992 moved towards very-larg

36、e-vocabulary (20,000 words a more), high-perplexity(2P0P0), speaker-independent, continuous speech recognition. The be system in 1994 achievedan errorrateof 7.2% on readsentencesdrawn from North Americabusiness news.With the steady improvements in speech recognition performance, systems are now bei

37、deployed within telephone and cellular networks in many countries. Within the next few speech recognitionwill be pervasivein telephonenetworks around the world. There are tremendousforcesdrivingthe developmentof the technology;in many countriest,ouch tone penetratioins low, and voiceisthe only optio

38、nforcontrollinagutomated servicesI.n voice dialing, feoxrample, usercsan dial10-20 telephone numbers by voi(cee.g., call home) after having enrolled their voices by saying the words associated with telephone numbers. AT& the otherhand, has installead callroutingsystem using speaker-independenwtord-s

39、potting technology that can detect a few key phrases (e.g., person to person, calling card) in such as: I want to charge it to my calling card.At present,severalvery largevocabularydictatiosnystems are availablefor document generation.These systems generallyrequirespeakers to pause between words. Th

40、eir performance can be further enhanced if one can apply constraints of the specific domain dictating medical reports.Even though much progressis being made, machines are a long way from recognizing conversationaslpeech.Word recognitiornateson telephoneconversationisn the Switchboardcorpus are aroun

41、d 50%. It will be many years before unlimited vocabulary, speaker-indep continuous dictation capability is realized.3Future DirectionsIn 1992, the U.SN.ationalScience Foundation sponsored a workshop to identify the key research challenges in the area of human language technology, and the infrastruct

42、ure ne supportthe work. The key researchchallengesare summarized in.Researchin the followingareas for speech recognition were identified: Robustness:In a robustsystem, performancedegrades gracefully(ratherthan catastrophicallays) conditions become more different from those under which it was trained

43、. Differences in characteristics and acoustic environment should receive particular attention.Portability:Portabilitryefersto the goalof rapidlydesigning,developingand deployingsystemsfor new applicationAst. present,systemstend to suffersignificandtegradationwhen moved to anew task. In order to retu

44、rn to peak performance, they must be trained on examples spec the new task, which is time consuming and expensive.Adaptation:How can systemscontinuouslyadaptto changingconditions(new speakers,microphone, task,etc)and improve throughuse? Such adaptationcan occur at many levelsin systems, subword mode

45、ls, word pronunciations, language models, etc.Language Modeling:Current systems use statistical language models to help reduce the search space and acoustic ambiguity. As vocabulary size grows and other constraints are relaxed to creat habitable systems, it will be increasingly important to get as m

46、uch constraint as possi language models; perhaps incorporatinsgyntacticand semanticconstrainttshatcannot be captured by purely statistical models.Confidence Measures:Most speech recognitionsystems assignscoresto hypothesesfor the purpose of rank ordering them. These scores do not provide a good indi

47、cation of whether a hypothesis isor not, just that it is better than the other hypotheses. As we move to tasks that requ we need better methods to evaluate the absolute correctness of hypotheses.Out-of-Vocabulary Words:Systems are designed for use with a particular set of words, but system users may

48、 n exactlywhich words are in the system vocabulary.This leadsto a certainpercentageofout-of-vocabularwyords in naturalconditionsS.ystems must have some method of detectingsuch out-of-vocabulary words, or they will end up mapping a word from the vocabulary on unknown word, causing an error.Spontaneou

49、s Speech:Systems thatare deployedforrealuse must dealwith a varietyof spontaneousspeech phenomena, such as filled pauses, false starts, hesitations, ungrammatical construction commonbehaviorsnot found in read speech.Development on the ATIS taskhas resultedin progress in this area, but much work remains to be done.Pr

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 技术资料 > 技术方案

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁