Matlab在语音识别中的应用(共50页).doc-淘文阁

资源描述

《Matlab在语音识别中的应用(共50页).doc》由会员分享，可在线阅读，更多相关《Matlab在语音识别中的应用(共50页).doc（50页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、精选优质文档-倾情为你奉上1. 基于GUI的音频采集处理系统注：本实验是对“东、北、大、学、中、荷、学、院”孤立文字的识别！首先是GUI的建立，拖动所需控件，双击控件，修改控件的参数；主要有string Tag(这个是回调函数的依据)，其中还有些参数如value style 也是需要注意的，这个在实际操作中不能忽视。这里需要给说明一下：图中所示按钮都是在一个按钮组里面，都属于按钮组的子控件。所以在添加回调函数时，是在按钮组里面添加的，也就是说右击三个按钮外面的边框，选择View CallbackSelectionChange,则在主函数中显示该按钮的回调函数：function uipanel1

2、_SelectionChangeFcn(hObject, eventdata, handles)以第一个按钮“录音”为例讲解代码；下面是“播放”和“保存”的代码：以上就是语音采集的全部代码。程序运行后就会出现这样的界面：点击录音按钮，录音结束后就会出现相应波形：点击保存，完成声音的保存，保存格式为.wav。这就完成了声音的采集。2. 声音的处理与识别2.1 打开文件语音处理首先要先打开一个后缀为.wav的文件，这里用到的不是按钮组，而是独立的按钮，按钮“打开”的回调函数如下：function pushbutton1_Callback(hObject, eventdata, handles)其中

3、pushbutton1是“打开”按钮的Tag.在回调函数下添加如下代码：运行结果如图：2.2 预处理回调函数如下：function pushbutton2_Callback(hObject, eventdata, handles)运行结果如图：2.3 短时能量短时能量下的回调函数：function pushbutton3_Callback(hObject, eventdata, handles)其回调函数下的代码是：2.4 端点检测这里要先声明一点，为了避免在以后的函数调用中，不能使用前面的变量，所以其实后面的函数都包含了前面的部分。显而易见这样程序就会显得很冗长，这也是值得以后修改的地方。f

4、unction pushbutton4_Callback(hObject, eventdata, handles)2.5 生成模版本功能和上面重复的部分省略掉了，现在只补充添加的代码：2.6 语音识别将打开的语音与提前录好的语音库进行识别，采用的是DTW算法。识别完后就会在相应的文本框里显示识别的文字。代码如下：程序运行前后的对比图：GUI的整体效果图：总结实验已经实现了对“东、北、大、学、中、荷、学、院”文字的识别，前提是用模版的语音作为样本去和语音库测试，这已经可以保证的正确率，这说明算法是正确的，只是需要优化。而现场录音和模版匹配时，则不能保证较高的正确率，这说明特征参数的提取这方面还不

5、够完善。特征参数提取的原则是类内距离尽量小，类间距离尽量大的原则，这是需要以后完善的地方。也需要优化，先生成一个模版库，然后用待测语音和模版库语音识别，让这个模版库孤立出来，不需要每次测试都要重复生成模版库，提高运算速率。以后有机会可以实现连续语音的识别！附件这是全部代码文件mfcc.mat 文件是程序运行过程中生成的；test 文件夹里面存放了录音的模版：这里是6个.M文件，如下：1 WienerScalart96.mfunction output=WienerScalart96(signal,fs,IS) % output=WIENERSCALART96(signal,fs,IS)% Wi

6、ener filter based on tracking a priori SNR usingDecision-Directed % method, proposed by Scalart et al 96. In this method it is assumed that% SNRpost=SNRprior +1. based on this the Wiener Filter can be adapted to a% model like Ephraims model in which we have a gain function which is a% function of a

7、priori SNR and a priori SNR is being tracked using Decision% Directed method. % Author: Esfandiar Zavarehei% Created: MAR-05 if (nargin=3 & isstruct(IS)%This option is for compatibility with another programmeW=IS.windowsizeSP=IS.shiftsize/W;%nfft=IS.nfft;wnd=IS.window;if isfield(IS,IS)IS=IS.IS;elseI

8、S=.25;endend% .UP TO HERE pre_emph=0;signal=filter(1 -pre_emph,1,signal); NIS=fix(IS*fs-W)/(SP*W) +1);%number of initial silence segments y=segment(signal,W,SP,wnd); % This function chops the signal into framesY=fft(y);YPhase=angle(Y(1:fix(end/2)+1,:); %Noisy Speech PhaseY=abs(Y(1:fix(end/2)+1,:);%S

9、pecrogramnumberOfFrames=size(Y,2);FreqResol=size(Y,1); N=mean(Y(:,1:NIS); %initial Noise Power Spectrum meanLambdaD=mean(Y(:,1:NIS).2);%initial Noise Power Spectrum variancealpha=.99; %used in smoothing xi (For Deciesion Directed method for estimation of A Priori SNR)NoiseCounter=0;NoiseLength=9;%Th

10、is is a smoothing factor for the noise updatingG=ones(size(N);%Initial Gain used in calculation of the new xiGamma=G; X=zeros(size(Y); % Initialize X (memory allocation) h=waitbar(0,Wait.); for i=1:numberOfFrames%VAD and Noise Estimation STARTif i=NIS % If initial silence ignore VADSpeechFlag=0;Nois

11、eCounter=100;else % Else Do VADNoiseFlag, SpeechFlag, NoiseCounter, Dist=vad(Y(:,i),N,NoiseCounter); %Magnitude Spectrum Distance VADend if SpeechFlag=0 % If not Speech Update Noise ParametersN=(NoiseLength*N+Y(:,i)/(NoiseLength+1); %Update and smooth noise meanLambdaD=(NoiseLength*LambdaD+(Y(:,i).2

12、)./(1+NoiseLength); %Update and smooth noise varianceend%VAD and Noise Estimation END gammaNew=(Y(:,i).2)./LambdaD; %A postiriori SNRxi=alpha*(G.2).*Gamma+(1-alpha).*max(gammaNew-1,0); %Decision Directed Method for A Priori SNRGamma=gammaNew; G=(xi./(xi+1); X(:,i)=G.*Y(:,i); %Obtain the new Cleaned

13、value waitbar(i/numberOfFrames,h,num2str(fix(100*i/numberOfFrames);end close(h);output=OverlapAdd2(X,YPhase,W,SP*W); %Overlap-add Synthesis of speechoutput=filter(1,1 -pre_emph,output); %Undo the effect of Pre-emphasis function ReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen); %Y=Over

14、lapAdd(X,A,W,S);%Y is the signal reconstructed signal from its spectrogram. X is a matrix%with each column being the fft of a segment of signal. A is the phase%angle of the spectrum which should have the same dimension as X. if it is%not given the phase angle of X is used which in the case of real v

15、alues is%zero (assuming that its the magnitude). W is the window length of time%domain segments if not given the length is assumed to be twice as long as%fft window length. S is the shift length of the segmentation process ( for%example in the case of non overlapping signals it is equal to W and in

16、the%case of %50 overlap is equal to W/2. if not givven W/2 is used. Y is the%reconstructed time domain signal.%Sep-04%Esfandiar Zavarehei if nargin2yphase=angle(XNEW);endif nargin3windowLen=size(XNEW,1)*2;endif nargin4ShiftLen=windowLen/2;endif fix(ShiftLen)=ShiftLenShiftLen=fix(ShiftLen);disp(The s

17、hift length have to be an integer as it is the number of samples.)disp(shift length is fixed to num2str(ShiftLen)end FreqRes FrameNum=size(XNEW); Spec=XNEW.*exp(j*yphase); if mod(windowLen,2) %if FreqResol is oddSpec=Spec;flipud(conj(Spec(2:end,:);elseSpec=Spec;flipud(conj(Spec(2:end-1,:);endsig=zer

18、os(FrameNum-1)*ShiftLen+windowLen,1);weight=sig;for i=1:FrameNumstart=(i-1)*ShiftLen+1; spec=Spec(:,i);sig(start:start+windowLen-1)=sig(start:start+windowLen-1)+real(ifft(spec,windowLen); endReconstructedSignal=sig; function Seg=segment(signal,W,SP,Window) % SEGMENT chops a signal to overlapping win

19、dowed segments% A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented% and windowed frames of the input one dimentional signal, X. W is the% number of samples per window, default value W=256. SP is the shift% percentage, default value SP=0.4. WIN is the window that is multiplied by

20、% each segment and its length should be W. the default window is hamming% window.% 06-Sep-04% Esfandiar Zavarehei if nargin3SP=.4;endif nargin2W=256;endif nargin4Window=hamming(W);endWindow=Window(:); %make it a column vector L=length(signal);SP=fix(W.*SP);N=fix(L-W)/SP +1); %number of segments Inde

21、x=(repmat(1:W,N,1)+repmat(0:(N-1)*SP,1,W);hw=repmat(Window,1,N);Seg=signal(Index).*hw; function NoiseFlag, SpeechFlag, NoiseCounter, Dist=vad(signal,noise,NoiseCounter,NoiseMargin,Hangover) %NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST=vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER)%Spectral Distan

22、ce Voice Activity Detector%SIGNAL is the the current frames magnitude spectrum which is to labeld as%noise or speech, NOISE is noise magnitude spectrum template (estimation),%NOISECOUNTER is the number of imediate previous noise frames, NOISEMARGIN%(default 3)is the spectral distance threshold. HANG

23、OVER ( default 8 )is%the number of noise segments after which the SPEECHFLAG is reset (goes to%zero). NOISEFLAG is set to one if the the segment is labeld as noise%NOISECOUNTER returns the number of previous noise segments, this value is%reset (to zero) whenever a speech segment is detected. DIST is

24、 the%spectral distance. %Saeed Vaseghi%edited by Esfandiar Zavarehei%Sep-04 if nargin4NoiseMargin=3;endif nargin5Hangover=8;endif nargin3NoiseCounter=0;end FreqResol=length(signal); SpectralDist= 20*(log10(signal)-log10(noise);SpectralDist(find(SpectralDist0)=0; Dist=mean(SpectralDist); if (Dist Han

25、gover) SpeechFlag=0; else SpeechFlag=1; end2 mfcc.mfunction cc=mfcc(k)%-% cc=mfcc(k)计算语音k的MFCC系数%-% M为滤波器个数，N为一帧语音采样点数M=24; N=256;% 归一化mel滤波器组系数bank=melbankm(M,N,22050,0,0.5,m);figure;plot(linspace(0,N/2,129),bank);title(Mel-Spaced Filterbank);xlabel(Frequency Hz);bank=full(bank);bank=bank/max(bank(

26、:); % DCT系数,12*24for i=1:12 j=0:23; dctcoef(i,:)=cos(2*j+1)*i*pi/(2*24);end% 归一化倒谱提升窗口w=1+6*sin(pi*1:12./12);w=w/max(w);% 预加重AggrK=double(k);AggrK=filter(1,-0.9375,1,AggrK);% 分帧FrameK=enframe(AggrK,N,80);% 加窗for i=1:size(FrameK,1) FrameK(i,:)=(FrameK(i,:).*hamming(N);endFrameK=FrameK;% 计算功率谱S=(abs(f

27、ft(FrameK).2;disp(显示功率谱)figure; plot(S);axis(1,size(S,1),0,2);title(Power Spectrum (M=24, N=256);xlabel(Frame);ylabel(Frequency Hz);colorbar; % 将功率谱通过滤波器组P=bank*S(1:129,:);% 取对数后作离散余弦变换D=dctcoef*log(P);% 倒谱提升窗for i=1:size(D,2) m(i,:)=(D(:,i).*w);end% 差分系数dtm=zeros(size(m);for i=3:size(m,1)-2 dtm(i,:

28、)=-2*m(i-2,:)-m(i-1,:)+m(i+1,:)+2*m(i+2,:);enddtm=dtm/3;%合并mfcc参数和一阶差分mfcc参数cc=m,dtm;%去除首尾两帧，因为这两帧的一阶差分参数为0cc=cc(3:size(m,1)-2,:);3 getpoint.mfunction StartPoint,EndPoint=getpoint(k,fs)%UNTITLED 此处显示有关此函数的摘要% 此处显示详细说明 signal=WienerScalart96(k,fs);sigLength=length(signal);%计算信号长度t=(0:sigLength-1)/fs;

29、%计算信号对应时间坐标FrameLen = round(0.012/max(t)*sigLength);%定义每一帧长度FrameInc = round(FrameLen/3);%每一帧的重叠区域，选为帧长的1/31/2tmp=enframe(signal(1:end), FrameLen, FrameInc);signal=signal/max(abs(signal);signal=double(signal);signal=filter(1,-0.9735,1,signal);tmp1=enframe(signal(1:end-1), FrameLen, FrameInc);tmp2=en

30、frame(signal(2:end), FrameLen, FrameInc);%调用分帧函数Framesize=size(tmp1);window(1:Framesize(1),1:Framesize(2)=0;a=hamming(Framesize(2);%对原信号进行加窗操作，这里用hamming窗for i=1:Framesize(1) window(i,1:Framesize(2)=a;endtmp1=tmp1.*window;%获得加窗后信号tmp1、tmp2、tmptmp2=tmp2.*window;tmp=tmp.*window;signs = (tmp1.*tmp2)0.0

31、2;zcr = sum(signs.*diffs,2)/FrameLen;%zcr保存过零率结果FrameNB=Framesize(1);%保存数据帧个数clear tmp1 tmp2 signs diffs a window Framesize;%清除无用变量 %计算语音信号的短时幅度amp=sum(abs(tmp), 2);%开始进行端点检测%定义变量amp1 = 6;amp2 = 2;%最大与最小能量幅度阈值maxsilence=5; % 最大沉默帧数目5，长度5*12ms = 72msminlen =15;% 最小语音长度15*12ms = 180msstatus =0;%初始状态（静

32、音段：0，语音段：1，结束段：2，此算法忽略了过度段的判断）count=0;%记录语音长度 %求前5帧与后5帧的能量幅度平均值与过零率均值，认为前5帧与后5帧不为信号有效部分a=mean(amp(1:5)+mean(amp(FrameNB-4:FrameNB);b=mean(zcr(1:5)+mean(zcr(FrameNB-4:FrameNB);%对求得的过零率与能量幅度进行修正amp=abs(amp-a);zcr=abs(zcr-b);%设定阈值amp1=min(amp1,max(amp)/4);amp2=min(amp2,max(amp)/8);%设定两个能量门限，其中amp1为高能量门

33、限，amp2为低能量门限zcr1 =0.001;%过零律阈值 for i=6:maxsilence:FrameNB switch status case 0 %语音信号处于静音段 if (amp(i)amp1) %帧能量大于高能量门限时，确信进入语音段 x1=i; count=count+1; for j=i-1:-1:6 %进一步找到准确起始点 if(zcr(j)zcr1)&(amp(i)amp2) %用低能量门限和多零率阈值判断起始点 x1=j; count=count+1; else break end end status=1; end case 1 %语音信号处于语音段 if(zcr

34、(i)zcr1)&(amp(i)amp2) %向后搜索语音信号终止点 x2=i; count=count+5; else for j=i:-1:i-4 %进一步向前搜索，找到准确终止点 if(zcr(j)zcr1)&(amp(i)amp2) %用低能量门限和多零率阈值判断终止点 x2=j; count=count+1; else break end end if countminlen %语音信号长度小于最小语音长度时，认为信号为无效噪声，重新初始化变量搜索 status=0; count=0; x1=0; x2=0; else status=2; %语音信号有效时进入结束段 end end

35、case 2 break end end StartPoint=x1;EndPoint=x2; 4 dtw.mfunction dist = dtw(test, ref)global x y_min y_maxglobal t rglobal D dglobal m n t = test;r = ref;n = size(t,1);m = size(r,1); d = zeros(m,1);D = ones(m,1) * realmax;D(1) = 0; % 如果两个模板长度相差过多，匹配失败if (2*m-n3) | (2*n-mxa %xbxa, 按下面三个区域匹配 % 1 :xa %

36、xa+1:xb % xb+1:N for x = 1:xa y_max = 2*x; y_min = round(0.5*x); warp end for x = (xa+1):xb y_max = round(0.5*(x-n)+m); y_min = round(0.5*x); warp end for x = (xb+1):n y_max = round(0.5*(x-n)+m); y_min = round(2*(x-n)+m); warp endelseif xaxb %xaxb, 按下面三个区域匹配 % 0 :xb % xb+1:xa % xa+1:N for x = 1:xb y

37、_max = 2*x; y_min = round(0.5*x); warp end for x = (xb+1):xa y_max = 2*x; y_min = round(2*(x-n)+m); warp end for x = (xa+1):n y_max = round(0.5*(x-n)+m); y_min = round(2*(x-n)+m); warp endelseif xa=xb %xa=xb, 按下面两个区域匹配 % 0 :xa % xa+1:N for x = 1:xa y_max = 2*x; y_min = round(0.5*x); warp end for x =

38、 (xa+1):n y_max = round(0.5*(x-n)+m); y_min = round(2*(x-n)+m); warp endend %返回匹配分数dist = D(m); function warpglobal x y_min y_maxglobal t rglobal D dglobal m n d = D;for y = y_min:y_max D1 = D(y); if y1 D2 = D(y-1); else D2 = realmax; end if y2 D3 = D(y-2); else D3 = realmax; end d(y) = sum(t(x,:)-r

39、(y,:).2) + min(D1,D2,D3);end D = d; 5 recordfunction varargout = record(varargin)% RECORD MATLAB code for record.fig% RECORD, by itself, creates a new RECORD or raises the existing% singleton*.% H = RECORD returns the handle to a new RECORD or the handle to% the existing singleton*.% RECORD(CALLBACK

40、,hObject,eventData,handles,.) calls the local% function named CALLBACK in RECORD.M with the given input arguments.% RECORD(Property,Value,.) creates a new RECORD or raises the% existing singleton*. Starting from the left, property value pairs are% applied to the GUI before record_OpeningFcn gets called. An% unrecognized property name or invalid value makes property application% stop. All inputs are passed to record_OpeningFcn via varargin.%

展开阅读全文