计量经济学的各种检验.pptx-淘文阁

资源描述

《计量经济学的各种检验.pptx》由会员分享，可在线阅读，更多相关《计量经济学的各种检验.pptx（60页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、多重共线性 .Multicollinearity arises because we have put in too many variables that measure the same thing.As the degree of multicollinearity increases,the regression model estimates of the coefficients become unstable and the standard errors for the coefficients can get wildly inflated.Measure:vif,tol=1

2、/vif,condition index;etc.第1页/共60页多重共线性的后果1.存在完全多重共线性时,参数的估计值无法确定,而且估计值的方差变为无穷大.2.存在不完全多重共线性时,可以估计参数值,但是数值不稳定,而且方差很大.3.多重共线性会降低预测的精度,甚至失效,增大零假设接受的可能性(t值变小).第2页/共60页多重共线性的检测方法(1)样本可决系数法如果样本的可决系数R-square 比较大，且回归系数几乎没有统计上的显著性，则可认为存在多重共线性。Theil 提出了一个指标：多重共线性效应系数第3页/共60页Theil test resultsSas 结果：结果表明有多重共线性

3、。第4页/共60页多重共线性检测方法（2）辅助回归检验法若存在多重共线性，则至少有一个解释变量可精确或近似地表示为其余皆是变量的线性组合。相应的检验统计量为：第5页/共60页辅助回归检验结果Sas 结果：Klein经验法则：若存在一个i,使得R(i)-squareR-square,则认为多重共线性严重；本例中x1,x3有多重共线性。第6页/共60页多重共线性检验方法（3）样本相关系数检验法第7页/共60页FG test resultsfg=20.488013401 p=0.0001344625；拒绝零假设，认为存在多重共线性。具体那些变量之间存在多重共线性，除了上面提到的辅助回归的方法外，还有

4、以下提到的条件数检验和方差膨胀因子法。第8页/共60页多重共线性检验方法：（4）特征值分析法所用的检验统计指标 ;为第k各自变量和其余自变量回归的可决系数.VIF10,有多重共线性;TOL=1/VIF;条件指数:条件数:;C20,共线性严重.第9页/共60页多重共线性的检验和补救例一:进口总额和三个自变量之间回归;Sas 结果如下:Pearson Correlation Coefficients,N=11 Prob|r|under H0:Rho=0 x1 x2 x3x1 1.00000 0.02585 0.997260.99726GDP 0.9399 .0001.0001x2 0.02585

5、1.00000 0.03567存蓄量 0.9399 0.9171x3 0.997260.99726 0.03567 1.00000总消费.0001|t|InflationIntercept1-10.127991.21216-8.36.00010 x11-0.051400.07028-0.730.4883185.99747x210.586950.094626.200.00041.01891x310.286850.102212.810.0263186.11002发现x1的系数为负,和现实经济意义不符,出现原因就是x1和x3之间的线性相关.第11页/共60页补救措施增加样本;岭回归或主分量回归;至少

6、去掉一个具有多重共线性的变量;对具有多重共线性的变量进行变换.对所有变量做滞后差分变换(一般是一阶差分),问题是损失观测值,可能有自相关.采用人均形式的变量（例如在生产函数估计中）在缺乏有效信息时,对系数关系进行限制,变为有约束回归(Klein,Goldberger,1955),可以降低样本方差和估计系数的标准差,但不一定是无偏的(除非这种限制是正确的).对具有多重共线性的变量,设法找出其因果关系,并建立模型和原方程构成联立方程组.第12页/共60页岭回归岭回归估计:K=0,b(k)=b即为OLSE;K的选取:即使b(k)的均方误差比b的均方误差小.第13页/共60页岭迹图第14页/共60页岭

7、回归结果Obs_MODEL_TYPE_DEPVAR_RIDGE_k_PCOMIT_ _RMSE_Interceptx1x2x3y1MODEL1PARMSy0.48887-10.1280-0.0510.586950.287-12MODEL1RIDGEVIFy0.00 方差膨胀因子185.997 1.01891 186.110 1 3MODEL1RIDGEy0.000.48887-10.1280-0.0510.586950.28714MODEL1RIDGEVIFy0.01 8.599 0.98192 8.604 -15MODEL1RIDGEy0.010.55323-9.18050.0460.598

8、860.14416MODEL1RIDGEVIFy0.02 2.858 0.96219 2.859 -17 MODEL1 RIDGE y 0.02 0.57016 -8.9277 0.057 0.59542 0.127 -18MODEL1RIDGEVIFy0.031.5020.943451.502-19MODEL1RIDGEy0.030.57959-8.73370.0610.590800.120-110MODEL1RIDGEVIFy0.040.9790.925320.979-111MODEL1RIDGEy0.040.58745-8.55830.0640.585910.116-1第15页/共60页

9、主分量回归主分量回归是将具有多重相关的变量集综合得出少数几个互不相关的主分量.两步:(1)找出自变量集的主分量,建立y与互不相关的前几个主分量的回归式.(2)将回归式还原为原自变量结果.详见,方开泰;第16页/共60页主分量回归结果Obs_MODEL_TYPE_DEPVAR_PCOMIT_RMSE_Interceptx1x2x3y1MODEL1PARMSy0.48887-10.1280-0.051400.586950.2868512MODEL1IPCVIFy10.25083 1.00085 0.25038 13MODEL1 IPC y 1 0.55001 -9.1301 0.07278 0.6

10、0922 0.10626 14MODEL1IPCVIFy20.249560.000950.24971-15MODEL1IPCy21.05206-7.74580.073810.082690.10735-1第17页/共60页主分量回归结果由输出结果看到在删去第三个主分量（pcomit=1)后的主分量回归方程：Y=-9.1301+0.07278x1+0.60922x2+0.10626x3;该方程的系数都有意义，且回归系数的方差膨胀因子均小于1.1；主分量回归方程的均方根误差（_RMSE=0.55)比普通OLS方程的均方根误差（_RMSE=0.48887)有所增大但不多。第18页/共60页Sas 程序

11、data ex01;input x1 x2 x3 y;label x1=国内生产总值;label x2=存储量;label x3=消费量;label y=进口总额;cards;149.3 4.2 108.1 15.9161.2 4.1 114.8 16.4171.5 3.1 123.2 19.0175.5 3.1 126.9 19.1180.8 1.1 132.1 18.8190.7 2.2 137.7 20.4202.1 2.1 146 22.7212.4 5.6 154.1 26.5226.1 5.0 162.3 28.1231.9 5.1 164.3 27.6 239.0 0.7 167

12、.6 26.3;run;proc corr data=ex01;var x1-x3;run;*岭回归*;proc reg data=ex01 outest=ex012 graphics outvif;model y=x1-x3/ridge=0.0 to 0.1 by 0.01;plot/ridgeplot;run;proc print data=ex012;run;*主分量回归法*;proc reg data=ex01 outest=ex103;model y=x1-x3/pcomit=1,2 outvif;*pcomit表示删去最后面的1或2个主分量,用前面m-1或 m-2各主分量进行回归*

13、;run;proc print data=ex103;run;第19页/共60页Sas 程序/*theil test*/;proc reg data=ex01;equation3:model y=x1 x2;equation2:model y=x1 x3;equation1:model y=x2 x3;run;/*r-.9473;r3s=0.9828*/;data theil;rsq=0.9919;r1s=0.9913;r2s=0.9473;r3s=0.9828;theil=rsq-(3*rsq-(r1s+r2s+r3s);put theil=;run;/*辅助回归检验法*/;proc reg

14、 data=ex01;equation3:model x3=x1 x2;equation2:model x2=x1 x3;equation1:model x1=x2 x3;run;/*FG test*/;proc corr data=ex01 outp=corr nosimple;var x1-x3;run;proc print data=corr;run;title 计算相关矩阵的行列式;proc iml;R=1.000 0.026 0.997,0.026 1 0.036,0.9152 0.6306 1;d=det(R);print d;run;/*d=0.081371*/;title 计算

15、检验统计量及其p值;data fg;n=11;p=3;d=0.081371;fg=-(n-1-1/6*(2*p+5)*log(d);df=p(p-1)/2;p=1-probchi(fg,df);put fg=p=;run;/*fg=20.488013401 p=0.0001344625,拒绝零假设*/;第20页/共60页异方差的检验和补救 OLSE unbiased,inefficient;t,F test invalid;forecast accuracy decreased.If the model is well-fitted,there should be no pattern to

16、the residuals plotted against the fitted values.If the variance of the residuals is non-constant,then the residual variance is said to be heteroscedastic.第21页/共60页异方差的检测There are graphical and non-graphical methods for detecting heteroscedasticity.A commonly used graphical method is to plot the resi

17、duals versus fitted(predicted)values.Example:grade:educated years;potexp:working years;exp2=potexp2;union:dummy variable.第22页/共60页收入方程回归的结果DependentVariable:LNWAGEAnalysisofVarianceSumofMeanSourceDFSquaresSquareFValuePrFModel412.422363.1055914.06|t|Intercept10.595110.283492.100.0384GRADE10.083540.02

18、0094.16FModel121.188810.099070.880.5731Error879.830780.11300CorrectedTotal9911.01958RootMSE0.33615R-Square 0.1079DependentMean0.20989AdjR-Sq-0.0152CoeffVar160.15281ParameterStandardVariableDFEstimateErrortValuePr|t|Intercept1-0.077670.98580-0.080.9374GRADE1-0.012200.12502-0.100.9225POTEXP10.077840.0

19、71881.080.2819EXP21-0.003990.00409-0.970.3325UNION10.648790.861600.750.4535grade210.002200.004250.520.6065exp41-3.34378E-70.00000151-0.220.8256exp310.000061700.000141920.430.6648gx210.000116830.000111021.050.2955gp1-0.003750.00494-0.760.4498gu1-0.051370.04430-1.160.2494pu10.001930.060610.030.9746eu1

20、-0.000221850.00126-0.180.8605残差项平方对所有一阶,二阶及交叉项回归.1.由左边的结果可知:故同方差的假设未被拒绝.2.Proc reg data=aa;Model y=x/spec;Run;可得到相同的结果。第25页/共60页布罗施-帕甘/戈弗雷检验怀特检验的特例（1）OLS残差额et和一个估计的干扰误差（2）用OLS将对选中的解释变量进行回归，并计算解释平方和(ESS);(3)在零假设下，有 (4)一个更简单且渐进等价的做法是直接利用残差平方对选中的解释变量进行回归.在零假设(同方差)下,第26页/共60页 Dependent Variable:rsq Su

21、m of MeanSource DF Squares Square F Value PrFModel 12 1.18881 0.09907 0.88 0.5731Error 87 9.83078 0.11300Corrected Total 99 11.01958 Root MSE 0.33615 R-Square 0.1079Dependent MeanDependent Mean 0.209890.20989 Adj R-Sq -0.0152BPG test results(1)第27页/共60页BPG test results(2)Dependent Variable:rsqadjust

22、Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr FModel 3 10.7041510.70415 3.56805 1.43 0.2386Error 96 239.41116 2.49387 Corrected Total 99 250.11531 Root MSE 1.57920 R-Square 0.0428 Dependent Mean 0.99997 Adj R-Sq 0.0129Coeff Var 157.92443ESS=10.70415ESS=10.70415第28页/共60页BPG tes

23、t results(3)*ESS=5.35 FModel 3 0.47160 0.15720 1.43 0.2386Error 96 10.5479810.54798 0.10987Root MSE 0.33147 R-Square 0.0428R-Square 0.0428第29页/共60页戈德菲尔德-匡特(Goldfeld-Quandt)检验按potexp的值将数据从小到大进行排列.取前后个35个观测值分别回归.c=30;回归的主要结果:RSS1=6.39573;RSS2=7.2517;RSS2/RSS1=1.13;而 ;该比值不显著,不能拒绝同方差的原假设;去掉的中间观测值的个数要适中,

24、否则会降低功效,一般取观测值个数的1/3.第30页/共60页补救措施-已知方差的形式1.广义最小二乘法(GLS);请参考讲义中的例子;2.模型变换法,适用于函数型异方差;已知方差的函数形式;3.加权最小二乘法(WLS);实质上是一种模型变换法;具体参见讲义中的例子;采用面板数据,增加信息量.第31页/共60页未知方差的形式Furnival(1961)提出了一种拟合指数进行不断的修正,最后找出最佳的权重(使得该指数值最小).第32页/共60页处理盲点-robust regression1.迭代加权最小二乘法(IRLS),Neter提出了2中加权函数,Huber and Bisquare,但是不易

25、操作.SAS v8中常使用Proc NLIN迭代.2.非参数回归.Proc Loess.3.SAS v9.0中有一个过程Proc robustregStata 中有一个比较好的命令:rreg直接进行鲁棒回归(robust),采用迭代过程.第33页/共60页序列相关性(serial correlation)OLSE unbiased,but inefficient and its standard error estimators are invalid;BLUE of the Gauss-Markov Theorem no longer holds.The variance formulas

26、for the least squares estimators are incorrect.AR,MA,or ARMA forms of serial correlation.Take the AR(1)for instance:第34页/共60页Dw 检验需要注意的地方假定了残差是服从正态分布,而且是同方差;自变量是外生的,如果包含了内生滞后变量,就需要用修正的dh检验(proc autoreg).只适用于一阶自相关,对高阶或非线性自相关不适用.样本容量至少为15.第35页/共60页自相关检验的标准德宾和沃森根据显著水平,n,k,确定了二个临界值du(上界),dl(下界);然后进行比较;(

27、1)ddu,不拒绝零假设;(3)dlddu,无结论;直观:;d2,负自相关;d=2,无自相关;第36页/共60页Eg:Ice cream demand(Hildreth,Lu(1960)Cons:consumption of ice cream per head(pints);Income:average family income per week($);Price:price of ice cream(per pint);Temp:average temperature(in Fahrenheit);Data:30 four-weekly obs from March 1951 to 11

28、 July 1953;第37页/共60页残差的散点图第38页/共60页回归结果 Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr|t|Intercept 1 0.19732 0.27022 0.73 0.4718price 1 -1.04441 0.83436 -1.25 0.2218income 1 0.00331 0.00117 2.82 0.0090temp 1 0.00346 0.00044555 7.76 .0001 Durbin-Watson D 1.021Durbin-Watso

29、n D 1.021 Number of Observations 30 1st Order Autocorrelation 0.330第39页/共60页1.DW test查表可得:在0.05的显著水平上,dl=1.21(N=30,k=3);du=1.65;直接在回归的语句中加上一个dw选项;Dw=1.021 ;因此拒绝零假设,认为有自相关;且显著一阶正相关;Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr|t|resid 1 0.384540.38454 0.17029 2.26 0.0319

30、0.0319第44页/共60页补救方法1.已知rho时,采用广义差分变换.2.未知rho时,先求相关系数,然后进行广义差分.求相关系数的方法有:(1)Cochrane-Orcutt迭代方法;(2)Hildreth-Lu.(3)Durbin 2 step.第45页/共60页对严格外生回归元的序列相关的校正AR(1)模型-可行的广义最小二乘法(FGLS)采用估计的相关系数值回归方程:FGLS步骤:1.yt对做xt1,xt2,xtk回归,得到残差t.2.t=t-1+et,求出相关系数的估计值3.对上面的方程进行回归.常见的标准误,t统计量和F统计量都是渐进正确的.采用相关系数估计值的代价是FGLS有

31、限样本性质较差,可能不是无偏的(数据弱相关时),但仍然是一致的.尽管FGLS不是无偏的,不是BLUE,但是当序列相关的AR(1)模型成立时,比OLS更渐进有效第46页/共60页区分科克伦-奥克特(Cochrane-Orcutt)和普莱斯-温斯登(Paris-Winsten)估计Co 估计省略了第一次的观测值,使用的是t=t-1+et 滞后项系数估计值,而Pw估计方法使用了第一次的观测值,见上面的回归式.大体来说是否使用第一次的估计值并不会带来很大的差别,但是时间序列的样本很小,实际中还是有很大差别.注意下面的估计结果中没有还原到原方程,还原时要写正确.高阶序列相关的校正,类似于一阶的修正,广义

32、的差分方法.第47页/共60页Sas 程序data ice;input cons income price temp time;cards;.;proc reg data=ice;model cons=price income temp/dw;output out=ice1 p=consp r=resid;run;symbol1 i=none v=dot c=blue h=.5;proc gplot data=ice1;plot resid*time=1/vref=0;run;/*BG test*/data tt1;set ice1;resid1=lag(resid);run;proc reg

33、 data=tt1;model resid=resid1/noint;run;/*rh0=0.40063,R-square=0.1541*/;data bgt;bg=29*0.1541;chisq=cinv(0.95,1);if bgchisq then t=1;else t=0;put t=;run;/*t=0*/;第48页/共60页Sas 程序高阶的BG检验:/*高阶BG test p=3*/;data tt2;set ice1;resid1=lag(resid);resid2=lag(resid1);resid3=lag(resid2);run;proc reg data=tt2;mod

34、el resid=resid1 resid2 resid3/noint;run;/*R-square=0.1792*/;data bgt2;bg=(29-3)*0.1792;chisq=cinv(0.95,3);if bgchisq then t=1;else t=0;put t=chisq=bg=;run;/*t=0,无高阶自相关*/;第49页/共60页Sas 程序/*yule-walker estimates*/;proc autoreg data=ice;model cons=price income temp/nlag=1 method=yw;run;*COCHRANE-ORCUTT;

35、proc reg data=ice;model cons=price income temp/dw;output out=tt p=chat r=res;run;proc print data=tt;run;data tt;set tt;relag=Lag(res);run;proc print data=tt;run;proc reg data=tt outest=b1;model res=relag/noint;run;/*可算出rh0=0.40063*/;data pp;set tt;c1=lag(cons);t1=lag(temp);i1=lag(income);p1=lag(pric

36、e);run;proc print data=pp;run;data pp1;set pp;if _n_=1 then delete;c2=cons-0.40063*c1;t2=temp-0.40063*t1;i2=income-0.40063*i1;p2=price-0.40063*p1;run;proc print data=pp1;run;proc reg data=pp1;MODEL c2=t2 i2 p2/dw;run;/*dw=1.541.65,因此不拒绝平稳性假设*/;第50页/共60页Sas 程序上页的科克伦-奥科特迭代只用了1次;对小样本情况,迭代多次的仍然很难收敛,我做了三

37、次迭代发现仍然不收敛;所以说多次迭代效果和一次的效果相差不大.从理论上来说两者的渐进性一样.大样本情况只需几步就可收敛;/*下面采用fgls进行估计校正*/;data fgls;set tt1;if _n_=1 then int=sqrt(1-0.40063*0.40063);else int=1-0.40063;if _n_=1 then cons1=cons*sqrt(1-0.40063*0.40063);else cons1=cons-0.40063*cons;if _n_=1 then price1=price*sqrt(1-0.40063*0.40063);else price1=p

38、rice-0.40063*price;if _n_=1 then income1=income*sqrt(1-0.40063*0.40063);else income1=income-0.40063*income;if _n_=1 then temp1=temp*sqrt(1-0.40063*0.40063);else temp1=temp-0.40063*temp;run;proc reg data=fgls;model cons1=int price1 income1 temp1/noint;run;第51页/共60页Sas 程序proc autoreg data=ice;model co

39、ns=price income temp/nlag=1 dwprob archtest;run;估计方法缺省为yule-walker估计;又称为两步完全变换法;已知自回归参数下的GLS估计;其他方法:在model/method=ML;ULS;ITYW;分别为极大似然估计,无条件最小二乘估计,以及迭代yule-walker估计;自回归参数较大时ml方法uls(又称NLS)方法较好.详见SAS/ETS中的autoreg过程.第52页/共60页Yuler-walker estimateTheAUTOREGProcedureDependentVariableconsOrdinaryLeastSquar

40、esEstimatesSSE0.03527284DFE26MSE0.00136RootMSE0.03683SBC-103.63408AIC-109.23887RegressR-Square0.7190TotalR-Square0.7190Durbin-Watson1.0212PrDW0.9997NOTE:PrDWisthep-valuefortestingnegativeautocorrelationStandardApproxVariableDFEstimateErrortValuePr|t|Intercept10.19730.27020.730.4718price1-1.04440.834

41、4-1.250.2218income10.0033080.0011712.820.0090temp10.0034580.0004467.76FModel30.047070.0156915.41|t|Intercept10.094090.173580.540.5926t210.003560.000554546.42FModel41.440320.36008836.01|t|int10.034110.262990.130.8978price11-0.669010.78886-0.850.4044income110.003880.001133.430.0021temp110.003650.00042

42、6868.56QLM Pr LM10.44250.50590.17970.671620.83220.65961.24460.536731.27250.73571.63460.651643.52920.47354.39740.354953.72470.58974.42290.490363.93200.68594.48930.610874.22880.75314.50930.719685.83440.66589.45420.305496.74410.663710.32720.3246107.75610.652610.59570.3899117.84430.727210.91310.4506127.93220.790412.49100.4071从上面的p-value可以看出不存在条件异方差;第58页/共60页其他有关时间序列的过程分布滞后模型Proc Pdlreg.向量自回归Proc varmax;时间序列建模Proc Arima时间序列预测Proc forecast.Stata中的命令rreg（鲁棒回归）；reg，robust给出来稳健的t值；newey和newey2给出来不同条件下的（包括面板数据，内生变量等）异方差自相关稳健估计（HAC）。第59页/共60页感谢您的观看！第60页/共60页

展开阅读全文