《最新多元线性应用回归PPT课件.ppt》由会员分享,可在线阅读,更多相关《最新多元线性应用回归PPT课件.ppt(113页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、浙江财经学院 倪伟才2第三章多元线性回归模型例子例子: 工资收入y ,教育x1 、工资经验x2 ; 产品的销售量y,自身的价格x1、替代品的价格x2、互补品的价格x3; 某品牌手机的销售额y,广告费x1、价格x2、可支配的收入x3、研发的投入x4; 汽车的速度y,动力x1、重量x2; 血糖y,胰岛素x1、生长素x2。以上例子的特点:被解释变量只有一个被解释变量只有一个,解释变解释变量有量有2个或个或2个以上个以上,这样的模型称为多元线性回多元线性回归模型归模型。本章的主要内容主要内容:多元线性回归模型、基本假设、未知参数的估计及性质、回归方程的系数和回归方程的检验、预测等。本章特点本章特点:利
2、用矩阵矩阵进行计算。浙江财经学院 倪伟才9四.完全共线性完全共线性 例子假设我们想估计竞选支出对竞选结果的影响.假定每次选举都有两位侯选人.令vote A为侯选人A的得票率, expend A为侯选人A的竞选支出; expend B为侯选人B的竞选支出; tot expend为竞选总支出.为了将每个侯选人竞选支出与竞选总支出的影响隔离开,考虑如下模型:vote A=0+1 expend A +2 expend B+3 totexpend+由于expend A +expend B=tot expend, 因此这3个自变量存在完全共线性完全共线性.只要解释1 的意义就会揭示出问题. 参数1 被认为
3、是在保持侯选人B的竞选支出和竞选总支出不变的情况下不变的情况下,度量了侯选人A的竞选支出对其得票率的影响.因为如果expend B和tot expend都保持不变,我们就不不可能增加可能增加expend A ,所以这就毫无意义毫无意义.解决完全共性方法解决完全共性方法:将3个自变量中去掉1个.浙江财经学院 倪伟才10利用矩阵形式求回归参数的估计()() ()()()2()YQee enYXYXYXYXY YXXXXYX YXYYX为 维列向量理由!浙江财经学院 倪伟才11关于向量求导11221 12211212, ( )( )( ),nnniiinnnnabababa baba ba baba
4、ba ba ba bbbbaaa浙江财经学院 倪伟才121122( )( )( )( )( )( )( ):_,_,_.nna baba bba bababa baa ba ba bbab练习浙江财经学院 倪伟才13关于矩阵求导111121221222121112112122221212211,()2nnnnnnnnnnnnnnnniiiijijiij nxaaaxaaaxAxaaaaaaxaaaxx Axxxxaaaxa xa x x 浙江财经学院 倪伟才1411 1122111 122111 1122121 1222211 122( )222( )222( )( )( )2( )nnnnn
5、nnnnnnnnnnnnnx Axa xa xa xxx Axa xa xa xxx Axxa xa xa xx Axa xa xa xx Axxxa xa xa xx Axx2AX浙江财经学院 倪伟才1511120()0()1()()1()12()()()()()1QX XX YifX XthenX XXX YX XX XX YX XrankXrankX XprankXrankX XprankXpassum ptionXp 存 在 ,非 奇 异,存 在另故浙江财经学院 倪伟才16残差性质1100010000iiiiiipX XX YX YX XX YXX eex eeenx()浙江财经学院
6、倪伟才17rank(X)=p+1 rank(XX) =p+1 (1) 1200()0000 because X has rank p+1.Proof :):):000 because has rank p+1.pX Xcc X XcXcXcXcXccXcX XccX X 浙江财经学院 倪伟才18例 题(用Stata!)数据: ch05pr04.dta请用矩阵矩阵求线性回归模型的系数估计值 1:计算矩阵形式XX 2:计算矩阵形式(XX)-1 3:计算矩阵形式XY 4:计算矩阵形式系数的估计值 (XX)-1 XY 5:将用矩阵运算得到的系数估计值和软件的直接回归得到的结果比较!浙江财经学院 倪伟才
7、19Stata命令数据:ch05pr04.dtagen one=1 mkmat y,mat(y) mkmat (one x),mat(x)mat list x mat list ymat b=inv(x*x)*x*ymat list breg y x浙江财经学院 倪伟才20最大似然估计最大似然估计 y yN(X,X,2I In)X X- -y yX X- -y y(21exp()2(2222nnL)X X- -y yX X- -y y(21)ln(2)2ln(2ln22nnL等价于使等价于使(y-X)(y-X)(y-X)(y-X)达到最小达到最小,这又完全与这又完全与OLSE一样一样11122
8、12( ,),1(2 )| exp()()( ,.,),2( )nnnx xxfN x xxx复习: 维正态分布的联合密度则x浙江财经学院 倪伟才21例例 3.1 (数据:12元回归.sav) 例例3.13.1国际旅游外汇收入是国民经济发展的重要组成部分,影响一个国家或地区旅游收入的因素包括自然、文化、社会、经济、交通等多方面的因素,本例研究第三产业对旅游外汇研究第三产业对旅游外汇收入的影响收入的影响。中国统计年鉴把第三产业划分为12个组成部分,分别为x x1 1农林牧渔服务业,x x2 2地质勘查水利管理业,x x3 3交通运输仓储和邮电通信业,x x4 4批发零售贸易和餐饮业,x x5 5
9、金融保险业,x x6 6房地产业,x x7 7社会服务业,x x8 8卫生体育和社会福利业,x x9 9教育文化艺术和广播,x x1010科学研究和综合艺术,x x1111党政机关,x x1212其他行业。采用1998年我国31个省、市、自治区的数据,以国际旅游外汇收入(百万美元)为因变量国际旅游外汇收入(百万美元)为因变量y y,以如上12 12 个行业为自变量做多元线性回归个行业为自变量做多元线性回归,数据见表3.1,其中自变量单位为亿元人民币。浙江财经学院 倪伟才22残差(abstracted from Greene ECONOMETRIC ANALYSIS chapter 3)111(
10、)(IX X XeYXYX X XXYYYMIX X XXXM 其中,浙江财经学院 倪伟才23we can interpret M as a matrix that produces the vector of least squares residuals in the regression of y on X when it premultiplies any vector y. It is convenient to refer to this matrix as a “residual maker.” It follows that MX = 0. One way to interpr
11、et this result is that if X is regressed on X, a perfect fit will result and the residuals will be zero.The n n matrix M defined is fundamental in regression analysis. You can easily show that M is both symmetric (M = M) and idempotent (M = M2).residual maker: M浙江财经学院 倪伟才2411()()()XXYYeXXYM YIMYYYPX
12、XPXX其 中 ,fit valueMIP浙江财经学院 倪伟才25The matrix P, which is also symmetric and idempotent, is a projection matrix. It is the matrix formed from X such that when a vector y is premultiplied by P, the result is the fitted values in the least squares regression of y on X. This is also the projection of the
13、 vector y into the column space of X.projection (or hat)matrix: P浙江财经学院 倪伟才26projection (or hat)matrix: P性性质质1:对称矩阵2:幂等矩阵11113 :P roof: ()()()()()(:1)14)(pntr PtrX XXtrX XXtrX XX Xtr Itr PpXXMIPp浙江财经学院 倪伟才27残差的方差协方差矩阵22()(,)(,)DeC o v M YM YC o v YYC o v e eMMMI MM方 法 二 :22( )()()()()()()D eEEeEeeEM
14、E MMMEMeEMIeeMMM方法一:浙江财经学院 倪伟才28随机误差项的方差2的无偏估计为221111iepneepn2222: 1)() ()2 )()()() ()()()()()()(1)p r o o fe eM YM YY M YEe eEY M YEtr Y M YEtrM Y YtrIPttrMrItrPnp理 由 !理 由 !浙江财经学院 倪伟才29课堂练习数据见:TableF2.2.dta题目来源于Greene Notes3解释变量y=G,x=(one,pg,y)请用Stata计算:1:xx, xy, (xx) -1, b2:M3:xe=xMy,where e is re
15、siduals4:MX浙江财经学院 倪伟才30Stata命令egen one=fill(1,1)mkmat G,mat(y)mkmat one Pg Y,mat(x)mat b=inv(xx)*x*y mat e=m*ymat xte=x*m*ymat list xtemat m=I(36)-x*inv(x*x)*xmat mx=m*xmat list mx浙江财经学院 倪伟才31补充内容(矩阵计算)Applied Linear Regression Models (Fourth Edition) chapter5 simple linear regressionProblems 5.23,
16、5.25例题 Problems 5.23学生练习: Problems 5.25具体请见word格式:回归模型的矩阵计算回归模型的矩阵计算(stata).doc浙江财经学院 倪伟才32Homework:(1) (2) (3)0(4)(5)0(6) 0provee ee ye ey yy XbMXMeX ey e浙江财经学院 倪伟才333.3参数估计的性质(BLUE)111:Y.:XXX,AY:AP ,P= X XXX,PXA:. i. e. E( ):AE( ) E(A )E(A )A E( )0We notice thatrepresentsXXX the regression ofonAA1
17、2性质线性变换性是随机向量 的一个分析 记则比较和其中是 的分析 质无偏估计 X .As long as the effects of missing variables are randomly distributed independently of X and have 0 mean, the least squares parameter estimator will be unbiased.浙江财经学院 倪伟才343性质的方差协方差矩阵21112:Var( )A A A A XXX XXVar( )X X分 析21D( )=( )X X 2221( )()()( )Var AAVar
18、AVarAIAAAX X证明:010101:,.,练习 利用此性质 求一元线性回归模型中的 的方差协方差矩阵首先回忆的方差Var(),Var()及它们的协方差Cov(,)再利用性质求解 答案是否一致?浙江财经学院 倪伟才35特例特例 (一元线性回归模型) 当当p=1时时 niiniiniixxxn1211 X XX Xxx22212xx2L nL xxxxniiLxLxxn )(1112221niiniiniixxxX XX XX XX X浙江财经学院 倪伟才36性质 3: D()=2(X X) -1的意义:22D( ),;,.D( ),X.,;()( spss:covariance matr
19、ix)的是各个分量的表示起是反映相应分量的不仅和有关 而且和设计矩阵 有关具体而言 随机误差项的方差样本的要有代表性相关矩阵中的操主对角线元素方差波动非主对角线元素协方差相关程度越小越好数据不能太集中作浙江财经学院 倪伟才37Calculating Parameter and Standard Error Estimates for Multiple Regression ModelsExample: The following model with k=3 is estimated over 15 observations: and the following data have been
20、calculated from the original Xs. Calculate the coefficient estimates and their standard errors. To calculate the coefficients, just multiply the matrix by the vector to obtainTo calculate the standard errors, we need an estimate of 2. ().,( )., .X XX yu u120351035106510 65433022061096uxxy33221yXXX12
21、10.960.9115312SSR浙江财经学院 倪伟才38(contd)The variance-covariance matrix of is given by The variances are on the leading diagonal:We write:VarSEVarSEVarSE().().().().().().112233183135091096393198231.104.4019.1.3508.96.9881xxy 2113.200.911.830. ()0.91()3.205.940.915.9913.943X XX X浙江财经学院 倪伟才39性质4: 高斯马尔可夫定理
22、Gauss-Markov theorem: 在高斯马尔可夫条件高斯马尔可夫条件下, 即E()=0 , E( )=2I ,在的所有线性无偏估计所有线性无偏估计中,由最小二乘法得到的估计值 的方差最小.(即BLUE)注可能存在非线性函数非线性函数(指的是y1,y2, y n的函数 ),是无偏估计,但它的方差比由最小二乘法得到的估计值 的方差方差要小要小可能存在有偏估计有偏估计, 它的方差比由最小二乘法得到的估计值 的方差要小方差要小本定理的一个前提是在前提是在 的线性的线性,无偏估计中无偏估计中.本定理的证明采用矩阵形式.详细过程请参考Econometric Models and Economic
23、 ForecastsPindyckAppendix 4.3 The Multiple Regression Model in Matrix Form 该书110,111页,此种证明方法较繁琐!建议采用建议采用Greene Econometric Analysis的方法!的方法!浙江财经学院 倪伟才40Greene的方法!(的方法!(要求掌握要求掌握!)11(1221) 1(1)12A A:X XXX XX,Var( )()()()()() ()Var( )Var()C C( )()LetXXD=CA ppnnlinyAEEELetCyCearunbiasedyECXI Proof = 2222
24、20Var( )Var( )C CA A(D+A) (D+DA)A DA0DX浙江财经学院 倪伟才41Cond2Var( )Var( )Since a quadratic form in D Dis qD D(D ) D ,the covariance matrixD D0nonnegativedefinite of minues that of equals a .Therefore quadratic form matrixevery in Var(qqqz z) is larger than the corresponding form in Var( ),which implies a
25、 very importantproperty of the least squares coefficient vector.浙江财经学院 倪伟才42注解:Gauss-Markov theorem的证明可以参考 James H.Stock,Mark W.Watson Introduction to EconometricsAPPENDIX 16.5 浙江财经学院 倪伟才43参数估计量的性质参数估计量的性质 性质性质5 cov(,e)=0此性质说明 与e e不相关,在正态假定下等价于与e e独立,从而与 独立。SSR e e性质性质6 在正态假设时),(2nIXyN时)( ,(21XXN(1)
26、(2)22(/)1SSRnp222222()() (1)()SSRe eMtr MMMMnp注解:浙江财经学院 倪伟才4422222222()():()2:()()()()2()()2()3:1()iiiiiiiiiiiiiiiiSSTyySSEyySSRyyprooSSTSSRSSEyye yfSSTyyyyyyyyyyyySSRSSESSRSSESSRSSERSSTTySS 1:总平方和:解释平方和:残差平和利用残差性质浙江财经学院 倪伟才45M0 很方便的记号!10010120002()i(i i)ii=(1,1,.,1)1111(),11,()()inXnnMIXXXMIwherenM
27、nyyy M yyyyyM yM yM yyy浙江财经学院 倪伟才46M0 性质0100iMeeneeX 浙江财经学院 倪伟才47SST,SSR,SSE002)(iSSTyySy M yy MSEReSyeS浙江财经学院 倪伟才48SST=SSR+SSE00000000000000()()()()0yXeM yM XM XM XM XeX M Xe eM eeSSTy M ye M XX M Xe eXX M Xe eXX MeXXe eeM eey M ye eSSSeES RyX浙江财经学院 倪伟才49请用矩阵计算重点是3种平方和01 1010,(5)SMMSMRISSESSyMIe eT
28、y=Py e=Myi=(1,1一、(1) =(XX) (X )(2)P=X(XX,1, ,1)i() XP(3),二、(4)三、(6)(7)(ii) iyyy8)y浙江财经学院 倪伟才50Stata 例:数据:chap05pr04.dta gen one=1mkmat one x,mat(x)mkmat y,mat(y)mat b=inv(x*x)*x*ymat p=x*inv(x*x)*xmat m=I(5)-pmat yhat=p*ymat e=m*ymat i=J(5,1,1)mat list imat m0=I(5)-i*inv(i*i)*imat list m0浙江财经学院 倪伟才5
29、1Condmat ssr=e*emat sse=yhat*m0*yhatmat sst=y*m0*ymat list ssemat list ssrmat list sstreg y x具体的输出结果请参考:3种平方和的矩阵计算.doc练习练习:Applied Linear Regression Modelschapter5 problems5.24数据: chap05pr21.dta 浙江财经学院 倪伟才52样本决定系数样本决定系数 R2= SSE/ SST=1 SSR/ SSTR2 measures the proportion of variation in Y which is exp
30、lained by the multiple regression equation. R2 is often used informally as a goodness of fit statistic and to compare the validity of regression results under alternative specifications of the independent variables in the model .However, there are several problems with the use of R2 . First, all our
31、 statistic results follow from the initial assumption that the model is correct ; we have no procedure that compares alternative specifications.Second, R2 is sensitive to the number of independent variables included in the regression model. The addition of more independent variables to the regressio
32、n equation can never lower R2 and is likely to raise it. (The addition of a new explanatory variable does not alter SST but is likely to increase SSE. ) Thus ,one could simply add more variables to an equation if one wished only to maximize R2.浙江财经学院 倪伟才53Adjusted R2 要掌握!要掌握!The difficulty with R2 a
33、s a measure of goodness of fit is that R2 pertains only to explained and unexplained variation in Y and therefore does not account for the number of degree of freedom. A natural solution is to use variances, not variations, thus eliminating the dependence of goodness of fit in the number of independ
34、ent variables in the model.22222(np 1)R(n1)Sn1SR /1SSR1 (1)nT1:p/RRR和关系浙江财经学院 倪伟才54Adjusted R2性质2222222222 (2),n10 , p3 , 0n11 (1)np 10.35.10RRRRRRRRRR (1)表明随着自变量个数的增加,一定程度上消除了自变量个数的影响当较小,自变量个数较多。如当时,负的拟合度无任何意义,此比增加的慢可能会是负数时取浙江财经学院 倪伟才55三.统计量:回归方程总体显著性的检验2The F statistic calculated by most regressio
35、n programs canbe used in the multiple regression model to .The F statistic wipand np1 degrees of frth ee dom allows test thesignificance of the R statistic12,1hypothesis that none of the explanatory variables helpsexplain the variation of Y aboutus to test the.In other words , the F statistic test t
36、he joint hypothesisth its mean.0SSEFat pp np22/p(n-p-1)SSR/(n-p-1)1p RR浙江财经学院 倪伟才56 Cond2,122 If the ,then we would expect ,SSE/p(n-p-1)FSSR/(n-p-1)1pnull hypothesis is trueSSEF to and therefore . Thus a high valueof the F statistic is a rbe aticlose onale for rejectintog 0 p n pRRR the null hypothe
37、sis . An F statistic not significantly different from 0 lets us conclude that the explanatory variables dolittle to explain the variation of Y about its mean.浙江财经学院 倪伟才57联合排除性约束的F检验 很重要,务必掌握很重要,务必掌握00111110yx. up 1()q0H(qq)qq0,.pyx. + ,0RuqSSpppqpqpp qxx (个参数)假设有要检验,即要检验 个变量的系数为 。:对模型施加了 个排除性约束,若原假设
38、成立,则 个变量从模型中去掉!()当不受约束模型变为受约束不受约束模型:个排除性约束受约束模型:个约束残差模型时,对原假设而平方和的相言应该对增加是有意义的!浙江财经学院 倪伟才58联合排除性约束的F检验的公式公式记住记住 222(: 1q?2F:Wooldridge paSSRSSR/qFSSR/ge 147t(np 1)FSSR(RR )/hqF(1R )/(ne1qp) ururrrururSSRdqfRq不受约束模型受约束模型为约束的)度量的是从到!观测次数被估计参数的个数为受约束模型和不受约束模型的自由度差。或思考为什么是为什么构造的是 统计量?请解释!证明: ()课型相对提高平方型
39、堂习个数!练uestion4.5浙江财经学院 倪伟才59联合排除性约束的F检验和一般F检验的关系关系一般F检验实际上就是联合排除性约束的F检验的特例特例!SSR =SSRSSE0SSRSSTSSRSSR/qFSSR/(nSS/ pFSSR()FSSSSR/(np 1)/ pSSR /p 1)(np 1)urrrrururpTqE联合排除性约束当受约束的个数解释变量的个数 时,故一般的,的验检验检浙江财经学院 倪伟才60一道有趣的题目有趣的题目:Wooldridge question4.5浙江财经学院 倪伟才61练习Consider Patient satisfaction chap06pr15
40、.dta1:Test whether X3 can be dropped from the regression model given that X1 and X2 are retained. Use F test statistic and level of significance 0.05.State the alternatives, decision rule, and conclusion. What is the P-value of the test? 2:Test whether 1=-1 and 2=0.State the alternatives, full model
41、 and reduced model, decision rule, and conclusion. What is the P-value of the test? abstracted from Applied Linear Regression Models Problems 7.5 and 7.9浙江财经学院 倪伟才62Stata chap06pr15.dta1: reg y x1 x2 x3 test x3di 3.600.51.89736662: quireg y x1 x2 x3 test x1=-1F( 1, 42) = 0.43 Prob F = 0.5133test x2=
42、0,accumulateF( 2, 42) = 0.88 Prob F = 0.4208浙江财经学院 倪伟才63四. t统计量:个别个别回归系数的显著性检验21201 N( ,XX)xyH:0tt()()XXjjjjjjjjsese 检验:对 的影响是否显著。原假设 统计量:其中个别自变量浙江财经学院 倪伟才64五.讲解课本例3.1 (12元.dta)注:全体12个自变量做为整体对y有显著性的线性关系;但每一个自变量对y没有显著性的线性关系。:对于多元回归而言,回归方程总体性的显著性回归方程总体性的显著性F检验检验和回归系数的个别显著性的回归系数的个别显著性的t检验检验不同不同;原因在于多重共
43、线性。如何才能使每一个变量都对y具有显著性影响: 方法:剔除多余变量,一个一个剔除,先剔除p值最大的,进行检验,依次进行,直到所有的变量对y的影响都是显著的(即每个p值均小于).浙江财经学院 倪伟才65Stata for 课本例3.1 数据: 12元.dtareg y x*reg y x2-x12reg y x3-x12reg y x3-x11 reg y x3 x5-x11 reg y x3 x5 x6 x8-x11reg y x3 x5 x8-x11 reg y x3 x8-x11reg y x3 x8 x9 x11浙江财经学院 倪伟才66六.对F显著显著,t不显著不显著的直观解释直观解释
44、y : the total travel time;X1 : the number of miles traveled;X2 : the number of gallons of gasoline consumed.Assume that we obtain the equation and find that the F test shows the relationship to be significant .Then suppose we conduct a test on 1 to determine whether 1 0 ,and we cannot reject H0: 1 =
45、0 .Does this mean that travel time y is not related to miles traveled x1 ? 01 122 yxx浙江财经学院 倪伟才67CondNot necessarily .What it probably means is that with x2 already in the model, x1 does not make a significant contribution to determining the value of y .This interpretation makes sense in our example
46、: if we know the amount of gasoline consumed, we do not gain much additional information useful in predicting y by knowing the miles traveled. Similarly, a t test might lead us to conclude 2 =0 on the grounds that ,with x1 in the model ,knowledge of the amount of gasoline consumed does not add much.
47、浙江财经学院 倪伟才68请参考请参考Applied Linear Regression Modelschapter7, 7.5部分部分Standardized Multiple Regession ModelPurpose: A standardized form of the general multiple regression model is employed to control round off errors in normal equations calculations and to permit comparisons of the estimated regression
48、 coefficients in common units.浙江财经学院 倪伟才69Round off Errors in calculationsThe results from normal equations calculations can be sensitive to rounding of data in intermediate stages of calculations .When the number of X variables is small-say, three or less-round off effects can be controlled by carr
49、ying a sufficient number of digits in intermediate calculations .Indeed, most computer regression programs use double-precision arithmetic in all computations to control round off effects. Still, with a large number of X variables ,serious round off effects can arise despite the use of many digits i
50、n intermediate calculations .浙江财经学院 倪伟才70CondRound off errors tend to enter calculations primarily when the inverse of XX is taken.The danger of serious round off errors in (XX)-1 is particularly great when(1)XX has a determinant that is close to zero and/or (2)the elements of XX differ substantiall