《截面与面板数据分析-复旦.ppt》由会员分享,可在线阅读,更多相关《截面与面板数据分析-复旦.ppt(43页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Analysis of Cross Section and Panel DataYan ZhangSchool of Economics,Fudan UniversityCCER,Fudan UniversityIntroductory EconometricsA Modern ApproachYan ZhangSchool of Economics,Fudan UniversityCCER,Fudan UniversityAnalysis of Cross Section and Panel DataPart 1.Regression Analysis on Cross Sectional
2、DataChap 2.The Simple Regression ModelPractice for learning multiple Regression vBivariate linear regression modelv :the slope parameter in the relationship between y and x holding the other factors in u fixed;it is of primary interest in applied economics.v :the intercept parameter,also has its use
3、s,although it is rarely central to an analysis.More Discussion v :A one-unit change in x has the same effect on y,regardless of the initial value of x.Increasing returns:wage-education(f.form)vCan we draw ceteris paribus conclusions about how x affects y from a random sample of data,when we are igno
4、ring all the other factors?Only if we make an assumption restricting how the unobservable random variable u is related to the explanatory variable xClassical Regression Assumptions v Feasible assumption if the intercept term is includedvLinearly uncorrelated zero conditional expectationvMeaning =内生性
5、内生性 PRF(Population Regression Function):sth.fixed but unknownOLS vMinimize uu vsample regression function(SRF)vThe point is always on the OLS regression line.拟合值与残差PRF:OLS v vCoefficient of determinationthe fraction of the sample variation in y that is explained by x.the square of the sample correla
6、tion coefficient between andLow R-squaredsUnits of MeasurementvIf one of the dependent variables is multiplied by the constant cwhich means each value in the sample is multiplied by cthen the OLS intercept and slope estimates are also multiplied by c.vIf one of the independent variables is divided o
7、r multiplied by some nonzero constant,c,then its OLS slope coefficient is also multiplied or divided by c respectively.vThe goodness-of-fit of the model,R-squareds,should not depend on the units of measurement of our variables.Function FormvLinear NonlinearvLogarithmic dependent variableA Percentage
8、 change in y,semi-elasticityan increasing return to edu.Other nonlinearity:diploma effectvBi-LogarithmicAaConstant elasticityvChange of units of measurement P45,error:b0*b0+log(c1)-b1log(c2)vBi-LogarithmicAaConstant elasticityvChange of units of measurement P45,error b0*b0+log(c1)-b1log(c2)vBe profi
9、cient at interpreting the coef.Unbiasedness of OLS EstimatorsvStatistical properties of OLS从总体中随机抽样取出的不同样本的从总体中随机抽样取出的不同样本的OLS估计估计 的分的分布性质布性质vAssumptionsLinear in parameters(f.form;advanced methods)Random sampling(time series data;nonrandom sampling)Zero conditional mean(unbiased biased;spurious cor
10、)Sample Variation in the independent variables(colinearity)vTheorem(Unbiasedness)Under the four assumptions above,we have:Variance of OLS Estimatorsv 的随机抽样以的随机抽样以 为中心,问题是为中心,问题是 究竟距离究竟距离 多远?多远?vAssumptionsHomoskedasticity:Error varianceA larger means that the distribution of the unobservables affect
11、ing y is more spread out.vTheorem(Sampling variance of OLS estimators)Under the five assumptions above:Variance of y given xv Conditional mean and variance of y:v Heteroskedasticity What does depend on?vMore variation in the unobservables affecting y makes it more difficult to precisely estimatevThe
12、 more spread out is the sample of xi-s,the easier it is to find the relationship between E(y x)and xvAs the sample size increases,so does the total variation in the xi.Therefore,a larger sample size results in a smaller variance of the estimatorEstimating Error VariancevErrors(Disturbances)and Resid
13、ualsErrors:,populationResiduals:,estimated f.vTheorem(The unbiased estimator of )Under the five assumptions above,we have:standard error of the regression(SER):Estimating the standard deviation in y after the effect of x has been taken out.vStandard Error of :Regression through the OriginvRegression
14、 through the Origin:Pass through E.g.income tax revenue incomeThe estimator of OLS:=only if 0if the intercept 0,then is a biased estimator ofChap 3.Multiple Regression Analysis:EstimationvAdvantages of multiple regression analysisbuild better models for predicting the dependent variable.E.g.generali
15、ze functional form.Marginal propensity to consumeBe more amenable to ceteris paribus analysisChap 3.2Key assumption:Implication:other factors affecting wage are not related on average to educ and exper.vMultiple linear regression model:the ceteris paribus effect of xj on yOrdinary Least Square Estim
16、atorvSPF:vOLS:MinimizeF.O.C:vceteris paribus interpretations:Holding fixed,thenThus,we have controlled for the variables when estimating the effect of x1 on y.Holding Other Factors Fixed vThe power of multiple regression analysis is that it provides this ceteris paribus interpretation even though th
17、e data have not been collected in a ceteris paribus fashion.vit allows us to do in non-experimental environments what natural scientists are able to do in a controlled laboratory setting:keep other factors fixed.OLS and Ceteris Paribus Effects vStep of OLS:(1):the OLS residuals from a multiple regre
18、ssion of x1 on(2):the OLS estimator from a simple regression of y onv measures the effect of x1 on y after x2,xk have been partialled or netted out.vTwo special cases in which the simple regression of y on x1 will produce the same OLS estimate on x1 as the regression of y on x1 and x2.Goodness-of-fi
19、t v also equal the squared correlation coef.between the actual and the fitted values of y.vR never decreases,and it usually increases when another independent variable is added to a regression.vThe factor that should determine whether an explanatory variable belongs in a model is whether the explana
20、tory variable has a nonzero partial effect on y in the population.Regression through the originvthe properties of OLS derived earlier no longer hold for regression through the origin.the OLS residuals no longer have a zero sample average.can actually be negative.to calculate it as the squared correl
21、ation coefficientif the intercept in the population model is different from zero,then the OLS estimators of the slope parameters will be biased.The Expectation of OLS Estimator vAssumptions(简单回归模型假定的直接推广;比较简单回归模型假定的直接推广;比较)Linear in parameters Random samplingZero conditional meanNo perfect co-linear
22、itynone of the independent variables is constant;and there are no exact linear relationships among the independent variablesvTheorem(Unbiasedness)Under the four assumptions above,we have:rank(X)=KNotice 1:Zero conditional mean vExogenous EndogenousMisspecification of function form(Chap 9)Omitting th
23、e quadratic term The level or log of variable Omitting important factors that correlated with any independent v.如果被遗漏的变量与解释变量相关,则零条件方差不如果被遗漏的变量与解释变量相关,则零条件方差不成立,回归结果有偏成立,回归结果有偏Measurement Error(Chap 15,IV)Simultaneously determining one or more x-s with y(Chap 16,联立方程组联立方程组)Omitted Variable Bias:The
24、Simple CasevProblem:Excluding a relevant variable or Under-specifying the model(遗漏本来应该包(遗漏本来应该包括在总体(真实)模型中的变量)括在总体(真实)模型中的变量)vOmitted Variable Bias(misspecification analysis)The true population model:The underspecified OLS line:The expectation of :The Omitted variable bias:前面3.2节中是x1对x2回归Omitted Var
25、iable Bias:NonexistencevTwo cases where is unbiased:The true population model:is the sample covariance between x1 and x2 over the sample variance of x1If ,then 的无偏性与的无偏性与x2无关,估计时只需调整截无关,估计时只需调整截距,将距,将x2放入误差项不影响零条件均值假定放入误差项不影响零条件均值假定vSummary of Omitted Variable Bias:The expectation of :The Omitted va
26、riable bias:The Size of Omitted Variable BiasvDirection SizevA small bias of either sign need not be a cause for concern.vUnknown Some ideawe usually have a pretty good idea about the direction of the partial effect of x2 on y,that is,the sign ofin many cases we can make an educated guess about whet
27、her x1 and x2 are positively or negatively correlated.vE.g.(Upward/downward Bias;biased toward zero)高估!高估!Omitted Variable Bias:More General Casesv vSuppose:x2 and x3 are uncorrelated,but that x1 is correlated with x3.vBoth and will normally be biased.The only exception to this is when x1 and x2 are
28、 also uncorrelated.vDifficult to obtain the direction of the bias in andvApproximation:if x1 and x2 are also uncor.Notice 2:No Perfect Collinearity vAn assumption only about x-s,nothing about the relationship between u and x-svAssumption MLR.4 does allow the independent variables to be correlated;th
29、ey just cannot be perfectly correlated.Ceteris Paribus effectIf we did not allow for any correlation among the independent variables,then multiple regression would not be very useful for econometric analysis.SignificanceCases of Perfect CollinearityvWhen can independent variables be perfectly collin
30、ear software“singular”Nonlinear functions of the same variable is not an exact linear f.Not to include the same explanatory variable measured in different units in the same regression equation.More subtle waysone independent variable can be expressed as an exact linear function of some or all of the
31、 other independent variables.Drop itKey:Notice 3:Unbiase vthe meaning of unbiasedness:an estimate cannot be unbiased:an estimate is a fixed number,obtained from a particular sample,which usually is not equal to the population parameter.When we say that OLS is unbiased under Assumptions MLR.1 through
32、 MLR.4,we mean that the procedure by which the OLS estimates are obtained is unbiased when we view the procedure as being applied across all possible random samples.Notice 4:Over-SpecificationvInclusion of an irrelevant variable or over-specifying the model:does not affect the unbiasedness of the OL
33、S estimators.including irrelevant variables can have undesirable effects on the variances of the OLS estimators.Variance of The OLS EstimatorsvAdding AssumptionsHomoskedasticity:Error varianceA larger means that the distribution of the unobservables affecting y is more spread out.vGauss-Markov assum
34、ptions(for cross-sectional regression):Assumption 1-5vTheorem(Sampling variance of OLS estimators)Under the five assumptions above:More about vThe stastical properties of y on x=(x1,x2,xk)vError varianceonly one way to reduce the error variance:to add more explanatory variablesnot always possible an
35、d desirablevThe total sample variations in xj:SSTjIncrease the sample size Multi-collinearity(多重共线性)(多重共线性)vThe linear relationships among the independent v.其他解释变量对其他解释变量对xj的拟合优度(含截距项)的拟合优度(含截距项)vIf k=2:v :the proportion of the total variation in xj that can be explained by the other independent var
36、iables :High(but not perfect)correlation between two or more of the in dependent variables is called multicollinearity.Micro-numerosity:problem of small sample sizevHigh vLow SSTjvone thing is clear:everything else being equal,forestimatingj,itisbettertohavelessforestimatingj,itisbettertohavelesscor
37、relation between xj and the other x-s.vHow to“solve”the multicollinearity?Increase sample sizeDropping some v.?如果删除了总体模型中如果删除了总体模型中的一个变量,则会导致有偏的一个变量,则会导致有偏Notice:The influence of multicollinearity vA high degree of correlation between certain independent variables can be irrelevant as to how well we
38、 can estimate other parameters in the model.vE.g.参见注释vImportance for economists:controlling v.Variances in Misspecified ModelsWhether or Not to Include x2:Two Favorable Reasons vThe choice of whether or not to include a particular variable in a regression model can be made by analyzing the tradeoff
39、between bias and variance.vHowever,when20,therearetwofavorableHowever,when20,therearetwofavorablereasons for including x2 in the model.any bias in does not shrink as the sample size grows;The variance of estimators both shrink to zero as n increaseTherefor,the multicollinearity induced by adding x2
40、becomes less important as the sample size grows.In large samples,we would preferEstimating :Standard Errors of the OLS Estimators参见注释EFFICIENCY OF OLS:THE GAUSS-MARKOV THEOREMvBLUE“Best”:smallest variance“linear”:“unbiased”:定理含义定理含义:(:(1)无需寻找其他线性组合的无偏估计量;()无需寻找其他线性组合的无偏估计量;(2)如果)如果G-M假设有一个假设有一个不成立,则不成立,则BLUE不成立。例如零条件均值不成立(内生性)会导致有偏;异方差不不成立。例如零条件均值不成立(内生性)会导致有偏;异方差不会有偏,但会使方差不再是最小。会有偏,但会使方差不再是最小。Classical Linear Model AssumptionsInference本部分课程内容参考资料本部分课程内容参考资料vJeffrey M.Wooldridge,Introductory EconometricsA Modern Approach,Chap 2-3.