《Common Errors How to (and Not to) Control for Unobserved Heterogeneity.docx》由会员分享,可在线阅读,更多相关《Common Errors How to (and Not to) Control for Unobserved Heterogeneity.docx(64页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Electronic copy available at: http:/ Common Errors: How to (and Not to) Control for Unobserved Heterogeneity* Todd A. Gormley and David A. Matsa October 19, 2012 Abstract Controlling for unobserved heterogeneity (or common errors), such as industry-specific shocks, is a fundamental challenge in empi
2、rical research, as failing to do so can introduce omitted variables biases and preclude causal inference. This paper discusses limitations of two approaches commonly used to control for unobserved group-level heterogeneity in finance researchdemeaning the dependent variable with respect to the group
3、 (e.g., industry-adjusting) and adding the mean of the groups dependent variable as a control. We show that these techniques, which are used widely in both asset pricing and corporate finance research, typically provide inconsistent coefficients and can lead researchers to incorrect inferences. In c
4、ontrast, the fixed effects estimator is consistent and should be used instead. We also explain how to estimate the fixed effects model when traditional methods are computationally infeasible. (JEL G12, G2, G3, C01, C13) Keywords: unobserved heterogeneity, group fixed effects, industry-adjust, bias *
5、 For helpful comments, we thank an anonymous referee, Michael Anderson, Joshua Angrist, Christian Hansen, Dirk Jenter, Sandy Klasa, Alexander Ljungqvist, Mitchell Petersen, Nagpurnanand R. Prabhala, Michael Roberts, Nick Souleles, Michael R. Wagner, and Jeffrey Wooldridge, as well as the seminar par
6、ticipants at Wharton and MIT. Matthew Denes, Christine Dobridge, Jingling Guan, and Kanis Saengchote provided helpful research assistance. Gormley thanks the Rodney L. White Center for Financial Research Brandywine Global Investment Management Research Fellowship and the Cynthia and Bennett Golub En
7、dowed Faculty Scholar Award for financial support. The Wharton School, University of Pennsylvania, 3620 Locust Walk, Suite 2400, Philadelphia, PA, 19104. Phone: (215) 746-0496. Fax: (215) 898-6200. E-mail: tgormleywharton.upenn.edu Kellogg School of Management, Northwestern University, 2001 Sheridan
8、 Road, Evanston, IL 60208. Phone: (847) 491-8337. Fax: (847) 491-5719. E-mail: dmatsakellogg.northwestern.edu Electronic copy available at: http:/ Controlling for unobserved heterogeneity is a fundamental challenge in empirical finance research because most corporate policiesincluding financing and
9、investmentdepend on factors that are unobservable to the econometrician. If these factors are correlated with the variables of interest, then without proper treatment, omitted variables bias infects the estimated parameters and precludes causal inference. In many settings, important sources of unobs
10、erved heterogeneity are common within groups of observations. For example, unobserved factorslike demand shocksare often common across firms in an industry and affect many corporate decisions. Potential unobserved factors abound: unobserved differences in local economic environments, management qual
11、ity, and the cost of capital, to name a few. Although the empirical finance literature uses various estimation strategies to control for unobserved group-level heterogeneity, there is little understanding of how these approaches differ and under which circumstances each provides consistent estimates
12、. Our paper examines this question and shows that some commonly used approaches typically lead to inconsistent estimates and can distort inference. We focus on two popular estimation strategies that are applied when there are a large number of groups of observations and the number of observations pe
13、r group is small relative to the number of groupsfor example, firm-panel data that is grouped into industry-years. The first estimation strategy, which we refer to as adjusted-Y (AdjY), demeans the dependent variable with respect to the group before estimating the model with ordinary least squares (
14、OLS). A common example is when researchers industry-adjust their dependent variable so as to remove common industry factors in a firm-level analysis. A second approach, which we refer to as average effects (AvgE), uses the mean of the groups dependent variable as a control in the OLS specification.
15、A common implementation of AvgE uses observations state-year mean to control for time-varying differences in local economic environments. Both AdjY and AvgE are widely used in empirical finance research. Articles published in top finance journalsincluding the Journal of Finance, Journal of Financial
16、 Economics, and Review of Financial Studieshave used both approaches since at least the late 1980s, and they continue to be used today.1 Among articles published in these three journals in 2008-2010, we found over 60 articles, split 1 The exact origin of the two estimators in finance is unclear; we
17、suspect they were adapted from the event studies literature, in which stock returns are regressed on market-average returns. AdjY may have been inspired by analyses of market-adjusted returns, and AvgE by estimations of the market model. 1 Electronic copy available at: http:/ about evenly between co
18、rporate finance and asset pricing, that employed at least one of the two techniques. The techniques are used to study a variety of finance topics, including banking, capital structure, corporate boards, governance, executive compensation, and corporate control. Articles using these estimation method
19、s have also been published in economics, including the American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics, and in accounting, including the Accounting Review, Journal of Accounting and Economics, and Journal of Accounting Research. Our paper shows that, despit
20、e their popularity, the AdjY and AvgE estimators rarely provide consistent estimates of models with unobserved group-level heterogeneity; both estimators can exhibit severe biases (where bias here refers to the difference between the probability limit of the estimate and the true parameter). The Adj
21、Y estimator suffers from an omitted variable bias because it fails to control for the group average of the independent variables. This omission is problematic when the observations for any explanatory variable are not independent within groups, which is likely in practice. The AvgE estimator suffers
22、 from a measurement error bias because the sample mean of the groups dependent variable measures the true unobserved heterogeneity with error. AvgE is inconsistent when AdjY is inconsistent and also when any independent variable is correlated with the unobserved heterogeneity. Even when the underlyi
23、ng data structure exactly matches the AdjY or AvgE specifications (implying there are peer effects), both estimators are inconsistent. For both estimators, the bias can be large and complicated; trying to predict even the sign of the bias is typically impractical because it depends on numerous corre
24、lations. The shortcomings of the AdjY and AvgE estimators stand in stark contrast to the fixed effects (FE) estimatoranother approach available to control for unobserved group-level heterogeneity. The FE estimator, which instead adds group indicator variables to the OLS estimation, is consistent in
25、the presence of unobserved group-level heterogeneity. When there is only one source of unobserved group- level heterogeneity, the FE estimator is equivalent to demeaning all of the dependent and independent variables with respect to the group and then estimating using OLS. The differences between th
26、e estimators are important because the AdjY and AvgE estimators can lead researchers to make incorrect inferences. We show that AdjY and AvgE estimates can be more biased than OLS and even yield estimates with the opposite sign of the true coefficient. AdjY and AvgE can also be inconsistent even in
27、circumstances in which the original OLS estimates would be consistent. 2 When estimating a few textbook finance models using each of the different techniques to control for unobserved heterogeneity, we find large differences between the AdjY, AvgE, and FE estimates and confirm that AdjY and AvgE can
28、 exhibit larger biases than OLS and can yield coefficients of the opposite sign as FE. These differences confirm the presence of unobserved group-level heterogeneity in these settings and of correlations within these commonly used data structures that cause the AdjY and AvgE estimators to be inconsi
29、stent and potentially quite misleading in practice. Based on these findings, we argue that AdjY and AvgE and related estimators should not be used to control for unobserved group-level heterogeneity. Any estimation that transforms the dependent variable but not the independent variables will typical
30、ly yield inconsistent estimates. For example, subtracting the group median or the mean or median of a comparable set of firms from the dependent variable will yield inconsistent estimates by failing to account for how the corresponding group median or mean of the independent variables affects the ad
31、justed dependent variable. Our findings apply to a diverse set of estimations found in the literature. The practice of industry- adjusting dependent variables is common in many corporate finance papers. Even a simple comparison of industry- or benchmark-adjusted outcomes before and after eventsas in
32、 many analyses of corporate control transactions, stock issues, and other sets of 0/1 eventsdoes not reveal the true effect of the events. Corporate governance analyses of the effects of business combination laws across U.S. states while controlling for industry-year and state-year averages of the d
33、ependent variable are not properly specified. Our criticism also applies to estimators in some asset pricing studies. The method of characteristically adjusting stock returns in asset pricingwhich subtracts the return of a benchmark portfolio containing stocks with similar characteristicsbefore sort
34、ing and comparing these stock returns across subsamples is problematic, because it does not control for how the variable used to sort the adjusted stock returns varies across the different benchmark portfolios. Our findings also shed light on why other estimators provide incorrect inferences. The om
35、itted variable problem of AdjY can be generalized to any dependent variable that is constructed using multiple observations, even if this is not done to control for an unobserved heterogeneity. For example, regression analysis of conglomerates diversification discount suffers from this bias when the
36、 researcher does not control for the independent variables of the single-segment firms used to construct the dependent variable. Our analysis also highlights potential problems with instrumental variables (IV) estimators that 3 4 instrument for an endogenous independent variable using its group aver
37、age. The exclusion restriction for this IV estimator is violated whenever an unobserved group heterogeneity is correlated with the regressor. FE estimators should be used instead of AdjY or AvgE to control for an unobserved heterogeneity. FE estimators are consistent because they are equivalent to t
38、ransforming both the dependent and independent variables so as to remove the unobserved heterogeneity. For any AdjY or AvgE estimator, there is a corresponding FE estimator that properly accounts for correlations in both the dependent and independent variables. For example, rather than industry-adju
39、sting a dependent variable or controlling for the industry mean of the dependent variable, researchers should instead estimate a model with industry fixed effects. Likewise, rather than correlating benchmark-portfolio-adjusted stock returns with an explanatory variable of interest, a researcher shou
40、ld instead estimate a model with fixed effects for each benchmark portfolio. FE estimation still transforms the stock returns using the average returns for the benchmark portfolios but also controls for how the explanatory variable of interest varies across the benchmark portfolios.2 The FE estimato
41、r, however, also has limitations. Although the FE estimator controls for unobserved group-level heterogeneties, it is unable to control for unobserved within-group heterogeneities. FE estimation also cannot identify the effect of independent variables that do not vary within groups and is subject to
42、 attenuation bias in the presence of measurement error. We discuss these limitations and provide guidance on when FE estimation is appropriate. We also describe how researchers can potentially address these limitations of the FE estimator. We also address another limitation of FE that has motivated
43、some researchers to use AdjY or AvgE rather than FEcomputational difficulties that arise when trying to estimate FE models with multiple types of unobserved heterogeneity. As the size and detail of datasets has increased, researchers are increasingly interested in controlling for multiple sources of
44、 unobserved heterogeneity. For example, executive compensation may be affected by unobserved managerial skill and by unobserved firm quality (Graham, Li, and Qui 2012; Coles and Li 2011a). Likewise, researchers who use firm-level data are increasingly concerned about both unobserved firm-level chara
45、cteristics and time-varying heterogeneity 2 To help interested researchers, we have also posted code and additional resources on our website to show how common implementations of AdjY and AvgE can be transformed into consistent FE estimators. 5 across industries, such as industry-level shocks to dem
46、and. When there are multiple sources of unobserved group heterogeneity in an unbalanced panel, demeaning the data multiple times is not equivalent to fixed effects. FE estimation of such models requires a large number of indicator variables, which can pose computational problems. The computer memory
47、 required to estimate these models can exceed the resources available to most researchers. We discuss techniques that provide consistent estimates for models with multiple, high- dimensional group effects, while avoiding the computational constraints of a standard FE estimator. One approach is to in
48、teract all values of the multiple group effects to create a large set of fixed effects in one dimension that can be removed by transforming the data. A second approach, which helps to avoid potential attenuation biases and allows the researcher to estimate a larger set of parameters, is to maintain
49、the multidimensional structure but to make estimation feasible by reducing the amount of information that needs to be stored in memory. This can be accomplished by using the properties of sparse matrices and/or by employing iterative algorithms. We discuss the relative advantages of each approach and how these techniques can be implemented easily in the widely used statistical