《The unobserved heterogeneity distribution in duration analysis.docx》由会员分享,可在线阅读,更多相关《The unobserved heterogeneity distribution in duration analysis.docx(14页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Biometrika (2007), 94, 1, pp. 8799 doi:10.1093/biomet/asm013 2007 Biometrika Trust Printed in Great Britain The unobserved heterogeneity distribution in duration analysis BY JAAP H. ABBRING AND GERARD J. VAN DEN BERG Department of Economics, Free University Amsterdam, De Boelelaan 1105, 1081 HV Amst
2、erdam, The Netherlands jabbringfeweb.vu.nl gbergfeweb.vu.nl SUMMARY In a large class of hazard models with proportional unobserved heterogeneity, the distribution of the heterogeneity among survivors converges to a gamma distribution. This convergence is often rapid. We derive this result as a gener
3、al result for exponential mixtures and explore its implications for the specification and empirical analysis of univariate and multivariate duration models. Some key words: Duration analysis; Exponential mixture; Gamma distribution; Limit distribution; Mixed proportional hazard. 1. INTRODUCTION It i
4、s well known that duration analysis produces incorrect results if unobserved heterogeneity is ignored (Lancaster, 1990). On average, subjects with relatively high hazard rates for unobserved reasons leave the state of interest first, so that samples of survivors are selected. Differences between suc
5、h samples at different times reflect behavioural differences as well as this selection effect. Lancaster (1979) specified and estimated a proportional hazard model with multiplicative unobserved heterogeneity. This is called a mixed proportional hazard model and has subsequently become by far the mo
6、st popular duration model in econometrics. Van den Berg (2001) presents a survey. The model is typically estimated using methods that require parametric functional form assumptions on the heterogeneity distribution. Lancaster (1979) assumes a gamma distribution, as do Vaupel et al. (1979), who intro
7、duced the model in demography. Nickell (1979) assumes a discrete distribution, and others have made other choices (Van den Berg, 2001). Unfortunately, estimators of the mixed proportional hazard model are usually biased if the functional form of the heterogeneity distribution is misspecified. Extens
8、ive simulation evidence is provided by, for example, Baker & Melino (2000) and Bretagnolle & Huber- Carol (1988). Also, many empirical studies report that the estimates are sensitive to the functional form of the distribution (Heckman & Singer, 1984; Trussell & Richards, 1985; Hougaard et al., 1994;
9、 Keiding et al., 1997). As a result, studies in which mixed proportional hazard models are estimated have wrestled with the choice of a functional form for the heterogeneity distribution; see for example Heckman & Singer (1984). In general, there is no argument in favour of one choice over the other
10、. Also, formal results in the methodological studies by Heckman & Taber (1994), Kortram et al. (1995) and Horowitz (1999) indicate that duration data are rather uninformative about the shape of this distribution. In practice, researchers often choose a gamma mixing distribution for computational and
11、 expositional reasons; 88 JAAP H. ABBRING AND GERARD J. VAN DEN BERG all functions of interest have simple explicit expressions in this case (Lancaster, 1990). The mixed proportional hazard model with gamma heterogeneity is a preferred option in popular statistical packages like STATA, SAS, S-Plus a
12、nd SPSS. Recently developed semiparametric estimators for the model also assume gamma heterogeneity; for examples see Clayton (1978), Meyer (1990), Nielsen et al. (1992), Murphy (1994, 1995), Petersen et al. (1996) and references in Andersen et al. (1993). The results in this paper rationalize this
13、preference for the gamma distribution, and connect the many results that have been derived for the gamma case to a wider class of models. 2. A LIMIT RESULT FOR EXPONENTIAL MIXTURES 21. Exponential mixtures Let Z and V be nonnegative random variables such that pr(Z z|V) = exp(V z). (1) The marginal d
14、istribution of Z is therefore a mixture of exponential distributions with respect to the marginal distribution F of V : pr(Z z) = exp(vz)dF (v). 0 We examine the limiting behaviour of the distribution of V conditional on Z z as z . In particular, we examine the limiting behaviour of Gz(v) = pr (zV v
15、|Z z) . 22. Main result We adopt the definitions of Feller (1971, VIII8) of slow variation and regular variation at 0. DEFINITION 1. A positive function L defined on (0, ) is slowly varying at 0 if limy0 L(y)/L(y) = 1 for every fixed 0. DEFINITION 2. A positive function k defined on (0, ) is regular
16、ly varying with exponent 0, at v. We define the standard gamma distribution as := 1, , with density denoted by . Finally, we define the limiting case 0 such that 0(v) = 1 for all v 0, ). This is a degenerate distribution with all probability mass at zero. We now state the main result. Unobserved het
17、erogeneity distribution 89 z z z PROPOSITION 1. If Gz G as z , with G a proper distribution function, then G = for some 0. A necessary and sufficient condition for Gz ( 0) is that F is regularly varying with exponent at 0. Proof. The Laplace transform LGz of Gz is given by LGz (s) = exp(sv)dGz(v) =
18、LF z(s + 1) . 0 LF (z) First, suppose that Gz G as z , with G a proper distribution function, and denote the Laplace transform of G by LG. Then LGz LG as z by the continuity of the Laplace transform. Thus, lim LGz (s) = lim LF z(s + 1) z z LF (z) exists and is positive and nonincreasing on (0, ). By
19、 Feller (1971, VIII8, Lemma 1), the latter limit then necessarily equals (s + 1) for some 0. In turn, this implies that G = for some 0. Secondly, again by continuity of the Laplace transform, z(s + 1) Gz lim LF LF (z) = (s + 1), so that Gz if and only if LF is regularly varying with exponent at infi
20、nity. In turn, it follows from an Abelian/Tauberian theorem, like Theorem 3 of Feller (1971, XIII5), that this is true if and and only if F varies regularly with exponent at 0. Examples of continuous distributions that are regularly varying at 0 with exponent 0 are all distributions with densities t
21、hat have finite positive limits at 0, such as the exponential, uniform and truncated normal distributions, and all gamma and beta distributions. Examples with 0 also include some discrete distributions with dense support near 0. The case = 0 includes all distributions, including finitely discrete di
22、stributions, with a point mass at 0. An obvious example of a distribution that is not regularly varying at 0 is a distribution without support near 0. Let v0 := infv : F (v) 0 be the largest lower bound on the support of F . Let F 0 be the distribution of V v0 and G0 the distribution of z(V v0) cond
23、itional on Z z. Then Proposition 1 applies without change with F replaced by F 0 and Gz replaced by G0. 23. Speed of convergence In statistical applications results about the rate of convergence of Gz to G would be useful. The following example shows that no general result about this rate can be der
24、ived under the conditions of Proposition 1, notably under regular variation of F with exponent 0 0. Then = k and Gz k by Proposition 1. Note that this convergence is uniform. It is easy to show that lim z supv Gz(v) G(v) zk1 exp(z)/ (k) = 1. This result does not generalize to all distributions that
25、are regularly varying with exponent . For example, let F (v) = v1 log(v) on (0, 1). Then F is regularly varying 90 JAAP H. ABBRING AND GERARD J. VAN DEN BERG z z z ; z 0 z with exponent 1 at 0, but convergence is much slower than for the linear case k = 1 above. In particular, it can be shown that l
26、im z supv Gz(v) G(v) c/ log(z) = 1, for some constant 0 0. The density is increasing if 1 and 1, U-shaped if 1 and 1. It includes the uniform density on (0, v) for = = 1. The corresponding cumulative distribution function is regularly varying with exponent 0, which implies that G according to Propos
27、ition 1. The parameter v is a scale parameter: we can write V = vV 1, with V 1 distributed with density , 1. We ensure that the examples are mutually comparable by fixing the value of v for each given and such that E(log V) = 0. Figure 1 displays the densities g of (z + 1)V |Z z corresponding to ,;v
28、 for values of and that generate the various density shapes mentioned above, and for z = 0, 05, 1, 2, 5. Obviously, in each case, g = ,;v . The figures also display the limiting density g = . In all cases, we observe convergence to the gamma density. To assess whether or not convergence is rapid we
29、need to obtain some insight into what constitutes a large or a small value of z. By equation (1), the normalization E(log V ) = 0 implies that E(log Z) = 0577. In addition, note that x exp(x) is convex, so that E(1/V ) = Eexp log(V ) expElog(V ) = 1 by Jensens inequality, and as a result E(Z) = E(1/
30、V ) 1. To be more precise, if 1 then E(Z) = , whereas otherwise E(Z) = (1/v)( + 1)/( 1). Given all this, it is fair to state that the convergence is rapid: in most cases depicted g is close to the density of its limiting distribution for z as small as 05 or 1. 3. SINGLE-SPELL DURATION ANALYSIS 31. T
31、he mixed proportional hazard model We first discuss the implications of Proposition 1 for the mixed proportional hazard model as popularized by Lancaster (1979) and Vaupel et al. (1979). The mixed proportional hazard model is a model for the distribution of a continuous random duration T conditional
32、 on a vector X of observed covariates. Under some regularity conditions, it is straightforward to extend the analysis to the case of time-varying explanatory variables, but for ease of exposition we do not take this up here. The model specifies the distribution of T |X as a mixture of the distributi
33、on of T |(X, V ) over the marginal distribution F of V . Here, V is a nonnegative random unobserved heterogeneity factor that is independent of X. The Unobserved heterogeneity distribution 91 z 0 | = 20 18 16 14 12 10 08 06 04 02 00 (a) 20 18 16 14 12 10 08 06 04 02 00 (c) 00 (b) 20 18 16 14 12 10 0
34、8 06 04 02 00 00 04 08 12 16 20 24 28 32 04 08 12 16 20 24 28 32 00 04 08 12 16 20 24 28 32 (d) 20 18 16 14 12 10 08 06 04 02 00 00 04 08 12 16 20 24 28 32 Fig. 1. Densities g of (z + 1)V |Z z for (a) V Be(1, 1, e1), i.e. Un(0, e), with limiting density 1, (b) V Be(1/2, 1/2, 4), with limiting densit
35、y 1/2, (c) V Be(2, 2, e5/6), with limiting density 2, (d) V Be(2, 1/2, e5/3/4), with limiting density 2. distribution of T |(X, V ) is specified in terms of its hazard rate, which is defined by pr(t T 0 at 0. Then Proposition 1 implies that the distribution of c0 + c1 (t )(x) V | (T t, X = x) conver
36、ges to a 1/c1, distribution as (t)(x) , for any c0 R and c1 0. This in turn implies that the distribution of V |(T t, X = x) can be approximated by a gamma distribution with parameters (c0/c1) + (t )(x) and . Here, the value of c0/c1 is arbitrary, apart from the requirement that c0/c1 (t)(x): it is
37、not determined by the limit result nor by properties of F . For t small we require c0 0, however, so that c0/c1 0. Exactly Gamma z 0 z 05 z 1 z 2 z 5 Gamma z 0 z 05 z 1 z 2 z 5 Gamma z 05 z 0 z 1 z 2 z 5 Gamma z 05 z 0 z 1 z 2 z 5 92 JAAP H. ABBRING AND GERARD J. VAN DEN BERG 0 the same distribution
38、 for V |(T t, X = x) is also obtained if a gamma mixing distribution F = c0/c1, is adopted. Note that we can achieve (t )(x) by letting t for given x X. However, our result is not only a large t result. If (x); x X includes a sequence that diverges to , we can also achieve (t)(x) along the correspon
39、ding sequence of covariate values for fixed t such that (t ) 0. 32. Estimation of the baseline with left-truncated data These results can be applied to the empirical analysis of mixed proportional hazard models with left-truncated data. Duration data are left-truncated if a spell only enters the sam
40、ple if its duration exceeds some t0 0. Left-truncation frequently arises in economic applications and poses some hard and mostly unresolved problems. In general, mixed proportional hazard models that are identified from complete data will not be identified from left-truncated data. However, under th
41、e assumption that V has a gamma distribution, some interesting features of the model can still be identified. Consider the two-sample case in which X is binary. Let Sx (t ) := pr(T t0 + t |X = x, T t0). Note that S0 and S1 can be estimated from data that are left-truncated at t0. If F = , then Sx (t
42、 ) = 1 + (t) (x) , (2) with (t) := (t0 + t) (t0) and (x) := (x)/ + (t0)(x). Thus, the model for (S0, S1) reduces to a mixed proportional hazard model with integrated baseline , regressor effects and 1, -distributed heterogeneity. Elbers & Ridders (1982) identification result implies that is identifi
43、ed from (S0, S1), and that and are identified up to a scale normalization, provided that (0) = (1). This, in turn, identifies up to scale almost everywhere on (t0, ). The regressor effects confound dynamic selection effects and the structural covariate effects embodied in . Therefore, we cannot sepa
44、rately identify . However, we can identify the sign of (1) (0), because it equals the sign of (1) (0). We return to this in 33. Our limit result implies that (2) holds approximately in a much wider class of models. This suggests that we adopt the gamma specification (2) and use estimates of to estim
45、ate with truncated data. We expect this estimator often to outperform alternative estimators such as those based on a flexible discrete approximation of the heterogeneity distribution in the truncated sample. We illustrate this point with some Monte Carlo analysis. We generate data from two- sample
46、mixed proportional hazard models with linear , and compare baseline estimates of the models with respectively gamma and two-point heterogeneity. For expositional convenience, we exploit our knowledge that the baseline is in the Weibull class and specify (t ) = (t0 + t)exp() t exp(). Table 1 reports
47、simulated root mean squared errors of the maximum likelihood estimator of for three data-generating processes differing only in the heterogeneity distribution used. Each row in the table corresponds to a different data-generating process. They are all mixed proportional hazard models with linear , pr(X = 0) = pr(X = 1) = 0.5, (1) = 2(0), and t