《统计计算考试题目.doc》由会员分享,可在线阅读,更多相关《统计计算考试题目.doc(6页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、精品文档,仅供学习与交流,如有侵权请联系网站删除统计计算题目4.2. Epidemiologists are interested in studying the sexual behavior of individuals at risk for HIV infection. Suppose 1500 gay men were surveyed and each was asked how many risky sexual encounters he had in the previous 30 days. Letdenote the number of respondents repor
2、ting i encounters, for i=1,.,16. Table 4.2 summarizes the responses. These data are poorly fitted by a Poisson model. It is more realistic to assume that the respondents comprise three groups. First, there is a group of people who,for whatever reason, report zero risky encounters even if this is not
3、 true. Suppose a respondent has probabilityof belonging to this group. With probability, a respondent belongs to a second group representing typical behavior. Such people respond truthfully, and their numbers of risky encounters are assumed to follow a Poisson() distribution. Finally, with probabili
4、ty , a respondent belongs to a high-risk group. Such people respond truthfully, and their numbers of risky encounters are assumed to follow a Poisson() distribution. The parameters in the model areand.At theth iteration of EM, we useto denote the current parameter values. The likelihood of the obser
5、ved data is given bywhere for i=1,.,16. The observed data are. The complete data may be construed to be , and ,where ,denotes the number of respondents in group reporting risky encounters and and correspond to the zero, typical,and promiscuous groups, respectively. Thus,andfor i=1,.,16.Let Definefor
6、 i=0,.,16. These correspond to probabilities that respondents with i risky encounters belong to the various groups.a. Show that the EM algorithm provides the following updates:b. Estimate the parameters of the model, using the observed data.c. Estimate the standard errors and pairwise correlations o
7、f your parameter estimates, using any available method.解:(1)则有其中表示不同组,表示危险性行为。即得证;(2) 下证EM算法更新推导过程:计算E步的Q函数:(3)计算M步的Q函数求极值过程:(i)由于要使Q函数达到最大,同时参数必须满足,运用拉格朗日乘法可得从而有;(ii)由于要使Q函数达到最大,即对求偏导。从而有;从而有;即得证。算法:(1)首先将混合正态模型的参数初始化为;(2)E步:通过混合正态分布进行随机模拟得到n个样本,计算完全数据对数似然关于数据的期望值,对数似然函数的期望(3)M步:最优化期望值,即通过迭代找到的最大值;
8、即题目7.2 Simulating from the mixture distribution in Equation (7.6) is straightforward see part (a) of Problem 7.1. However, using the MetropolisHastings algorithm to simulate realizations from this distribution is useful for exploring the role of the proposal distribution.a. Implement a MetropolisHas
9、tings algorithm to simulate from Equation (7.6) with ,usingas the proposal distribution. For each of three starting values,x(0) =0,7,and 15, run the chain for 10,000 iterations. Plot the sample path of the output from each chain. If only one of the sample paths was available, what would you conclude
10、 about the chain? For each of the simulations, create a histogram of the realizations with the true density superimposed on the histogram. Based on your output from all three chains, what can you say about the behavior of the chain?b. Now change the proposal distribution to improve the convergence p
11、roperties of the chain. Using the new proposal distribution, repeat part (a).算法:1. 从两个正态总体里分别以0.7和0.3的概率产生100个随机模拟样本2.选取一个建议分布,从建议分布中抽取一个候选值。3.计算Metropolis-Hastings比率(通常实际中用贝叶斯推断得到的一个比率。4.以等于R的概率接受,如果接受,则,如果没有接受,则。5.增加t,重复上述过程,直到收敛。题目7.5 A clinical trial was conducted to determine whether a hormone
12、treatment benefits women who were treated previously for breast cancer. Each subject entered the clinical trial when she had a recurrence. She was then treated by irradiation and assigned to either a hormone therapy group or a control group. The observation of interest is the time until a second rec
13、urrence, which may be assumed to follow an exponential distribution with parameter(hormone therapy group) or(control group). Many of the women did not have a second recurrence before the clinical trial was concluded, so that their recurrence times are censored. In Table 7.2, a censoring time M means
14、 that a woman was observed for M months and did not have a recurrence during that time period, so that her recurrence time is known to exceed M months. For example, 15 women who received the hormone treatment suffered recurrences, and the total of their recurrence times is 280 months. Letbe the data
15、 for theth person in the hormone group, whereis the time andequals 1 ifis a recurrence time and 0 if a censored time. The data for the control group can be written similarly. The likelihood is then Youve been hired by the drug company to analyze their data. They want to know if the hormone treatment
16、 works, so the task is to find the marginal posterior distribution ofusing the Gibbs sampler. In a Bayesian analysis of these data, use the conjugate prior Physicians who have worked extensively with this hormone treatment have indicatedthat reasonable values for the hyperparameters are (a, b, c, d)
17、 = (3, 1, 60, 120).a. Summarize and plot the data as appropriate.b. Derive the conditional distributions necessary to implement the Gibbs sampler.c. Program and run your Gibbs sampler. Use a suite of convergence diagnostics to evaluate the convergence and mixing of your sampler. Interpret the diagno
18、stics.d.Compute summary statistics of the estimated joint posterior distribution, including marginal means, standard deviations, and 95% probability intervals for each parameter. Make a table of these results.e. Create a graph which shows the prior and estimated posterior distribution forsuperimpose
19、d on the same scale.f. Interpret your results for the drug company. Specifically, what does your estimate ofmean for the clinical trial? Are the recurrence times for the hormone group significantly different from those for the control group?g. A common criticism of Bayesian analyses is that the resu
20、lts are highly dependent on the priors. Investigate this issue by repeating your Gibbs sampler for values of the hyperparameters that are half and double the original hyperparameter values.Provide a table of summary statistics to compare your results. This is called a sensitivity analysis. Based on
21、your results, what recommendations do you have for the drug company regarding the sensitivity of your results to hyperparameter values?TABLE 7.2 Breast cancer dataHormone TreatedControlRecurrence Times 2 4 6 9 9 9 13 14 18 23 31 32 33 34 431 4 6 7 13 2425 35 35 39Censoring Times 10 14 14 16 17 18 18
22、 19 20 20 21 21 23 24 29 29 30 30 31 31 31 33 35 37 40 41 42 42 44 46 48 49 51 53 54 54 55 561 1 3 4 5 810 11 13 14 14 15 17 19 20 22 24 24 24 25 26 26 26 28 29 29 32 35 38 39 40 41 44 45 47 47 47 50 50 51算法:1. 初始化。2. 分别计算其条件分布、和推导如下:又则有同理可推导3.可以判定服从分布,服从分布分布。4. 首先在分布中抽取,再分布中抽取。5. 重复上述步骤进而求出6. 利用所抽出的样本,估计【精品文档】第 6 页