《(10.5.1)--Chapter10statisticalpredictionme.ppt》由会员分享,可在线阅读,更多相关《(10.5.1)--Chapter10statisticalpredictionme.ppt(40页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、10.5 Multiple linear regression model In real life,it is not only one explanatory variable that causes the change of the explained variable,but there may be many explanatory variables.For example,output is often affected by various input factors-capital,labor,technology,etc.Sales are often influence
2、d by price and the companys investment in advertising.Therefore,multiple linear model-number of explanatory variables 2 is more common.10.5.1 Multiple linear regression model and its assumptions In practical problems,sometimes a variable is affected by one or more explanatory variables.Then it is ne
3、cessary to establish a multiple regression model for research.Suppose that there is a linear relationship between and variables ,.The multiple linear regression model is expressed as::Where is the explained variable(dependent variable),is the explanatory variable(independent variable),is the random
4、error term,and is the regression parameter(usually unknown).This shows that is an important explanatory variable for .is for the many small factors that affect the change of .When given a sample of size ,the observed value of the sample is then getthat is Let then we can get To ensure that we can ge
5、t the optimal estimator by OLS method,the regression model should meet the following assumptions:The random error term vector is non-autocorrelated and homoskedasticity,where each term satisfies the mean of zero and the variance of ,which is the same and finite value,namelyThe explanatory variable i
6、s independent of the error term,i.e The explanatory variables are linearly independent Where represents the rank of the matrix.The explanatory variable is non-random,and when ,1.Ordinary least squares(OLS)The principle of ordinary least square method is to determine the estimation value of regressio
7、n parameters by calculating the square sum of residual(the estimated value of error term)and the minimum,which an extremum problem.10.5.2 Parameter estimation of multiple linear regression model is used to represent the sum of squares of residuals and to estimate the regression parameters under its
8、minimum condition.The following equations are obtained The essence of the parameter estimation is to solve a system of elements.Normal equationsLet Matrix representation of least squares Properties of least squares estimators Linear(estimators are linear combinations of observed values of explained
9、variables)Since the elements of are non-random,is a constant matrix.It is known from the above equation that is a linear combination of ,is a linear estimator,and has linear characteristics.Unbiasedness(the mathematical expectation of the estimator=the estimated truth value)Using ,we can get is a li
10、near unbiased estimator of ,which is unbiased.Validity(the variance of the estimator is the smallest of all linear unbiased estimators)It has the characteristic of minimum variance.The estimator of the variance of the random error term If is known,then The estimator of the variance of the random err
11、or term If is know,then define So the above equation can be written as Matrix is symmetric and idempotent,that is Sample size problem Sample is an important practical problem,the model depends on the actual sample.the acquisition of samples need cost,so we can attempt to reduce the difficulty of dat
12、a collection by the determination of sample size.Minimum sample size:the sample size that meets the basic requirements exist is a full rank matrix of order There must be ,which is the minimum sample size that meets the basic requirements.General experience suggests that:or can satisfy the basic requ
13、irements of model estimationWhen ,distribution is stable and the test is effective.Regression analysis is to replace the real parameters of the population with the estimated parameters of the sample,or replace the population regression with the sample regression.Although it is known from statistical
14、 properties that the expected value(mean)of an estimate of a parameter is equal to the parameter truth value of its population if there are enough repeated samples,the estimate does not necessarily equal the truth value in a single sample.Then,in a sample,How large the difference between the estimat
15、ed value of the parameter and whether it is significant,which requires further statistical test.10.5.3 Statistical test of multiple linear regression model1.Goodness of fit test Test the fitting degree of the model curve to the observed values of samples and decomposition of the sum of squares of to
16、tal dispersion T.YX0*Y Y9 9 The part explained by the regression equation represents the linear effect of the explanatory variable on The residual term represents the part of the regression equation that cannot be explainedTotal sum of squares(TSS)Regression sum of squares(ESS)Residuals sum of squar
17、ed(RSS)2.Sample determination coefficient It is the most important index of goodness of fit evaluation,and the standard deviation of residual can also be used as the reference index of goodness of fit evaluation.The coefficient of determinationThe least squares estimator of the variance of the rando
18、m termThe calculation method of correlation coefficient is the same as that of sample determination coefficient,but with different meanings:Sample resolution coefficient is a quantitative index to judge the goodness of fit between regression equation and sample observation,the implicit premise is th
19、at and have a causal relationship;Correlation coefficient is to judge the close degree of linear correlation between two random variables without considering the causal relationship;When the explanatory variable is increased,the adjusted coefficient of determination is likely to increase ,which may
20、easily lead to the illusion that it is necessary to increase the explanatory variable in the regression model.Consider modifyingThinking:can the coefficient of determination of adjustment be negative?If its negative,what does that mean?3.Chi chi information criterion and Schwartz criterion In order
21、to compare the goodness of fit of multiple regression models with different number of explanatory variables,there are some common criteria Chi chi information criterion Schwartz criterion 4.Significance test of global linearity of the equation(F test)The F test examines the statistical significance
22、of the estimated regression equation as a whole At least one of them is not 0.Since follows a normal distribution,the sum of the squares of a group of follows distributionIf rejects ,otherwise do not reject .5.T test of parameter estimatorTest the statistical significance of each explanatory variabl
23、e in the regression equation.The elements of on the main diagonal are called the gaussian multiplier,and multiplying by is the variance of the corresponding coefficient IF rejects ,is considered to be significantly different from 0.Or looking up the the probability of distribution table according to
24、 if rejects ,the confidence interval of the parameter isIt is easy to deduce:at the confidence level of ,the confidence interval of is where is the critical value of distribution with significance level and degrees of freedom .6.Steps of statistical test of regression model Check the goodness of fit
25、,conduct F test,and judge whether the regression equation is valid on the whole.If the F test fails,there is no need to proceed to the next step.Otherwise go to the next step.Check the t value of each variable and its corresponding probability,and conduct t test.If the corresponding probability is l
26、ess than the given significance level,the coefficient of the independent variable is significantly not 0,and the independent variable has a significant effect on the dependent variable.Otherwise,there is no significant difference between the coefficient and 0(essentially=0),and the independent varia
27、ble has no significant effect on the dependent variable.Therefore,the equation should be deleted from the equation and reestimated.For a modelgiven the observed value of the explanatory variable outside the sample,the predicted value of the explained variable can be obtained.It could be a prediction
28、 of the population mean or .10.5.4 Prediction of multiple linear regression model1.Confidence interval of It is easy to know)()()()(00YEEEYE=BXBXBX000It is easy to prooveTake the sample estimator of the random disturbance term to obtain the estimator of the variance of t Therefore,the confidence int
29、erval of at the confidence level of is obtained:Where,is the critical value at the confidence level of .),(020XX)X(XBX100-sNY2.Confidence interval of :If the actual predicted value is known,the prediction error is:then it is easy to prove 0)()()()(100000000=-=-=-+=-XXXXBBXBXBXm mm mm mEEEeE follows a normal distribution,which isThe sample estimator of the random disturbance term is taken to obtain the estimator of the variance of Construct t statistic The confidence interval of at a given confidence level of can be obtained.