应用统计学-卡方检验.ppt

上传人:wuy****n92 文档编号:66030337 上传时间:2022-12-11 格式:PPT 页数:33 大小:727KB
返回 下载 相关 举报
应用统计学-卡方检验.ppt_第1页
第1页 / 共33页
应用统计学-卡方检验.ppt_第2页
第2页 / 共33页
点击查看更多>>
资源描述

《应用统计学-卡方检验.ppt》由会员分享,可在线阅读,更多相关《应用统计学-卡方检验.ppt(33页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、Week Six Analyzing categorical data:Chi-squared tests This week lecture will cover.Analysing categorical data(nominal)Chi-square test of differences between proportions Chi-square test of independenceSPSS单样本非参数检验总体分布的总体分布的chi-square检验检验(1)目的目的:根据样本数据推断总体的分布与某个已知分布是否有显著差异根据样本数据推断总体的分布与某个已知分布是否有显著差异-吻

2、合性检验。吻合性检验。适用于分类资料的统计推断适用于分类资料的统计推断SPSS单样本非参数检验单样本非参数检验l总体分布的chi-square检验(2)基本假设:H0:总体分布与理论分布无显著差异(3)基本方法根据已知总体的构成比计算出样本中各类别的期望频数,计算实际观察频数与期望频数的差距,即:计算卡方值卡方值较小,则实际频数和期望频数相差较小.如果P大于a,不能拒绝H0,认为总体分布与已知分布无显著差异.反之SPSS单样本卡方检验总体分布的总体分布的chi-square检验检验(4)基本操作步骤基本操作步骤:菜单:analyze-nonparametric test-chi square选

3、定待检验变量入test variable list 框确定待检验个案的取值范围(expected range)get from data:全部样本use specified range:用户自定义个案范围指定期望频数(expected values)all categories equal:所有类别有相同的构成比value:用户自定义构成比Categorical variableVariables that describe categories of entitiesDealing with them all the time in statisticsMaking comparisons

4、among variablesFor example,whether consumers prefer a particular brand of a product among other competing brands.Checking whether there is a relationship between two categorical variables Gender and preference for a product,whether the preference for a product is independent from genderChi-square te

5、st for differences between proportionsThis test involves with nominal data produced by multinomial experimentIt is a generalisation of a binomial experimentThese test the null hypothesis that data in the target population has a particular probability distribution.Example 1We might test whether consu

6、mers are indifferent to which of four materials(glass,plastic,steel or aluminium)that could be used to make soft drink containers.The null hypothesis is that they are indifferent(or that equal numbers prefer glass,plastic,steel and aluminium).Example 1DataLet pG be the probability that an individual

7、 selected at random will nominate glass as his/her preference if required to make a choice.Similarly for pP(plastic),pS(steel)and pA(aluminium)HypothesesHO:pG=pP=pS=pA=0.25.HA:at least one pi 0.25.The alternative is that at least one material is more preferred(or less preferred)than the others.Examp

8、le 1cont.Procedure:Select a random sample of,say,100 consumers and determine their preferences.Under the null hypothesisWe expect 25 consumers to nominate glass,25 to nominate plastic,25 to nominate steel and 25 to nominate aluminiumThese are the expected frequencies,Ei.Ei=n pi.We compare the expect

9、ed frequencies with the sample results or the observed frequencies,Oi.If they are approximately the same we would conclude that the null hypothesis is true.Oi Ei HO is probably true.Example 1cont.,Chi squareWe require a test statistic to decide whether the difference is large enough to reject the nu

10、ll hypothesis.We use chi square with G-1 degrees of freedom where G is the number of groups.Suppose in our example,39 prefer glass,16 prefer plastic,20 prefer steel and 25 prefer aluminium.Recall that the expected frequencies were all 25.Obtain the critical value of chi square Critical 23=7.82.Obtai

11、n the critical value at 5%significance level at 3 d.f.,(Table E4,page 742,Berenson et.al.2013)i.e.there is only a 5 percent chance or less that 23 7.82 if HO is true.Comparison of chi square values23=12.08 7.82 reject HO.Conclusion:at the 5%significance level there is sufficient evidence to reject t

12、he null hypothesis.At least one of the probabilities(pi)is different.The sample results indicate that the materials are not equally preferred by consumers in the target population.Thus,at least preferences for two materials are different.Chi square test using SPSSExample:Suppose that we want to test

13、 whether or not customers have a colour preference for packaging.Three different colours,Blue,Green&Purple,are considered.The null hypothesis is that they dont have colour preference.Use Analyse/Nonparametric tests/Chi-Square.The default is that the probabilities are equal.Main display colour2630.0-

14、4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualExample:We test the null hypothesis that consumers in the Example:We test the null hypothesis that consumers in the target population have no preference for any of three target population have no preference for any of three colo

15、urs of packaging.colours of packaging.Numbers of consumers actually choosing particular colours.Numbers of consumers expected to choose particular colours if the null is true.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualDifferent but differentenou

16、gh to reject the null?Test Statistics2.4672.291Chi-SquareadfAsymp.Sig.Main DisplayColour0 cells(.0%)have expected frequencies less than5.The minimum expected cell frequency is 30.0.a.Degrees of freedom,groups-1Chi-square statisticTest Statistics2.4672.291Chi-SquareadfAsymp.Sig.Main DisplayColourChec

17、k this to test the null.Ho:Consumers in the target population have no preference for any of three colours of Ho:Consumers in the target population have no preference for any of three colours of packagingpackagingH1:Consumers in the target population have preference for at least one of three H1:Consu

18、mers in the target population have preference for at least one of three colours of packaging.colours of packaging.Check the sig value to test Ho Cannot reject the null(Ho)that all three colours are equally preferredbecause Sig 0.05.Conclusion:At 5%significance level there is no sufficient evidence t

19、o conclude that consumers in the target population have preference for at least one of three colours of packaging.Tests of independence Chi-squared test of a contingency tableThis test satisfies two different problem objectives:Are two nominal variables related?Are there differences among two or mor

20、e population of nominal variables?Consider the following 3 featuresHeight in centimetres,Weight in kilograms&Colour of eyes.Whilst some people are tall and thin,on average taller people weigh more than shorter people.Weight and height are not independent.It seems unlikely that people with blue eyes

21、weigh more,on average,than people with brown eyes.Weight and eye colour are almost certainly independent.交叉分组下的频数分析目的 了解不同变量在不同水平下的数据分布情况 例:学习成绩与性别有关联吗?(两变量)例:职业、性别、爱逛商店有关联吗?(三变量)分析的主要步骤产生交叉列联表分析列联表中变量间的关系产生交叉列联表什么是列联表列变量行变量地区控制变量频数产生交叉列联表基本操作步骤(1)菜单选项:analyze-descriptive statistics-crosstabs(2)选择一个

22、变量作为行变量到row框.(3)选择一个变量作为列变量到column框.(4)可选一个或多个变量作为控制变量到layer框.控制变量的层次设置:同层为水平数加水平数加;不同层为水平数积水平数积.(5)是否显示各分组的棒图(display clustered bar charts)产生交叉列联表进一步计算 cells选项:选择在频数分析表中输出各种百分比.row:行百分比(Row pct);column:列百分比(Col pct);total:总百分比(Tot pct);分析列联表中变量间的关系目的:通过列联表分析,检验行列变量之间是否独立。方法:卡方检验:对品质数据的相关性进行度量分析列联表中

23、变量间的关系卡方检验 年龄与工资收入交叉列联表 低 中 高 青 400 0 0 中 0 5000 老 0 0 600 低 中 高 青 0 0 500 中 0 6000 老 400 0 0分析列联表中变量间的关系卡方检验基本步骤(1)H0:行列变量之间无关联或相互独立(2)构造卡方统计量统计量服从(r-1)*(c-1)个自由度的卡方分布count:观察(实际)频数expected count:期望频数(期望频数反映的是H0成立情况下的数据分布特征)Residual:剩余(观察频数-期望频数)不患肺癌不患肺癌患肺癌患肺癌总计总计不吸烟不吸烟7775427817吸烟吸烟2099492148总计总计9

24、8749199651、列联表2、三维柱形图3、二维条形图不患肺癌患肺癌吸烟不吸烟不患肺癌患肺癌吸烟不吸烟080007000600050004000300020001000从三维柱形图能清晰看出从三维柱形图能清晰看出各个频数的相对大小。各个频数的相对大小。从二维条形图能看出,吸烟者中从二维条形图能看出,吸烟者中患肺癌的比例高于不患肺癌的比例。患肺癌的比例高于不患肺癌的比例。通过图形直观判断两个分类变量是否相关:通过图形直观判断两个分类变量是否相关:Tests of independence contExample 2Suppose we interviewed 400 people&asked

25、themwhich of three age groups they are in(under 25,25 to 60,and over 60).We also ask their response to the statement that“All imports of automobiles should be banned in order to protect the local industry”(agree,no view either way,disagree).attitudes towards banning importsagreeno viewdisagree Total

26、 age groupunder 2519 53 25 9725-6046 94 47 187over 6030 56 30 116Total95203102 400Tests of independence contExample 2 cont.Null hypothesis:The null hypothesis is that answers to the two questions are independent.Under the null:Probover 60 and agree=Probover 60 ProbagreeMultiplication rule for indepe

27、ndent eventsExpected frequency=Probover 60 Probagree sample size.ProcedureWe set up a cross-tabulation showing the observed frequencies of answers to the two questions.We calculate the expected frequencies.TestOur test is based on a comparison of the observed and expected frequencies.Short-cut for e

28、xpected frequenciesAge*attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 6

29、0AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalCalculation for expectedfrequency of agree and over 60,95 116/400Age*attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.020

30、3.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalThe count(observed)and the expected are different,but different enough to reject the null?Chi-squared test for independenceRationa

31、le:Oij Eij HO is probably true.Test statisticWe require a test statistic to decide whether the difference is large enough to reject the null hypothesis.Chi-Square Tests1.438a4.8371.5174.8051.3071.758400Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid CasesValuedfAsymp.Sig.(2-s

32、ided)0 cells(.0%)have expected count less than 5.Theminimum expected count is 23.0.a.Calculated value ofChi-Square.Degrees of freedom,(rows-1)(columns-1)Chi-Square Tests1.438a4.8371.5174.8051.3071.758400Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid CasesValuedfAsymp.Sig.(2-

33、sided)0 cells(.0%)have expected count less than 5.Theminimum expected count is 23.0.a.Cannot rejectthe null that all attitude andage are independentbecause Sig 0.05.H0:attitudes and age are independent.H1:attitudes and age are dependent.Conclusion:At 5%significance level we are unable to conclude that age&attitudes towards banning automobile imports are dependent.

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育专区 > 大学资料

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁