《数据分析实验报告分析解析.docx》由会员分享,可在线阅读,更多相关《数据分析实验报告分析解析.docx(34页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、 试验课程:数据分析专业: 信息与计算科学班级:学号:姓名:中北大学理学院试验一 SAS 系统的使用【试验目的】了解 SAS 系统,娴熟把握 SAS 数据集的建立及一些必要的 SAS 语句。【试验内容】1. 将 SCORE 数据集的内容复制到一个临时数据集 test。SCORE 数据集NameSexMathChineseEnglishAlicef908591Tomm958784Jennyf939083Mikem808580Fredm848589Katef978382Alexm929091Cookm757876Bennief827984Hellenf857484Winceletf908287Bu
2、ttm778179Geogem868582Todm898484Chrisf898487Janetf8665872. 将 SCORE 数据集中的记录依据 math 的凹凸拆分到 3 个不同的数据集:math大于等于 90 的到 good 数据集,math 在 80 到 89 之间的到 normal 数据集,math 在 80 以下的到 bad 数据集。3. 将 3 题中得到的 good,normal,bad 数据集合并。【试验所使用的仪器设备与软件平台】SAS【试验方法与步骤】1:DATA SCORE;INPUT NAME $ Sex $ Math Chinese English; CARDS;
3、2Alicef908591Tom m958784Jennyf939083Mikem808580Fredm848589Katef978382Alexm929091Cookm757876Bennief827984Hellenf857484Wincelet f908287Buttm778179Geogem868582Todm898484Chrisf898487Janetf866587;Run;PROC PRINT DATA=SCORE;DATA test;SET SCORE;2:DATA good normal bad;SET SCORE;SELECT;when(math=90) output go
4、od;when(math=80&math90) output normal;when(math80) output bad;end;Run;PROCPRINTDATA=good;PROCPRINTDATA=normal;PROCPRINTDATA=bad;3:DATA All;SET good normal bad;PROC PRINT DATA=All;Run;3【试验结果】结果一:结果二:4结果三:5试验二上市公司的数据分析【试验目的】通过使用 SAS 软件对试验数据进展描述性分析和回归分析,生疏数据分析方法,培育学生分析处理实际数据的综合力量。【试验内容】表 2 是一组上市公司在 202
5、3 年的每股收益eps、流通盘(scale)的规模以及 2023 年最终一个交易日的收盘价(price).代码表 2流通盘某上市公司的数据表每股收益股票价格00009685000.05913.2700009960000.02814.200015012600-0.0037.12000151105000.02610.0800015325000.05622.7500015513000-0.0096.8500015636000.03314.95000157100000.0612.65000158100000.0188.3800015970000.00812.15000301153650.047.310
6、0048877000.10113.2600072560000.04412.3300083513380.0722.5800086932000.19418.290008777800-0.08412.550008856000-0.07312.48000890169340.0319.12000892120230.0317.88000897141660.0026.91000900214230.0588.5900090148000.00527.950009026500-0.03110.9200090360000.10911.7900090595000.0469.2900090666500.00714.47
7、00090889880.0068.2800090960000.0029.9900091080000.0368.900091172800.0679.01000912150000.1128.0600091384500.06211.8600091545990.00114.4000916340000.0385.15000917118000.08616.230009186000-0.04510.1261、对股票价格1) 计算均值、方差、标准差、变异系数、偏度、峰度;2) 计算中位数,上、下四分位 数,四分位极差,三均值;3) 作出直方图;4) 作出茎叶图;5) 进展正态性检验正态 W 检验;6) 计算协
8、方差矩阵,Pearson 相关矩阵; 7计算 Spearman 相关矩阵; 8分析各指标间的相关性。2、1对股票价格,拟合流通盘和每股收益的线性回归模型,求出回归参数估量值及残差;2) 给定显著性水平=0.05,检验回归关系的显著性,检验各自变量对因变量的影响的显著性;3) 拟合残差关于拟合值Y , X , X 及X X 的残差图及残差的正态 QQ 图。分析1212这些残差,并予以评述。【试验所使用的仪器设备与软件平台】SAS【试验方法与步骤】data prices;input num scale eps price;cards;00009685000.05913.2700009960000.
9、02814.200015012600-0.0037.12000151105000.02610.0800015325000.05622.7500015513000-0.0096.8500015636000.03314.95000157100000.0612.65000158100000.0188.3800015970000.00812.15000301153650.047.3100048877000.10113.2600072560000.04412.3300083513380.0722.5800086932000.19418.2970008777800-0.08412.550008856000
10、-0.07312.48000890169340.0319.12000892120230.0317.88000897141660.0026.91000900214230.0588.5900090148000.00527.950009026500-0.03110.9200090360000.10911.7900090595000.0469.2900090666500.00714.4700090889880.0068.2800090960000.0029.9900091080000.0368.900091172800.0679.01000912150000.1128.0600091384500.06
11、211.8600091545990.00114.4000916340000.0385.15000917118000.08616.230009186000-0.04510.12run;PROC PRINT DATA=prices;run;proc means data=prices mean var std skewness kurtosis cv;var price;output out=result;run;proc univariate data=prices plot freq normal;var price;output out=result2;run;proc capability
12、 data=prices graphics noprint;histogram price/normal;run;proc corr data=prices pearson spearman cov nosimple;var price;with price;run;proc reg data=prices;model price=scale eps/selection=backward noint p r;output out =prices p=p r=r;proc print data=prices;8run【试验结果】91011对于问题二结果:121314试验三 美国 50 个州七种犯
13、罪比率的数据分析【试验目的】通过使用 SAS 软件对试验数据进展主成分分析和因子分析,生疏数据分析方法,培育学生分析处理实际数据的综合力量。【试验内容】表 3 给出的是美国 50 个州每 100 000 个人中七种犯罪的比率数据。这七种犯罪是: Murder罪,Rape罪,Robbery罪, Assault斗殴罪,Burglary夜盗罪,Larceny偷盗罪,Auto汽车犯罪。表 3美国 50 个州七种犯罪的比率数据StateMurderRapeRobberyAssaultBurglaryLarcenyAutoAlabama14.225.296.8278.31135.51881.9280.7A
14、laska10.851.696.8284.01331.73369.8753.3Arizona9.534.2138.2312.32346.14467.4439.5Arkansas8.827.683.2203.4972.61862.1183.4California11.549.4287.0358.02139.43499.8663.5Colorado6.342.0170.7292.91935.23903.2477.1Connecticut4.216.8129.5131.81346.02620.7593.2Delaware6.024.9157.0194.21682.63678.4467.0Florid
15、a10.239.6187.9449.11859.93840.5351.4Georgia11.731.1140.5256.51351.12170.2297.9Hawaii7.225.5128.064.11911.53920.4489.4Idaho5.519.439.6172.51050.82599.6237.6Illinois9.921.8211.3209.01085.02828.5528.6Indiana7.426.5123.2153.51086.22498.7377.4Iowa2.310.641.289.8812.52685.1219.9Kansas6.622.0100.7180.51270
16、.42739.3244.3Kentucky10.119.181.1123.3872.21662.1245.4Louisiana15.530.9142.9335.51165.52469.9337.7Maine2.413.538.7170.01253.12350.7246.9Maryland8.034.8292.1358.91400.03177.7428.5Massachusetts3.120.8169.1231.61532.22311.31140.1Michigan9.338.9261.9274.61522.73159.0545.5Minnesota2.719.585.985.81134.725
17、59.3343.1Mississippi14.319.665.7189.1915.61239.9144.4Missouri9.628.3189.0233.51318.32424.2378.4Montana5.416.739.2156.8804.92773.2309.2Nebraska3.918.164.7112.7760.02316.1249.1Nevada15.849.1323.1355.02453.14212.6559.2New Hampshire3.210.723.276.01041.72343.9293.4New Jersey5.621.0180.4185.11435.82774.55
18、11.5New Mexico8.839.1109.6343.41418.73008.6259.5New York10.729.4472.6319.11728.02782.0745.8North Carolina10.617.061.3318.31154.12037.8192.115Ohio7.827.3190.5181.11216.02696.8400.4North Dakota0.99.013.343.8446.11843.0144.7Oklahoma8.629.273.8205.01288.22228.1326.8Oregon4.939.9124.1286.91636.435061388.
19、9Pennsylvania5.619.0130.3128.0877.51624.1333.2Rhode Island3.610.586.5201.01489.52844.1791.4South Carolina11.933.0105.9485.31613.62342.4245.1South Dakota2.013.517.9155.7570.51704.4147.5Tennessee10.129.7145.8203.91259.71776.5314.0Texas13.333.8152.4208.21603.12988.7397.6Utah3.520.368.8147.31171.63004.6
20、334.5Vermont1.415.930.8101.21348.22201.0265.2Virginia9.023.392.1165.7986.22521.2226.7Washington4.339.6106.2224.81605.63386.9360.3West Virginia6.013.242.290.9597.41341.7163.3Wisconsin2.812.952.263.7846.92614.2220.7Wyoming5.421.939.7173.9811.62772.2282.01、1分别用样本协方差矩阵和样本相关矩阵作主成分分析,二者的结果有何差异?2) 原始数据的变化可
21、否由三个或者更少的主成分反映,对所选取的主成分给出合理的解释。3) 计算从样本相关矩阵动身计算的第一样本主成分的得分并予以排序.2、从样本相关矩阵动身,做因子分析。【试验所使用的仪器设备与软件平台】SAS【试验方法与步骤】proc princomp data=work.crime covariance;run;首先将上述数据复制到 excel,再通过 SAS 导入数据至数据集 crime。样本协方差矩阵做主成分分析:样本相关矩阵做主成分分析:proc princomp data=work.crime;run;对第一样本主成分排序proc princomp data=crime out=defe
22、n;run;proc sort data=defen;by prin1;run;16proc print data=defen;run;2、程序:proc factor data=work.crime score;run;【试验结果】1718192021试验四 1991 年全国各省、区、市城镇居民月平均收入的数据分析【试验目的】通过使用 SAS 软件对试验数据进展判别分析和聚类分析,生疏数据分析方法,培育学生分析处理实际数据的综合力量。【试验内容】1991 年全国各省、区、市城镇居民月平均收入状况见下表,变量含义如下:X1-人均生活费收入元/人;X2-人均全民全部制职工工资元/人;X3-人均来
23、源于全民标准工资元/人;X4-人均集体全部制工资元/人; X5-人均集体职工标准工资元/人;X6-人均各种奖金及超额工资元/人;X7-人均各种津贴元/人;X8-职工人均从工作单位得到的其他收入元/人;X9-个体劳动者收入元/人。x1x2x3x4x5x6x7x8x9名型北京1170.03110.259.768.384.4926.816.4411.90.41天津1141.5582.5850.9813.49.3321.312.369.211.05河北1119.483.3353.39117.5217.311.79120.7上海1194.53107.860.2415.68.883121.0111.80.
24、16山东1130.4686.2152.315.910.520.6112.149.610.47湖北1119.2985.4153.0213.18.4413.8716.478.380.51广西1134.4698.6148.188.94.3421.4926.1213.64.56海南1143.7999.9745.66.31.5618.6729.4911.83.82四川1128.0574.9650.1313.99.6216.1410.1814.51021云南1127.4193.5450.5710.55.8719.4121.212.60.9疆1122.96101.469.76.33.8611.318.965
25、.624.62山西2102.4971.7247.729.426.9613.127.96.660.61内蒙古2106.1476.2746.199.656.279.65520.16.970.96吉林2104.9372.9944.613.79.019.43520.616.651.68黑龙江2103.3462.9942.9511.17.418.34210.196.452.68江西298.08969.4543.0411.47.9510.5916.57.691.08河南2104.1272.2347.319.486.4313.1410.438.31.11贵州2108.4980.7947.526.063.42
26、13.6916.538.372.85陕西2113.9975.650.885.213.8612.949.4926.771.27甘肃2114.0684.3152.787.815.4410.8216.433.791.19青海2108.880.4150.457.274.078.37118.985.950.83宁夏2115.9688.2151.858.815.6313.9522.654.750.97辽宁3128.4668.9143.4122.415.313.8812.429.011.41江苏3135.2473.1844.5423.915.222.389.66113.91.19浙江3162.5380.11
27、45.9924.313.929.5410.913223.47省(区市)类安徽3111.7771.0743.6419.412.516.689.6987.020.63福建3139.0979.0944.1918.510.520.2316.477.673.08湖南312484.6644.0513.57.4719.1120.4910.31.76待广东211.311441.4433.211.248.7230.7714.911.1待西藏175.93163.857.894.223.3717.8182.3215.70判判1、1判定广东、西藏两省区属于哪种收入类型,并用回代法及穿插确认法对误判率作出估量。2进展
28、Bayes 判别,并用回代法与穿插确认法验证判别结果。2、1用最短距离法、最长距离法与类平均法聚类,画出谱系图,并写出分 3类的结果;2快速聚类法聚类,并写出分 3 类的结果。【试验所使用的仪器设备与软件平台】SAS【试验方法与步骤】1:觉察数据四川省X9 数据存在特别,通过查阅课本 170 页表 5.3 可得此处数据应为 1.21.首先将上述数据建立 excel 表格,再通过 SAS 直接导入到名为 shuju 的数据集中。将数据x1x2x3x4x5x6x7x8x9211.311441.4433.211.248.7230.7714.911.1175.93163.857.894.223.371
29、7.8182.3215.70省(区 市)名广东 西藏导入daipang数据集。proc discrim data=shujushuju数据集删除最终两行 运行以下程序testdata=daipang method=normallist crosslist testlist;class leixing;var x1-x9;run;2:将上述结果也导入至数据集SHUJU 中SINGLE(或 SIN):最短距离法.proc cluster data=shuju method=sin outtree=y1;run;proc tree data=y1 nclusters=3 out=z1;run;pro
30、c print data=z1;23run;COMPLETE(或 COM): 最长距离法.proc cluster data=shuju method=com outtree=y2;run;proc tree data=y2 nclusters=3 out=z2;run;proc print data=z2;run;AVERAGE(或 AVE):类平均法.proc cluster data=shuju method=ave outtree=y3;run;proc tree data=y3 nclusters=3 out=z3;run;proc print data=z3;run;(2)快速聚类法proc fastclusproc fastclus data=shuju out=a1maxc3= cluster=c distance list;proc plot; plot x2*x1=c;run;【试验结果】24