《第五章 主成分分析(2)(主成分回归、经验正交分解EOF).doc》由会员分享,可在线阅读,更多相关《第五章 主成分分析(2)(主成分回归、经验正交分解EOF).doc(36页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、5.4 主成分聚类与主成分回归5.4.1 变量聚类与样品分类主成分分析可用于聚类:变量聚类与样品聚类。变量聚类:由主成分系数的差异,可将变量聚类。例如例5.5中第2主成分中murder,rape, assult系数为负的, burglary,larceny, auto系数是正的。按系数正负可把7个变量分为两类: murder, rape, assult属于暴力程度严重的一类;burglary,larceny,auto属于暴力程度较轻的一类。按照这种方法,根据主成分系数的正负可以将变量聚类。样品聚类:如果2个主成分能很好的概括随机向量的信息,计算每个样品的这两个主成分得分,把他们的散点图画出来,
2、就能从图上将样品分类。例55(续2) 按照第一、第二主成分得分,画出散点图data crime; /*建立数据集crime*/input state $ 1-15 murder rape robbery assult burglary larceny auto;/*建立变量state murder rape robbery assult burglary larceny auto。state $ 1-15表示前15列存州名。murder rape robbery assult burglary larceny auto 表7种罪的犯罪率*/cards; /*以下为数据体*/ Albama 14
3、.2 25.2 96.8 278.3 1135.5 1881.9 280.7 Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 Arirona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 Arkansas 8.8 34.2 138.2 312.3 2346.1 4467.4 439.5 Califonia 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5 Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 Conecticat 4
4、.2 16.8 129.5 131.8 1346.0 2620.7 593.2 Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 Geogia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9 Hawaii 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4 Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 Illinois 9.9 21.8
5、211.3 209.0 1085.0 2828.5 528.6 Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4 Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9 Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3 Kentaky 10.1 19.1 81.1 123.3 872.2 1662.1 245.4 Loisana 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7Maine 2.4 13.5 38.7 170.0 1253.
6、1 2350.7 246.9 Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5 Masschusetts 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1Michigan 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5 Minnesota 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1 Mississippi 14.3 19.6 65.7 189.1 915.6 1239.9 144.4 Missouri 9.6 28.3 189.0 233.5
7、1318.3 2424.2 378.4 Montana 5.4 16.7 39.2 156.8 804.9 2773.2 309.3 Nebraska 3.9 18.1 64.7 112.7 760.0 2316.1 249.1 Nevada 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2 Mew Hampashare 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4 New Jersey 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5 New Maxico 8.8 39.1 109.6 343.
8、4 1418.7 3008.6 259.5 New York 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8 North Carolina 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1 North Dakoda 100.9 9.0 13.3 43.8 446.1 1843.0 144.7 Ohio 7.8 27.3 190.5 181.1 1216.0 2696.8 400.4 Oklahoma 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8 Oregan 4.9 39.9 124.1 28
9、6.9 1636.4 3506.1 388.9 Pennsyvania 5.6 19.0 130.3 128.0 877.5 1624.1 333.2 Rhode Island 3.6 10.5 86.5 201.0 1849.5 2844.1 791.4South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1South Dakoda 2.0 13.5 17.9 155.7 570.5 1704.4 147.5 Tennessee 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0 Texas 13.3 3
10、3.8 152.4 208.2 1603.1 2988.7 397.6 Utah 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5 Vermont 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2 Virginia 9.0 23.3 92.1 165.7 986.2 2521.2 226.7 Wasinton 4.3 39.6 106.2 224.8 1605.6 3386.9 360.3 West Viginia 6.0 13.2 42.2 90.9 597.4 1341.7 163.3 Wiskonsin 2.8 12.9 52.
11、2 63.7 846.9 2614.2 220.7 Wyoming 5.4 21.9 39.7 173.9 811.6 2772.2 282.0;proc princomp out=crimprin n=2;var murder rape robbery assult burglary larceny auto;run;PROC PLOT data=crimprin; PLOT PRIN2*PRIN1=STATE/VPOS=31;TITLE2 PLOT OF THE FIRST TWO PRINCIPAL COMPONENTS;RUN;例57 (气温分析) 本例的输入资料文件(TEMPERAT
12、)是美国六十四个城市一月与七月的平均日温。DATA TEMPERAT; TITLE2 MEAN TEMPERATURE IN JANUARY AND JULY FOR SELECTED CITIES; INPUT CITY $1-15 JANUARY JULY; CARDS; MOBILE 51.2 81.6 PHOENIX 51.2 91.2 LITTLE ROCK 39.5 81.4 SACRAMENTO 45.1 75.2 DENVER 29.9 73.0 HARTFORD 24.8 72.7 WILMINGTON 32.0 75.8 WASHINGTON DC 35.6 78.7 JA
13、CKSONVILLE 54.6 81.0 MIAMI 67.2 82.3 ATLANTA 42.4 78.0 BOISE 29.0 74.5 CHICAGO 22.9 71.9 PEORIA 23.8 75.1 INDIANAPOLIS 27.9 75.0 DES MOINES 19.4 75.1 WICHITA 31.3 80.7 LOUISVILLE 33.3 76.9 NEW ORLEANS 52.9 81.9 PORTLAND, MAINE 21.5 68.0 BALTIMORE 33.4 76.6 BOSTON 29.2 73.3 DETROIT 25.5 73.3 SAULT ST
14、E MARIE 14.2 63.8 DULUTH 8.5 65.6 MINNEAPOLIS 12.2 71.9 JACKSON 47.1 81.7 KANSAS CITY 27.8 78.8 ST LOUIS 31.3 78.6 GREAT FALLS 20.5 69.3 OMAHA 22.6 77.2 RENO 31.9 69.3 CONCORD 20.6 69.7 ATLANTIC CITY 32.7 75.1 ALBUQUERQUE 35.2 78.7 ALBANY 21.5 72.0 BUFFALO 23.7 70.1 NEW YORK 32.2 76.6 CHARLOTTE 42.1
15、 78.5 RALEIGH 40.5 77.5 BISMARCK 8.2 70.8 CINCINNATI 31.1 75.6 CLEVELAND 26.9 71.4 COLUMBUS 28.4 73.6 OKLAHOMA CITY 36.8 81.5 PORTLAND, OREG 38.1 67.1 PHILADELPHIA 32.3 76.8 PITTSBURGH 28.1 71.9 PROVIDENCE 28.4 72.1 COLUMBIA 45.4 81.2 SIOUX FALLS 14.2 73.3 MEMPHIS 40.5 79.6 NASHVILLE 38.3 79.6 DALLA
16、S 44.8 84.8 EL PASO 43.6 82.3 HOUSTON 52.1 83.3 SALT LAKE CITY 28.0 76.7 BURLINGTON 16.8 69.8 NORFOLK 40.5 78.3 RICHMOND 37.5 77.9 SPOKANE 25.4 69.7 CHARLESTON, WV 34.5 75.0 MILWAUKEE 19.4 69.9 CHEYENNE 26.6 69.1 ; PROC PLOT; PLOT JULY*JANUARY=CITY/VPOS=36; PROC PRINCOMP COV OUT=PRIN; VAR JULY JANUA
17、RY; PROC PLOT; PLOT PRIN2*PRIN1=CITY/VPOS=26;TITLE3 PLOT OF PRINCIPAL COMPONENTS;Run;例58 美国大学生篮球队排名data bballm;label csn = Community Sports News (Chapel Hill NC) dursun = Durham Sun durher = Durham Morning Herald waspost = Washington Post usatoda = USA Today spormag = Sport Magazine insport = Inside
18、 Sports upi = United Press International ap = Associated Press sporill = Sports Illustrated ; title1 Pre-Season 1985 College Basketball Rankings; input school $13. csn dursun durher waspost usatoda spormag insport upi ap sporill; format csn - sporill 5.1; cards; Louisville 1 8 1 9 8 9 6 10 9 9 Georg
19、ia Tech 2 2 4 3 1 1 1 2 1 1 Kansas 3 4 5 1 5 11 8 4 5 7 Michigan 4 5 9 4 2 5 3 1 3 2 Duke 5 6 7 5 4 10 4 5 6 5 UNC 6 1 2 2 3 4 2 3 2 3 Syracuse 7 10 6 11 6 6 5 6 4 10 Notre Dame 8 14 15 13 11 20 18 13 12 . Kentucky 9 15 16 14 14 19 11 12 11 13 LSU 10 9 13 . 13 15 16 9 14 8 DePaul 11 . 21 15 20 . 19
20、. . 19 Georgetown 12 7 8 6 9 2 9 8 8 4 Navy 13 20 23 10 18 13 15 . 20 . Illinois 14 3 3 7 7 3 10 7 7 6 Iowa 15 16 . . 23 . . 14 . 20 Arkansas 16 . . . 25 . . . . 16 Memphis State 17 . 11 . 16 8 20 . 15 12 Washington 18 . . . . . . 17 . . UAB 19 13 10 . 12 17 . 16 16 15 UNLV 20 18 18 19 22 . 14 18 18
21、 . NC State 21 17 14 16 15 . 12 15 17 18 Maryland 22 . . . 19 . . . 19 14 Pittsburg 23 . . . . . . . . . Oklahoma 24 19 17 17 17 12 17 . 13 17 Indiana 25 12 20 18 21 . . . . . Virginia 26 . 22 . . 18 . . . . Old Dominion 27 . . . . . . . . . Auburn 28 11 12 8 10 7 7 11 10 11 St. Johns 29 . . . . 14
22、. . . . UCLA 30 . . . . . . 19 . . St. Josephs . . 19 . . . . . . . Tennessee . . 24 . . 16 . . . . Montana . . . 20 . . . . . . Houston . . . . 24 . . . . . Virginia Tech . . . . . . 13 . . . ;proc princomp data=bball n=1 out=pcbball standard; var csn -sporill; weight weight; proc sort data=pcbball
23、; by prin1; proc print; var school prin1; title2 College Teams as Ordered by PRINCOMP; run;例59 55个地区或国家的赛跑纪录如表5-7,试作主成分分析,并将55个国家或地区按赛跑成绩分类。表 5-7 55个地区或国家的赛跑纪录序号国家或地区100m(秒)200m(秒)400m(秒)800m(分)1500m(分)5000m(分)10000m(分)马拉松(分)1argentin10.3920.8146.841.813.7014.0429.36137.722australi10.3120.0644.841.7
24、43.5713.2827.66128.303austria10.4420.8146.821.793.6013.2627.72135.904belgium10.3420.6845.041.733.6013.2227.45129.955bermuda10.2820.5845.911.803.7514.6830.55146.626brazil10.2220.4345.211.733.6613.6228.62133.137burma10.6421.5248.301.803.8514.4530.28139.958canada10.1720.2245.681.763.6313.5528.09130.159
25、chile10.3420.8046.201.793.7113.6129.30134.0310china10.5121.0447.301.813.7313.9029.13133.5311columbia10.4321.0546.101.823.7413.4927.88131.3512cookis12.1823.2052.942.024.2416.7035.38164.7013costa10.9421.9048.661.873.8414.0328.81136.5814czech10.3520.6545.641.763.5813.4228.19134.3215denmark10.5620.5245.
26、891.783.6113.5028.11130.7816domrep10.1420.6546.801.823.8214.9131.45154.1217finland10.4320.6945.491.743.6113.2727.52130.8718france10.1120.3845.281.733.5713.3427.97132.3019gdr10.1220.3344.871.733.5613.1727.42129.9220frg10.1620.3744.501.733.5313.2127.61132.2321gbni10.1120.2144.931.703.5113.0127.51129.1
27、322greece10.2220.7146.561.783.6414.5928.45134.6023guatemal10.9821.8248.401.893.8014.1630.11139.3324hungary10.2620.6246.021.773.6213.4928.44132.5825india10.6021.4245.731.763.7313.7728.81131.9826indonesi10.5921.4947.801.843.9214.7330.79148.8327ireland10.6120.9646.301.793.5613.3227.81132.3528israel10.7
28、121.0047.801.773.7213.6628.93137.5529italy10.0119.7245.261.733.6013.2327.52131.0830japan10.3420.8145.861.793.6413.4127.72128.6331kenya10.4620.6644.921.733.5513.1027.38129.7532korea10.3420.8946.901.793.7713.9629.23136.2533dprkorea10.9121.9447.301.853.7714.1329.67130.8734luxembou10.3520.7747.401.823.6
29、713.6429.08141.2735malaysia10.4020.9246.301.823.8014.6431.01154.1036mauritiu11.1922.4547.701.883.8315.0631.77152.2337mexico10.4221.3046.101.803.6513.4627.95129.2038netherla10.5220.9545.101.743.6213.3627.61129.0239nz10.5120.8846.101.743.5413.2127.70128.9840norway10.5521.1646.711.763.6213.3427.69131.4
30、841png10.9621.7847.901.904.0114.7231.36148.2242philippi10.7821.6446.241.813.8314.7430.64145.2743poland10.1620.2445.361.763.6013.2927.89131.5844portugal10.5321.1746.701.793.6213.1327.38128.6545rumania10.4120.9845.871.763.6413.2527.67132.5046singapor10.3821.2847.401.883.8915.1131.32157.7747spain10.422
31、0.7745.981.763.5513.3127.73131.5748sweden10.2520.6145.631.773.6113.2927.94130.6349switzerl10.3720.4645.781.783.5513.2227.91131.2050taipei10.5921.2946.801.793.7714.0730.07139.2751thailand10.3921.0947.911.833.8415.2332.56149.9052turkey10.7121.4347.601.793.6713.5628.58131.5053usa9.9319.7543.861.733.531
32、3.2027.43128.2254ussr10.0720.0044.601.753.5913.2027.53130.5555wsamoa10.8221.8649.002.024.2416.2834.71161.83 可用下列SAS程序作主成分分析,并将第1,2主成分画散点图data runrecod;input country $ x1-x8;cards;argentin 10.39 20.81 46.84 1.81 3.70 14.04 29.36 137.72australi 10.31 20.06 44.84 1.74 3.57 13.28 27.66 128.30austria 10.
33、44 20.81 46.82 1.79 3.60 13.26 27.72 135.90belgium 10.34 20.68 45.04 1.73 3.60 13.22 27.45 129.95bermuda 10.28 20.58 45.91 1.80 3.75 14.68 30.55 146.62brazil 10.22 20.43 45.21 1.73 3.66 13.62 28.62 133.13burma 10.64 21.52 48.30 1.80 3.85 14.45 30.28 139.95canada 10.17 20.22 45.68 1.76 3.63 13.55 28.
34、09 130.15chile 10.34 20.80 46.20 1.79 3.71 13.61 29.30 134.03china 10.51 21.04 47.30 1.81 3.73 13.90 29.13 133.53columbia 10.43 21.05 46.10 1.82 3.74 13.49 27.88 131.35cookis 12.18 23.20 52.94 2.02 4.24 16.70 35.38 164.70costa 10.94 21.90 48.66 1.87 3.84 14.03 28.81 136.58czech 10.35 20.65 45.64 1.76 3.58 13.42 28.19 134.32denmark 10.56 20.52 45.89 1.78 3.61 13.50 28.11