《大数据应用基础(人邮微课版)-课后提升参考答案汇总 项目2--6MySQL数据库设计--浪潮可视化大数据工具应用.docx》由会员分享,可在线阅读,更多相关《大数据应用基础(人邮微课版)-课后提升参考答案汇总 项目2--6MySQL数据库设计--浪潮可视化大数据工具应用.docx(33页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、大数据应用基础(人邮微课版)课后提升参考答案一、请根据给出资料完成数据库的概念模型、逻辑模型以及物理模型设计(1)概念模型设计依据需求分析可知,该模型涉及4个实体,假设干属性,且实体之间的关系清晰,可利用 E-R图工具完成概念模型的设计,如下列图所示。(2)逻辑模型设计利用概念模型结果,依据概念模型转换逻辑模型的方法,将E.R图转换成逻辑模型,具 体转换方法如下:1)实体的转换:将E-R图中的每个实体均转换为一个逻辑模型。科目信息(科目代码,科目名称,科目类别,余额方向,数量单位)其中,“科目代码” 为主键。凭证信息(凭证编号,附单据数,制单人,制单时间,借方合计,贷方合计,凭证状态) 其中“
2、凭证编号”为主键。账簿信息(账簿编号,科目代码,累计借方,累计贷方,初期余额)其中“账簿编号” 为主键。2)联系的转换:本案例中“科目信息”与“凭证信息”为包含关系,其类型为1: N,根据转换原那么, 将1端实体主键添加到多端实体转换的模型中作为外键,即将“科目信息”中的“科目代码” 添加到“凭证信息”模型中作为外键;“凭证信息”与“账簿信息”为过账关系,其类型为 1: 1,需将一端实体主键添加到另一端实体转换的模型中作为外键,即将“凭证信息”的主 键“凭证编码”添加到“账簿信息”模型中作为外键;“账簿信息”与“科目余额”为包含 关系,关系类型为N:l,同理,需将1端实体主键添加到多端实体转换
3、的模型中作为外键。 最终转换结果如下:科目信息(科目代码,科目名称,科目类别,数量单位)其中,“科目代码”为主键。凭证信息(凭证编号,附单据数,制单人,制单时间,借方合计,贷方合计,凭证状态, 科目代码)其中“凭证编号”为主键,“科目代码”为外键。账簿信息(账簿编号,科目代码,累计借方,累计贷方,初期余额,凭证编号)其中“账 簿编号”为主键,“凭证编号”作为外键。tiercity一线城市北京、上海、广州、深圳新一线城市成都、重庆、杭州、武汉、西安、郑州、青岛、长沙、天津、苏州、南京、东莞、沈阳、合肥、佛山二线城市宁波、昆明、福州、无锡、厦门、哈尔滨、长春、南昌、济南、大连、贵阳、温州、石家庄、
4、泉州、南宁、金华、常州、珠海三线城市海口、汕头、潍坊、扬州、洛阳、乌鲁木齐、临沂、唐山、镇江、盐城、湖州、赣州、漳州、揭阳、江门、桂林、邯郸、泰州四线城市常德、渭南、孝感、丽水、运城、德州、张家口、鄂尔多斯、阳江、泸州、丹东、曲靖、乐山、许昌、湘潭、晋中、安阳、齐五线城市汉中、辽阳、四平、内江、六盘水、安顺、新余、牡丹江、晋城、自贡、三门峡、赤峰、本溪、防城港、铁岭、随州、广安、其次,创立客户视图。客户表的所有字段,加上计算字段:性别、出生日期、年龄、所属几线城市,形成具有 更全客户信息的客户视图:【课堂小提示工视图是基于SQL语句的结果集的可视化的表,相当于虚拟表,视图 中的字段就是来自一个
5、或多个真实的物理表中的字段。CREATE VIEW创立视图,ALTER VIEW 修改视图,DROP VIEW删除视图,CREATE OR REPLACE VIEW仓ij建或修改视图。 CREATE OR REPLACE VIEW v_customers ASSELECT cust_id,cust_name,cust_country,cust_state,cust_city,(SELECT tier FROM cities WHERE INSTR(cities.city, customers.cust_city)0) cust_tier,cust_district,cust_address,c
6、ust_zip,cust_email,cust_tel,cust_prof,cust_identity,IF(MOD(substr(cust_identity, 17),2)二0;女?男)cust_sex,CONC AT(SUBSTR(cust Jdentity,7,4);-SUBSTR(cust .identity, 11,2);-,SUBSTR(cust_identity,l 3,2) cust_birthday,TIMESTAMPDIFF(YEAR, CONCAT(SUBSTR(custJdentity,7,4);-SUBSTR(cust.identity, 11 SUBSTR(cust
7、_identity, 13,2), CURDATE() cust_ageFROM customers;查询视图:SELECT * FROM v_customers;信息结果1剖析状态cust citycost tiercust districtcust address cust zipcust emailcust telcustprofcust identitycust sexcust birthdaycust age湾西五线城市滴道区黑龙江省鸡西i 158100科研女1984-04-1137哈尔滨二线城市道里区哈尔滨市道里150000制造110101197S05136456男1975-05-
8、1346绥化五统蜗庆安县黑龙江省庆安1152400蜥男1991-05-2530哈尔滨二线城市通河县照龙江省哈尔;150900服务女1987-12-1233贾阳二线城市修文昌贵州省费阳市(550200加工女1988-02-1433西安雁塔区陕西省西安市)710000tKW女1990-08 1531哈尔滨二线城市松北区哈尔滨市松北H 50000砂女1984-09-2136哈尔滨二线城市双城区黑龙江舍哈尔;150100制造女1975-03-1946弁齐哈尔 畦ISELECT * FROM v customers只M匕查询时间:0.331s第3条记录(共1717条)c(1)统计客户性别分布。按照性别分
9、组GROUP BY cust_sex,汇总客户个数使用聚合函数COUNT(custJd):【课堂小提示工分组语句中,SELECT子句的字段只能是2种情况:第1种是分组的 字段(GROUP BY后的字段,即按照什么来分组或分类,可以是1个字段,也可以是多个 字段),第2种是聚合函数(用于汇总数据,包括:COUNT、SUM、AVG、MIN、MAX, 可以在聚合函数基础上嵌套其他函数使用)oSELECT cust_sex,COUNT(cust_id) cnumsFROM v_customersGROUP BY cust_sexORDER BY cnums DESC;custsex cnums男868
10、女849(2)统计客户职业分布。按照职业分组GROUP BY cust_prof,汇总客户个数使用聚合函数COUNT(custJd):SELECT cust_prof,COUNT(cust_id) cnumsFROM v_customersGROUP BY cust_profORDER BY cnums DESC;custprofcnums教育367服务353琳203加工185制造146死141102科研85政治69农业66(3)统计客户地域分布,分别按照省、市统计,再按照所属几线城市统计。按照客户所在城市的“一线、新一线、二线、三线、四线、五线”划分,分类汇总客 户人数:SELECT cus
11、t_tier,COUNT(cust_id) cnums FROM v_customers GROUP BY cust_tier ORDER BY cnums DESC;custtier cnums,二线城市633五线城市416四线城市253三线城市220新一线城市145一线城市50按照客户所在省,分类汇总客户人数:SELECT cust_state,COUNT(cust_id) cnums FROM v_customers GROUP BY cust_state ORDER BY cnums DESC;custstatecnumscuststatecnums黑龙江910 吉林11陕西274湖北
12、10custstatecnums贵州2689浙江4山东41安徽8海南3广东34新疆7江苏3江西33四川6重庆2河南24山西6云南2北京17福建6青海2上海13辽宁5河北1广西12天津5 湖南1按照客户所在省、市,汇总有哪些区县和客户人数:GROUP_CONCAT通过分组连接所有非NULL的字符串。DISTINCT按字段值去重。此应 用中同一区县有多个客户,汇总各市有哪些区县时需去掉重复值。SELECT cust_state,cust_city,GROUP_CONCAT(DISTINCT cust_district) cust_district,COUNT(cust_id) cnums第1条记录
13、供136条)第1条记录供136条)FROM v_customers GROUP BY cust_state,cust_city ORDER BY cnums DESC;信息结果1剖析状态custstatecustcitycustdistrictcnums哈尔滨五常市,依兰县,南岗区,双城区,呼兰区,宾县,尚志市,巴彦县平房区,延寿县,方正县,木兰县,松北区,通河县,道外区,道里区,阿城区,香坊区503陕西西安临潼区,周至县,新城区,未央区,满桥区,碑林区,莲湖区,蓝田县,长安区,雁塔区,高陵区103贵州贵阳乌当区,云岩区,修文县商明区,开阳县,息烽县,清镇市,白云区,花溪区,观山湖区97黑龙江
14、佳木斯东风区,前进区,同江市,向阳区,富锦市,抚远市,桦南县,桦J11县,汤原县,郊区76贵州遵义习水县,仁怀市,余庆县,凤冈县,播州区,桐梓县,正安县,汇川区,泊潭县修工花岗区71黑龙江雌兰西县d匕林区,安达市,庆安县,明水县,望奎县,海伦市,绥棱县,肇东市65黑龙江柝大同区,杜尔伯特蒙古族自治县,林甸县,单州县,至源县,萨尔图区让胡路区,龙凤区54黑龙江标的克东县,克山艮富拉尔基区,建华区,拜泉县,昂昂溪区,梅里斯达斡尔族区,泰来县,日南县讷河市,铁锋区,龙江县,龙沙区46SELECT cust_state,cust_city, GROUP_CONCAT(DISTINCT cust dis
15、trict) cust district,COUNT(cust_id) cnums FROh 只读直询时间:0.336s(4)统计客户的年龄分布,按照出生年代统计客户分布情况,如:60后、70后、80 后、90后、00后等。年代的计算方法:出生日期1987-12-12的年份值1987除以10向下取整得到198,再乘 以10为1980,表示80后。YEAR函数计算日期型数据的年值,FLOOR向下取整函数,计 算字段命别名为ages:SELECT cust_birthday,FLOOR(YEAR(cust_birthday)/l0)*10 ages FROM v_customers;信息结果1剖析
16、状态custbirthday ages1975-05-1319701991-05-251990 1987-12-1219801988-02-1419801990-08-1519901984-09-2119801975-03-1919701979-01-011970+ - X C SELECT cust birth (共 1717 条)用计算字段年代值作为分组字段,汇总客户个数:SELECT FLOOR(YEAR(cust_birthday)/10)*10 ages,COUNT(cust_id) cnumsFROM v_customers GROUP BY FLOOR(YEAR(cust_bir
17、thday)/l 0)* 10 ORDER BY ages;agescnums197056919805531990564200031(5)统计客户年龄分布,按照年龄段统计客户分布情况,如:1620岁、21-25岁、2630岁、3135岁、3640岁、4145岁、4650岁、5155岁、5660岁、6165岁 等。年龄段的计算方法:年龄-1除以5向下取整,如:30-1=29, 29/5向下取整=5, 5表示 26-30岁段,计算字段命别名为ages:SELECT DISTINCT cust_age,FLOOR(cust_age-1 )/5) ages FROM v_customers ORDER
18、 BY ages;custage agesagescustage ages4182143164282243264382343364482443464582543564692653674792753774892853874992953975093054075110custage用计算字段年龄段作为分组字段,汇总客户个数,年龄段在显示方式上用条件语句CASE WHEN THEN ELSE END显示具体的年龄段情况:SELECT CASE FLOOR(cust_age-l)/5)WHEN 3 THEN 116-20 岁, WHEN 4 THEN *21-25 岁WHEN 5 THEN 26-30
19、岁WHEN 6 THEN 31-35 岁 WHEN 7 THEN 36-40 岁WHEN 8 THEN 41-45 岁WHEN 9 THEN 46-50 岁WHEN 10 THEN 51-55 岁 WHEN 11 THEN 56-60 罗 WHEN 12 THEN *61-65 罗 ELSE 其他 END ages,COUNT(cust_id) cnums FROM v_customers GROUP BY FLOOR(cust_age-1 )/5) ORDER BY ages;拓展四分析销售趋势agescnums21-25岁26626-30岁28231-35岁28236-40岁26741-4
20、5岁28946-50岁32051-55岁11(1)统计多年的订单数、销售数量、销售金额、销售利润,并进行趋势比照。SELECT YEAR(order_date) years,COUNT(DISTINCT orders.order_id) onums,SUM(item_quantity) quantities,SUM(item_quantity*item_price) amouns,SUM(item_quantity*(item_price-prod_price) profitFROM orders INNER JOIN orderitems ON orders.order_id=orderit
21、ems.order_idINNER JOIN products on products.prod_id=orderitems.prod_idGROUP BY YEAR(order_date) ORDER BY years;(2)统计多年的订单数、销售数量、销售金额、销售利润,并进行趋势比照,从月份 角度比照销售情况。将分组由年改为月:SELECT month(order_date) months,COUNT(DISTINCT orders.order_id) onums,SUM(item_quantity) quantities,SUM(item_quantity*item_price) am
22、ouns,SUM(item_quantity*(item_price-prod_price) profitFROM orders INNER JOIN orderitems ON orders.order_id=orderitems.order_idINNER JOIN products on products.prod_id=orderitems.prod_idGROUP BY month(order_date) ORDER BY months;months onums quantities amounsprofit1139295212477423.201645270.64212527821
23、2011663.201564464.38310724289975037.331273519.634130290412169236.501544585.065120264311304861.091486340.106138318413481590.731756067.807117271311683279.571542149.048143324813716739.891768147.239122278211847310.461536414.1510131292712825283.991692100.6511136310812833718.061646445.0712134294812449815.
24、841611843.00(3)统计不同类别的商品各年的订单数、销售数量、俏售金额、俏售利润,并进行趋 势比照。仍然是商品与订单、订单明细3张表连接查询,但分组字段根据分析主题进行选择,再 分类统计:SELECT YEAR(order_date) years,prod_category,COUNT(DISTINCT orders.order_id) onums,SUM(item_quantity) quantities,SUM(item_quantity*item_price) amouns,SUM(item_quantity*(item_price-prod_price) profitFROM
25、 orderitems INNER JOIN orders ON orders.order_id=orderitems.order_idINNER JOIN products on products.prod_id=orderitems.prod_idGROUP BY YEAR(order_date),prod_categoryORDER BY prod_category,YEAR(order_date);years prodcategory onums quantities amounsprofityears 20182018201820182018201820182018201820182
26、01820182018years 2018201820182018201820182018201820182018201820182018monthsonumsquantitiesamounsprofit1479784253895.02554448.7725011334699458.90605471.913418753710876.50464226.544367783200240.17405433.695409454066062.13525475.4764911014541276.65619375.527399053866492.91485434.6485312515273753.286731
27、37.5594911264813635.76614086.8810449974378558.92574435.32114610794406770.72566113.63124810804390302.03569026.91合计5421224851601322.996656666.83(4)打印各年月销售总报表。2018图书5252454175501.3823146.702019图书4872292161733.0221370.212020图书4842206155459.1120464.162018手机541504718362105.122393273.242019手机50346141685535
28、2.112194380.352020手机497459116878464.312215291.082018电脑541474733063716.494240246.892019电脑502429430115436.283940974.682020电脑497437431008192.044018199.44按照年、月份组统计,添加WITH ROLLUP后能将分组后的信息按组聚合,但双分组 字段聚合后有null值,加上IFNULL判空函数显示“总计”、“合计:SELECT IFNULL(YEAR(order_date),总计)years,IFNULL(MONTH(order_date),合计)month
29、s, COUNT(DISTINCT orders.order_id) onums,SUM(item_quantity) quantities,SUM(item_quantity*item_price) amouns,SUM(item_quantityJ!i(item_price-prod_price) profitFROM orders INNER JOIN orderitems ON orders.order_id=orderitems.order_idINNER JOIN products on products.prod_id=orderitems.prod_idGROUP BY YEA
30、R(order_date),MONTH(order_date) WITH ROLLUP;yearsmonthsonumsquantitiesamounsprofit20191449443917857.73524695.0020192439283923209.62514812.2120193419874205928.42538050.2420194429363778176.92476002.6220195346982761455.67377404.00201964610244223361.44536950.4920197399213868884.56515704.7720198501073457
31、4154.50598765.2320199327463140146.03394943.74201910459864437256.16594675.03201911449924117970.26538636.98201912439654184120.10546084.932019合计5031120047132521.416156725.24yearsmonthsonumsquantitiesamounsprofit2019合计5031120047132521.416156725.24202014810304305670.45566126.8720202327213388994.68444180.
32、2620203255662058232.41271242.85202045211905190819.41663148.75202054610004477343.29583460.63202064310594716952.64599741.7920207398873947902.10541009.6320208409243868832.11496244.4520209419103893528.67527383.53202010429444009468.91522990.302020114610374308977.08541694.46202012439033875393.71496731.162
33、020合计4971117148042115.466253954.68总计合计154234619146775959.8619067346.75import requestsfrom bs4 import BeautifulSoupheads = User-Agent: nMozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36H)def getShowList(startPos, endPos):f = open(HD:/jd
34、_shoe_Iist.txtn, mode=a+,encoding=nutf-8H)for i in range(startPos, endPos):strUrll = n 1729,11731,9772&page=M + str(i) +n&s=297&click=0ncategory 二“女鞋”try:res = requests.get(strUrl 1, headers=heads)html = res.textsoup = BeautifulSoup(html, *html.parser)items = soup.find_all(nlin, attrs=class: gl-item
35、”) for item in items:name 二 item.find(,div, class_ = ,p-name,).find(emn).getText()price = item.find(ndivn, class_ = ,p-price,).find(,i,),getText()image= s:”+item.find(Hdivn,class_=np-img,).find(Himgn),data-lazy-img,desc = item.find(HdivH, class_=np-namen).find(,a,),title,f.write(name+”;+price+”;+cat
36、egory+”;+image+”;+desc + “n”) except:print。连接异常”)f.close()def getJDShoe(): getShowList(l, 100)getJDShoe()import requestsfrom bs4 import BeautifulSoupheadDict =Huser-agentH:nn )def downWebContent(url):r = requests.get(url, headers = headDict)r.encoding 二 r.apparent_encodingreturn r.text fff下载全部列表数据 f
37、ffdef gctHouscInfoO:for i in range(l, 100):strurl= s: cd.lianjia /ershoufang/pg+str +rs%E6%88%90%E9%83%BD/content 二 downWebContent(strurl)parseHouseList(content)Vfl解析二手房列表fffdef parseHouseList(webContent):soup = BeautifulSoup(webContent,nhtml.parser)ultags = soup.find(,urclass_=nsellListContent,)lit
38、ags = ultags.find_all(nli)for litag in litags:url = litag.find(Han,class_=nnoresultRecommend img LOGCLICKDATA)href content = downWebContent(url)parseHouseDetail(content)解析二手房详情def parseHouseDetail(webContent):soup = BeautifulSoup(webContent, nhtml.parsern)f = open(HD:/house_list.txtn, mode=a+, encod
39、ing=utf-8”) contentList =div2 = soup.find(div, class_=noverviewn)contentTag = div2.find(,div,class_=Hcontentn)total = contentTag.find(Hspann, class_=ntotaln).getText() unit = contentTag.find(Hspann, class_=nunitn).getText() price = total + unitmianjimianjicontentTag.find(ndivn,class_=,unitPrice,).fi
40、nd(,spann,class_=HunitPriceValuen).getText()class_=communityName).find(a,class_=,areaNameH).find(nspann,cIass_=areaName,).find(nspanH,community = contentTag.find(,divH, class_=ninfoH).getText()area =contentTag.findCdiv,class_=ninfon).find_all(nan)0.getText()town = contentTag.find(ndivn,class_=ninfon
41、).find_all(nan)l.getText()bianhao = contentTag.find(Hdivn, class_=nhouseRecordn).getText().replace(n 链 家二 “)replace(举报二*contentList.append(bianhao) contentList.append(area) contentList.append(town) contentList.append(community) contcntList.appcnd(mianji) contentList.append(price)mContentTag = soup.f
42、ind(ndivn, class_=nm-contentH)#基本属性ulTag = mContentTag.find(div”,class_=Hbasen).find(ndivn, class_=Hcontentn).ul lis = ulTag.find_all(Hlin)huxing = lisO.getText().replace(房屋户型”,”).replace(“nJ”) contentList.append(huxing)louceng = lisl.getText().replace(所在楼层:“).replace(“nJ) contentList.append(louceng
43、)mianji = lis2.getText().replace(建筑面积”,replace。面 contentList.append(mianj i)jiegou = lis3.getText().replace(“户型结构,).replace( W) contentList.append(jiegou)taoneimianji =由4名仃1()/(“套内面积“,”)4碗(”11“,) contentList.append(taoneimianji)leixing = lis5.getText().replace(建筑类型”,).replace(nJ) contentList.append(leixing)chaoxiang = lis6.getText().replace(房屋朝向“,”).replace(“nJ) contentList.ap