大数据介绍英文方案ppt课件.ppt

上传人:飞****2 文档编号:29416933 上传时间:2022-07-30 格式:PPT 页数:33 大小:1.50MB
返回 下载 相关 举报
大数据介绍英文方案ppt课件.ppt_第1页
第1页 / 共33页
大数据介绍英文方案ppt课件.ppt_第2页
第2页 / 共33页
点击查看更多>>
资源描述

《大数据介绍英文方案ppt课件.ppt》由会员分享,可在线阅读,更多相关《大数据介绍英文方案ppt课件.ppt(33页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、BIG DATAEVERY MINUTE1,388 cabs2,777private carsDidi rides hailed:EVERY MINUTE 395,833People log inTo WeChat 194,444 peopleare video or audio chattingEVERY MINUTE625,000Youku Tudou videosbeing watchedEVERY MINUTE64,814posts and reposts on WeiboSEARCH4,166,667 search queriesEVERY MINUTE774 people buy

2、something on Alibabas marketplacesUS$1,133,942spent on Alibaba1Definition2Characteristic3NoSQL4RDBMS5MapReduceCONTENTS6Applications1Definition1DefinitionBIGDATAvolume of dataimportant dataon a day-to-day basisfor better decisions2Characteristic2CharacteristicVolumeThe quantity of generated and store

3、d data.VarietyThe type and nature of the data.The quality of captured data can vary greatly, affecting accurate analysis.VelocityIn this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.VariabilityIncons

4、istency of the data set can hamper processes to handle and manage it.Veracity3NoSQL3NoSQLNoSQL refers to document-oriented databases SQL doesnt scale well horizontally. It is schemaless. But not formless (JSON format). JSON: data interchange format Mongo Database Couch Database3NoSQLBasic Availabili

5、tyspread data across many storage systems with a high degree of replication.Soft StateEventual ConsistencyBase Modeldata consistency is the developers problem and should not be handled by the database.at some point in the future, data will converge to a consistent state. No guarantees are made “when

6、”.3NoSQL field1: value1, field2: value2 fieldN: valueN var mydoc = _id:ObjectId(5099803df3f4948bd2f98391), name: first: Alan, last: Turing , birth: new Date(Jun 23, 1912), death: new Date(Jun 07, 1954), contribs: Turing machine, Turing test, , views : NumberLong(1250000) JSON Structure3NoSQLRDBMS vs

7、 NoSQL XszcRow DB:001:10,Smith,Joe,40000;002:12,Jones,Mary,50000;003:11,Johnson,Cathy,44000;004:22,Jones,Bob,55000; index: 001:40000;002:50000;003:44000;004:55000;Column DB:10:001,12:002,11:003,22:004;Smith:001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002,Cathy:003,Bob:004;40000:001,50000 ;Smith

8、:001,Jones:002,004,Johnson:003;3NoSQLBenefits Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data. Column-o

9、riented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows. Row-oriented organizations are more efficient when many columns of

10、a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek. Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written

11、 with a single disk seek.3NoSQLSQL vs Non SQLA good compromise is to design your system with 3 logical DBs 1. Normal SQL DB used by your admin application to create content. 2. No-SQL DB for front-end/public/high-volume applicaiton used by the public internet. 3. The last DB is for analytical report

12、ing system using cubes and all that good stuff. Then data flows from the Admin DB to the client No-SQL DB when someone Publishes a piece of content, the client (NoSQL) db provides very fast read access and records user interactions with the content. Then you have a scheduled job that pulls the data

13、from the client DB into the reporting system. Since Admin, client, and reporting are often separate apps, each application team can work with data in the format that best serves the application and the transition from one system to the other is handled in the service layers. 4RDBMS4RDBMSfixed-schema

14、, row-oriented databases with ACID properties and a sophisticated SQL query engineThe emphasis is on strong consistency, referential integrity, abstraction from the physical layer, and complex queries through the SQL language.easily create secondary indexes, perform complex inner and outer joins, co

15、unt, sum, sort, group, and page your data across a number of tables, rows, and columns.5MapReduceDividing and conqueringHighly fault tolerantEvery data block replicated on 3 nodesDifficult to implement5MapReduce5Comparison RDBMSMapReduceData sizeGBPBAccessInteractive and Batch Batch UpdatesRead /Wri

16、te many times Write once ,Read many times Structure Static Schema Dynamic Scheme Integrated High(ACID)Low Scaling No liner Liner DBA Ratio 1:401:30005How does MapReduce workMapReduce uses key/value pairs. (Traditionally using rows and columns)-Mapall the intermediate values for a given output key ar

17、e combined together into a list. -ReduceThe reduce function then combines the intermediate values into one or more final values for the same key. -ReduceTwo steps: Map and Reduce6Application6GovernmentThe use and adoption of big data within governmental processes is beneficial and allows efficiencie

18、s in terms of cost, productivity, and innovation, but does not come without its flaws. Data analysis often requires multiple parts of government (central and local) to work in collaboration and create new and innovative processes to deliver the desired outcome. Below are the thoughtby whom? leading

19、examples within the governmental big data space.6HealthcareBig data analytics has helped healthcare improve by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics, waste and care variability reduction, automated external and internal report

20、ing of patient data, standardized medical terms and patient registries and fragmented point solutions.6EducationA McKinsey Global Institute study found a shortage of 1.5 million highly trained data professionals and managers and a number of universities including University of Tennessee and UC Berke

21、ley, have created masters programs to meet this demand. Private bootcamps have also developed programs to meet that demand, including free programs like The Data Incubator or paid programs like General Assembly.6Internet of ThingsBig Data and the IoT work in conjunction. From a media perspective, da

22、ta is the key derivative of device inter-connectivity and allows accurate targeting. The Internet of Things, with the help of big data, therefore transforms the media industry, companies and even governments, opening up a new era of economic growth and competitiveness. The intersection of people, da

23、ta and intelligent algorithms have far-reaching impacts on media efficiency. The wealth of data generated allows an elaborate layer on the present targeting mechanisms of the industry.6SportsBig data can be used to improve training and understanding competitors, using sport sensors. Besides, it is p

24、ossible to predict winners in a match using big data analytics. Future performance of players could be predicted as well. Thus, players value and salary is determined by data collected throughout the season.THANKS5Comparison 1KB=2(10)B=1024B1MB=2(10)KB=1024KB 1GB=2(10)MB=1024MB 1TB=2(10) GB=1024GB 1PB=2(10) TB=1024TB1EB=2(10) PB=1024PB Back

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育专区 > 教案示例

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁