精通数据仓库设计.doc

上传人:豆**** 文档编号:17666677 上传时间:2022-05-25 格式:DOC 页数:26 大小:255.50KB
返回 下载 相关 举报
精通数据仓库设计.doc_第1页
第1页 / 共26页
精通数据仓库设计.doc_第2页
第2页 / 共26页
点击查看更多>>
资源描述

《精通数据仓库设计.doc》由会员分享,可在线阅读,更多相关《精通数据仓库设计.doc(26页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、【精品文档】如有侵权,请联系网站删除,仅供学习与交流精通数据仓库设计.精品文档.精通数据仓库设计(Mastering Data Warehouse Design)中英对照精通数据仓库设计(Mastering Data Warehouse Design)中英对照第一部分 基本概念我们发现,理解为什么采纳某个具体的方法,能帮助我们理解这个方法的价值并应用这个方法。因此,这一节的开始,我们先介绍企业信息工厂(Corporate Information Factory CIF),这种已经被证明的、稳定的体系结构。在这种体系结构下,商业智能(BI),包含两种形式的数据存贮,每一种都有一个BI环境下具体的

2、角色。第一类数据存贮是数据仓库,数据仓库主要的角色是担当数据知识库,存贮来自不同数据源的数据,使它能被另一类数据存贮访问。另一类数据存贮就是数据集市。总的来说,设计数据仓库最有效的方法是基于实体-关系数据模型和范式技术(由Code 和 Date 最初在1970,90,90年代为关系数据库创建)。PA数据集市的主要角色是提供企业用户一个容易的访问优良的、集成的信息的方法。在第1章描述有几种类型的数据集市,最常用的数据集市是创建联机分析处理(OLAP),OLAP最有效的设计方法是维度数据模型。在第2章,我们继续这个基本的主题,解释最重要的关系建模技术,介绍所需要的不同类型的模型,提供建立关系模型的

3、过程,同时,我们解释为企业构建一个坚固的基础时,商业数据型、系统数据、技术数据等模型等各类数据模型之间的关系,并解释他们之间是如何互相共享或继承特性。第1章 介绍欢迎阅读本书,这是第一本彻底描述构建一个多用途的、稳定的、可持续的,支持商业智能的数据仓库建模技术的书。这一章介绍BI及数据仓库的目标,解释他们如何组合成一个整体的企业信息工厂体系结构,讨论数据仓库建设的迭代性,论证数据仓库数据模型的重要性,以及采用这种数据模型形式的理由。我们讨论这种模型形式为什么应该基于关系设计技术,阐明是为了满足最小冗余,最大稳定性和可维护性的需要。这一章的另一节列出了可维护的数据仓库环境的特点。最后讨论这种建模

4、方法对最终交付数据集市的影响。这一章,让读者理解后续章节的基本原理,后续章节会描述创建数据仓库模型的细节。Chapter 1 Introduction CHAPTEWelcome to the first book that thoroughly describes the data modeling techniques used in constructing a multipurpose, stable, and sustainable data warehouse used to support business intelligence (BI). This chapter intr

5、oduces the data warehouse by describing the objectives of BI and the data warehouse and by explaining how these fit into the overall Corporate Information Factory (CIF) architecture. It discusses the iterative nature of the data warehouse constructionand demonstrates the importance of the data wareh

6、ouse data model and the justification for the type of data model format suggested in this book. We discuss why the format of the model should be based on relational design techniques, illustrating the need to maximize nonredundancy, stability, and maintainability. Another section of the chapter outl

7、ines the characteristics of a maintainable data warehouse environment. The chapter ends with a discussion of the impact of this modeling approach on the ultimate delivery of the data marts. This chapter sets up the reader to understand the rationale behind the ensuing chapters, which describe in det

8、ail how to create the data warehouse data model.1.1商业智能概述商业智能,在数据仓库领域,指的是一个企业学习过去的行为与活动,理解组织的过去,确定组织的现状,预计或者改变将来会发生的事情的能力。BI的概念已经提出20年了,让我们简短的回顾过去令人兴奋的、不断创新的10年。Overview of Business IntelligenceBI, in the context of the data warehouse, is the ability of an enterprise to study past behaviors and acti

9、ons in order to understand where the organization has been, determine its current situation, and predict or change what will happen in the future. BI has been maturing for more than 20 years. Lets briefly go over the past decade of this fascinating and innovative history. 也许你熟悉技术采纳曲线,最早采用新技术的公司叫创新者,

10、下一类叫作早期采纳者,然后有前半数成员、后半数成员,最后是落伍者。这个曲线是传统的钟型曲线,在开始的时候成指数增长,在后半周期市场缓慢下降。新技术一旦被引进,往往价钱昂贵且不完善,而很难应用;经过一段时间,性价比可以接受。手机(蜂窝电话)就是一个很好的例子。曾经,只有革新者(医生和律师?)带着手机,又笨重又昂贵,信号不连续,经常丢失通话。现在,你只要花60美元,随处可以拥有一个手机,且服务非常的可靠。Youre probably familiar with the technology adoption curve. The first companies to adopt the new t

11、echnology are called innovators. The next category is known as the early adopters, then there are members of the early majority, members of the late majority, and finally the laggards. The curve is a traditional bell curve, with exponential growth in the beginning and a slowdown in market growth occ

12、urring during the late majority period. When new technology is introduced, it is usually hard to get, expensive, and imperfect. Over time, its availability, cost, and features improve to the point where just about anyone can benefit from ownership. Cell phones are a good example of this. Once, only

13、the innovators (doctors and lawyers?) carried them. The phones were big, heavy, and expensive. The service was spotty at best, and you got “dropped” a lot. Now, there are deals where you can obtain a cell phone for about $60, the service providers throw in $25 of airtime, and there are no monthly fe

14、es, and service is quite reliable.数据仓库是这种采纳曲线另一个很好的例子。事实上,如果你还没有开始你的第一个数据仓库项目,那没有比现在更好的开始时间了。今天管理人期望得到大多数好的,及时的信息,用于领导企业进入下一个年代的、基于知识的决策,他们经常做到了,然而,并不是每次都这样。Data warehousing is another good example of the adoption curve. In fact, if you havent started your first data warehouse project, there has nev

15、er been a better time. Executives today expect, and often get, most of the good, timely information they need to make informed decisions to lead their companies into the next decade. But this wasnt always the case.就在在10年前,同样的管理者批准开发决策信息系统(Executive information systems EIS)来满足他们的需要。发起人后面的基本概念是合理的:以实时

16、的方式,提供给管理者容易访问的关键性能信息。然而,很多这类系统没有实现它们目标,大多数是因为基本的体系结构不能快速响应企业环境的变化。早期EIS系统另一个显著的缺点是需要花费大量的精力去提供管理者所需要的数据。数据获取,即提取、转换、装载(ETL)过程是一系列复杂的活动,它们的唯一目的是获取最准确的、集成的数据,然后通过数据仓库或者操作型数据存贮(ODS)让企业访问。Just a decade ago, these same executives sanctioned the development of executive information systems (EIS) to meet

17、their needs. The concept behind EIS initiatives was soundto provide executives with easily accessible key performance information in a timely manner. However, many of these systems fell short of their objectives, largely because the underlying architecture could not respond fast enough to the enterp

18、rises changing environment. Another significant shortcoming of the early EIS days was the enormous effort required to provide the executives with the data they desired. Data acquisition or the extract, transform, and load (ETL) process is a complex set of activities whose sole purpose is to attain t

19、he most accurate and integrated data possible and make it accessible to the enterprise through the data warehouse or operational data store (ODS).整个过程以手工密集的活动开始:硬编码“数据吸管”是唯一从操作型系统获取数据的方法,用于商业分析师的访问。这有点类似于早期的电话,穿着轮滑来回穿梭的操作员很难通过插入正确的线绳,连接你呼叫的电话。The entire process began as a manually intensive set of a

20、ctivities. Hard-coded “data suckers” were the only means of getting data out of the operational systems for access by business analysts. This is similar to the early days of telephony, when operators on skates had to connect your phone with the one you were calling by racing back and forth and manua

21、lly plugging in the appropriate cords.幸运的是,我们已经比那个年代前进了很多,数据仓库行业已经开发了太多的工具和技术支持数据的获取过程。现在,大多数ETL过程都已经自动化,就像今天的电话系统。同时,类似于电话的发展,这个过程保留了一些困难的,或者说本身决定的,复杂的问题。没有两个公司有同样数据获取过程,甚至不会有同样的问题。今天,大多数拥有重要数据仓库的大公司,严重依赖于 ETL工具,用于设计,构建和维护他们的BI环境。过去十年,另一个主要的改变是建模技术和工具的引入,带到了“容易使用”的阶段。由RalphKimball博士等人提出的维度建模概念,对全球的

22、支持联机分析处理(OLAP)多维模型数据集市造成很大影响。Fortunately, we have come a long way from those days, and the data warehouse industry has developed a plethora of tools and technologies to support the data acquisition process. Now, progress has allowed most of this process to be automated, as it has in todays telephony

23、 world. Also, similar to telephony advances, this process remains a difficult, if not temperamental and complicated, one. No two companies will ever have the same data acquisition activities or even the same set of problems. Today, most major corporations with significant data warehousing efforts re

24、ly heavily on their ETL tools for design, construction, and maintenance of their BI environments.Another major change during the last decade is the introduction of tools and modeling techniques that bring the phrase “easy to use” to life. The dimensional modeling concepts developed by Dr. Ralph Kimb

25、all and others are largely responsible for the widespread use of multidimensional data marts to support online analytical processing.除了多维分析,还开发了其它一些复杂的技术用于支持数据挖掘、统计分析、探索等需要。现在,一个成熟的BI环境需要比星型模式多得多:平文件、无偏数据统计子集,规范化数据结构模式等,除了星形模式,所有这些都属数据仓库必须支持的、重要的数据需求。当然,我们不能低估互联网对数据仓库的影响。互联网消除了计算机的神秘性,管理者在日常生活中使用互联网

26、,不再对触摸键盘心存芥蒂。终端用户工具公司认识到了互联网的影响,且大多数都利用了这种成就:它们的界面都复制了流行的互联网浏览器与搜索引擎的视觉特性。这些工具的强大及直观,导致商业分析师和管理者广乏使用BI。In addition to multidimensional analyses, other sophisticated technologies have evolved to support data mining, statistical analysis, and exploration needs. Now mature BI environments require much

27、more than star schemas flat files, statistical subsets of unbiased data, normalized data structures, in addition to star schemas, are all significant data requirements that must be supported by your data warehouse.Of course, we shouldnt underestimate the impact of the Internet on data warehousing. T

28、he Internet helped remove the mystique of the computer. Executives use the Internet in their daily lives and are no longer wary of touching the keyboard. The end-user tool vendors recognized the impact of the Internet, and most of them seized upon that realization: to design their interface suchthat

29、 it replicated some of the look-and-feel features of the popular Internet browsers and search engines. The sophisticationand simplicityof these tools has led to a widespread use of BI by business analysts and executives.发生最近几年的另一个重要事件是:发生了从技术追赶业务到业务驱使技术的转变。在BI的早期,信息技术(IT)部门认识到了BI的价值,并努力向商业团体兜售这些价值。不

30、幸的是,有时IT伙计向商业团体兜售的是构建数据仓库的希望。今天,复杂的决策支持环境的价值在商业界得到广发的认同。例如,一个有效的客户关系管理程序不能离开战略(含有相关数据集市的数据仓库)和战术(操作型数据存贮和操作型集市)的决策支持能力。(见图1.1):Another important event taking place in the last few years is the transformation from technology chasing the business to the business demanding technology. In the early days

31、 of BI, the information technology (IT) group recognized its value and tried to sell its merits to the business community. In some unfortunate cases, the IT folks set out to build a data warehouse with the hope that the business community would use it. Today, the value of a sophisticated decision su

32、pport environment is widely recognized throughout the business. As an example, an effective customer relationship management program could not exist without strategic (data warehouse with associated marts) and a tactical (operational data store and oper mart) decision-making capabilities. (See Figur

33、e 1.1)BI体系结构过去十年最重要的发展是提出了广为接受的BI体系结构,支持所有的技术需求。这种体系结构认识到EIS方法有不少重大缺陷,最严重的缺陷是EIS数据结构常常从源系统直接获取数据,导致需要非常复杂的数据获取环境,需要大量的人力和计算机资源去维护。CIF(见图1.2)体系,现在已经有大多数决策支持系统使用,通过把数据隔离成主要的5个数据库(操作型系统,数据仓库,操作型数据存贮,数据集市,操作集市)来解决这个问题,把从源系统到商业用户的数据移动过程合并为一个高效的过程。rBI ArchitectureOne of the most significant developments d

34、uring the last 10 years has been the introduction of a widely accepted architecture to support all BI technological demands. This architecture recognized that the EIS approach had several major flaws, the most significant of which was that the EIS data structures were often fed directly from source

35、systems, resulting in a very complex dataacquisition environment that required significant human and computer resources to maintain. The Corporate Information Factory (CIF) (see Figure 1.2), the architecture used in most decision support environments today, addressed that deficiency by segregating d

36、ata into five major databases (operational systems, data warehouse, operational data store, data marts, and oper marts) and incorporating processes to effectively and efficiently move data from the source systems to the business users.(翻转90度之后的图:)这些组件进一步分为两个主要的组。“取数据入”组从操作型系统获取数据,集成,清洗并推入数据库,以方便使用。在

37、CIF中包含如下组件:操作型系统数据库(源系统)包含公司日常的商业数据,这仍然是决策支持系统最主要的数据来源。 数据仓库是集成的、包含明细的、包含历史数据的数据集合,用于支持战略决策。操作型数据存贮是集成的,明细的,现在的数据集合,用于支持战术决策。These components were further separated into two major groupings of components and processes: Getting data in consists of the processes and databases involved in acquiring dat

38、a from the operational systems, integrating it, cleaning it up, and putting it into a database for easy usage. The components of the CIF that are found in this function: The operational system databases (source systems) contain the data used to run the day-to-day business of the company. These are s

39、till the major source of data for the decision support environment. The data warehouse is a collection or repository of integrated, detailed, historical data to support strategic decision-making. The operational data store is a collection of integrated, detailed, current data to support tactical dec

40、ision making.“数据获取”组是一系列的过程和程序,用于从操作型系统抽取数据到数据仓库和操作型数据存贮。数据获取过程执行数据集成、清洗功能,把数据转换为企业统一的格式。这种企业级的格式,反映了一个企业商业规则的集成的集合。数据获取层是CIP体系中最复杂的一部份。除了清洗和转换外,数据获取层还包含审计和控制过程,保证进入数据仓库或操作型数据存贮系统数据的完整性。“取信息出”由一系列过程和数据库组成,用于把BI交付给最终的企业用户和分析师,在CIF中包括如下组件:从数据仓库分离出的数据集市,用于提供商业团体各种各样的决策分析支持。从ODS 分离出的操作集市,用于提供商业团体对现在的操作型

41、数据进行多维访问。把数据从数据仓库转移到操作集市的过程叫数据交付。类似于数据获取层,在移动数据的同时也制造数据。只是在数据交付时,来源是数据仓库或ODS,这里已经包含了高质量的,集成的数据,且数据符合企业的商业规则。 Data acquisition is a set of processes and programs that extracts data for the data warehouse and operational data store from the operational systems. The data acquisition programs perform th

42、e cleansing as well as the integration of the data and transformation into an enterprise format. This enterprise format reflects an integrated set of enterprise business rules that usually causes the data acquisition layer to be the most complex component in the CIF. In addition to programs that tra

43、nsform and clean up data, the data acquisition layer also includes audit and control processes and programs to ensure the integrity of the data as it enters the data warehouse or operational data store. Getting information out consists of the processes and databases involved in delivering BI to the

44、ultimate business consumer or analyst. The components of the CIF that are found in this function: The data marts are derivatives from the data warehouse used to provide the business community with access to various types of strategic analysis. The oper marts are derivatives of the ODS used to provid

45、e the business community with dimensional access to current operational data. Data delivery is the process that moves data from the data warehouse into data and oper marts. Like the data acquisition layer, it manipulates the data as it moves it. In the case of data delivery, however, the origin is t

46、he data warehouse or ODS, which already contains high quality, integrated data that conforms to the enterprise business rules.CIF体系并不是一开始就如此。一开始,它由数据仓库和一些轻量级的汇总数据、高度汇总数据组成最开始,需要历史数据的集合用来支持战略决策。一段时间后,产生了操作型数据存贮,用于支持战术决策支持系统;轻量级与高度汇总的数据存放在现在所谓的数据集市里。让我们看看CIF的运转情况。客户关系管理(CRM)是一个普通的需求驱动器,驱动了战术信息部件(操作型系统

47、,操作型数据存贮,操作型集市),战略信息部件(数据仓库和各种类型的数据集市)。当然,对CRM来说,这些技术是必须的,但远远不止这些技术,除了为客户和组织提供长期价值外,它还需要商业策略,企业文化与架构,客户信息等。提供的架构非常适合环境,在这个体系架构里,每一个部件都有专门的设计和功能。The CIF didnt just happen. In the beginning, it consisted of the data warehouse and sets of lightly summarized and highly summarized datainitially a collect

48、ion of the historical data needed to support strategic decisions. Over time, it spawned the operational data store with a focus on the tactical decision support requirements as well. The lightly and highly summarized sets of data evolved into what we now know are data marts.Lets look at the CIF in a

49、ction. Customer Relationship Management (CRM) is a highly popular initiative that needs the components for tactical information (operational systems, operational data store, and oper marts) and for strategic information (data warehouse and various types of data marts). Certainly this technology is necessary for CRM, but CRM requires more than just the technology it also requires

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育专区 > 小学资料

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁