《Greenplum 数据库开发基础.ppt》由会员分享,可在线阅读,更多相关《Greenplum 数据库开发基础.ppt(102页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、学习学习学习学习 创造创造创造创造 超越超越超越超越 共享共享共享共享Greenplum 数据库开发基础数据库开发基础12014年10月China Construction Bank.|2目录Greenplum概述和数据分布数据加载和外部表客户端工具Greenplum数据库基础Greenplum与Oracle优化策略其他要点及示例China Construction Bank.|3MPP架构MPP(MassivelyParallelProcessing)Shared-NothingArchitectureChina Construction Bank.|4MPP无共享架构的优势Intercon
2、nectLoading数据分布在所有的并行节点上每个节点只处理其中一部分数据最优化的I/O处理所有的节点同时进行并行处理节点之间完全无共享,无I/O冲突自动化的并行处理机制内部处理自动化并行,无需人工分区或优化加载与访问方式与一般数据库相同最易于扩展的架构BI和数据分析的最佳选择增加节点实现线性扩展增加节点可线性增加存储、查询和加载性能China Construction Bank.|5Greenplum基本架构MPP(MassivelyParallelProcessing)Shared-NothingArchitectureNetworkInterconnect.MasterSeversQu
3、ery planning&dispatch SegmentSeversQuery processing&data storageSQLMapReduceExternalSourcesLoading,streaming,etc.China Construction Bank.|6Greenplum的并行处理特性v并行处理由系统自动完成,无需人工干预并行处理由系统自动完成,无需人工干预所有数据均匀分布到所有节点,每个节点都计算自己的部分数据,所以并行处理无需人工干预,系统自动完成。v无需复杂的调优需求,只需要加载数据和查询无需复杂的调优需求,只需要加载数据和查询DBA工作量极少,无需复杂的调优工作
4、和维护工作。v扩展性扩展性可线性扩展到10,000个节点每增加一个节点,查询、加载性能都成线性增长v客户端访问及第三方工具支持客户端访问及第三方工具支持完全支持数据库技术接口标准,例如:SQL,ODBC,JDBC,OLEDB等。同时,广泛地支持各个BI和ETL软件工具。China Construction Bank.|7Greenplum 基本体系架构基本体系架构China Construction Bank.|8客户端接口和程序客户端接口和程序lpsqllpgAdminIIIlODBC/DatadirectlJDBClPerlDBIlPythonllibpqlOLEDBChina Const
5、ruction Bank.|9Master Hostl访问系统的入口l建立与客户端的连接和管理lSQL的解析并形成执行计划l执行计划向Segment的分发l收集Segment的执行结果l协调工作处理过程lMaster不存储业务数据,只存储系统目录表和元数据(数据字典)China Construction Bank.|10Segmentv 每段(Segment)存放一部分用户数据v 一个系统可以有多段v 用户不能直接存取访问v 所有对段的访问都经过Masterv用户查询SQL的执行China Construction Bank.|11Interconnectv Greenplum数据库之间的连接
6、层v 进程间协调和管理v 基于千兆以太网架构v 属于系统内部私网配置v 支持两种协议:TCPorUDPChina Construction Bank.|12Greenplum 高可用性体系架构高可用性体系架构China Construction Bank.|13Master/Standby镜像镜像保护保护lStandby实时与 Master节点的 Catalog和事务日志保持同步lStandby节点用于当 Primary Master节点损坏时提供 Master服务China Construction Bank.|14数据冗余数据冗余-Segment镜像镜像保护保护l每个Segment的数据冗
7、余存放在另一个Segment上,数据实时同步l当PrimarySegment失败时,MirrorSegment将自动提供服务lPrimarySegment恢复正常后,使用gprecoversegF 同步数据。China Construction Bank.|15表分布的策略表分布的策略-并行计算的并行计算的基础基础lHash分布CREATETABLEDISTRIBUTEDBY(column,)同样数值的内容被分配到同一个Segment上l 循环分布 CREATETABLEDISTRIBUTEDRANDOMLY 具有同样数值的行内容并不一定在同一个Segment上China Constructi
8、on Bank.|16分布键(DistributionKeys)v用于将数据平均分布到Segments之中的一个或者多个字段v用表的主键作分布键可以使数据分布均匀v建表时使用 DISTRIBUTEDBY 子句定义表的分布键CREATETABLEsales(dtdate,prcfloat,qtyint,cust_idint,prod_idint,vend_idint)DISTRIBUTEDBY(dt,cust_id,prod_id);v如果表没有主键,或者没有合适的字段作为分布键,可以使用随机分布键(DISTRIBUTEDRANDOMLY)v如果没有明确定义分布键,系统会把第一个字段作为表的分布
9、键China Construction Bank.|17分布分布存储存储China Construction Bank.|18数据均匀分布-并行处理的关键43Oct 20 20051264Oct 20 20051145Oct 20 20054246Oct 20 20056477Oct 20 20053248Oct 20 200512OrderOrder#OrderDateCustomerID50Oct 20 20053456Oct 20 20052163Oct 20 20051544Oct 20 20051053Oct 20 20058255Oct 20 200555策略策略:数据尽可能的均匀
10、分布到每个节点China Construction Bank.|19查询命令的查询命令的执行执行China Construction Bank.|20SQL查询处理查询处理机制机制China Construction Bank.|21并行并行查询查询计划计划SELECT customer,amount FROM sales JOIN customer USING(cust_id)WHERE date=04302008;China Construction Bank.|22压缩存储和行列存储压缩存储和行列存储v压缩存储支持ZLIB和QUICKLZ方式的压缩,压缩比可到10:1压缩表只能是Appe
11、ndOnly方式压缩数据不一定会带来性能的下降,压缩表将消耗CPU资源,而减少I/O资源占用语法CREATETABLEfoo(aint,btext)WITH(appendonly=true,compresstype=zlib,compresslevel=5);v行或列存储模式列模式目前只支持AppendOnly 如果常用的查询只取表中少量字段,则列模式效率更高,如查询需要取表中的大量字段,行模式效率更高语法:CREATETABLEsales2(LIKEsales)WITH(appendonly=true,orientation=column);China Construction Bank.|
12、23锁China Construction Bank.|24停止活动的SQL查询v查看要停止的SQL查询的进程ID 执行 select*from pg_stat_activity 查看到当前数据库连接的IP 地址,用户名,提交的查询等。(另外也可以在master 主机上查看进程,对每个客户端连接,master 都会创建一个进程。ps-ef|grep-i postgres|grep-i con)#查询表是否被锁select procpid,t.*from pg_stat_activity t where usename=lds_betl and datname=ldsdb and waiting
13、=t;v停止SQL:执行 Select pg_cancel_backend(procpid)或者 Select pg_terminate_backend(procpid)或者 在MASTER OS:$kill procpid注:极端情况下,kill 不能停止SQL时,采用kill-11停止进程 千万不要使用kill-9,该操作导致数据库崩溃;生产系统请不要采用kill操作。China Construction Bank.|25表分区的概念表分区的概念v将一张大表逻辑性地分成多个部分,如按照分区条件进行查询,将减少数据的扫描范围,提高系统性能。v提高对于特定类型数据的查询速度和性能v也可以更方便
14、数据库的维护和更新v两种类型:Range分区(日期范围或数字范围)/如日期、价格等List 分区,例如地区、产品等vGreenplum中的表分区在使用中具有总表的继承性,并通过Check参数指定相应的子表v分区的子表依然根据分布策略分布在各segment上v分区是一种非常有用的优化措施,例如一年的交易按交易日期分区后,查询一天的交易性能将提高365倍!China Construction Bank.|26DataDistribution&PartitioningSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegm
15、ent2DSegment3ASegment3BSegment3CSegment3DJan2005Feb2005Mar2005Apr2005May2005Jun2005Jul2005Aug2005Sep2005Oct2005Nov2005Dec2005每个分区表的数据平均分布到各个节点表分区可减少数据的搜索范围,提高查询性能China Construction Bank.|27FullTableScanVS.PartitionPruningSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment
16、3ASegment3BSegment3CSegment3DSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSELECT COUNT(*)FROM orders WHERE order_date=Oct 20 2005
17、 AND order_date(non-superuser)vTo show a list of all databases:lvTo connect to another database:cdb_namevUse PGDATABASE environment variable to set the default databaseChina Construction Bank.|49SchemavTo Create:CREATESCHEMAvTo Drop:DROPSCHEMAvTo Edit:ALTERSCHEMAChange nameAssign new ownerPSQLTipsvT
18、o see the current schema:SELECTcurrent_schema();vTo see a list of all schemas in the database:dnvTo see the schema search path:SHOWsearch_path;vTo set the search path for a database:ALTERDATABASESETsearch_pathTOmyschema,public,pg_catalog;China Construction Bank.|50表vTo Create:CREATETABLEAdditional D
19、ISTRIBUTEDBY or DISTRIBUTEDRANDOMLY clauseSome syntax not supportedvTo Edit:ALTERTABLECannot alter distribution key columnsvTo Drop:DROPTABLEPSQLTipsvTo list tables in the database:dtvTo see structure of a table:d+table_namevTo list system catalog tables:dtSvTo list external tables only:dxvTo see th
20、e distribution key columns of a table:dtable_nameChina Construction Bank.|51表和字段约束vCHECK table or column constraintsvNOTNULL column constraintsvUNIQUE column constraintsOne allowed per tableUnique columns must also be in distribution keyNot allowed if table also has a primary keyvPRIMARYKEY table co
21、nstraintsUsed as distribution key by defaultvFOREIGNKEY constraints definitions are supported but not enforcedvForeign key relationships are utilized by the query planner to improve query plans.China Construction Bank.|52视图vView SQL Commands:CREATEVIEWDROPVIEW vPSQL Tips:To list all views while in p
22、sql:dv To see a view definition:d+view_name EXAMPLE:CREATEVIEWtoptenASSELECTname,rank,gender,yearFROMnames,rankWHERErank11ANDnames.id=rank.id;SELECT*FROMtoptenORDERBYyear,rank;China Construction Bank.|53索引v在Greenplum数据库中应谨慎创建索引v索引不一定都能优化查询v应测试索引是否真正提升了性能v删除没用的索引vPRIMARY KEY索引会自动创建v唯一性索引只能在分布键字段创建Chi
23、na Construction Bank.|54索引(续)v索引类型:B-tree Bitmapv索引相关SQL命令:CREATEINDEXALTERINDEXDROPINDEXREINDEXvPSQL Tips:在PSQL显示所有索引:di显示索引定义:d+index_name v大批量ETL加工最好不建索引,对性能提升作用不大vB-TREE适用每次通过单一字段筛选查询少量数据vB-MAP适用每次通过多个字段筛选查询大量数据China Construction Bank.|55其他数据库对象vFunctions and operatorsvSequencesvTriggersvTablesp
24、acesChina Construction Bank.|56数据类型v常用数据类型CHAR,VARCHAR,TEXTSmallint,integer,bigintNumeric,real,doubleprecisionTimestamp,date,timeBooleanArray 类型。如 integerv其它数据类型请参考China Construction Bank.|57常用系统表及常用系统表及视图视图v所有系统表在pg_catalogschemav标准PostgreSQL系统表(pg_*)v常用系统表:pg_stat_activitypg_tablespg_classpg_attri
25、bute pg_namespacev在psql显示所有系统表:dtSvPsql显示所有系统视图:dvSv其它catalog参考China Construction Bank.|58函数v日期函数Extract(day|month|year。Fromdate);Selectdate+1day:interval,date+1month:intervalSELECTdate_part(day,TIMESTAMP2001-02-1620:38:40);Result:16SELECTdate_trunc(hour,TIMESTAMP2001-02-1620:38:40);Result:2001-02-1
26、620:00:00pg_sleep(seconds);系统日期变量Current_dateCurrent_timeCurrent_timestampNow()Timeofday()在 事务中发生变化,以上函数在事务中不变China Construction Bank.|59函数v字符串处理函数Substr/length/lower/upper/trim/replace/positionrPad/lpadTo_char,|(字符串连接)substringlike,simillar to (模式匹配)v其它杂类Case。When/Coalescenullifgenerate_seriesIn/no
27、tin/exists/any/allChina Construction Bank.|60Built-InFunctions(SELECT)FunctionDescriptionExampleCURRENT_DATEReturns the current system date2006-11-06CURRENT_TIMEReturns the current system time16:50:54CURRENT_TIMESTAMPReturns the current system date and time2008-01-06 16:51:44.430000+00:00LOCALTIMERe
28、turns the current system time with time zone adjustment19:50:54LOCALTIMESTAMPReturns the current system date and time with time zone adjustment2008-01-06 19:51:44.430000+00:00CURRENT_ROLEROLEReturns the current database userjdoeChina Construction Bank.|61MathematicalFunctionsFunctionReturnsDescripti
29、onExampleResults+-*/sameAdd,Subtract,Multiply&Divide1+12%IntegerModulo10%20SameExponentiation224|/NumericSquare Root|/93|/NumericCube Root|/82!NumericFactorial!36&|#NumericBitwiseAnd,Or,XOR,Not91&1511NumericBitwiseShift left,right1 2162China Construction Bank.|62MathematicalFunctions(Continued)Funct
30、ionReturnsDescriptionExampleResultsabssameAbsolute Valueabs(-998.2)998.2ceiling(numeric)NumericReturns smallest integer not less than argumentceiling(48.2)49floor(numeric)NumericReturns largest integer not greater than argumentfloor(48.2)48pi()NumericThe constantpi()3.1419random()NumericRandom value
31、 between 0.0 and 1.0random().87663round()NumericRound to nearest integerround(22.7)23China Construction Bank.|63StringFunctionsFunctionReturnsDescriptionExampleResultsstring|stringTextString concatenationmy|mymymychar_length(string)Integernumber of chars in stringchar_length(mymy)4position(string in
32、 string)IntegerLocation of specified sub-stringposition(my in ohmy)3lower(string)TextConverts to lower caselower(MYMY)mymyupper(string)TextConverts to upper caseupper(mymy)MYMYsubstring(string from n for n)TextDisplays portion of stringsubstring(myohmy from 3 for 2)ohtrim(both,leading,trailing from
33、string)TextRemove leading and/or trailing characterstrim(mymy )mymyChina Construction Bank.|64StringFunctions(Continued)FunctionReturnsDescriptionExampleResultsinitcap(string)TextChanges caseinitcap(my my)My Mylength(string)IntegerReturns string lengthlength(mymy)4split_part(string,delimiter,occurre
34、nce)TextSeparates delimited listsplit_part(one|two|three,|,2)twoChina Construction Bank.|65DateFunctionsFunctionReturnsDescriptionExampleResultsage(timestamp,timestamp)TimestampDifference in years,months and daysage(2008-08-12 timestamp,current_timestamp)0 years 1 month 11 daysextract(field from tim
35、estamp)IntegerReturns year,month,day,hour,minute or secondextract(day from current_date)11now()TimestampReturns current date&timenow()2008-09-22 11:00:01overlapsBooleanSimplifies comparing date rangesWHERE(2008-01-01,2008-02-11)overlaps(2008-02-01,2008-09-11)TRUEChina Construction Bank.|66存储过程vGreen
36、plum支持SQL/PYTHON/PERL/C语言构建函数,以下着重介绍SQL 存储过程。一个存储过程就是一个事务,包括对子过程的调用都在一个事务内存储过程结构:CREATEFUNCTIONsomefunc()RETURNSintegerAS$DECLAREquantityinteger:=30;BEGINRETURN.;END;$LANGUAGEplpgsql;赋值给一个变量或行/记录赋值用下面方法:identifier:=expression例子:user_id:=20;执行一个没有结果的查询:PERFORMquery;一个例子:PERFORMcreate_mv(cs_session_pa
37、ge_requests_mv,my_query);存储过程请参考:存储过程请参考:China Construction Bank.|67存储过程v动态SQLEXECUTEcommand-stringINTOSTRICTtarget;vSELECTINTOExample:SELECTIDINTOVAR_IDFROMTABLEAv获取结果状态GETDIAGNOSTICSvariable=item,.;一个例子:GETDIAGNOSTICSinteger_var=ROW_COUNT;vSQL返回变量SQLERRM(SQL出错信息),SQLSTATE(SQL执行返回状态编码)v控制结构IF.THEN.
38、ELSEIF.THEN.ELSELOOP,EXIT,CONTINUE,WHILE,FORv从函数返回有两个命令可以用来从函数中返回数据:RETURN和RETURNNEXT。Syntax:RETURNexpression;v设置回调EXECSQLWHENEVERconditionaction;condition可以是下列之一:SQLERROR,SQLWARNING,NOTFOUNDChina Construction Bank.|68存储过程v异常处理EXCEPTIONWHENunique_violationTHEN-donothingEND;忽略错误:EXCEPTIONWHENOTHERSTH
39、ENRAISENOTICEanEXCEPTIONisabouttoberaised;RAISEEXCEPTIONNUM:%,DETAILS:%,SQLSTATE,SQLERRM;END;v错误和消息RAISElevelformat,expression,.;Level:Info:信息输入Notice:信息提示Exception:产生一个例外,将退出存储过程Example:RAISENOTICECallingcs_create_job(%),v_job_id;China Construction Bank.|69OVER(PARTITIONBY)ExampleSELECT*,row_number
40、()OVER()FROMsaleORDERBYcn;SELECT*,row_number()OVER(PARTITION BY cn)FROMsaleORDERBYcn;row_number|cn|vn|pn|dt|qty|prc-+-+-+-+-+-+-1|1|10|200|1401-03-01|1|02|1|30|300|1401-05-02|1|03|1|50|400|1401-06-01|1|04|1|30|500|1401-06-01|12|55|1|20|100|1401-05-01|1|01|2|50|400|1401-06-01|1|02|2|40|100|1401-01-01
41、|1100|24001|3|40|200|1401-04-01|1|0(8rows)row_number|cn|vn|pn|dt|qty|prc-+-+-+-+-+-+-1|1|10|200|1401-03-01|1|02|1|30|300|1401-05-02|1|03|1|50|400|1401-06-01|1|04|1|30|500|1401-06-01|12|55|1|20|100|1401-05-01|1|06|2|50|400|1401-06-01|1|07|2|40|100|1401-01-01|1100|24008|3|40|200|1401-04-01|1|0(8rows)C
42、hina Construction Bank.|70OVER(ORDERBY)ExampleSELECTvn,sum(prc*qty)FROMsaleGROUPBYvnORDERBY2DESC;SELECTvn,sum(prc*qty),rank()OVER(ORDER BY sum(prc*qty)DESC)FROMsaleGROUPBYvnORDERBY2DESC;vn|sum|rank-+-+-40|2640002|130|180|250|0|320|0|310|0|3(5rows)vn|sum-+-40|264000230|18050|020|010|0(5rows)China Con
43、struction Bank.|71事务事务v事务将多个语句捆绑为all-or-nothing操作v事务相关命令BEGINorSTARTTRANSACTIONENDorCOMMITROLLBACKSAVEPOINTandROLLBACKTOSAVEPOINTvPsql中设置自动提交模式:setautocommiton|offChina Construction Bank.|72目录Greenplum概述和数据分布数据加载和外部表客户端工具Greenplum数据库基础Greenplum与Oracle优化策略其他要点及示例China Construction Bank.|73数据类型数据类型Dat
44、aTypeORACLEGreenplumNumericNUMBER(p,s)SMALLINT (2 bytes)INTEGER (4 bytes)BIGINT (8 bytes)DECIMAL(p,s)(11+p/2 bytes)NUMERIC(p,s)(11+p/2 bytes)REAL (4 bytes)DOUBLE (8 bytes)CharacterCHAR and NCHARCHARVARCHAR2 and NVARCHAR2VARCHARDate&TimeDATE(includes time to sec)DATE or TIMESTAMP without Time ZoneTIM
45、ESTAMPTIMESTAMPINTERVALINTERVAL/TIMEBinaryBFILE(1 GB)Large Objects(up to 2 GB)RAW BFILE(1 GB)BYTEACLOB and NCLOBTEXTChina Construction Bank.|74Greenplum与与Oracle比较(比较(1)DifferenceOracleGreenplumDUALSELECT 1+1 FROM DUALSELECT 1+1;orCREATE VIEW dual AS SELECT X:VARCHAR(1)AS DUMMY;SELECT 1+1 FROM dual;N
46、EXTVALSELECT A_TABLE_SEQUENCE.NEXTVAL FROM DUALSELECT NEXTVAL(A_TABLE_SEQUENCE)FROM DUALROWNUMSELECT*FROM AGE_TYPE WHERE ROWNUM=5SELECT*FROM AGE_TYPELIMIT 5 OFFSET 0SELECT*FROM AGE_TYPE WHERE CODE IS NOT NULLAND ROWNUM=5 ORDER BY CODE DESCSELECT*FROM AGE_TYPEWHERE CODE IS NOT NULLORDER BY CODE DESCL
47、IMIT 5 OFFSET 0ASSELECT A.COL1 A_COL1,A.COL2 A_COL2FROM A_TABLE ASELECT A.COL1 AS A_COL1,A.COL2 AS A_COL2FROM A_TABLE AChina Construction Bank.|75Greenplum与与Oracle比较(比较(2)DifferenceOracleGreenplum(+)SELECT*FROM A_TABLE A,B_TABLE B WHERE A.ID(+)=B.IDSELECT*FROM A_TABLE A RIGHT OUTER JOIN B_TABLE B ON
48、 A.ID=B.IDSELECT*FROM A_TABLE A,B_TABLE B WHERE A.ID(+)=B.IDAND A.COL1=COL1_VALUESELECT*FROM A_TABLE A RIGHT OUTER JOIN B_TABLE B ON A.ID=B.ID AND A.COL1=COL1_VALUESELECT*FROM A_TABLE A,B_TABLE B,C_TABLE C,D_TABLE DWHERE A.ID=B.ID(+)AND A.ID=C.ID(+)ANDA.COL1=D.COL1SELECT*FROM(A_TABLE ALEFT OUTER JOI
49、N B_TABLE BON A.ID=B.ID)LEFT OUTER JOIN C_TABLE CON A.ID=C.ID,D_TABLE DWHERE A.COL1=D.COL1SELECT*FROM A_TABLE AWHERE A.COL1(+)=0 AND A.COL2(+)=A_VALUE2SELECT*FROM A_TABLE AWHERE A.COL1=0 AND A.COL2=A_VALUE2China Construction Bank.|76Greenplum与与Oracle比较(比较(3)DifferenceOracleGreenplumNVLSELECT NVL(SUM
50、(VALUE11),0)FS_VALUE1,NVL(SUM(VALUE21),0)FS_VALUE2FROM FIELD_SUMSELECT COALESCE(SUM(VALUE11),0)AS FS_VALUE1,COALESCE(SUM(VALUE21),0)AS FS_VALUE2FROM FIELD_SUMTO_NUMBERSELECT COL1 FROM A_TABLEORDER BY TO_NUMBER(COL1)SELECT COL1 FROM A_TABLEORDER BY TO_NUMBER(COL1,999999)note:999999 is length of COL1D