《大型网站所使用的工具.ppt》由会员分享,可在线阅读,更多相关《大型网站所使用的工具.ppt(33页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、大型网站所使用的工具 Still waters run deep.流静水深流静水深,人静心深人静心深 Where there is life,there is hope。有生命必有希望。有生命必有希望How to scale up web service in the past?Source:http:/ Intro王耀聰陳威宇jazznchc.org.twwauenchc.org.tw教育訓練課程教育訓練課程HBaseisadistributedcolumn-orienteddatabasebuiltontopofHDFS.HBase is.lAdistributeddatastoretha
2、tcanscalehorizontallyto1,000sofcommodityserversandpetabytesofindexedstorage.lDesignedtooperateontopoftheHadoopdistributedfilesystem(HDFS)orKosmosFileSystem(KFS,akaCloudstore)forscalability,faulttolerance,andhighavailability.lIntegratedintotheHadoopmap-reduceplatformandparadigm.BenefitslDistributedst
3、oragelTable-likeindatastructureumulti-dimensionalmaplHighscalabilitylHighavailabilitylHighperformanceWho use HBase lAdobe內部使用(Structuredata)lKalooga圖片搜尋引擎http:/ Is Not lTableshaveoneprimaryindex,therow key.lNojoinoperators.lScansandqueriescanselectasubsetofavailablecolumns,perhapsbyusingawildcard.lT
4、herearethreetypesoflookups:uFastlookupusingrowkeyandoptionaltimestamp.uFulltablescanuRangescanfromregionstarttoend.HBase Is Not(2)lLimitedatomicityandtransactionsupport.uHBasesupportsmultiplebatchedmutationsofsinglerowsonly.uDataisunstructuredanduntyped.lNoaccessedormanipulatedviaSQL.uProgrammaticac
5、cessviaJava,REST,orThriftAPIs.uScriptingviaJRuby.Why Bigtable?lPerformanceofRDBMSsystemisgoodfortransactionprocessingbutforverylargescaleanalyticprocessing,thesolutionsarecommercial,expensive,andspecialized.lVerylargescaleanalyticprocessinguBigqueriestypicallyrangeortablescans.uBigdatabases(100sofTB
6、)Why Bigtable?(2)lMapreduceonBigtablewithoptionallyCascadingontoptosupportsomerelationalalgebrasmaybeacosteffectivesolution.lShardingisnotasolutiontoscaleopensourceRDBMSplatformsuApplicationspecificuLaborintensive(re)partitionaingWhy HBase?lHBaseisaBigtableclone.lItisopensourcelIthasagoodcommunityan
7、dpromiseforthefuturelItisdevelopedontopofandhasgoodintegrationfortheHadoopplatform,ifyouareusingHadoopalready.lIthasaCascadingconnector.HBase benefits than RDBMSlNo real indexeslAutomatic partitioninglScale linearly and automatically with new nodeslCommodity hardwarelFault tolerancelBatch processing
8、Data ModellTablesaresortedbyRowlTableschemaonlydefineitscolumn families.uEachfamilyconsistsofanynumberofcolumnsuEachcolumnconsistsofanynumberofversionsuColumnsonlyexistwheninserted,NULLsarefree.uColumnswithinafamilyaresortedandstoredtogetherlEverythingexcepttablenamesarebytel(Row,Family:Column,Times
9、tamp)ValueRow keyColumn FamilyvalueTimeStampMemberslMasteruResponsibleformonitoringregionserversuLoadbalancingforregionsuRedirectclienttocorrectregionserversuThecurrentSPOFlregionserver slavesuServingrequests(Write/Read/Scan)ofClientuSendHeartBeattoMasteruThroughputandRegionnumbersarescalablebyregio
10、nserversRegionsl表格是由一或多個region所構成uRegion是由其startKey與endKey所指定l每個region可能會存在於多個不同節點上,而且是由數個HDFS檔案與區塊所構成,這類region是由Hadoop負責複製實際個案討論 部落格l邏輯資料模型u一篇Blogentry由title,date,author,type,text欄位所組成。u一位User由username,password等欄位所組成。u每一篇的Blogentry可有許多Comments。u每一則comment由title,author,與text組成。lERD部落格 HBase Table Sc
11、hemalRowkeyutype(以2個字元的縮寫代表)與timestamp組合而成。u因此rows會先後依type及timestamp排序好。方便用scan()來存取Table的資料。lBLOGENTRY與COMMENT的”一對多”關係由comment_title,comment_author,comment_text等columnfamilies內的動態數量的column來表示l每個Column的名稱是由每則comment的timestamp來表示,因此每個columnfamily的column會依時間自動排序好ArchitectureZooKeeperlHBasedependsonZoo
12、Keeper(Chapter13)andbydefaultitmanagesaZooKeeperinstanceastheauthorityonclusterstateOperation The-ROOT-tableholdsthelistof.META.tableregionsThe.META.tableholdsthelistofalluser-spaceregions.Installation(1)$wgethttp:/ Host1,Host2hbase.zookeeper.property.dataDir/var/hadoop/hbase-dataStartup&Stopl全部啟動/關
13、閉$bin/start-hbase.sh$bin/stop-hbase.shl個別啟動/關閉$bin/hbase-daemon.sh start/stop zookeeper$bin/hbase-daemon.sh start/stop master$bin/hbase-daemon.sh start/stop regionserver$bin/hbase-daemon.sh start/stop thrif$bin/hbase-daemon.sh start/stop restTesting(4)$hbase shell create test,data0row(s)in4.3066seco
14、nds listtest1row(s)in0.1485seconds put test,row1,data:1,value10row(s)in0.0454seconds put test,row2,data:2,value20row(s)in0.0035seconds put test,row3,data:3,value30row(s)in0.0090seconds scan testROWCOLUMN+CELLrow1column=data:1,timestamp=1240148026198,value=value1row2column=data:2,timestamp=1240148040
15、035,value=value2row3column=data:3,timestamp=1240148047497,value=value33row(s)in0.0825seconds disable test09/04/1906:40:13INFOclient.HBaseAdmin:Disabledtest0row(s)in6.0426seconds drop test09/04/1906:40:17INFOclient.HBaseAdmin:Deletedtest0row(s)in0.0210seconds list0row(s)in2.0645secondsConnecting to H
16、BaselJavaclientuget(byte row,byte column,long timestamp,int versions);lNon-JavaclientsuThriftserverhostingHBaseclientinstancelSampleruby,c+,&java(viathrift)clientsuRESTserverhostsHBaseclientlTableInput/OutputFormatforMapReduceuHBaseasMRsourceorsinklHBaseShelluJRubyIRBwith“DSL”toaddget,scan,andadminu
17、./bin/hbase shell YOUR_SCRIPTThriftlasoftwareframeworkforscalablecross-languageservicesdevelopment.lByfacebooklseamlesslybetweenC+,Java,Python,PHP,andRuby.lThiswillstarttheserverinstance,bydefaultonport9090lTheothersimilarproject“rest”$hbase-daemon.shstartthrift$hbase-daemon.shstopthriftReferenceslHBase 介紹介紹uhttp:/www.wretch.cc/blog/trendnop09/21192672lHadoop:TheDefinitiveGuideuBook,byTomWhitelHBase Architecture 101uhttp:/