《华远地产股份有限公司6007432011.ppt》由会员分享,可在线阅读,更多相关《华远地产股份有限公司6007432011.ppt(28页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Learning with Bayesian NetworksDavid HeckermanPresented by Colin Rickert太原房产网 http:/2021/9/181Introduction to Bayesian NetworksBayesian networks represent an advanced form of general Bayesian probabilityA Bayesian network is a graphical model that encodes probabilistic relationships among variables
2、of interest1The model has several advantages for data analysis over rule based decision trees12021/9/182Outline1.Bayesian vs.classical probability methods2.Advantages of Bayesian techniques3.The coin toss prediction model from a Bayesian perspective4.Constructing a Bayesian network with prior knowle
3、dge5.Optimizing a Bayesian network with observed knowledge(data)6.Exam questions2021/9/183Bayesian vs.the Classical ApproachThe Bayesian probability of an event x,represents the persons degree of belief or confidence in that events occurrence based on prior and observed facts.Classical probability r
4、efers to the true or actual probability of the event and is not concerned with observed behavior.2021/9/184Bayesian vs.the Classical ApproachBayesian approach restricts its prediction to the next(N+1)occurrence of an event given the observed previous(N)events.Classical approach is to predict likelih
5、ood of any given event regardless of the number of occurrences.2021/9/185ExampleImagine a coin with irregular surfaces such that the probability of landing heads or tails is not equal.Classical approach would be to analyze the surfaces to create a physical model of how the coin is likely to land on
6、any given throw.Bayesian approach simply restricts attention to predicting the next toss based on previous tosses.2021/9/186Advantages of Bayesian TechniquesHow do Bayesian techniques compare to other learning models?1.Bayesian networks can readily handle incomplete data sets.2.Bayesian networks all
7、ow one to learn about causal relationships3.Bayesian networks readily facilitate use of prior knowledge4.Bayesian methods provide an efficient method for preventing the over fitting of data(there is no need for pre-processing).2021/9/187Handling of Incomplete DataImagine a data sample where two attr
8、ibute values are strongly anti-correlatedWith decision trees both values must be present to avoid confusing the learning modelBayesian networks need only one of the values to be present and can infer the absence of the other:lImagine two variables,one for gun-owner and the other for peace activist.l
9、Data should indicate that you do not need to check both values2021/9/188Learning about Causal RelationshipsWe can use observed knowledge to determine the validity of the acyclic graph that represents the Bayesian network.For instance is running a cause of knee damage?lPrior knowledge may indicate th
10、at this is the case.lObserved knowledge may strengthen or weaken this argument.2021/9/189Use of Prior Knowledge and Observed BehaviorConstruction of prior knowledge is relatively straightforward by constructing“causal”edges between any two factors that are believed to be correlated.Causal networks r
11、epresent prior knowledge where as the weight of the directed edges can be updated in a posterior manner based on new data2021/9/1810Avoidance of Over Fitting DataContradictions do not need to be removed from the data.Data can be“smoothed”such that all available data can be used2021/9/1811The“Irregul
12、ar”Coin Toss from a Bayesian PerspectiveStart with the set of probabilities =1,n for our hypothesis.For coin toss we have only one representing our belief that we will toss a“heads”,1-for tails.Predict the outcome of the next(N+1)flip based on the previous N flips:l for 1,NlD=X1=x1,Xn=xnlWant to kno
13、w probability that Xn+1=xn+1=heads represents information we have observed thus far(i.e.=D2021/9/1812Bayesian ProbabilitiesPosterior Probability,p(|D,):Probability of a particular value of given that D has been observed(our final value of).In this case =D.Prior Probability,p(|):Prior Probability of
14、a particular value of given no observed data(our previous“belief”)Observed Probability or“Likelihood”,p(D|,):Likelihood of sequence of coin tosses D being observed given that is a particular value.In this case =.p(D|):Raw probability of D2021/9/1813Bayesian Formulas for Weighted Coin Toss(Irregular
15、Coin)where*Only need to calculate p(|D,)and p(|),the rest can be derived 2021/9/1814IntegrationTo find the probability that Xn+1=heads,we must integrate over all possible values of to find the average value of which yields:2021/9/1815Expansion of Terms 1.Expand observed probability p(|D,):2.Expand p
16、rior probability p(|):*“Beta”function yields a bell curve upon integration which is a typical probability distribution.Can be viewed as our expectation of the shape of the curve.2021/9/1816Beta Function and IntegrationIntegrating gives the desired result:Combine product of both functions to yield202
17、1/9/1817Key PointsMultiply the results of the beta function(prior probability)with results of the coin toss function for (observed probability).Result is our confidence for this value of.Integrating the product of the two with respect to over all values of 0BlFor all nodes that do not have a causal
18、link we can check for conditional independence between those nodes2021/9/1821ExampleUsing the above graph of expected causes,we can check for conditional independence of the following probabilities given initial sample datap(a|f)=p(a)p(s|f,a)=p(s)p(g|f,a,s)=p(g|f)p(j|f,a,s,g)=p(j|f,a,s)2021/9/1822Co
19、nstruction of“Posterior”knowledge based on observed data:lFor every node i,we construct the vector of probabilities ij=ij1,ijn where ij is represented as row entry in a table of all possible combinations j of the parent nodes 1,n lThe entries in this table are the weights that represent the degree o
20、f confidence that nodes 1,n influence node i(though we dont know these values yet)2021/9/1823Determining Table Values for i How do we determine the values for ij?Perform multivariate integration to find the average ij for all i and j in a similar manner to the coin toss integration:lCount all instan
21、ces“m”that satisfy a configuration ijk then observed probability for ijk becomes ijkm(1-ijk)n-mlIntegrate over all vectors ijk to find the average value of each ijk 2021/9/1824Question1:What is Bayesian Probability?A persons degree of belief in a certain eventi.e.Your own degree of certainty that a
22、tossed coin will land“heads”2021/9/1825Question 2:What are the advantages and disadvantages of the Bayesian and classical approaches to probability?Bayesian Probability:l+Reflects an experts knowledgel+Compiles with rules of probabilityl-ArbitraryClassical Probability:l+Objective and unbiasedl-Gener
23、ally not available2021/9/1826Question 3:Mention at least 3 Advantages of Bayesian analysisHandle incomplete data setsLearning about causal relationshipsCombine domain knowledge and dataAvoid over fitting2021/9/1827ConclusionBayesian networks can be used to express expert knowledge about a problem domain even when a precise model does not exist2021/9/1828