数据挖掘复习题和答案(4页).doc

上传人:1595****071 文档编号:36713716 上传时间:2022-08-28 格式:DOC 页数:4 大小:193.50KB
返回 下载 相关 举报
数据挖掘复习题和答案(4页).doc_第1页
第1页 / 共4页
数据挖掘复习题和答案(4页).doc_第2页
第2页 / 共4页
点击查看更多>>
资源描述

《数据挖掘复习题和答案(4页).doc》由会员分享,可在线阅读,更多相关《数据挖掘复习题和答案(4页).doc(4页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、-一、二、三、四、 数据挖掘复习题和答案-第 4 页五、 考虑表中二元分类问题的训练样本集1. 整个训练样本集关于类属性的熵是多少?2. 关于这些训练集中a1,a2的信息增益是多少?3. 对于连续属性a3,计算所有可能的划分的信息增益。4. 根据信息增益,a1,a2,a3哪个是最佳划分?5. 根据分类错误率,a1,a2哪具最佳?6. 根据gini指标,a1,a2哪个最佳?答1.P(+) = 4/9 and P() = 5/94/9 log2(4/9) 5/9 log2(5/9) = 0.9911.答2:(估计不考)答3:答4: According to information gain, a1

2、 produces the best split.答5:For attribute a1: error rate = 2/9.For attribute a2: error rate = 4/9.Therefore, according to error rate, a1 produces the best split.答6:六、 考虑如下二元分类问题的数据集1. 计算信息增益,决策树归纳算法会选用哪个属性2. 计算 gini指标,决策树归纳会用哪个属性?这个答案没问题3. 从图4-13可以看出熵和gini指标在0,0.5都是单调递增,而0.5,1之间单调递减。有没有可能信息增益和gini指标

3、增益支持不同的属性?解释你的理由Yes, even though these measures have similar range and monotonousbehavior, their respective gains, , which are scaled differences of themeasures, do not necessarily behave in the same way, as illustrated bythe results in parts (a) and (b).贝叶斯分类1. P(A = 1|) = 2/5 = 0.4, P(B = 1|) = 2/

4、5 = 0.4,P(C = 1|) = 1, P(A = 0|) = 3/5 = 0.6,P(B = 0|) = 3/5 = 0.6, P(C = 0|) = 0; P(A = 1|+) = 3/5 = 0.6,P(B = 1|+) = 1/5 = 0.2, P(C = 1|+) = 2/5 = 0.4,P(A = 0|+) = 2/5 = 0.4, P(B = 0|+) = 4/5 = 0.8,P(C = 0|+) = 3/5 = 0.6.2. P(A = 0|+) = (2 + 2)/(5 + 4) = 4/9,P(A = 0|) = (3+2)/(5 + 4) = 5/9,P(B = 1

5、|+) = (1 + 2)/(5 + 4) = 3/9,P(B = 1|) = (2+2)/(5 + 4) = 4/9,P(C = 0|+) = (3 + 2)/(5 + 4) = 5/9,P(C = 0|) = (0+2)/(5 + 4) = 2/9.3. Let P(A = 0,B = 1, C = 0) = K4. 当的条件概率之一是零,则估计为使用m-估计概率的方法的条件概率是更好的,因为我们不希望整个表达式变为零。1. P(A = 1|+) = 0.6, P(B = 1|+) = 0.4, P(C = 1|+) = 0.8, P(A =1|) = 0.4, P(B = 1|) = 0

6、.4, and P(C = 1|) = 0.22.Let R : (A = 1,B = 1, C = 1) be the test record. To determine itsclass, we need to compute P(+|R) and P(|R). Using Bayes theorem, P(+|R) = P(R|+)P(+)/P(R) and P(|R) = P(R|)P()/P(R).Since P(+) = P() = 0.5 and P(R) is constant, R can be classified bycomparing P(+|R) and P(|R).

7、For this question,P(R|+) = P(A = 1|+) P(B = 1|+) P(C = 1|+) = 0.192P(R|) = P(A = 1|) P(B = 1|) P(C = 1|) = 0.032Since P(R|+) is larger, the record is assigned to (+) class.3.P(A = 1) = 0.5, P(B = 1) = 0.4 and P(A = 1,B = 1) = P(A) P(B) = 0.2. Therefore, A and B are independent.4.P(A = 1) = 0.5, P(B

8、= 0) = 0.6, and P(A = 1,B = 0) = P(A =1) P(B = 0) = 0.3. A and B are still independent.5.Compare P(A = 1,B = 1|+) = 0.2 against P(A = 1|+) = 0.6 andP(B = 1|Class = +) = 0.4. Since the product between P(A = 1|+)and P(A = 1|) are not the same as P(A = 1,B = 1|+), A and B arenot conditionally independe

9、nt given the class.七、 使用下表中的相似度矩阵进行单链和全链层次聚类。绘制树状况显示结果,树状图应该清楚地显示合并的次序。 There are no apparent relationships between s1, s2, c1, and c2.A2: Percentage of frequent itemsets = 16/32 = 50.0% (including the nullset).A4:False alarm rate is the ratio of I to the total number of itemsets. Sincethe count of I = 5, therefore the false alarm rate is 5/32 = 15.6%.

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育专区 > 单元课程

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号© 2020-2023 www.taowenge.com 淘文阁