《数据挖掘技术综述-毕业论文外文翻译.doc》由会员分享,可在线阅读,更多相关《数据挖掘技术综述-毕业论文外文翻译.doc(15页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Summary of Data Mining Technology Abstract: With the development of computer and network technology, it is very easy to obtain relevant information. But for the large number of large-scale data, the traditional statistical methods can not complete the analysis of such data. Therefore, an intelligent
2、, comprehensive application of a variety of statistical analysis, database, intelligent language to analyze large data data data mining (Date Mining) technology came into being. This paper mainly introduces the basic concept of data mining and the method of data mining. The application of data minin
3、g and its development prospect are also described in this paper.Keywords: data mining; method; application; foreground1 IntroductionWith the rapid development of information technology, the scale of the database has been expanding, resulting in a lot of data. The surge of data is hidden behind a lot
4、 of important information, people want to be able to conduct a higher level of analysis in order to make better use of these data. In order to provide decision makers with a unified global perspective, data warehouses are established in many areas. But a lot of data often makes it impossible to iden
5、tify hidden in which can provide support for decision-making information, and the traditional query, reporting tools can not meet the needs of mining this information. Therefore, the need for a new data analysis technology to deal with large amounts of data, and from the extraction of valuable poten
6、tial knowledge, data mining (Data Mining) technology came into being. Data mining technology is also accompanied by the development of data warehouse technology and gradually improved.2 Data Mining Technology2.1 Definition of data miningData mining refers to the non-trivial process of automatically
7、extracting useful information hidden in the data from the data set. The information is represented by rules, concepts, rules and patterns. It helps decision makers analyze historical data and current data and discover hidden relationships and patterns to predict future behaviors that may occur. The
8、process of data mining is also called the process of knowledge discovery. It is a kind of interdisciplinary and interdisciplinary subject, which involves the fields of database, artificial intelligence, mathematical statistics, visualization and parallel computing. Data mining is a new information p
9、rocessing technology, its main feature is the database of large amounts of data extraction, conversion, analysis and other model processing, and extract the auxiliary decision-making key data. Data mining is an important technology in KDD (Knowledge Discovery in Database). It does not use the standa
10、rd database query language (such as SQL) to query, but the content of the query to summarize the pattern and the inherent law of the search. Traditional query and report processing are only the result of the incident, and there is no in-depth study of the reasons for the occurrence of data mining is
11、 the main understanding of the causes of occurrence, and with a certain degree of confidence in the future forecast for the decision-making behavior to provide favorable stand by.2.2 Methods of data miningData mining research combines a number of different disciplines in the field of technology and
12、results, making the current data mining methods show a variety of forms. From the perspective of statistical analysis, the data mining models used in statistical analysis techniques are linear and non-linear analysis, regression analysis, logistic regression analysis, univariate analysis, multivaria
13、te analysis, time series analysis, recent sequence analysis, and recent Oracle algorithm and clustering analysis and other methods. Using these techniques, you can examine the data in those unusual forms, and then interpret the data using various statistical models and mathematical models to explain
14、 the market rules and business opportunities that are hidden behind those data. Knowledge discovery class Data mining technology is a kind of mining technology which is completely different from the statistical analysis class data mining technology, including artificial neural network, support vecto
15、r machine, decision tree, genetic algorithm, rough set, rule discovery and association order.2.2.1 Statistical methodsTraditional statistics provide a number of discriminant and regression analysis methods for data mining. Commonly used techniques such as Bayesian reasoning, regression analysis, and
16、 variance analysis. Bayesian reasoning is the basic principle of correcting the probability distribution of data sets after knowing new information Tools, to deal with the classification of data mining problems, regression analysis used to find an input variable and the relationship between the outp
17、ut variables of the best model, in the regression analysis used to describe a variable trends and other variables of the relationship between the linear regression, There is also a logarithmic regression for predicting the occurrence of certain events. The variance analysis in the statistical method
18、 is generally used to analyze the effects of estimating the regression lines performance and the independent variables on the final regression, which is the result of many mining applications One of the powerful tools.2.2.2 Association rulesThe association rule is a simple and practical analysis rul
19、e, which describes the law and pattern of some attributes in one thing at the same time, which is one of the most mature and important technologies in data mining. It is made by R. Agrawal et al. First proposed that the most classical association rule mining algorithm is Apriori, which first digs ou
20、t all frequent itemsets, and then generates association rules from frequent itemsets. Many mining rules of frequent rule sets are It evolved from the evolution of the rules in the field of data mining is widely used in large data sets to find a meaningful relationship between the data, one of the re
21、asons is that it is not only a choice of a dependent variable, the association rules in the data The most typical application of the mining area is the shopping basket analysis. Most association rule mining algorithms can discover all the associated relationships hidden in the mining data, and the a
22、mount of association rules is often very large. However, not all the relationships between the attributes obtained through the association are practical. Value, the effective evaluation of these association rules, screening out the user is really interested, meaningful association rules is particula
23、rly important.2.2.3 Clustering analysisCluster analysis is based on the criteria associated with the selected samples to be divided into several groups, the same group of samples with high similarity, different groups are different, commonly used techniques have split algorithm, cohesion algorithm,
24、Clustering and incremental clustering. The clustering method is suitable for the internal relationship between the samples, so as to make a reasonable evaluation of the sample structure. In addition, the cluster analysis is also used to detect the isolated points. Sometimes clustering is not intende
25、d to get objects together but to make it easier for an object to be separated from other objects. Cluster analysis has been applied to a variety of areas such as economic analysis, pattern recognition, image processing, and especially in business. Clustering analysis can help marketers discover diff
26、erent groups of characteristics that exist in customer groups. The key to clustering analysis In addition to the choice of algorithms, it is the choice of metrics for the sample. The classes that are not derived from the clustering algorithm are effective for decision making. Before applying an algo
27、rithm, the clustering trend of the data is usually checked first.2.2.4 Decision tree methodDecision tree learning is a method of approximating discrete objective functions by classifying instances from a root node to a leaf node to classify an instance. The leaf node is the classification of the ins
28、tance. Each node on the tree illustrates a test of an attribute of the instance, and each subsequent branch of the node corresponds to a possible value of the attribute. The method of sorting the instance is from the root node of the tree, Test the properties specified by this node, and then move do
29、wn the corresponding branch of the attribute value for the given instance. Decision tree method is to be applied to the classification of data mining.2.2.5 neural networkThe neural network is based on the mathematical model of self-learning, which can analyze a large number of complex data and can c
30、omplete the extremely complex pattern extraction and trend analysis for human brain or other computer. The neural network can be expressed as guidance The learning can also be a non-guided cluster, whichever is the value entered into the neural network. Artificial neural network is used to simulate
31、the structure of human brain neurons. Based on MP model and Hebb learning rules, three kinds of neural networks are established, which have non-linear mapping characteristics, information storage, parallel processing and global collective action, High degree of self-learning, self-organizing and ada
32、ptive ability. The feedforward neural network is represented by the sensor network and BP network, which can be used for classification and prediction. The feedback network is represented by Hopfield network for associative memory and optimization. The self-organizing network is based on ART model,
33、Kohonon The model is represented for clustering.2.2.6 support vector machineSupport vector machine (SVM) is a new machine learning method developed on the basis of statistical learning theory. It is based on the principle of structural risk minimization, as far as possible to improve the learning ma
34、chine generalization ability, has good promotion performance and good classification accuracy, can effectively solve the learning problem, has become a training multi-layer sensor, RBF An Alternative Method for Neural Networks and Polynomial Neural Networks. In addition, the support vector machine a
35、lgorithm is a convex optimization problem, the local optimal solution must be the global optimal solution, these features are including the neural network, including other algorithms can not and. Support vector machine can be applied to the classification of data mining, regression, the exploration
36、of unknown things and so on. In addition to the above methods, there are ways to convert data and results into visualization techniques, cloud model methods, and inductive logic programs.In fact, any kind of excavation tool is often based on specific issues to select the appropriate mining method, i
37、t is difficult to say which method is good, that method is inferior, but depending on the specific problems.2.3 data mining processFor data mining, we can be divided into three main stages: data preparation, data mining, evaluation and expression of results. The results of the evaluation and express
38、ion can also be broken down into: assessment, interpretation model model, consolidation, the use of knowledge. Knowledge discovery in the database is a multi-step process, but also the three stages of the repeated process,2.3.1 Data PreparationKDD processing object is a lot of data, these data are g
39、enerally stored in the database system, the long-term accumulation of the results. But often not suitable for direct knowledge mining on these data, need to do data preparation, generally including the choice of data (select the relevant data), clean (eliminate noise, data), speculate (estimate miss
40、ing data), conversion (discrete Data conversion between data and continuous value data, packet classification of data values, calculation combinations between data items, etc.), data reduction (reduction of data volume). These jobs are often prepared when the data warehouse is generated. Data prepar
41、ation is the first step in KDD. Whether data preparation is good will affect the efficiency and accuracy of data mining and the effectiveness of the final model.2.3.2 Data miningData mining is the most critical step KDD, but also technical difficulties. Most of the research KDD personnel are studyin
42、g data mining technology, using more technology to have decision tree, classification, clustering, rough set, association rules, neural network, genetic algorithm and so on. Data mining According to the goal of KDD, select the parameters of the corresponding algorithm, analyze the data, and get the
43、model model of the possible model layer knowledge.2.3.3 Results evaluation and expressionEvaluation model: the model model obtained above, there may be no practical significance or no use value, it may not be able to accurately reflect the true meaning of the data, even in some cases is contrary to
44、the facts, so need Evaluate, determine which are valid and useful patterns. Evaluation can be based on years of experience, some models can also be used directly to test the accuracy of the data. This step also includes presenting the pattern to the user in an easy-to-understand manner.Consolidate k
45、nowledge: the user understands and is considered to be consistent with the actual and valuable model of the model that forms the knowledge. But also pay attention to the consistency of knowledge to check, with the knowledge obtained before the conflict, contradictory embankment, so that knowledge is
46、 consolidated.The use of knowledge: to find knowledge is to use, how to make knowledge can be used is one of the steps of KDD. There are two ways to use knowledge: one is to rely on the relationship or result described by the knowledge itself to support decision-making; the other is to require the u
47、se of new data knowledge, which may produce new problems, and Need to further optimize the knowledge. The process of KDD may need to be repeated multiple times. Once each step does not match the expected target, go back to the previous step, re-adjust, and re-execute.3 data mining applicationsThe po
48、tential application of data mining is very broad: government management decision-making, business management, scientific research and industrial enterprise decision support and other fields.3.1 Applied in scientific researchFrom the point of view of scientific research methodology, scientific resear
49、ch can be divided into three categories: theoretical science, experimental science and computational science. Computational science is an important symbol of modern science. Computing scientists work with data and analyze a wide variety of experimental or observational data every day. With the use of advanced scientific data collection tools, such as observing satellites, remote sensors, DNA molecular technology, the amount of data is very large, the traditional data analysis tools can not do anything, so there must