论文中文题名: | 基于约束的最大频繁项目集挖掘算法与实现 |
姓名: | |
学号: | 05257 |
保密级别: | 公开 |
学科代码: | 081203 |
学科名称: | 计算机应用技术 |
学生类型: | 硕士 |
院系: | |
专业: | |
研究方向: | 数据挖掘 |
第一导师姓名: | |
论文外文题名: | Research and Implementation of an Algorithms for Mining Constrained Maximum Frequent Itemsets |
论文中文关键词: | |
论文外文关键词: | Data mining Association rule Maximum frequent itemsets Incremental updatin |
论文中文摘要: |
数据库技术的逐渐成熟及网络技术的迅速普及,使人们采集数据的能力得到了极大的提高,从而导致了全球范围内数据存储量的急剧增大。“数据爆炸与知识贫乏”是信息时代所面临的一个严峻的问题,而数据挖掘就是解决该问题的有效手段之一。数据挖掘是从大规模的数据量中获取有用信息,发现隐含的、先前未知的、对决策有潜在价值的知识,因此对数据挖掘技术的研究有着重要的意义。本文以数据挖掘中一个重要的领域关联规则挖掘为研究课题,并对关联规则的挖掘方法进行了研究和分析。
本文的研究工作主要包括以下两个方面:
一方面,提出了基于数据库变化的关联规则增量式更新算法。发现频繁项目集是关联规则数据挖掘中的关键问题,频繁项目集是在给定的数据库里,在满足最小支持度和最小置信度下的一个项目集合。但随着数据库发生变化,就会产生不同的频繁项目集,如何发现在数据库变化情况下,利用已挖掘的频繁项目集来实现更新挖掘就是需研究的问题。
另一方面,提出了基于约束的最大频繁项集挖掘算法,该算法是将约束条件应用到挖掘算法中,减少候选项目集的个数,提高算法的执行效率。对算法进行了详细的分析。实验结果表明,该算法具有较好的可操作性,并在一定程度上解决了产生许多无关或无价值的关联规则。
﹀
|
论文外文摘要: |
With growth of database technology and popularity of network technology,a lot of collecting data was increased rapidly.The capacity of storing data was enlarged hugely around the world.Data explosion and knowledge scarcity is an urgent phenomenon in information society.Data mining is one of effective method to tackle the problem, which is a process of extracting useful information and identifying valid, novel, potentially useful, and ultimately understandable patterns in data from large volumes of raw data.Therefore, study on data mining technique is of important practical meaning. Based on data mining of association rules for the research field, association rule data mining algorithms are detailed analysis and research.
The work of author mainly focuses on two aspects in the following:
On one hand,an incremental updating algorithm for mining association rules based on the change of database is proposed. Discovering frequent itemsets is a key problem in data mining association rules.The frequent itemsets is a set of all items that satisfies a minimum support and a minimum confidence in a given transactional database. With the addition and subtraction of the database, different frequent itemsets will be produced.Under the variance of database, how to utilize mined useful information and realize maintenance of frequent itemsets , this question is needed researching;
On the other hand,constrained maximal frequent itemsets mining algorithm is proposed. Constraint condition is applied to the mining algorithm ,which reduces the number of candidate itemsets and increases the efficiency of algorithm. The experimental result shows that this algorithm has effective and operational. To a certain extent, a lot of irrelative and worthless association rules is reduced.
﹀
|
中图分类号: | TP182 |
开放日期: | 2009-05-13 |