论文中文题名: | 多值属性关联规则的研究与实现 |
姓名: | |
学号: | 05236 |
保密级别: | 公开 |
学科代码: | 081203 |
学科名称: | 计算机应用技术 |
学生类型: | 硕士 |
院系: | |
专业: | |
研究方向: | 数据挖掘 |
第一导师姓名: | |
论文外文题名: | The Research and Implementation of Quantitative Association Rules |
论文中文关键词: | |
论文外文关键词: | Data mining Association Rule Quantitative Frequent set Cluster |
论文中文摘要: |
在信息爆炸的时代,面对“人们被数据淹没,同时却仍然感到知识饥饿”的挑战,数据挖掘技术应运而生,并得以蓬勃发展。关联规则挖掘是一个重要的研究领域。目前对其的研究主要是集中在以支持-信任理论为基础对布尔型数据进行挖掘,并且已取得了一些研究成果,要从这些数据中挖掘潜在的规则,现有的布尔型关联规则方法就显得力不从心了。如何划分区段是实现多值属性关联规则问题到布尔型关联规则问题转变的关键。挖掘多值属性关联规则的关键步骤是把数值型属性所在的域分成多个区间。
在划分区段方面,现有的方法多是把数值属性所在的域划分成等宽的或等深的区间,或者在一个(或一组)属性上使用聚类算法。虽然这些算法能很好的解决多值型的数据挖掘问题,但是不能避免最小支持度和最小可信度冲突的问题,而且有可能错过一些重要的规则。本文所提方法是,把一个交易作为一个n维向量,并且在多个n维向量上对所有属性使用迭代自组织的数据分析算法(ISODATA)进行聚类。由于ISODATA的试探特性,并且可以结合成人机交互的结构,使其能利用中间结果所取得的经验更好地进行分类。把聚类投影到数值型属性所在的区间形成可能重叠的区间,最后使用布尔型关联规则挖掘算法来挖掘关联规则。该算法既考虑了交易之间的距离,又考虑了属性之间的关系,而且能避免最小支持度和最小可信度之间的冲突。实验结果显示,该方法能有效地挖掘多值属性关联规则,而且能够发现可能被以前的算法错过的重要的规则。
﹀
|
论文外文摘要: |
In the era of information explosion, faced to the challenges that “people were drowned data, while still feel knowledge hunger”, data mining techniques have emerged and flourish. Data mining of association rules has become an important research area. The current research are mostly based on the support-confidence theory of boolean data mining, and made some research achievements, but the existed methods of boolean association rules from these data mining potential rules are insufficient. The critical of quantitative association rules problem to boolean association rules problem changing is how to divide the sections. The critical part of quantitative association rule mining is to partition the domains of quantitative attributes into intervals.
The existed methods in divide the sections dealt with this problem by dividing the domains of quantitative attributes into equi-depth or equi-width intervals, or using a clustering algorithm on a single attribute (or a set of attributes) alone. Although these algorithms can be satisfactorily resolved quantitative data mining, but can not avoid the conflict between the minimum support and the minimum confidence problem, and risk missing some important rules. In this paper, the proposed method is the fact that a transaction as a n-dimensional vector and apply a iterative self-organizing data techniques algorithm(ISODATA) to all attributes clustered. Because explore of ISODATA, and can be combined human-computer Interaction and used the intermediate results of the experience gained to classify better. Clustering algorithm to the vectors, then project the clusters into the domains of the quantitative attributes to form overlapped intervals. Finally use a classical boolean algorithm to find association rules. This approach takes the relations and the distances among attributes into account, and can resolve the conflict between the minimum support problem and the minimum confidence problem by allowing intervals to be overlapped. Experimental results show that this approach can efficiently find quantitative association rules, and can find important association rules which may be missed by the previous algorithm.
﹀
|
中图分类号: | TP311.13 |
开放日期: | 2009-05-12 |