论文中文题名: | 多值关联规则挖掘算法的研究 |
姓名: | |
学号: | 20070362 |
保密级别: | 公开 |
学科代码: | 081203 |
学科名称: | 计算机应用技术 |
学生类型: | 硕士 |
学位年度: | 2010 |
院系: | |
专业: | |
第一导师姓名: | |
论文外文题名: | The Research on the Algorithm of Mining Quantitative Association Rules |
论文中文关键词: | |
论文外文关键词: | Data mining Quantitative association rules Discretization Frequent item se |
论文中文摘要: |
当今世界,数据日益增长,在大量的数据中隐藏着许多重要的信息,如何发现有价值的信息或知识是一项非常艰巨的任务。数据挖掘就是为了满足这种要求而迅速发展起来。数据挖掘就是指从大型数据库或数据仓库中提取隐含的、先前未知的、对决策有潜在价值的知识和规则。在事务数据库中挖掘关联规则是数据挖掘领域中的一个非常重要的研究课题。
关联规则按处理对象的不同可分为布尔型关联规则和多值型关联规则。布尔型关联规则挖掘是在属性值为布尔量的关系表中发现属性值同时为“1”的属性之间的关系。然而在现实中,事务数据库中包含的属性还有多值情况,因此研究如何挖掘多值关联规则具有重要意义。
本文就数据挖掘中的多值关联规则挖掘进行了一些研究,研究内容主要包括:
提出了一种新的多值关联规则挖掘算法MQAR,该算法结合频繁项集挖掘中的FP-tree和高维数据聚类算法CLIQUE,设计了一种树形结构DGFP-tree来存储事务数据库中的信息,并通过搜索树中路径来挖掘存在聚类的低维子空间,从而将多值关联规则挖掘问题转化为创建DGFP-tree、利用该树搜索高密度单元、形成聚类的过程。该算法避免了传统多值关联规则挖掘算法中的“最小支持度”和“最小置信度”问题,且能够挖掘出部分属性之间的关联规则。实验结果表明该算法能有效地挖掘多值关联规则。
针对多值关联规则挖掘过程中属性离散化方法的不足和组合爆炸等问题,提出了一种基于模糊聚类和互信息的多值关联规则挖掘算法FMI-Miner。该算法首先采用模糊C均值聚类算法进行多值属性的离散化,然后根据离散化属性间互信息的大小来挖掘频繁模糊项集以产生关联规则。实验结果表明,算法FMI-Miner有效地减少了挖掘过程中的计算量,提高了算法的性能,并且挖掘出的多值关联规则更容易让人们理解。
﹀
|
论文外文摘要: |
Nowadays, data which contains much important information is increasing quickly. It is arduous task to mine valuable information or knowledge. Data mining is developing quickly to meet this demand. Data mining has been defined as “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data in large database or data warehouse”. Mining association rules in large relational databases is an important research theme in data mining.
Association rules can be classified into boolean association rules and quantitative association rules by the objects which will be handled. Mining boolean association rules is to find associations between attributes whose values are all “1” in relational table where every attribute is boolean. In real life, the attributes in relational databases are normally quantitative. Therefore, it is very significative to study on how to mine quantitative association rules.
In this paper, we have done some researches on the problems of mining quantitative association rules. It is stated as follows:
A novel algorithm MQAR is proposed to mine quantitative association rules. The algrithm combines FP-tree in mining frequent patterns and CLIQUE which is used for clustering, and we design a new data structure named DGFP-tree to save the information of the database and subspaces which have clusters. Then mining quantitative association rules is transformed into the problem of constructing DGFP-tree and clustering by searching dense units in the DGFP-tree. The algorithm not only can avoid the conflict between minimum support problem and minimum confidence problem, but also can mine important associations which may be missed by previous algorithms. Experimental results show that MQAR can efficiently find quantitative association rules
Against the problems of discretizations and combinations of quantitative attributes, we propose a novel algorithm FMI-Miner which is based on fuzzy clustering and mutual information to mine quantitative association rules. The algorithm firstly partitions attributes by fuzzy C-mean clustering, then mines frequent fuzzy itemsets by mutual information entropy to form association rules. The experimental result shows that the mining efficiency of FMI-Miner has been improved and the association rules mined are more understandable.
﹀
|
中图分类号: | TP311.13 |
开放日期: | 2011-04-02 |