- 无标题文档
查看论文信息

论文中文题名:

 数据流的频繁模式挖掘算法研究    

姓名:

 黄威    

学号:

 20070346    

保密级别:

 公开    

学科代码:

 081203    

学科名称:

 计算机应用技术    

学生类型:

 硕士    

学位年度:

 2010    

院系:

 计算机科学与技术学院    

专业:

 计算机应用技术    

第一导师姓名:

 君锐    

论文外文题名:

 The Research on the Algorithm of Mining Frequent Patterns over Data Streams    

论文中文关键词:

 数据挖掘 ; 数据流 ; 频繁模式 ; 频繁模式树 ; 界标窗口 ; 滑动窗口    

论文外文关键词:

 Data Mining ; Data Streams ; Frequent Pattern ; Frequent Pattern Tree    

论文中文摘要:
随着信息技术的发展,海量数据库迅速增加,对其有效的分析处理技术的缺乏逐渐显现。在此需求的推动下,数据库中知识发现(Knowledge Discovery in Databases,KDD)技术应运而生。而数据挖掘(Data Mining,DM)是KDD中的重要过程,在该过程中系统采用智能算法从数据中提取有益的数据模式。其中频繁模式(Frequent Pattern)挖掘是DM中重要的研究问题。近年来,大量数据以数据流(Data Streams)的形式产生,如网络数据、交易数据等。区别于传统的静态数据,数据流具有连续性、无序性、无界性及实时性的特点,这对挖掘数据流中的知识带来了新的研究挑战。挖掘数据流中的频繁模式已成为当前数据挖掘领域的一个研究热点。 本文主要针对数据流挖掘中的重要问题之一 —数据流频繁模式挖掘展开研究,主要内容如下: 首先,对数据流挖掘技术及其特点进行了介绍,然后对数据流频繁模式挖掘的基本概念及其关键问题进行了介绍,最后对数据流频繁模式挖掘的几个典型算法进行了研究。 其次,提出了基于界标窗口的数据流频繁模式挖掘算法—Prefix-stream算法,该算法利用提出的数据结构P-tree同时对整个数据流的频繁模式进行挖掘、保存和更新。此外,该方法还应用对数倾斜时间窗口达到逐步降低历史事务的权重,从而区分最近事务与历史事务。实验结果表明,该算法的性能优于同类FP-stream算法。 最后,提出了基于滑动窗口的数据流频繁模式挖掘算法—PSW算法。该算法通过将滑动窗口分割为若干个基本窗口,以基本窗口为更新单位,使用提出的前缀滑动窗口树PSW-tree来挖掘基本窗口的频繁模式。挖掘时,将频繁模式存储到同一PSW-tree中,同时删除PSW-tree上过期的及不频繁的模式分支。因此,挖掘和更新滑动窗口中的所有频繁模式是在PSW-tree中同时进行。实验结果表明,算法具有较好的性能。
论文外文摘要:
With the development of information technology, massive database increased rapidly and the lack of analyse process technology is gradually appearing. This demand provides a great boost to the emergence of the Knowledge Discovery in Databases, or KDD. Data mining is an important process of KDD. In the process, intelligent algorithms are used to discover interesting patterns from large amounts of data. Frequent pattern mining is a very important problem in data mining. Recently, large amounts of data are accumulated in the form of data streams, such as web data and transaction data. Unlike traditional static databases, the features of data streams, such as consecution, disorder and real-time pose many new challenges for mining data streams, and mining frequent patterns over data streams has become current research difficulty and hotspot. The paper mainly study one of data stream mining problem—mining frequent patterns over data streams, The detail research achievements are listed as follows: Firstly, Introduces data streams mining technology and their characteristics. then introduces the basic conceptions and key problems, At last, study several typical algorithm of mining frequent patterns in data streams. Secondly, a new Prefix-stream algorithm based on landmark window for mining frequent patterns over data streams is proposed. A new data structure P-tree that is given in the paper is used for mining, maintaining and updating frequent patterns over data stream at the same time. The algorithm can also differentiate the patterns of recently generating transactions from those of historic transactions with a logarithmic tilted-time window. The experimental results show that the proposed algorithm outperforms the previous FP-stream algorithm. Finally, a new PSW algorithm based on sliding window for mining frequent patterns in data stream is proposed. A sliding window is divided into several basic windows and the basic window is served as an updating unit. A compact PSW-tree is used to mine frequent patterns in the basic window and maintaining all the frequent patterns. The obsolete and infrequent items are deleted. The experimental results indicate that PSW algorithm performs efficiently.
中图分类号:

 TP311.131    

开放日期:

 2011-04-02    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式