论文中文题名: | 基于关联规则的Web日志挖掘技术研究 |
姓名: | |
学号: | 06319 |
保密级别: | 公开 |
学科代码: | 081203 |
学科名称: | 计算机应用技术 |
学生类型: | 硕士 |
学位年度: | 2009 |
院系: | |
专业: | |
第一导师姓名: | |
论文外文题名: | Research on Web Log Mining Based on Association Rule |
论文中文关键词: | |
论文外文关键词: | Web log mining Association rule Data preprocessing Apriori algorithm Frequent |
论文中文摘要: |
随着个人计算机的普及和Internet技术的迅速发展,越来越多的人开始从网上寻找、搜集所需要的资源,来满足各种需要。Web服务器以日志的形式记录了用户的这些行为,同时随着人们网上活动及交易的日益增多,以及大容量存储设备的出现与运用,Web服务器上的日志记录也越来越多,这使得深入研究用户浏览站点的行为规律以及分析Web站点的性能,改善网站拓扑结构和页面间的超链接结构,为用户提供更好的服务成为可能,由此产生了Web日志挖掘。
本文以西安科技大学50周年校庆日志记录为基础,主要从以下几个方面对Web日志挖掘进行系统的分析与研究。首先对数据挖掘、Web数据挖掘、Web日志挖掘进行了概述,详细介绍了日志记录的内容与格式,并给出了Web日志挖掘的流程;其次对Web日志挖掘中数据预处理技术进行了分析与研究,详细分析了传统的数据预处理阶段中的各项任务,并在此基础上对预处理的步骤进行了简化,这个简化算法从会话识别直接到事务识别,而不经过路径补充;接着介绍了关联规则的一些基本概念,然后重点讲述了基于关联规则的挖掘频繁模式的经典算法——Apriori算法,通过实例分析了Apriori算法求频繁项集的具体过程,并根据网站的拓扑结构提出了一个改进算法,然后结合实例说明了改进的Apriori算法是有效且可行的;最后介绍了用频繁项集求关联规则的方法,设计并实现了一个简单的数据挖掘原型系统,通过日志数据得到了关联规则,并采用实际网页截图的形式对关联规则进行了分析,结果表明通过这样的挖掘分析,有利于了解用户的浏览习惯与改善站点的设计。
﹀
|
论文外文摘要: |
With the popularization of personal computers and the rapid development of Internet technology, more and more people start looking for and gathering resource from the Internet to meet kinds of the requirements.Web server records these behaviors by the form of log. Simultaneously, along with the development of on-line activities and transactions, as well as the advent and application of mass storage device, more and more records of the log are written in the server of the Web. The in-depth study on behavior of browsing web site, the analysis of Web site performance, and the improvement of structure between web topology and page hyperlinks make the better service become possible. Consequently, the technology of Web Log Mining emerges.
Based on the Web log of Xi’an University of Science and Technology 50th Anniversary celebration, this paper mainly analyzes researches Web Log Mining from following aspects. Firstly, it introduces the knowledge of Data Mining, Web Mining, and Web Log Mining. Then the content and format of the Web log are researched in detail, and the process of Web Log Mining is given. Secondly, the paper studies data preprocessing technology in Web Log Mining, and analyzes all tasks in every phase of traditional data preprocessing technology detailedly. After that, the paper proposes an algorithm which is based on traditional data preprocessing to simplify the steps in data preprocessing. The algorithm can identify user transaction from user session directly rather than through completing user path.Thirdly, this paper introduces the concepts of association rules, then introduces a classic alogirthm of frequent pattern based on association rules, which is the Apriori Algorithm. Afterwards, the algorithm obtains the frequent itemsets through specific examples. Then the paper proposes an improved algorithm based on Web Topology Structure, The improved Apriori algorithm is proved through specific examples effectively. Finally, the paper introduces a method on how to get association rules through frequent itemsets. And the paper designs and implements a simple prototype system of data mining based on the foregoing chapters. At the same time, the association rules are gotten through the Web logs. At last, the paper analyses the association rules through screenshot of web page. The results show that the mining based on association rules can find out users’ browsing habits and improve design of web site.
﹀
|
中图分类号: | TP311.13 |
开放日期: | 2010-04-02 |