- 无标题文档
查看论文信息

论文中文题名:

 垂直搜索引擎在校园网中的研究与应用    

姓名:

 姜美英    

学号:

 200907323    

保密级别:

 公开    

学科代码:

 081001    

学科名称:

 通信与信息系统    

学生类型:

 硕士    

学位年度:

 2012    

院系:

 通信与信息工程学院    

专业:

 通信与信息系统    

第一导师姓名:

 冀汶莉    

论文外文题名:

 The Research and Application of a Vertical Search Engine in Campus Network    

论文中文关键词:

 垂直搜索引擎 ; 校园网 ; 网络爬虫 ; Lucene    

论文外文关键词:

 Vertieal Seareh Engine ; Campus Network ; Web Crawler ; Lucene    

论文中文摘要:
随着互联网的迅猛发展,从海量数据信息中搜索有效信息已经成为一个重要的问题。目前虽然己经有google、百度这些优秀的通用搜索引擎,但这些搜索引擎对于局域网内信息无法完全、准确的获取,同时信息的实时性也无法保证,所以它们并不能很好的准确检索具有行业背景的信息。 目前高校校园网的建设已经比较成熟,校园网内部的公有信息大幅增长,例如本科及研究生的招生及宣传信息。如果使用通用搜索引擎,用户无法获取较为有效的校园网相关信息。因此,为了提高信息检索的效率,本文设计并实现了适用于高校校园网的垂直搜索引擎系统。 本文研究并实现了一个应用于高校校园网的垂直搜索引擎系统。首先阐述了通用搜索引擎的工作原理和主要组成部分,进而分析了垂直搜索引擎的实现原理。论文设计并完成了网页抓取模块、预处理模块、索引和查询模块等搜索引擎系统的4个核心模块。其中在网页抓取模块中实现了网页下载以及将已访问过的URL过滤的功能;在预处理模块中比较了两种方案,选择较优的一种进行了网页去噪,同时完成了中文分词、网页去重工作。针对Lucene中文分词效果比较弱这一特点,对中文分词技术进行了研究,针对最大匹配法的缺陷对中文分词进行了改进,提高了搜索引擎的查询准确率;在索引和查询模块中,建立了倒排索引并且使用了优于Lucene中自带的排序算法的PageRank算法进行网页排序。 最后对系统进行实验验证。从与百度搜索结果的实验对比中可以看出,本系统查准率较高,能够更好的满足想了解校园网信息的用户的需求。
论文外文摘要:
With the rapid development of the Internet, it has become an important issue to search information efficiently from massive data information. Although there have been many outstanding general search engines such asGoogle and Baidu, they can not fully and accurately collect the information on the LAN and guarant the efficiency of information,which make them not Retrieve information based on industry. The construction of Campus Network in Colleges and Universities is more mature than ever. Public information, such as the information of undergraduate and graduate about enrollment and publicity Within the campus network grows greatly,. But if users use the universal search engine, they can not get effective campus network information.Therefore,in order to improve the efficiency of information retrieval based on industry,we designed and implemented a vertical search engine system adjusted to Campus Network in Colleges and Universities. In this article,a vertical search engine which applies in Xi'an University Campus Network was researched and designed. Firstly, working principle and main components of the general search engine were introduced, and realization principle of the vertical search engine was analysed. The paper designed and completed search engine′s core modules, which are Web page capture module, preprocessing module, index and query module.In the web page capture module, the function of downloading Web page and filtering the visited URL was completed. In the preprocessing module , two schemes of Web page cleaning were compared and the beter scheme was adopted. Meanwhile,Chinese word segmentation was completed and because of this characteristic of the weak effect of Lucene Chinese word segmentation,this paper studied Chinese word segmentation technology and Improved the defects of maximum matching method in order to improve query accuracy;In the index and query module, an inverted index is built and PageRank algorithm better than the Lucene built-in sorting algorithm is used to conduct a webpage ranking. From the experimental results it can be seen that the system with higher precision than Baidu search results is able to meet the needs of users who want to understand the campus net information better.
中图分类号:

 TP391.3    

开放日期:

 2012-06-19    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式