- 无标题文档
查看论文信息

论文中文题名:

 Web图像搜索器的关键技术研究    

姓名:

 王月强    

学号:

 20070367    

保密级别:

 公开    

学科代码:

 081203    

学科名称:

 计算机应用技术    

学生类型:

 硕士    

学位年度:

 2010    

院系:

 计算机科学与技术学院    

专业:

 计算机应用技术    

第一导师姓名:

 李爱国    

论文外文题名:

 The Key Technology Research of Web Image Crawler    

论文中文关键词:

 Web图像搜索器 ; 多线程技术 ; 磁盘I/O缓冲 ; 基于深度的广度优先搜索策略 ; 基于内容的图像搜索引擎    

论文外文关键词:

 Web image searcher Multi-thread technology Disk I/O buffer Depth-based bre    

论文中文摘要:
利用基于内容图像检索技术在Internet上查找用户所需的图像是一个重要且具有挑战性的学术研究。Web图像搜索器能为基于内容的图像搜索引擎提供源源不断的图像数据,使基于内容的图像搜索引擎对提高用户的服务质量具有重要的意义。本文在本课题组开发基于内容图像搜索引擎系统V1.0研究基础上,引入多线程技术,提出多线程Web图像搜索器的磁盘I/O缓冲方法。在深入分析和比较几种常见的搜索策略基础上,探索出适合于多线程Web图像搜索器的新搜索策略。最后开发了多线程Web图像搜索器子系统,完成与图像检索子系统的融合,构建出基于内容的图像搜索引擎系统V2.0。 提出多线程Web图像搜索器的磁盘I/O缓冲方法。频繁的磁盘I/O操作导致多线程Web图像搜索器的性能显著下降。为此提出磁盘I/O缓冲方法,它包括待采URL的双队列缓冲和图像存储与URL存储中的循环缓冲池等两种措施。在URL待采队列中采用双队列缓冲,当其中一个队列处于使用状态时,另一队列执行从磁盘读取新URL的操作,使得各个线程可以不间断地获取URL。两个循环缓冲池分别用于图像和URL磁盘存储操作中,其工作原理相同。实验结果显示,磁盘I/O缓冲方法显著改善了多线程Web图像搜索器的性能。 提出Web图像搜索器的基于深度的广度优先搜索策略。本文对不同质量的图像在Internet站点中的位置进行了统计分析,实验表明高质量图像在深站点页面中的数量比浅站点页面多。通过对传统搜索器的基于广度搜索策略和基于深度搜索策略的研究,提出基于深度的广度优先搜索策略。为构建基于深度的广度优先策略的Web图像搜索器,提出了DR-BTree(Determine Repeat-BTree)的页面URL判断重复方式和页面URL的数据库存储方式,将这种搜索策略与图像过滤方法结合并实现对下载图像的过滤。实验结果对比显示,在相同时间内三种搜索策略下载的图像数量相近,但是本文搜索策略下载高质量图像的数量是基于广度优先策略和基于深度优先策略的3.6倍和2.7倍,说明本文搜索策略更适合于多线程Web图像搜索器。 基于上述研究结果,设计并开发了多线程Web图像搜索器子系统,它是基于内容的图像搜索引擎系统V2.0的重要组成部分。该子系统采用了多线程技术、磁盘I/O缓冲方法和基于深度的广度优先搜索策略。实验测试证明,该子系统提高了图像下载的速度,能较好地为基于内容的图像搜索引擎提供大量的图像数据,达到了预期目标。
论文外文摘要:
Using content-based image retrieval technology on internet for searching images is an important and challenging academic research. Web image searcher can supply continuous image data for content-based image search engine, and it is of significance to enhance quality of service for user by content-based image search engine. Based on the research of content-based image search engine system V1.0 which is developed independently, multi-thread technology is introduced and multi-thread web image searcher is developed in this thesis. At the same time, the I/O buffering scheme of multi-thread web image searcher is proposed. Some common search strategies are analyzed and compared deeply, and the search strategy of web oriented image searcher is studied. A new search strategy, which is fit for multi-thread web image searcher, is explored. Finally, a subsystem of multi-thread web image searcher is developed and recombined with image retrieval subsystem composing the system V2.0 of content-based image search engine. The disk I/O buffer method of multi-thread web image searcher is proposed. Frequent disk I/O operations result in the performance degradation of multi-thread web image crawler. A method of disk I/O buffer is proposed, which includes double-queue buffering in collecting URLs and cycle buffer pool in image storage and URL storage. Method of double-queue buffering is used in the URL queue, which is waiting for processing. When a queue provides all threads with URLs and the other one is performing a new operation of reading new URLs. Therefore, these two coinstantaneous operations can continuously supply the new URLs to each thread. The method of cycle buffer pool is used respectively in image storage and URL storage, and the two cycle buffer pool both work on the same principle. The experimental results show that the Multi-thread web image crawler system’s performance is improved obviously when these disk I/O buffer methods are applied. D epth-based breadth first search of web image searcher is proposed. The position of the different images on web site is counted and analyzed, and experimental results show that the numbers of high quality image in deep web site are more than that of shallow web site. By the study of breadth first search and depth first search of traditional searcher, the depth-based breadth first search is proposed. In order to structure web image searcher of depth-based breadth first search, two ways of page URL determine repeat of DR-BTree and database storage of page URL are proposed. Combination this thesis’ search strategy proposed and image filtering method achieve the filtering processing of download images. The comparison of experimental results show that the image numbers of this thesis’ search strategy downloaded are respective 3.6 times and 2.7 times relative to two traditional search strategies. It denotes that thesis’ search strategy is fit for multi-thread web image searcher. On the basis of above research, multi-thread web image searcher subsystem is designed and developed, and it is a very important part of the system V2.0 of content-based image search engine. Multi-thread technology, disk I/O buffer and Depth-based breadth first search are introduced into this subsystem. It enhances the speed of image downloading and supplies a lot of images for content-based image search engine, and the desired goal is achieved.
中图分类号:

 TP391.41    

开放日期:

 2011-04-02    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式