查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于内容的视频数据检索技术研究
姓名：	郭仁杰
学号：	19208208053
保密级别：	公开
论文语种：	chi
学科代码：	085212
学科名称：	工学 - 工程 - 软件工程
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2022
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	计算机图形图像处理
第一导师姓名：	付燕
第一导师单位：	西安科技大学
论文提交日期：	2022-06-22
论文答辩日期：	2022-06-06
论文外文题名：	Research on content-based video data retrieval technology
论文中文关键词：	镜头边界检测 ; 大数据处理 ; 关键帧提取 ; 特征提取 ; 局部敏感哈希
论文外文关键词：	shot boundary detection ; big data processing ; key frame extraction ; feature extraction ; lsh
论文中文摘要：	︿近年来，随着计算机技术和网络技术的快速发展，海量的信息在全球被采集、传输、流通和应用，网络上的信息已经从海量文本信息转化为表现形式丰富多样的图像和视频信息。与此同时，具有观看便利、社交功能完善等特点的短视频平台迅速发展，吸引了越来越多的人进入短视频制作市场，视频数据呈爆炸式增长，视频检索的用户需求越来越强烈，这使得视频检索成为一个热门的研究方向。面对海量的视频数据，基于标签的传统视频检索技术已经难以满足用户需求，因此出现了基于内容的视频检索研究。在基于内容的视频检索中，主要包括镜头边界检测、关键帧提取、特征提取、相似度检索等过程，通过视频语义特征可以实现视频内容的表征。由于视频数据是由一系列连续拍摄的镜头组成，直接对视频管理和检索会很复杂，通常需要先采用镜头分割的方式对视频进行处理，然后再进行检索，这使得如何将视频正确地切割为镜头成为首要解决的问题。本文首先对传统的镜头边界检测算法进行了研究，发现由于视频内容的复杂性，视频中存在不可预见的照明变化和运动效应导致容易出现误检，因此本文提出了一种结合视觉颜色信息和BRISK特征的镜头边界检测方法。该方法包括突变检测和渐变检测两部分。突变检测通过使用CIEDE2000色差公式和自适应阈值进行初步突变帧筛选，然后使用BRISK特征去除误检帧。渐变检测则需要先通过视频帧的亮度变化模式检测可能的渐变的帧组，然后利用CIEDE2000色差和基于BRISK特征点匹配的累积帧算法检测真正的渐进过渡帧。实验在TRECVid2001和ClipShots数据集上进行了评估。实验结果表明，该方法可以有效提高镜头边界检测的精度。其次，本文研究了已有的视频检索方法，发现若使用基于视频帧的检索的方式，虽然检索精度高，但存在检索效率低下的问题。为解决该问题，本文提出了一种基于多特征的并行Top-N视频大数据分布式检索方法。首先通过本文提出的镜头边界检测算法将视频分割为多个镜头，然后从镜头中提取视频关键帧，并将它们存储到HBase中，再借助Spark框架实现了视频帧特征的分布式提取。由于本文是将视频帧的多种特征拼接为一个高维向量来进行检索，这种方式虽然精度高，但计算效率较低，于是本文使用局部敏感哈希算法(locality sensitive hashing, LSH)对分布式数据进行分桶和压缩编码，以加快计算效率。在进行视频大数据检索时，由于视频帧数据量庞大，若直接将相似度计算结果进行排序汇总，会导致严重的海量数据shuffle问题。本文提出了基于aggregate算子的自定义堆算法模型，通过降低分区内的数据量，对shuffle过程进行优化，以便解决上述问题。实验在I2V数据集上进行了评估。实验结果表明，使用多特征进行视频检索精度更高，且通过LSH和自定义堆算法大大加快了分布式检索的效率。﹀
论文外文摘要：	︿ In recent years, with the rapid development of computer technology and network technology, massive amounts of information have been collected, transmitted, circulated and applied all over the world, and the information on the Network has been transformed from massive amounts of text information into images and video information in various forms. At the same time, the rapid development of short video platforms with convenient viewing and perfect social functions has attracted more and more people to enter the short video production market. Video data shows explosive growth, and the user demand for video retrieval is increasingly strong, which makes video retrieval become a hot research direction. In the face of massive video data, the traditional video retrieval technology based on tags has been unable to meet the needs of users, so content-based video retrieval appears. Content-based video retrieval mainly includes shot boundary detection, key frame extraction, feature extraction, similarity retrieval and other processes. Video content can be represented by video semantic features. Since the video data is composed of a series of continuous shots, it is very complicated to directly manage and retrieve the video. Usually, the video needs to be processed by lens segmentation first and then retrieved. Therefore, how to correctly cut the video into shots becomes the primary problem to be solved. This paper first studies traditional shot boundary detection algorithms and finds that due to the complexity of video content, unexpected lighting changes and motion effects in videos are prone to false detection. Therefore, this paper proposes a Shot boundary detection method combining visual color information and BRISK feature. The method includes abrupt transition detection and gradual transition detection. Abrupt transition detection works by using the CIEDE2000 chromatic aberration formula and adaptive thresholds for initial mutation frame screening, then using the BRISK feature to remove error-checked frames. Gradual transition detection involves detecting a group of frames with possible gradients through the brightness change patterns of video frames, and then using CIEDE2000 chromatic difference and the BRISK feature-based cumulative frame algorithm to detect truly progressive transition frames. The experiment was evaluated on the TRECVid2001 and ClipShots datasets. Experimental results show that this method can improve the accuracy of lens boundary detection. Secondly, this paper studies the existing video retrieval methods and finds that if the retrieval method based on video frame is used, although the retrieval accuracy is high, the retrieval efficiency is low. To solve this problem, this paper proposes a parallel top-N video big data distributed retrieval method based on multi-feature. Firstly, the video is divided into multiple shots by using the shot boundary detection algorithm proposed in this paper, and then the video key frames are extracted from the shots and stored in HBase. Then, the distributed feature extraction of video frames is realized by using Spark framework. In this paper, multiple features of video frames are spliced into a high-dimensional vector for retrieval, which has high accuracy but low computational efficiency. Therefore, locality sensitive hashing (LSH) algorithm is used in this paper to barrel and compress and encode distributed data to accelerate computational efficiency. Due to the large amount of video frame data during video big data retrieval, sorting and summarizing the results of similarity calculation directly will result in massive data shuffle. In this paper, a user-defined heap algorithm model based on aggregate operator is proposed to optimize the shuffle process by reducing the amount of data in partitions, so as to solve the above problems. The experiment was evaluated on the I2V dataset. Experimental results show that multi-feature video retrieval has higher accuracy, and LSH and custom heap algorithm greatly accelerate the efficiency of distributed retrieval. ﹀
参考文献：	︿ [1] Chen Hanqing et al. A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval[J]. Sensors, 2021, 21(9) : 3094-3094. [2] Amato Giuseppe et al. The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval[J]. Journal of Imaging, 2021, 7(5) : 76-76. [3] Muqiang Zhao,Wenxi Zheng,Yan Ye,Min Wu. Research on Educational Video Retrieval Method Based on Audio Transcription Technology[C]//.Proceedings of 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018). 2018, PP. 399-403. [4] Spolar N , Lee H D , Takaki W S R , et al. A systematic review on content-based video retrieval[J]. Engineering Applications of Artificial Intelligence, 2020, 90(4):1-16. [5] 张晨. 基于内容的视频检索关键技术研究及实现[D].南京邮电大学, 2021. [6] Saoudi E M , Jai-Andaloussi S . A distributed Content-Based Video Retrieval system for large datasets[J]. Journal of Big Data, 2021, 8(1):1-26. [7] Abdulhussain Sadiq H et al. Methods and Challenges in Shot Boundary Detection: A Review[J]. Entropy, 2018, 20(4):214-255. [8] B.H. Shekar and K.P. Uma. Kirsch Directional Derivatives Based Shot Boundary Detection: An Efficient and Accurate Method[J]. Procedia Computer Science, 2015, 58: 565-571. [9] Guangyu Gao and Huadong Ma. To accelerate shot boundary detection by reducing detection region and scope[J]. Multimedia Tools and Applications, 2014, 71(3): 1749-1770. [10] 秦剑鹏,符茂胜,涂铮铮,罗斌.基于颜色直方图变化率的视频镜头检测[J].计算机应用与软件,2011,28(4):17-20. [11] Rashmi B. S. and Nagendraswamy H. S.. Video shot boundary detection using block based cumulative approach[J]. Multimedia Tools and Applications, 2020, 80(1):641-664. [12] 陈曦,贾克斌,王思文.基于互信息量的镜头边界检测算法[J].计算机工程,2014,40(4):287-290+294. [13] 白慧茹.基于多特征融合的自适应双阈值镜头检测算法研究[J].信息记录材料,2020,21(9):229-231. [14] 李秋玲,赵磊,邵宝民,王雷,姜雪.基于融合特征的自适应阈值镜头边界检测算法[J].计算机工程与设计,2020,41(3):777-782. [15] A. Sasithradevi,S. Mohamed Mansoor Roomi. A new pyramidal opponent color-shape model based video shot boundary detection[J]. Journal of Visual Communication and Image Representation,2020,67(C):1-30. [16] G. Ulutas, B. Ustubioglu, M. Ulutas, V.V. Nabiyev. Frame duplication detection based on BoW model[J]. Multimedia Systems, 2018, 24(5):549-567. [17] Jiang Y G, Ngo C W. Bag-of-visual-words expansion using visual relatedness for video indexing[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 2008, pp. 769-770. [18] WuLin, WangYang, GaoJunbin, et al. Where-and-When to Look: Deep Siamese Attention Networks for Video-Based Person Re-Identification.[J]. IEEE Trans. Multimedia, 2019, 21(6):1412-1424. [19] Lou Y, Bai Y, Lin J, et al. Compact deep invariant descriptors for video retrieval[C]//2017 Data Compression Conference (DCC). IEEE, 2017, pp. 420-429. [20] Araujo A, Chaves J, Angst R, et al. Temporal aggregation for large-scale query-by-image video retrieval[C]//2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015, pp. 1519-1522. [21] Zhu X, Jing X Y, Wu F, et al. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2017, pp. 4341-4348. [22] Zhang C, Hu B, Suo Y, et al. Large-scale video retrieval via deep local convolutional features[J]. Advances in Multimedia, 2020, 2020(3):1-8. [23] Andre F. de Araújo and Bernd Girod. Large-Scale Video Retrieval Using Image Queries.[J]. IEEE Trans. Circuits Syst. Video Techn., 2018, 28(6):1406-1420. [24] Rathore M M, Son H, Ahmad A, et al. Real-time video processing for traffic control in smart city using Hadoop ecosystem with GPUs[J]. Soft Computing, 2018, 22(5):1533-1544. [25] Li Y, Liang H, Ke X, et al. Fast distributed video deduplication via localitysensitive hashing with similarity ranking[J]. EURASIP Journal on Image and Video Processing, 2019, 2019(1):1-11. [26] 张涛,陈杨华.基于Hadoop的特定车辆车牌检索平台设计[J].公路,2021,66(1):248-251. [27] 周少珂,王雷,崔琳,郭璇,万小舟.大数据Hadoop技术完全分布式集群部署[J].工业控制计算机,2021,34(8):101-103. [28] Abdiaziz Omar Hassan,Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study[J]. Advances in Applied Sciences, 2021, 6(3):43-48. [29] Mohan Prakash,Kuppuraj Balasaravanan,Chellai Saravanakumar. An Enhanced Security Measure for Multimedia Images Using Hadoop Cluster[J]. International Journal of Operations Research and Information Systems (IJORIS), 2021, 12(3):1-7. [30] 李宇.基于Hadoop的分布式外观专利图像检索系统研究[J].电脑知识与技术,2021,17(18):205-208. [31] 何昱琪,李德禹.基于Spark计算框架的多目标优化算法实现[J].现代信息科技,2021,5(22):66-70. [32] Xinxin Wang, Depeng Dang, Zixian Guo. Evaluating the crowd quality for subjective questions based on a Spark computing environment[J]. Future Generation Computer Systems, 2020, 106(C):426-437. [33] 王璐.基于HBase的大数据存储设计及高并发查询方法研究[J].信息与电脑(理论版),2021,33(15):184-187. [34] 崔斌,高军,童咏昕,许建秋,张东祥,邹磊.新型数据管理系统研究进展与趋势[J].软件学报,2019,30(1):164-193. [35] Hassan Muhammad Umair,Yaqoob Irfan,Zulfiqar Sidra,Hameed Ibrahim A.. A Comprehensive Study of HBase Storage Architecture—A Systematic Literature Review[J]. Symmetry, 2021, 13(1):109-129. [36] Xu Haojie. Research on mass monitoring data Retrieval Technology based on HBase[J]. Journal of Physics: Conference Series,2021,1871(1):12133-12138. [37] Gómez-Polo Cristina,Montero Javier,Gómez-Polo Miguel,Martin Casado Ana. Comparison of the CIELab and CIEDE 2000 Color Difference Formulas on Gingival Color Space.[J]. Journal of prosthodontics : official journal of the American College of Prosthodontists, 2020, 29(5):401-408. [38] 冯亚洲. 基于Hadoop的电力视频大数据分布式检索系统设计与实现[D].南京邮电大学,2017. [39] Gómez-Polo Cristina,Montero Javier,Gómez-Polo Miguel,Martin Casado Ana. Comparison of the CIELab and CIEDE 2000 Color Difference Formulas on Gingival Color Space.[J]. Journal of prosthodontics : official journal of the American College of Prosthodontists, 2020, 29(5):401-408. [40] Chakraborty S, Thounaojam D M, Sinha N. A Shot boundary Detection Technique based on Visual Colour Information[J]. Multimedia Tools and Applications, 2021, 80(4):1-16. [41] 王剑峰, 杜奎然. 基于三步筛选的视频渐变镜头检测[J]. 计算机工程, 2011, 37(24):269-271. [42] 何薇, 金立左. 基于累积帧的自适应镜头边界检测算法[C]// 中国过程控制会议. 中国自动化学会, 2006. [43] Saptarshi Chakraborty,Dalton Meitei Thounaojam. A novel shot boundary detection system using hybrid optimization technique[J]. Applied Intelligence, 2019, 49(9):3207-3220. [44] Dalton Meitei Thounaojam, Vivek Singh Bhadouria, Sudipta Roy, Kh. Manglem Singh. Shot boundary detection using perceptual and semantic information[J]. International Journal of Multimedia Information Retrieval, 2017, 6(2):167-174. [45] J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong, Multiple feature hashing for real-time large scale near-duplicate video retrieval[C]//Proceedings of the 19th ACM international conference on Multimedia. 2011, pp. 423-432. [46] Mistry Y D. Textural and color descriptor fusion for efficient content-based image retrieval algorithm[J]. Iran Journal of Computer Science, 2020, 3(3):169-183. [47] Khan M N, Alam A, Lee Y K. FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing[C]//2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2020, pp. 36-43. [48] M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the twentieth annual symposium on Computational geometry. 2004, pp. 253–262. [49] Liu X, Han J, Zhong Y, et al. Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS[C]//2009 IEEE International Conference on Cluster Computing and Workshops. IEEE, 2009, pp. 1-8. [50] Dong B, Qiu J, Zheng Q, et al. A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files[C]//2010 IEEE International Conference on Services Computing. IEEE, 2010, pp. 65-72. [51] Jiang D Y, Kim J. Image Retrieval Method Based on Image Feature Fusion and Discrete Cosine Transform[J]. Applied Sciences, 2021, 11(12): 5701-5728. [52] Dongsu Liu and Chenhui Huo and Hao Yan. Research of commodity recommendation workflow based on LSH algorithm[J]. Multimedia Tools and Applications, 2019, 78(4):4327-4345. [53] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation[J]. Journal of the ACM (JACM), 2006, 53(3):307-323. [54] 徐文博,吴恋,于国龙.基于SIFT特征图像检索的分布式应用[J].贵州师范学院学报,2016,32(9):13-17. [55] Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li, Deep learning for content-based image retrieval: A comprehensive study[C]//Proceedings of the 22nd ACM international conference on Multimedia. 2014, pp. 157-166. [56] Andre F. de Araújo and Bernd Girod. Large-scale video retrieval using image queries[J]. IEEE Trans. Circuits Syst. Video Techn., 2018, 28(6):1406-1420. [57] Saliha Mezzoudj, Ali Behloul,Rachid Seghir, Yassmina Saadna. A parallel content-based image retrieval system using spark and tachyon frameworks[J]. Journal of King Saud University-Computer and Information Sciences, 2019, 33(2):141-149. [58] 曹健,张俊杰,李海生,蔡强.基于Apache Spark的海量图像并行检索[J].计算机应用,2018,38(S2):183-186+230. ﹀
中图分类号：	TP391.4
开放日期：	2022-06-22

附件下载