查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于卷积神经网络的图像标注算法研究
姓名：	车子豪
学号：	18208088014
保密级别：	公开
论文语种：	chi
学科代码：	083500
学科名称：	工学 - 软件工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2021
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	人工智能与信息处理
第一导师姓名：	厍向阳
第一导师单位：	西安科技大学
论文提交日期：	2021-06-21
论文答辩日期：	2021-06-03
论文外文题名：	Research on Image Annotation Algonthm Based on Convolutional Neural Network
论文中文关键词：	图像自动标注 ; 深度学习 ; 卷积神经网络 ; 特征融合 ; 生成对抗网络
论文外文关键词：	Automatic Image Annotation ; Deep Learning ; Convolutional Neural Network ; Feature Fusion ; Generative Adversarial Networks
论文中文摘要：	︿随着智能手机、家用电脑等数码设备的普及和通信技术的发展，图像等可视化数据在互联网的共享平台上随处可见，为了对其进行有效的管理和使用，研究者们提出了图像检索技术。但由于技术限制和用户的习惯，搜索引擎都提供基于关键词的图像检索，这种检索方式需要提前使用关键词对图像进行标注，但仅依靠手工的方式进行标注时间成本和人工成本是难以想象的，因此图像自动标注技术迅速发展起来。传统的图像自动标注算法由于模型复杂且泛化性能差，存在标注结果准确率低等缺点，为此，本文提出了两种基于卷积神经网络的图像自动标注算法。主要工作如下：（1）针对图像中小尺度目标标注准确率低和标注的类别不均衡的问题，提出了融合多尺度特征和代价敏感学习的图像标注方法。该方法对VGG16的网络结构进行了调整，添加了特征融合模块。特征融合模块分为多尺度特征提取和特征融合。多尺度特征提取模块从卷积特征提取多尺度特征，特征融合模块在网络学习过程中自适应的融合特征，并在多标签损失函数的基础上提出了代价敏感的多标签损失函数。实验表明，融合多尺度特征和代价敏感学习的图像标注算法能够在保证高频标签标注性能的同时，提升对低频标签的标注性能。（2）针对图像标注数据集存在的训练样本不充足和标注的类别不均衡的问题，设计了基于双卷积神经网络的图像标注方法。首先提出了基于生成对抗网络的图像扩充方法，与传统图像扩充方法相结合解决训练样本不充足问题；其次改进卷积神经网络结构，引入可形变卷积和滤波池化来加强对不同尺度对象的标注能力；最后对数据集进行划分，划分为全部数据集和低频标签数据集，分别独立训练两个卷积神经网络模型，并设计标注结果融合模块对两个模型标注结果进行融合，低频数据集训练出来的模型更适用于标注低频标签，降低了类别不平衡对低频标签的影响。实验表明，基于双卷积神经网络模型的图像标注算法能够提升图像标注的准确率。﹀
论文外文摘要：	︿ With the popularity of smart phones, home computers and other digital devices and the development of communication technology, visual data such as images can be seen everywhere on the Internet sharing platform. In order to manage and use them effectively, researchers put forward image retrieval technology. Due to technical limitations and user habits, search engines provide keyword based image retrieval. This retrieval method needs to use keywords to annotate the image in advance, but it is difficult to imagine the time cost and labor cost only relying on manual annotation, so the automatic image annotation technology has developed rapidly. Due to the complexity of the model, poor generalization performance and low accuracy of the traditional image automatic annotation algorithm, this paper proposes two image automatic annotation algorithms based on convolution neural network. The main work is as follows: (1) Aiming at the problems of low accuracy and imbalanced categories of small and medium scale object annotation, an image annotation method based on multi-scale features and cost sensitive learning is proposed. This method adjusts the network structure of vgg16 and adds feature fusion module. The feature fusion module is divided into multi-scale feature extraction module and fusion feature module. The multi-scale feature extraction module extracts multi-scale features from convolution features, and the fusion feature module adaptively fuses features in the process of network learning. Based on the multi label loss function, a cost sensitive multi label loss function is proposed. Experimental results show that the proposed algorithm can improve the labeling performance of low-frequency tags while ensuring the labeling performance of high-frequency tags. (2) In order to solve the problems of insufficient training samples and unbalanced annotation categories in image annotation dataset, an image annotation method based on double convolution neural network is designed. Firstly, an image expansion method based on generative countermeasure network is proposed, which is combined with the traditional image expansion method to solve the problem of insufficient training samples. Secondly, the convolution neural network structure is improved, and deformable convolution and filter pooling are introduced to enhance the ability of labeling objects of different scales. Finally, the dataset is divided into all datasets and low-frequency label datasets, respectively two convolutional neural network models are trained independently, and a labeling result fusion module is designed to fuse the labeling results of the two models. The model trained from low-frequency dataset is more suitable for labeling low-frequency labels, which reduces the impact of class imbalance on low-frequency labels. Experiments show that the image annotation algorithm based on double convolution neural network model can improve the accuracy of image annotation. ﹀
参考文献：	︿ [1]Jin C, Jin S W. Content-based image retrieval model based on cost sensitive learning[J]. Journal of Visual Communication and Image Representation, 2018, 55(8):720-728. [2]贺周雨, 冯旭鹏, 刘利军, 黄青松. 面向大规模图像检索的深度强相关散列学习方法[J]. 计算机研究与发展, 2020, 57(11):2375-2388. [3]Cong J, Shu W J. Automatic discovery approach of digital image Topic [C]// Applied Mechanics and Materials, 2014:382-386. [4]Wang Y X, Zhu L, Qian X M. Social image retrieval based on topic diversity[J]. Multimedia Tools and Applications, 2021, DOI: 10.1007/s11042-020-10221-z. [5]Nemade S B, Sonavane S P. Co-occurrence patterns based fruit quality detection for hierarchical fruit image annotation[J]. Journal of King Saud University - Computer and Information Sciences, 2020, DOI: 10.1016/j.jksuci.2020.11.033. [6]严靓, 周欣, 何小海, 熊淑华, 卿粼波. 基于集成分类的暴恐图像自动标注方法[J]. 太赫兹科学与电子信息学报,2020, 18(2):306-312. [7]张钢, 钟灵, 黄永慧. 一种病理图像自动标注的机器学习方法[J]. 计算机研究与发展, 2015, 52(9):2135-2144. [8]Mori Y, Takahashi H, Oka R, Image-to-word transformation based on dividing and vector quantizing images with words[C]// First International Workshop on Multimedia Intelligent to Rage and Retrieval Management, 1999:405-409. [9]Duygulu P, Barnard K, Freitas J F G D, et al. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary[C]// European Conference on Computer Vision, 2002:97-112.. [10]Jeon J, Lavrenko V, Manmatha R. Automatic image annotation and retrieval using cross-media relevance models[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003: 119-126. [11]Lavrenko V, Manmatha R, Jeon J. A model for learning the semantics of pictures[C]// Advances in Neural Information Processing Systems 16, 2004: 553-560. [12]Feng S L, Manmatha R, Lavrenko V. Multiple bernoulli relevance models for image and video annotation[C]// Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004: 1002-1009. [13]Liu J, Wang B, Li M J, et al. Dual cross-media relevance model for image annotation[C]// Proceedings of the 15th ACM International Conference on Multimedia, 2007:605–614. [14]秦铭, 蔡明. 基于分类融合和关联规则挖掘的图像语义标注[J]. 计算机工程与科学, 2018, 40(5):950-956. [15]Hofmann T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1-2):177-196. [16]Landauer T, Foltz P, Laham D. Introduction to latent semantic indexing[J]. Discourse Processes, 1998, 25(5):259-284. [17]Wang Y, Mei T, Gong S, et al. Combining global, regional and contextual features for automatic image annotation[J]. Pattern Recognition, 2009, 42(2):259-266. [18]李志欣, 施智平, 张灿龙, 王金艳. 混合生成式和判别式模型的图像自动标注[J]. 中国图象图形学报, 2015, 20(5):687-699. [19]David M B, Michael I J, Modeling annotated data[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003:127–134. [20]Putthividhya D, Attias H T, Nagarajan S S. Topic regression multi-modal latent dirichlet allocation for image annotation[C]// The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2010), 2010: 3408–3415. [21]Ruan Y B, Xiao Y S, Hao Z F, et al. A nearest-neighbor search model for distance metric learning[J]. Information Sciences, 2021, 552:261-277. [22]Makadia A, Pavlovic V, Kumar S. A new baseline for image annotation[C]// European Conference on Computer Vision, 2008:316–329. [23]Guillaumin M, Mensink T, Verbeek J, Schmid C. Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation[C]// 2009 IEEE 12th International Conference on Computer Vision, 2009:309–316. [24]Verma Y, Jawahar C V. Image annotation by propagating labels from semantic neighbourhoods[J]. International Journal of Computer Vision, 2017, 121(1):126-148. [25]Ma Y, Liu Y, Xie Q, et al. CNN-feature based automatic image annotation method[J]. Multimedia Tools and Applications, 2019, 78(2):3767–3780. [26]柯逍, 周铭柯, 牛玉贞. 融合深度特征和语义邻域的自动图像标注[J]. 模式识别与人工智能, 2017, 30(3):193-203. [27]Goh K S, Chang E Y, Li B. Using one-class and two-class SVMs for multiclass image annotation[J]. IEEE Transactions on Knowledge & Data Engineering, 2005, 17(10):1333-1346. [28]李志欣, 郑永哲, 张灿龙, 史忠植. 结合深度特征与多标记分类的图像语义标注[J].计算机辅助设计与图形学学报, 2018, 30(2):318-326. [29]Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3):333-359. [30]Ali F, Amir MEM, et al. Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval[J]. Applied Soft Computing, 2013, 13(2): 1292-1302. [31]高耀东, 侯凌燕, 杨大利. 基于多标签学习的卷积神经网络的图像标注方法[J]. 计算机应用, 2017, 37(1):228-232. [32]汪鹏, 张奥帆, 王利琴, 董永峰. 基于迁移学习与多标签平滑策略的图像自动标注[J]. 计算机应用, 2018, 38(11):3199-3203+3210. [33]Ke X, Zou J, Niu Y. End-to-end automatic image annotation based on deep cnn and multi-label data augmentation[J]. IEEE Transactions on Multimedia, 2019, 21(8): 2093-2106. [34]Jin J, Nakayama H. Annotation order matters: recurrent image annotator for arbitrary length image tagging[C]// International Conference on Pattern Recognition, 2017:2452-2457. [35]Li X, Wang W, Hu X, et al. Selective kernel networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020:510-519. [36]Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018:7132-7141. [37]Christian S, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015:1-9. [38]Kelvin X, Jimmy L B, Ryan K, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015:2048-2057. [39]Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]// Proceedings of the 2014 Conference on Advances in Neural Information Processing Systems 27, 2014:2672-2680. [40]Phillip I, Zhu J Y, Zhou T H, et al. Image-to-image translation with conditional adversarial networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2017:1125-1134. [41]Zhu J Y, Taesung P, Phillip I, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// Proceedings of the IEEE International Conference on Computer Vision(ICCV) , 2017:2242-2251. [42]周铭柯, 柯逍, 杜明智. 基于数据均衡的增进式深度自动图像标注[J]. 软件学报, 2017, 28(7):1862-1880. [43]Murthy V N, Maji S, Manmatha R. Automatic image annotation using deep learning representations[C]// Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015:603-606. [44]Bbeiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140. [45]Dai J F, Qi H Z, Xiong Y W, et al. Deformable convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision , 2017:764-773. [46]Richard Z. Making convolutional networks shift-invariant again[C]// 36th International Conference on Machine Learning(ICML 2019) ,2019: 12712-12722. [47]林兰. 基于半监督学习的图像自动标注方法研究[D]. 广西:广西师范大学,2018. [48]Wang Y, Dawood H, Yin Q, et al. A comparative study of different feature mapping methods for image annotation[C]// International Conference on Advanced Computational Intelligence, 2015:340-344. [49]Wu B, Jia F, Liu W, Ghanem B. Diverse image annotation [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017:6194-6202. [50]Uricchio T, Ballan L, Seidenari L, Del B A. Automatic image annotation via label transfer in the semantic space[J]. Pattern Recognition, 2017, 71:144-157. ﹀
中图分类号：	TP301.6
开放日期：	2021-06-21

附件下载