查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的图像修复算法研究
姓名：	王冠懿
学号：	20208223053
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	数字图像处理
第一导师姓名：	李洪安
第一导师单位：	西安科技大学
论文提交日期：	2023-06-15
论文答辩日期：	2023-06-05
论文外文题名：	Research on Image Inpainting Method Based on Deep Learning
论文中文关键词：	图像修复 ; 门控卷积 ; 自注意力机制 ; 双阶段修复网络 ; Transformer
论文外文关键词：	Image Inpainting ; Gated Convolution ; Self-attention Mechanism ; Two-stage Inpainting Network ; Transformer
论文中文摘要：	︿图像是目前人类用于传递和获取信息频率最高的媒介之一，图像信息的破损会影响对信息的获取以及后续处理。破损图像修复是计算机视觉中的一个重要研究领域，它的目标是修复破损图像中的缺损信息。近年来，随着深度学习的兴起，基于深度学习的图像修复方法取得了显著效果，但是在一些情况下，修复图像仍然存在着模糊失真、语义信息不匹配和训练缓慢等问题。为了解决以上问题，本文的研究内容如下：（1）针对目前轻量化图像修复方法存在的语义信息缺失、修复内容不匹配和训练缓慢的问题，本文提出一种基于门控卷积和自注意力机制的金字塔图像修复方法。首先，该方法以U型网络为基础，融合门控卷积，改变特征的提取策略，减少对冗余信息的计算，提高模型的计算效率。其次，设计自注意力机制模块和注意力转移模块，更有效地引导高阶语义特征与图像信息之间的转换过程，减少网络中长距离导致的信息损耗。最后，设计并增加内容损失、感知损失和金字塔损失，增强网络的学习速度和能力，生成和真实图像相近的数据分布。实验结果表明基于门控卷积和自注意力机制的金字塔图像修复方法修复结果语义更完整、内容更匹配，模型训练速度更快。（2）针对应用于高质量图像修复的方法中存在的修复后图像语义信息和边缘一致性较差、图像清晰度低、细节丢失和模型训练缓慢的问题，本文提出一种基于自适应Transformer的高质量图像修复方法。首先，设计双阶段生成器网络，第一阶段是结合自注意力机制的卷积神经网络，第二阶段采用基于Transformer的生成器模型。通过使用双阶段网络，增强模型的修复能力。其次，设计使用自适应多头自注意力机制，增加对核心特征区域的关注度，加快模型的前向传播速度。最后，融合内容损失、感知损失和金字塔损失作为生成器网络的损失函数，提高模型的训练速度和学习精度。实验结果表明使用基于自适应Transformer的高质量图像修复方法增加了修复图像的清晰度和纹理细节，语义信息和边缘一致性更高，模型的训练速度更快。﹀
论文外文摘要：	︿ Images are one of the most frequently used media for conveying and acquiring information in today's world, and damage to image information can affect the retrieval of information and subsequent processing. Image inpainting is an important research area in computer vision, with the goal of repairing missing information in damaged images. In recent years, deep learning-based image inpainting methods have achieved remarkable results. However, in some cases, the inpainted images still suffer from blurring, distortion, semantic mismatch, and slow training. To address these issues, the research contents in this paper are as follows: (1) To address the issues of semantic information loss, content mismatch, and slow training in current lightweight image inpainting methods, this paper proposes a pyramid image inpainting method based on gated convolution and self-attention mechanisms. Firstly, the method is based on a UNet and incorporates gated convolution, changing feature extraction strategies to reduce computation on redundant information and improve the model's computational efficiency. Secondly, self-attention mechanism module and attention transfer module are designed to more effectively guide the transformation process between high-level semantic features and image information, reducing the information loss caused by long distance within the network. Lastly, content loss, perceptual loss, and pyramid loss are designed and added to enhance the network's learning speed and capacity, generating data distributions close to the real images. Experimental results show that the pyramid image inpainting method based on gated convolution and self-attention mechanisms yields more complete semantics, better content matching, and faster model training. (2) To address the problems of poor semantic information consistency, low image clarity, detail loss, and slow model training in high-quality image inpainting methods, this paper proposes a high-quality image inpainting method based on adaptive Transformer. Firstly, a two-stage generator network is designed, the first stage is a convolutional neural network combined with a self-attention mechanism, and the second stage is a generator model based on Transformer. By using a two-stage network, the model's inpainting capability is enhanced. Secondly, an adaptive multi-head self-attention mechanism is employed to increase the focus on core feature areas and accelerate the model's forward propagation speed. Lastly, content loss, perceptual loss, and pyramid loss are integrated as the generator network's loss functions, improving the model's training speed and learning accuracy. Experimental results show that the high-quality image inpainting method based on adaptive Transformer increases the clarity and texture detail of the inpainted images, and achieves higher semantic information and edge consistency while providing faster model training. ﹀
参考文献：	︿ [1] Bertalmio M, Sapiro G, Caselles V, et al. Image inpainting[C]//Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 2000: 417-424. [2] Mairal J, Elad M, Sapiro G. Sparse learned representations for image restoration[C]//Proceedings of the 4th World Conf of the Int Assoc for Statistical Computing (IASC). 2008: 118-127. [3] Guleryuz O G. Nonlinear approximation based image recovery using adaptive sparse reconstructions and iterated denoising-part I: theory[J]. IEEE Transactions on image processing, 2006, 15(3): 539-554. [4] Criminisi A, Pérez P, Toyama K. Region filling and object removal by exemplar-based image inpainting[J]. IEEE Transactions on image processing, 2004, 13(9): 1200-1212. [5] Xie J, Xu L, Chen E. Image denoising and inpainting with deep neural networks[J]. Advances in neural information processing systems, 2012, 25: 341-349. [6] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[J]. arXiv preprint arXiv:14062661, 2014, 1406: 1-9. [7] Aggarwal A, Mittal M, Battineni G. Generative adversarial network: An overview of theory and applications[J]. International Journal of Information Management Data Insights, 2021, 1(1): 1-9. [8] Han B, Zhang X, Wang J, et al. Hybrid distance-guided adversarial network for intelligent fault diagnosis under different working conditions[J]. Measurement, 2021, 176: 1-12. [9] Yan K. Chiller fault detection and diagnosis with anomaly detective generative adversarial network[J]. Building and Environment, 2021, 201: 1-9. [10] Gui J, Sun Z, Wen Y, et al. A review on generative adversarial networks: Algorithms, theory, and applications[J]. IEEE transactions on knowledge and data engineering, 2021: 1-28. [11] Wang K, Gou C, Duan Y, et al. Generative adversarial networks: introduction and outlook[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 588-598. [12] Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:14111784, 2014: 1-7. [13] Pathak D, Krahenbuhl P, Donahue J, et al. Context encoders: Feature learning by inpainting[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2536-2544. [14] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:151106434, 2015: 1-16. [15] Wang T-C, Liu M-Y, Zhu J-Y, et al. High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8798-8807. [16] Ren Y, Yu X, Zhang R, et al. Structureflow: Image inpainting via structure-aware appearance flow[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 181-190. [17] Ma Y, Liu X, Bai S, et al. Coarse-to-Fine Image Inpainting via Region-wise Convolutions and Non-Local Correlation[C]//Proceedings of the IJCAI. 2019: 3123-3129. [18] Hu J, Wang H, Wang J, et al. SA-Net: A scale-attention network for medical image segmentation[J]. PloS one, 2021, 16(4): 1-14. [19] Rong L, Li C. Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2021: 3675-3684. [20] 朱张莉, 饶元, 吴渊等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. [21] Guo M-H, Xu T-X, Liu J-J, et al. Attention mechanisms in computer vision: A survey[J]. Computational Visual Media, 2022, 8(3): 331-368. [22] Galassi A, Lippi M, Torroni P. Attention in natural language processing[J]. IEEE transactions on neural networks and learning systems, 2020, 32(10): 4291-4308. [23] Guan S, Hsu K-T, Eyassu M, et al. Dense dilated UNet: deep learning for 3D photoacoustic tomography image reconstruction[J]. arXiv preprint arXiv:210403130, 2021: 1-9. [24] Jing J, Wang Z, Rätsch M, et al. Mobile-Unet: An efficient convolutional neural network for fabric defect detection[J]. Textile Research Journal, 2022, 92(1-2): 30-42. [25] Yan Z, Li X, Li M, et al. Shift-net: Image inpainting via deep feature rearrangement[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 1-17. [26] Zeng Y, Lin Z, Yang J, et al. High-resolution image inpainting with iterative confidence feedback and guided upsampling[C]//Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. 2020: 1-17. [27] 司念文, 张文林, 屈丹等. 卷积神经网络表征可视化研究综述[J]. 自动化学报, 2022, 48(8): 1890-1920. [28] 窦慧, 张凌茗, 韩峰等. 卷积神经网络的可解释性研究综述[J]. 软件学报, 2023: 1-27. [29] Kaur G, Sinha R, Tiwari P K, et al. Face mask recognition system using CNN model[J]. Neuroscience Informatics, 2022, 2(3): 1-9. [30] 刘荣. 人工神经网络基本原理概述[J]. 计算机产品与流通, 2020, 6: 35-36. [31] Ajit A, Acharya K, Samanta A. A review of convolutional neural networks[C]//Proceedings of the 2020 international conference on emerging trends in information technology and engineering (ic-ETITE). 2020: 1-5. [32] Fasel B. Robust face analysis using convolutional neural networks[C]//Proceedings of the 2002 International Conference on Pattern Recognition. 2002, 2: 40-43. [33] Browne M, Ghidary S S. Convolutional neural networks for image processing: an application in robot vision[C]//Proceedings of the Australian Conference on Artificial Intelligence. 2003, 2903: 641-652. [34] Nagi J, Ducatelle F, Di Caro G A, et al. Max-pooling convolutional neural networks for vision-based hand gesture recognition[C]//Proceedings of the 2011 IEEE international conference on signal and image processing applications (ICSIPA). 2011: 342-347. [35] Wang Y, Li Y, Song Y, et al. The influence of the activation function in a convolution neural network model of facial expression recognition[J]. Applied Sciences, 2020, 10(5): 1897-1916. [36] Sharma S, Sharma S, Athaiya A. Activation functions in neural networks[J]. Towards Data Sci, 2017, 6(12): 310-316. [37] Agostinelli F, Hoffman M, Sadowski P, et al. Learning activation functions to improve deep neural networks[J]. 2014: 1-9. [38] Apicella A, Donnarumma F, Isgrò F, et al. A survey on modern trainable activation functions[J]. Neural Networks, 2021, 138: 14-32. [39] Ding B, Qian H, Zhou J. Activation functions and their characteristics in deep neural networks[C]//Proceedings of the 2018 Chinese control and decision conference (CCDC). 2018: 1836-1841. [40] Parhi R, Nowak R D. The role of neural network activation functions[J]. IEEE Signal Processing Letters, 2020, 27: 1779-1783. [41] Maeda H, Kashiyama T, Sekimoto Y, et al. Generative adversarial network for road damage detection[J]. Computer-Aided Civil and Infrastructure Engineering, 2021, 36(1): 47-60. [42] Li H, Zheng Q, Yan W, et al. Image super-resolution reconstruction for secure data transmission in Internet of Things environment[J]. Mathematical Biosciences and Engineering, 2021, 18(5): 6652-6672. [43] Qin Z, Zeng Q, Zong Y, et al. Image inpainting based on deep learning: A review[J]. Displays, 2021, 69: 1-14. [44] 李雪涛, 王耀雄, 高放. 图像修复方法综述[J]. Laser & Optoelectronics Progress, 2023, 60(2): 1-16. [45] 李天成, 何嘉. 一种基于生成对抗网络的图像修复算法[J]. 计算机应用与软件, 2019, 36(12): 195-200. [46] Elharrouss O, Almaadeed N, Al-Maadeed S, et al. Image inpainting: A review[J]. Neural Processing Letters, 2020, 51: 2007-2028. [47] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[J]. Advances in neural information processing systems, 2015, 28: 2017-2025. [48] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [49] Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. 2016: 694-711. [50] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. science, 2006, 313(5786): 504-507. [51] Jain V, Murray J F, Roth F, et al. Supervised learning of image restoration with convolutional networks[C]//Proceedings of the 2007 IEEE 11th International Conference on Computer Vision. 2007: 1-8. [52] Iizuka S, Simo-Serra E, Ishikawa H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics (ToG), 2017, 36(4): 1-14. [53] Yu J, Lin Z, Yang J, et al. Free-form image inpainting with gated convolution[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 4471-4480. [54] Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks[C]//Proceedings of the International conference on machine learning. 2019: 7354-7363. [55] Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10076-10085. [56] Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models[J]. Advances in neural information processing systems, 2019, 32(7): 68-80. [57] Vaswani A, Ramachandran P, Srinivas A, et al. Scaling local self-attention for parameter efficient visual backbones[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12894-12904. [58] Cordonnier J-B, Loukas A, Jaggi M. On the relationship between self-attention and convolutional layers[J]. arXiv preprint arXiv:191103584, 2019: 1-18. [59] Zeng Y, Fu J, Chao H, et al. Learning pyramid-context encoder network for high-quality image inpainting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1486-1494. [60] Liu Z, Luo P, Wang X, et al. Large-scale celebfaces attributes (celeba) dataset[J]. Retrieved August, 2018, 15(2018): 11-12. [61] Li J, Wang N, Zhang L, et al. Recurrent feature reasoning for image inpainting[C]//Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 7760-7768. [62] Karras T, Aila T, Laine S, et al. Progressive growing of gans for improved quality, stability, and variation[J]. arXiv preprint arXiv:171010196, 2017: 1-26. [63] Liu G, Reda F A, Shih K J, et al. Image inpainting for irregular holes using partial convolutions[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 85-100. [64] Yu J, Lin Z, Yang J, et al. Generative image inpainting with contextual attention[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5505-5514. [65] Yang C, Lu X, Lin Z, et al. High-resolution image inpainting using multi-scale neural patch synthesis[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 6721-6729. [66] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022. [67] Arnab A, Dehghani M, Heigold G, et al. Vivit: A video vision transformer[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 6836-6846. [68] Parmar N, Vaswani A, Uszkoreit J, et al. Image transformer[C]//Proceedings of the International conference on machine learning. 2018: 4055-4064. [69] Han K, Wang Y, Chen H, et al. A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110. [70] Neimark D, Bar O, Zohar M, et al. Video transformer network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 3163-3172. [71] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the Advances in neural information processing systems. 2017, 30: 5998–6008. ﹀
中图分类号：	TP391
开放日期：	2023-06-15

附件下载