查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于生成对抗网络的图像色彩渲染算法研究
姓名：	张敏
学号：	19208049009
保密级别：	公开
论文语种：	chi
学科代码：	0812
学科名称：	工学 - 计算机科学与技术（可授工学、理学学位）
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2022
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机科学与技术
研究方向：	图形图像处理
第一导师姓名：	李洪安
第一导师单位：	西安科技大学
论文提交日期：	2022-06-20
论文答辩日期：	2022-06-07
论文外文题名：	Research on Image Color Rendering Algorithm Based on Generative Adversarial Networks
论文中文关键词：	图像色彩渲染 ; 生成对抗网络 ; Gabor滤波器 ; 损失函数 ; 通道注意力机制
论文外文关键词：	Image color rendering ; Generative adversarial networks ; Gabor filter ; Loss function ; Channel attention mechanism
论文中文摘要：	︿图像色彩渲染是指将目标图像中的灰度值像素用某种色彩表示，以提高视觉效果，关注重要信息，是图像处理领域中图像增强的重要技术之一，在产品外观颜色设计、增强现实、数字娱乐等领域有着广泛应用。目前常见的色彩渲染方法面临颜色越界、边界模糊等细节问题，主要由于传统方法存在人工成本高、参考图像质量要求高等局限性，基于深度学习的方法存在上采样导致的特征丢失、网络模型预训练困难、梯度消失或爆炸、欠拟合或过拟合、鲁棒性差等问题。针对以上问题，论文分别从算法鲁棒性和泛化性、生成对抗网络的稳定性、轻量化与高性能三个方面展开研究，主要包括以下内容：（1）针对低质量图像下色彩渲染算法存在的鲁棒性差、泛化性差问题，研究了一种基于Gabor滤波器的改进pix2pix模型。首先利用Gabor滤波器提取多尺度多方向的纹理特征图，对图像进行预处理；其次基于pix2pix模型使用最小二乘损失函数和梯度惩罚项，保证生成样本的多样性并稳定网络模型训练；最后使用生成器渲染低质量图像，实验表明该方法在一定程度上降低了色彩渲染过程中光照变化和噪声对图像的影响。（2）针对生成对抗网络训练不稳定的问题，研究了一种铰链-交叉熵GAN模型用于图像色彩渲染。首先改进了自注意力机制，用于有效捕捉远距离的特征依赖；其次设计了铰链-交叉熵损失函数强化训练效果，使得损失始终保持最优状态，以稳固模型；最后提出铰链-交叉熵GAN模型，在DIV2K和COCO数据集上实现图像自动渲染，实验表明图像渲染质量和效果均有提高。（3）针对色彩渲染网络模型轻量化和高性能难以平衡的问题，研究了一种频域通道注意力GAN。首先，全局平均池化是离散余弦变换的最低频域分量，为了将其余的频域分量整合到通道注意力机制，设计了频域通道注意力机制，既减少了参数量和计算量，又更好地捕获了丰富的输入模式信息；其次，将U-Net网络融合频域通道注意力机制提出频域通道注意力GAN，在模型高性能的同时降低了模型复杂性；最后，在Jittor框架下实现了该方法，相比PyTorch框架，节省了计算机资源开销。﹀
论文外文摘要：	︿ Image color rendering refers to the expression of grayscale pixels in the target image with a certain color to improve the visual effect and highlight useful information. It is one of the important image enhancement technologies in the field of image processing, and has been widely used in product appearance color design, augmented reality, digital entertainment and other fields. At present, common color rendering methods are faced with details such as color boundary crossing and fuzzy boundary. The traditional color rendering method has the limitation of high labor cost and high reference image quality. Deep learning-based color rendering methods have some problems, such as feature loss caused by upsampling, difficulty in model pre-training, gradient disappearance or explosion, under-fitting or over-fitting, and poor robustness. To solve the above problems, this thesis studies the robustness and generalization of the algorithm, the stability of generative adversarial networks, and the lightweight and high performance of the algorithm respectively. It mainly includes the following contents: （1）Aiming at the problems of poor robustness and generalization of color rendering algorithm in low-quality images, an improved pix2pix model method based on Gabor filter is studied. Firstly, Gabor filter is used to extract multi-scale and multi-direction texture feature map, and the image is preprocessed. Secondly, based on pix2pix model, the least square loss function and gradient penalty term are used to ensure the diversity of generated samples and stabilize the network model training. Finally, the generator is used to render the robust image, which reduces the influence of light change and noise on the image to a certain extent. (2) To solve the problem of poor stability of generative adversarial networks, a Hinge-Cross-Entropy GAN model is studied. Firstly, the self-attention mechanism is improved to effectively capture feature dependence at a distance. Secondly, the Hinge-Cross-Entropy loss function is designed to strengthen the training effect, so that the loss is always in the optimal state, which is used to stabilize the model. Finally, a Hinge-Cross-Entropy GAN is proposed to achieve automatic image rendering on DIV2K and COCO datasets. Experiments show that the rendering quality and effect are improved. (3) Aiming at the difficulty in balancing the lightweight and rendering accuracy of color rendering models, a frequence channel attention GAN model is studied. Firstly, global average pooling is the lowest frequency domain component of discrete cosine transform. In order to integrate the remaining frequency domain components into channel attention mechanism, a frequency channel attention mechanism is designed. It not only reduces the number of parameters and the amount of computation, but also better captures the rich input mode information. Secondly, frequency channel attention GAN is proposed by combining the frequency channel attention mechanism of U-Net network, which improves the model performance and reduce the model complexity. Finally, the method is implemented in Jittor framework, which saves the cost of computer resources compared with PyTorch framework. ﹀
参考文献：	︿ [1]陈炳权, 刘宏立, 孟凡斌. 数字图像处理技术的现状及其发展方向[J]. 吉首大学学报自然科学版, 2009, 30(1): 63-70. [2]姚敏. 数字图像处理. 第2版[M]. 机械工业出版社, 2012: 70-80. [3]陈圆圆, 刘惠义. 基于生成对抗网络的破损老照片修复[J]. 计算机与现代化, 2021(04): 42-47. [4]徐明远, 崔华, 张立恒. 基于改进CNN的公交车内拥挤状态识别[J]. 计算机技术与发展, 2020(05): 1-8. [5]马其鹏, 谢林柏, 彭力. 一种基于改进的卷积神经网络在医学影像分割上的应用[J].激光与光电子学进展, 2020, 57(673): 190-196. [6]涂鑫. 基于卷积神经网络的课堂人脸打卡算法[J]. 现代计算机, 2019(29): 39-43. [7]黄冠婷. 灰度图像彩色化算法研究[D]. 长春理工大学, 2019. [8]成梓锐. 基于深度学习的图像着色算法研究与实现[D]. 中北大学, 2018. [9]崔永成. 基于生成对抗网络的黑白电影染色系统[D]. 东北师范大学, 2020. [10]赵绍良. 基于生成对抗网络的图像彩色化算法[D]. 大连理工大学, 2019. [11]陈杰. 基于生成对抗网络的图像风格迁移算法研究[D]. 西安工业大学, 2020. [12]Reinhard E, Adhikhmin M, Gooch B, et al. Color transfer between images[J]. IEEE Computer graphics and applications, 2001, 21(5): 34-41. [13]Welsh T, Ashikhmin M, Mueller K. Transferring Color to Greyscale Images[J]. ACM Transactions on Graphics, 2002, 21(3): 277-280. [14]Levin A, Lischinski D, Weiss Y. Colorization using optimization[M]. ACM SIG-GRAPH 2004 Papers. 2004: 689-694. [15]李洪安, 张敏, 杜卓明, 李占利, 康宝生. 一种基于分块特征的交互式图像色彩编辑方法[J]. 红外与激光工程, 2019, 48(12): 293-298. [16]李洪安, 郑峭雪, 张婧等. 结合Pix2Pix生成对抗网络的灰度图像着色方法[J]. 计算机辅助设计与图形学学报, 2021, 33(6): 10. [17]朱黎博, 孙韶媛, 谷小婧, 夏如镜, 叶茂锹. 基于色彩传递与扩展的图像着色算法[J].中国图象图形学报, 2010, 15(02): 200-205. [18]徐铭蔚, 李郁峰, 陈念年等. 多尺度融合与非线性颜色传递的微光与红外图像染色[J]. 红外技术, 2012, 34(12): 722-728. [19]曹丽琴, 商永星, 刘婷婷, 李治江, 马爱龙. 局部自适应的灰度图像彩色化[J]. 中国图象图形学报, 2019, 24(08): 1249-1257. [20]Li B, Lai Y K, John M, et al. Automatic Example-Based Image Colorization Using Location-Aware Cross-Scale Matching[J]. IEEE Transactions on Image Processing, 2019, 28(9): 4606-4619. [21]Deshpande A, Lu J, Yeh M C, et al. Learning diverse image colorization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6837-6845. [22]Wan Z, Zhang B, Chen D, et al. Bringing Old Photos Back to Life[J]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 1-11. [23]陈先昌. 基于卷积神经网络的深度学习算法与应用研究[D]. 浙江工商大学, 2013. [24]徐中辉, 吕维帅. 基于卷积神经网络的图像着色[J]. 电子技术应用, 2018, 44(10): 19-22. [25]Larsson G, Maire M, Shakhnarovich G. Learning representations for automatic colorization[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 577-593. [26]Iizuka S, Simo-Serra E, Ishikawa H. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification[J]. ACM Transactions on Graphics (TOG), 2016, 35(4): 110-121. [27]Zhang R, Isola P, Efros A A. Colorful image colorization[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 649-666. [28]Sangkloy P, Lu J, Fang C, et al. Scribbler: Controlling deep image synthesis with sketch and color[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5400-5409. [29]He M, Chen D, Liao J, et al. Deep exemplar-based colorization[J]. ACM Transactions on Graphics(TOG), 2018, 37(4): 1-16. [30]Guadarrama S, Dahl R, Bieber D, et al. Pixcolor: Pixel recursive colorization[J]. arXiv preprint arXiv:1705.07208, 2017: 1-17. [31]张娜, 秦品乐, 曾建潮, 李启. 基于密集神经网络的灰度图像着色算法[J]. 计算机应用, 2019, 39(06): 1816-1823. [32]Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Networks[J]. Advances in Neural Information Processing Systems, 2014, 3: 2672-2680. [33]Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014: 1-7. [34]Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efro. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1125-1134. [35]Zhu J Y, Park T, Isola P, et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[J]. IEEE, 2017: 1-18. [36]Arjovsky M, Chintala S, Bottou L. Wasserstein gan[J]. arXiv preprint arXiv:1701.07875, 2017: 1-32. [37]Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[C]//Advances in neural information processing systems. 2017: 5767-5777. [38]Zhang H, Goodfellow I, Metaxas D, et al. Self-Attention Generative Adversarial Networks[J]. 2018: 1-10. [39]Mao X, Li Q, Xie H, et al. Least squares generative adversarial networks[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2794-2802. [40]Su J W, Chu H K, Huang J B. Instance-aware image colorization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 7968-7977. [41]Wu Y, Wang X, Li Y, et al. Towards Vivid and Diverse Image Colorization with Generative Color Prior[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 14377-14386. [42]CycleGAN/datasets. summer2winter. https://people.eecs.berkeley.edu/˜taesung_park/ CycleGAN/datasets, 2020. [43]Agustsson E, Timofte R. Ntire 2017 challenge on single image super-resolution: Dataset and study[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017: 126-135. [44]Peng C, Xiao T, Li Z, et al. Megdet: A large mini-batch object detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6181-6189. [45]Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [46]高晓兴, 曹丽, 常桂然. 基于Gabor滤波器的图像边缘检测算法[J]. 计算机应用, 2008(10): 2625-2627. [47]魏祥坡, 余旭初等. CNN和三维Gabor滤波器的高光谱图像分类[J]. 计算机辅助设计与图形学学报, 2020, 32(01): 90-98. [48]林喜, 林喜荣, 戴晓清. 二维Gabor滤波器组的设计及虹膜识别的实现[J]. 电视技术, 2011, 35(19): 109-112. [49]Shang W, Sohn K, Almeida D, et al. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units[J]. 2016: 2217-2225. [50]Qin Y, Mitra N, Wonka P. How does lipschitz regularization influence GAN training?[C]//European Conference on Computer Vision. Springer, Cham, 2020: 310-326. [51]Rosasco L, Vito E D, Caponnetto A, Piana M and Verri A. Are loss functions all the same?[J]. Neural Computation, 2014, 16(5): 1063-1076. [52]刘勍, 马义德, 钱志柏. 一种基于交叉熵的改进型PCNN图像自动分割新方法[J]. 中国图象图形学报, 2005, 010(005): 579-584. [53]Lin M, Chen Q, Yan S. Network in network[J]. arXiv preprint arXiv:1312.4400, 2013: 1-10. [54]Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [55]Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 11534-11542. [56]Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19. [57]Ehrlich M, Davis L S. Deep residual learning in the jpeg transform domain[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3484-3493. [58]Xu K, Qin M, Sun F, et al. Learning in the frequency domain[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1740-1749. [59]Liu Z, Xu J, Peng X, et al. Frequency-domain dynamic pruning for convolutional neural networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018: 1051-1061. 于万波, 王香香, 王大庆. 基于离散余弦变换基函数迭代的人脸图像识别[J]. 图学学报, 2020, 41: 149(01): 91-95. ﹀
中图分类号：	TP391
开放日期：	2022-06-20

附件下载