查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于Transformer的图像增强方法研究
姓名：	安金鹏
学号：	20208223069
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	可视化技术及应用
第一导师姓名：	马天
第一导师单位：	西安科技大学
论文提交日期：	2023-06-13
论文答辩日期：	2023-06-05
论文外文题名：	Study on Image Enhancement Method based on Transformer
论文中文关键词：	图像增强 ; 低光照图像 ; Transformer ; 生成对抗网络 ; 曲线调整
论文外文关键词：	Image enhancement ; low-light image ; Transformer ; generative adversarial network ; curve adjustment
论文中文摘要：	︿图像是人类获取信息的重要途径，但是由于光照条件和设备性能等的限制，会导致所拍摄图像存在过暗、细节不清晰、颜色失真等问题，因此，采用图像增强技术进行处理是必要的。本文以图像增强领域的两个子问题：图像修饰和低光照图像增强为研究对象。针对目前已有的图像增强方法存在网络模型设计复杂、难以处理光照不均衡图像、训练数据受限等问题，从全监督和半监督的角度出发，分别提出了两种基于Transformer网络架构的图像增强方法，以解决上述问题。本文的主要工作和创新点如下： (1)针对图像增强网络结构设计复杂导致实时性较差的问题，提出了一种基于曲线调整的Transformer全监督图像增强方法(Transformer Photo Enhancer, TPE)。TPE采用Transformer网络架构作为编码器的主干网络，并结合两阶段曲线调整函数，以实现图像增强。首先，编码器通过轻量化的自注意力机制获取调整参数，实现图像到调整参数的映射，从而实现在任意分辨率下的图像增强，并提高超大分辨率图像处理的实时性。同时，通过分析Transformer超参数设置对增强结果的影响，设计了一种轻量化模型，以此来提高模型的效率。其次，两阶段曲线调整策略增强了曲线调整函数的调整能力，第二阶段曲线调整用于对第一阶段增强结果进行微调，使得该方法兼具全局增强与局部微调的能力。最后，在MIT Adobe FiveK和LOL数据集上进行了定量和定性实验分析，结果表明该方法在PSNR、SSIM、LPIPS等评价指标上提升显著，能够有效提高图像的亮度和对比度，并恢复前景和背景中更多细节。 (2)针对成对数据集获取成本较高、光照分布不均衡图像增强难度较大和单一图像分块策略容易产生十字形伪影的问题，提出了一种基于生成对抗网络结合Transformer的半监督图像增强方法(Semi-supervised TransGAN Image Enhancer, STGIE)。STGIE采用Transformer网络架构作为生成对抗网络的主干网络，并通过曲线调整函数来增强图像质量和细节。首先，使用生成对抗网络通过非成对数据集进行半监督学习，以克服成对数据集获取困难的问题。其次，使用灰度图作为生成器网络的光照注意力图，以平衡增强结果在不同区域的曝光水平。最后，为避免单一图像分块策略形成固定分割边界，采用生成器和鉴别器网络交叉使用均等裁剪和滑动窗口裁剪策略的方法，以增强网络提取特征的能力并解决十字形伪影问题。此外，为了提高生成器对图像细节的感知能力，引入了一种重建损失，用于辅助生成器生成更加真实和自然的图像。在MIT Adobe FiveK、LOL、NPE和MEF等多种数据集进行了定量和定性实验分析，结果表明该方法在NIQE和用户主观评分两个评价指标上提升显著，并且对图像的亮度和色彩调整更加真实自然，特别是在处理光照不均衡的图像方面表现更好。综上所述，本文研究了基于Transformer网络架构的图像增强方法，为光照不足和光照失衡情况下的图像增强方法提供了新的思路和方法。此外，通过曲线调整函数可以增强图像细节信息和色彩分布，从而提高了图像的可读性和识别度，更好地满足实际应用的需求。﹀
论文外文摘要：	︿ Images are an important means for humans to obtain information, but due to limitations such as lighting conditions and equipment performance, captured images may have problems such as being too dark, unclear details, and color distortion, which require image enhancement techniques for repair. This paper focuses on two sub-problems in the field of image enhancement: image retouching and low-light image enhancement. In response to the problems of complex network model design, difficulty in processing unevenly lit images, and limited training data in existing image enhancement methods, two Transformer network-based image enhancement methods are proposed from the perspectives of full supervision and semi-supervision to solve these problems. The main contributions and innovations of this paper are as follows: (1) To address the problem of poor real-time performance caused by complex image enhancement network structure design, a Transformer-based full-supervision image enhancement method (Transformer Photo Enhancer, TPE) is proposed based on curve adjustment. TPE uses the Transformer network architecture as the backbone of the encoder, and combines a two-stage curve adjustment function to achieve image enhancement. Firstly, the encoder uses a lightweight self-attention mechanism to obtain adjustment parameters and maps the image to the adjustment parameters, thereby achieving image enhancement at any resolution and improving the real-time performance of processing ultra-high-resolution images. Meanwhile, by analyzing the impact of Transformer hyperparameter settings on enhancement results, a lightweight model is designed to improve model efficiency. Secondly, the two-stage curve adjustment strategy enhances the adjustment ability of the curve adjustment function, and the second-stage curve adjustment is used to fine-tune the first-stage enhancement results, making this method capable of both global enhancement and local fine-tuning. Finally, quantitative and qualitative experiments on the MIT Adobe FiveK and LOL datasets show that this method significantly improves evaluation metrics such as PSNR, SSIM, and LPIPS, effectively improves image brightness and contrast, and restores more details in the foreground and background. (2) To address the problems of high cost for acquiring paired datasets, difficulty in enhancing unevenly lit images, and the tendency for a single image partitioning strategy to produce cross-shaped artifacts, a semi-supervised image enhancement method (Semi-supervised TransGAN Image Enhancer, STGIE) based on a combination of generative adversarial networks and Transformer is proposed. STGIE uses the Transformer network architecture as the backbone of the generative adversarial network and uses curve adjustment functions to enhance image quality and details. Firstly, a generative adversarial network is used for semi-supervised learning with non-paired datasets to overcome the difficulty of obtaining paired datasets. Secondly, a grayscale image is used as the illumination attention map of the generator network to balance the exposure levels of the enhanced results in different regions. Finally, to avoid fixed segmentation boundaries formed by a single image partitioning strategy, a method of using equal cropping and sliding window cropping strategies for the generator and discriminator networks is adopted to enhance the feature extraction ability of the network and solve the problem of cross-shaped artifacts. In addition, to improve the generator's perception of image details, a reconstruction loss is introduced to assist the generator in generating more realistic and natural images. Quantitative and qualitative experiments on multiple datasets such as MIT Adobe FiveK, LOL, NPE, and MEF show that this method significantly improves evaluation metrics such as NIQE and user subjective scores, and the brightness and color adjustments of the images are more realistic and natural, especially in processing unevenly lit images. In summary, this paper studies image enhancement methods based on the Transformer network architecture, providing new ideas and methods for image enhancement in situations of insufficient or imbalanced lighting. In addition, curve adjustment functions can enhance image detail information and color distribution, thereby improving the readability and recognizability of images and better meeting the needs of practical applications. ﹀
参考文献：	︿ [1]江泽涛, 覃露露, 秦嘉奇, 张少钦. 一种基于MDARNet的低照度图像增强方法[J]. 软件学报, 2021, 032(012): 3977–3991. [2]卢宏涛, 张秦川. 深度卷积神经网络在计算机视觉中的应用研究综述[J]. 数据采集与处理, 2016, 31(01): 1–17. [3]胡琼, 汪荣贵, 胡韦伟. 基于直方图分割的彩色图像增强算法[J]. 中国图象图形学报, 2009, 14(09): 1776–1781. [4]Guo X, Li Y, Ling H. LIME: Low-light image enhancement via illumination map estimation[J]. IEEE Transactions on image processing, 2016, 26(2): 982–993. [5]Pizer S M, Amburn E P, Austin J D. Adaptive histogram equalization and its variations[J]. Computer Vision Graphics & Image Processing, 1987, 39(3): 355–368. [6]Zuiderveld K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994: 474–485. [7]Reinhard E, Ashikhmin M, Gooch B. Color transfer between images[J]. IEEE Computer Graphics & Applications, 2002, 21(5): 34–41. [8]Cohen-Or D, Sorkine O, Gal R. Color harmonization[J]. ACM Transactions on Graphics, 2006, 25(3): 624–630. [9]Tomasi C, Manduchi R. Bilateral filtering for gray and color images[C]//Sixth international conference on computer vision (IEEE Cat. No. 98CH36271). IEEE, 1998: 839–846. [10]肖春霞, 聂勇伟, 黄先锋. 基于联合双边滤波的纹理合成上采样算法[J]. 计算机学报, 2009, 32(02): 241–251. [11]Paris S, Durand F. A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach[J]. International Journal of Computer Vision, 2009, 81(1): 24–52. [12]Bracewell R N. The Fourier transform and its applications[M]// Bracewell R N. American Association of Physics Teachers. New York: McGraw-Hill, 1986: 267–272. [13]Chen W H, Smith C, Fralick S. A Fast Computational Algorithm for the Discrete Cosine Transform[J]. IEEE Transactions on Communications, 2003, 25(9): 1004–1009. [14]Ahmed N N, Natarajan T, Rao K R. Discrete Cosine Transform[J]. IEEE Transactions on Computers, 2006, 100(1): 90–93. [15]Graps A. An introduction to wavelets[J]. IEEE computational science and engineering, 1995, 2(2): 50–61. [16]汪雪林, 韩华, 彭思龙. 基于小波域局部高斯模型的图像复原[J]. 软件学报， 2004, 15(3): 443–450. [17]Li J P, Jing Z, Wickerhauser V, et al. Wavelet Analysis and Its Applications[M]. Springer, 2001. [18]Prasad L, Iyengar S S. Wavelet analysis with applications to image processing[M]. CRC press, 1997. [19]Zuiderveld K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994: 474–485. [20]Johnstone D I M. Ideal Spatial Adaptation by Wavelet Shrinkage[J]. Biometrika, 1994, 81(3): 425–455. [21]Jobson D J, Rahman Z, Woodell G A. A multiscale retinex for bridging the gap between color images and the human observation of scenes[J]. IEEE Transactions on Image Processing, 2002, 6(7): 965–976. [22]张尚伟, 曾平, 罗雪梅. 具有细节补偿和色彩恢复的多尺度Retinex色调映射算法[J]. 西安交通大学学报, 2012, 46(4): 32–37. [23]Land E H, Mccann J. Lightness and Retinex Theory[J]. Journal of the Optical Society of America, 1971, 61(1): 1–11. [24]Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention. Proceedings, Part III 18. Springer International Publishing, 2015: 234–241. [25]江泽涛, 伍旭, 张少钦. 一种基于 MR-VAE 的低照度图像增强方法[J]. 计算机学报, 2020, 43(7): 1328–1339. [26]Lv F, Li Y, Lu F. Attention guided low-light image enhancement with a large scale low-light simulation dataset[J]. International Journal of Computer Vision, 2021, 129(7): 2175–2193. [27]Wang Y, Li X, Xu K, et al. Neural Color Operators for Sequential Image Retouching[C]//Computer Vision–ECCV 2022: 17th European Conference. Proceedings, Part XIX. Springer Nature Switzerland, 2022: 38–55. [28]Moran S, Marza P, McDonagh S, et al. Deeplpf: Deep local parametric filters for image enhancement[C]//CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 12826–12835. [29]Gharbi M, Chen J, Barron J T, et al. Deep bilateral learning for real-time image enhancement[J]. ACM Transactions on Graphics (TOG), 2017, 36(4): 1–12. [30]Zhang Y, Zhang J, Guo X. Kindling the darkness: A practical low-light image enhancer[C]//Proceedings of the 27th ACM international conference on multimedia. 2019: 1632–1640. [31]Zhang Y, Guo X, Ma J, et al. Beyond brightening low-light images[J]. International Journal of Computer Vision, 2021, 129: 1013–1037. [32]Liu R, Ma L, Zhang J, et al. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement[C]//CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2021: 10561–10570. [33]Wei C, Wang W, Yang W, et al. Deep retinex decomposition for low-light enhancement[J]. arXiv preprint arXiv: 1808.04560, 2018. [34]Wu W, Weng J, Zhang P, et al. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement[C]//CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 5901–5910. [35]Hao S, Han X, Guo Y, et al. Low-light image enhancement with semi-decoupled decomposition[J]. IEEE transactions on multimedia, 2020, 22(12): 3025–3038. [36]Xu K, Yang X, Yin B, et al. Learning to restore low-light images via decomposition-and-enhancement[C]//CVF conference on computer vision and pattern recognition. IEEE, 2020: 2281–2290. [37]Wang R, Zhang Q, Fu C W, et al. Underexposed photo enhancement using deep illumination estimation[C]//CVF conference on computer vision and pattern recognition. IEEE, 2019: 6849–6857. [38]Wang Y, Wan R, Yang W, et al. Low-light image enhancement with normalizing flow[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2022, 36(3): 2604–2612. [39]Guo C, Li C, Guo J, et al. Zero-reference deep curve estimation for low-light image enhancement[C]//CVF conference on computer vision and pattern recognition. IEEE, 2020: 1780–1789. [40]Moran S, McDonagh S, Slabaugh G. Curl: Neural curve layers for global image enhancement[C]//International Conference on Pattern Recognition. IEEE, 2021: 9796–9803. [41]Li C, Guo C, Ai Q, et al. Flexible piecewise curves estimation for photo enhancement[J]. arXiv preprint arXiv: 2010.13412, 2020. [42]Zeng H, Cai J, Li L, et al. Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 2058–2073. [43]Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1125–1134. [44]Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2223–2232. [45]Liu M Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks[J]. Advances in neural information processing systems, 2017, 30. [46]Zhang R, Pfister T, Li J. Harmonic unpaired image-to-image translation[J]. arXiv preprint arXiv: 1902.09727, 2019. [47]Chen Y S, Wang Y C, Kao M H, et al. Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans[C]//Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 6306–6314. [48]Jiang Y, Gong X, Liu D, et al. Enlightengan: Deep light enhancement without paired supervision[J]. IEEE transactions on image processing, 2021, 30: 2340–2349. [49]Hu Y, He H, Xu C, et al. Exposure: A white-box photo post-processing framework[J]. ACM Transactions on Graphics (TOG), 2018, 37(2): 1–17. [50]Park J, Lee J Y, Yoo D, et al. Distort-and-recover: Color enhancement using deep reinforcement learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5928–5936. [51]Kosugi S, Yamasaki T. Unpaired image enhancement featuring reinforcement-learning-controlled image editing software[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 11296–11303. [52]Furuta R, Inoue N, Yamasaki T. Pixelrl: Fully convolutional network with reinforcement learning for image processing[J]. IEEE Transactions on Multimedia, 2019, 22(7): 1704–1719. [53]Zhang R, Guo L, Huang S, et al. ReLLIE: Deep reinforcement learning for customized low-light image enhancement[C]//Proceedings of the 29th ACM international conference on multimedia. 2021: 2429–2437. [54]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. [55]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010.11929, 2020. [56]Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//CVF conference on computer vision and pattern recognition. IEEE, 2021: 6881–6890. [57]Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//CVF international conference on computer vision. IEEE, 2021: 10012–10022. [58]Esser P, Rombach R, Ommer B. Taming transformers for high-resolution image synthesis[C]//CVF conference on computer vision and pattern recognition. IEEE, 2021: 12873–12883. [59]Jiang Y, Chang S, Wang Z. Transgan: Two pure transformers can make one strong gan, and that can scale up[J]. Advances in Neural Information Processing Systems, 2021, 34: 14745–14758. [60]Zamir S W, Arora A, Khan S, et al. Restormer: Efficient transformer for high-resolution image restoration[C]//CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 5728–5739. [61]Zhang Z, Jiang Y, Jiang J, et al. Star: A structure-aware lightweight transformer for real-time image enhancement[C]//CVF International Conference on Computer Vision. IEEE, 2021: 4106–4115. [62]Bychkovsky V, Paris S, Chan E, et al. Learning photographic global tonal adjustment with a database of input/output image pairs[C]//CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2011: 97–104. [63]Cai J, Gu S, Zhang L. Learning a deep single image contrast enhancer from multi-exposure images[J]. IEEE Transactions on Image Processing, 2018, 27(4): 2049–2062. [64]Wang S, Zheng J, Hu H M, et al. Naturalness preserved enhancement algorithm for non-uniform illumination images[J]. IEEE transactions on image processing, 2013, 22(9): 3538–3548. [65]Ma K, Zeng K, Wang Z. Perceptual quality assessment for multi-exposure image fusion[J]. IEEE Transactions on Image Processing, 2015, 24(11): 3345–3356. [66]Lee C, Lee C, Kim C S. Contrast enhancement based on layered difference representation of 2D histograms[J]. IEEE transactions on image processing, 2013, 22(12): 5372–5384. [67]Buades A, Coll B, Morel J M. A non-local algorithm for image denoising[C]// Computer Vision and Pattern Recognition. San Diego: IEEE, 2005, 2: 60–65. [68]Song Y, Qian H, Du X. Starenhancer: Learning real-time and style-aware image enhancement[C]//CVF International Conference on Computer Vision. IEEE, 2021: 4126–4135. [69]Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014. ﹀
中图分类号：	TP391.41
开放日期：	2024-06-13

附件下载