查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于改进GAN与多尺度对齐融合的超分辨率重建算法研究
姓名：	杨庆豪
学号：	21208223051
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	图像处理
第一导师姓名：	厍向阳
第一导师单位：	西安科技大学
论文提交日期：	2024-06-18
论文答辩日期：	2024-05-30
论文外文题名：	Research on Super-Resolution Reconstruction Algorithm Based on Improved GAN and Multi-Scale Alignment Fusion
论文中文关键词：	超分辨率重建 ; 深度学习 ; 密集残差 ; 多尺度 ; 特征融合
论文外文关键词：	Super-resolution Reconstruction ; Deep Learning ; Dense Residual ; Multi-scale ; Feature Fusion
论文中文摘要：	︿超分辨率研究是计算视觉中的经典问题，随着成像技术的进步，对高清晰度图像和视频的需求激增。超分辨率技术能有效重建含丰富纹理细节的高分辨率图像和视频，并因其低成本和灵活性，应用范围日益广泛，吸引了众多学者关注。因此，本文对基于深度学习的超分辨率重建算法深入研究，论文主要的研究内容以及创新点总结如下：（1）针对单帧图像超分辨率重建算法中图像边缘平滑、伪影、以及高频信息提取不足等问题，提出了一种基于改进增强型生成对抗网络的图像超分辨率重建算法。首先，引入多尺度深度可分离特征提取模块，多尺度结构有助于捕捉不同尺度的图像特征，深度可分离卷积降低了模型参数量和计算量，提高了网络训练的稳定性。其次，引入了多尺度大内核注意力构建多尺度深度可分离密集连接模块，通过密集连接融合卷积层输入，更好的结合局部感知和远程依赖，充分提取图像特征。最后，利用多级残差网络结合大内核注意力尾部模块，进一步优化了高频细节和关键信息的整合，促进深层网络的训练，并显著提升了图像重建质量。实验验证了该算法的有效性与可行性。（2）针对视频超分辨率重建算法中无法充分提取图像特征信息、特征对齐精度不高，以及特征融合中时序信息提取不足的问题，提出了一种基于多尺度融合和轴向可变形卷积的视频超分辨率重建算法。首先，采用多尺度特征对齐策略，对目标帧和相邻帧在不同尺度上执行对齐操作，有效提取局部和全局特征。其次，引入轴向可变形卷积对齐块，维持了局部和全局信息的平衡，优化了偏移量的预测，保证了不同尺度上目标帧和相邻帧的有效对齐。最后，采用多尺度区域关注特征融合策略，加强了对视频帧内复杂纹理区域的关注，并以不同尺度的对齐特征进行融合，增强了对齐帧的时序信息补充能力，从而提升了重建效果。实验验证了该算法的有效性与可行性。关键词：超分辨率重建；深度学习；密集残差；多尺度；特征融合研究类型：应用研究﹀
论文外文摘要：	︿ Super-resolution research is a classic problem in computer vision, and with advancements in imaging technology, the demand for high-definition images and videos has surged. Super-resolution techniques can effectively reconstruct high-resolution images and videos with rich texture details, and due to their low cost and flexibility, their application scope is increasingly broad, attracting attention from numerous scholars. Therefore, this paper conducts an in-depth study on super-resolution reconstruction algorithms based on deep learning, summarizing the main research content and innovation points as follows: (1) To address issues in the single-frame image super-resolution reconstruction field, such as edge smoothing, artifacts, and insufficient high-frequency information extraction, a super-resolution reconstruction algorithm based on an improved enhanced generative adversarial network is proposed. Firstly, a multi-scale deep separable feature extraction module is introduced, where the multi-scale structure aids in capturing image features at different scales, and the use of deep separable convolution reduces model parameter count and computational load, enhancing network training stability. Secondly, multi-scale large kernel attention is introduced to construct a multi-scale deep separable densely connected module, merging convolutional layer inputs through dense connections to better combine local perception and distant dependencies, thoroughly extracting image features. Finally, utilizing a multi-level residual network combined with a large kernel attention tail module further optimizes the integration of high-frequency details and key information, facilitating deep network training, and significantly improving image reconstruction quality. Experimental results demonstrate significant improvements in both objective and subjective evaluation metrics, verifying the effectiveness and feasibility of the proposed algorithm. (2) To solve issues in video super-resolution reconstruction algorithms, such as insufficient feature extraction, low feature alignment accuracy, and inadequate temporal information extraction in feature fusion, a video super-resolution reconstruction algorithm based on multi-scale fusion and axial deformable convolution is proposed. The algorithm optimizes alignment and fusion modules, starting with a multi-scale feature alignment strategy that performs alignment operations on target frames and adjacent frames at different scales, effectively extracting local and global features. Then, an axial deformable convolution alignment block is introduced, maintaining a balance between local and global information, optimizing offset prediction, and ensuring effective alignment of target frames and adjacent frames at different scales. Finally, a multi-scale regional attention feature fusion strategy is employed, enhancing focus on complex texture regions within video frames and merging alignment features at different scales to bolster the temporal information supplementation capability of aligned frames, thereby improving reconstruction effects. Experimental results show significant enhancements in both objective and subjective evaluation metrics, validating the effectiveness and feasibility of the proposed algorithm. Key words: Super-resolution Reconstruction; Deep Learning; Dense Residual; Multi-scale; Feature Fusion Thesis: Application Research ﹀
参考文献：	︿ [1] Harris J L. Diffraction and resolving power[J]. Journal of the Optical Society of America, 1964, 54(7): 931-936. [2] 钟梦圆, 姜麟. 超分辨率图像重建算法综述[J]. 计算机科学与探索, 2022, 16(5):972-990. [3] 李佳星, 赵勇先, 王京华. 基于深度学习的单幅图像超分辨率重建算法综述[J]. 自动化学报, 2021, 47(10): 2341-2363. [4] 黄健, 赵元元, 郭苹, 等. 深度学习的单幅图像超分辨率重建方法综述[J].计算机工程与应用, 2021, 57(18):13-23. [5] 吴靖,叶晓晶,黄峰,等.基于深度学习的单帧图像超分辨率重建综述[J].电子学报,2022,50(09):2265-2294. [6] 于亚龙, 穆远彪. 插值算法的研究[J]. 现代计算机: 中旬刊, 2014 (2): 32-35. [7] 李艳玲. 图像的最近邻缩放原理及实现[J]. 长治学院学报, 2016, 33(5): 31-32. [8] 王森, 杨克俭. 基于双线性插值的图像缩放算法的研究与实现[J]. 自动化技术与应用, 2008, 27(7): 44-45. [9] 王会鹏, 周利莉, 张杰. 一种基于区域的双三次图像插值算法[J]. 计算机工程, 2010,36(19): 216-218. [10] Unaldi N, Asari V K. Undecimated wavelet transform-based image interpolation[C]//International Symposium on Visual Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010: 474-483. [11] Wang Z, Qi F. Analysis of multi frame super-resolution reconstruction for image anti-aliasing and deblurring[J]. Image and Vision Computing, 2005, 23(4): 393-404. [12] Cao Y, Liu X, Wang W, et al. Super-resolution image reconstruction algorithm based on projection onto convex sets and wavelet fusion[J]. Journal of Biomedical Engineering, 2009, 26(5): 947-952. [13] Wan B, Meng L, Ming D, et al. Video image super-resolution restoration based on iterative back-projection algorithm[C]//2009 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications. IEEE, 2009: 46-49. [14] 张磊, 杨建峰, 薛彬, 等. 改进的最大后验概率估计法实现单幅图像超分辨率重建[J].激光与光电子学进展, 2011, 48(1): 82-27. [15] Freeman W T, Jones T R, Pasztor E C. Example-based super-resolution[J]. IEEE Computer graphics and Applications, 2002, 22(2): 56-65. [16] Yang J, Wright J, Huang T, et al. Image super-resolution as sparse representation of raw image patches[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008: 1-8. [17] Chang H, Yeung D Y, Xiong Y. Super-resolution through neighbor embedding[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. IEEE, 2004: 1-8. [18] Timofte R, De Smet V, Van Gool L. Anchored neighborhood regression for fast example-based super-resolution[C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 1920-1927. [19] Schulter S, Leistner C, Bischof H. Fast and accurate image upscaling with super-resolution forests[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern recognition. 2015: 3791-3799. [20] Dong C, Loy C C, He K, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(2): 295-307. [21] Dong C, Loy C C, Tang X. Accelerating the super-resolution convolutional neural network[C]//Computer Vision–ECCV 14th European Conference, Amsterdam, The Netherlands, Springer International Publishing, 2016: 391-407. [22] Shi W, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1874-1883. [23] Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1646-1654. [24] Zhang K, Zuo W, Zhang L. Deep plug-and-play super-resolution for arbitrary blur kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1671-1681. [25] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017: 5998-6008. [26] Yang F, Yang H, Fu J, et al. Learning texture transformer network for image super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 5791-5800. [27] 陈洪刚, 李自强, 张永飞, 等. 基于迭代交替优化的图像盲超分辨率重建[J]. 电子与信息学报, 2022, 44(10): 3343-3352. [28] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4681-4690. [29] Lim B, Son S, Kim H, et al. Enhanced deep residual networks for single image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 136-144. [30] Wang XT, Yu K, Wu SX, et al. ESRGAN: Enhanced super-resolution generative adversarial networks[C]//Proceedings of the ECCV European Conference on Computer Vision. Munich: Springer, 2018: 63-79. [31] 江俊君, 程豪, 李震宇, 等. 深度学习视频超分辨率技术概述[J]. 中国图象图形学报, 2023, (7):1927-1964. [32] 何小海, 吴媛媛, 陈为龙, 等. 视频超分辨率重建技术综述[J]. 信息与电子工程, 2011, 9(1): 1-6. [33] Jo Y, Oh S W, Kang J, et al. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3224-3232. [34] Sajjadi M S M, Vemulapalli R, Brown M. Frame-recurrent video super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6626-6634. [35] Yan B, Lin C, Tan W. Frame and feature-context video super-resolution[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 5597-5604. [36] Zhu X, Li Z, Zhang X Y, et al. Residual invertible spatio-temporal network for video super-resolution[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 5981-5988. [37] Fuoli D, Gu S, Timofte R. Efficient video super-resolution through recurrent latent space propagation[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. IEEE, 2019: 3476-3485. [38] Kappeler A, Yoo S, Dai Q, et al. Video super-resolution with convolutional neural networks[J]. IEEE Tansactions on Computational Imaging, 2016, 2(2): 109-122. [39] Caballero J, Ledig C, Aitken A, et al. Real-time video super-resolution with spatio temporal networks and motion compensation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4778-4787. [40] Wang Z, Yi P, Jiang K, et al. Multi-memory convolutional neural network for video super-resolution[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2530-2544. [41] Xue T, Chen B, Wu J, et al. Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127: 1106-1125. [42] Haris M, Shakhnarovich G, Ukita N. Recurrent back-projection network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3897-3906. [43] Yi P, Wang Z, Jiang K, et al. Multi-temporal ultra dense memory network for video super-resolution[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(8): 2503-2516. [44] Li F, Bai H, Zhao Y. Learning a deep dual attention network for video super-resolution[J]. IEEE Transactions on Image Processing, 2020, 29: 4474-4488. [45] Wang L, Guo Y, Liu L, et al. Deep video super-resolution using HR optical flow estimation[J]. IEEE Transactions on Image Processing, 2020, 29: 4323-4336. [46] Tian Y, Zhang Y, Fu Y, et al. TDAN: Temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 3360-3369. [47] Wang X, Chan K C K, Yu K, et al. EDVR: Video restoration with enhanced deformable convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019: 1954-1963. [48] Chan K C K, Wang X, Yu K, et al. Understanding deformable alignment in video super-resolution[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(2): 973-981. [49] Chauhan R, Ghanshala K K, Joshi R C. Convolutional neural network (CNN) for image detection and recognition[C]// Proceedings of the IEEE Frist International Conference on Secure Cyber Computing and Communication (ICSCCC). 2018: 278-282. [50] Targ S, Almeida D, Lyman K. Resnet in resnet: generalizing residual architectures[C]//International Conference on Learning Representations, 2016: 1-7. [51] Creswell A, White T, Dumoulin V, et al. Generative adversarial networks: an overview[J]. IEEE Signal Processing Magazine, 2018, 35(1): 53-65. [52] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141. [53] Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19. [54] Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10076-10085. [55] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 764-773. [56] Shi W, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1874-1883. [57] Huynh-Thu Q, Ghanbari M. Scope of validity of PSNR in image/video quality assessment[J]. Electronics Letters, 2008, 44(13): 800-801. [58] Wang Z. Image quality assessment: form error visibility to structural similarity[J]. IEEE Trans. Image Process, 2004, 13(4): 604-606. [59] Wang L, Shen J, Tang E, et al. Multi-scale attention network for image super-resolution[J]. Journal of Visual Communication and Image Representation, 2021, 80: 1-12. [60] Gao F, Yang Y, Wang J, et al. A deep convolutional generative adversarial networks (DCGANs)-based semi-supervised method for object recognition in synthetic aperture radar (SAR) images[J]. Remote Sensing, 2018, 10(6): 846-866. [61] Jolicoeur-Martineau A. The relativistic discriminator: a key element missing from standard GAN[C]//International Conference on Learning Representations. 2018: 1-25. [62] Agustsson E, Timofte R. NTIRE 2017 challenge on single image super-resolution: Dataset and study[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 126-135. [63] Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]// Proceedings of the British Machine Vision Conference. Durham: BMVA Press. 2012: 135.1-135.10. [64] Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. Springer Berlin Heidelberg, 2012: 711-730. [65] Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings of the Eighth IEEE International Conference on Computer Vision. ICCV, 2001, 2: 416-423. [66] Valanarasu J M J, Oza P, Hacihaliloglu I, et al. Medical transformer: Gated axial-attention for medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer International Publishing, 2021: 36-46. [67] Xue T, Chen B, Wu J, et al. Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127: 1106-1125. [68] Yi P, Wang Z, Jiang K, et al. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3106-3115. [69] Nah S, Baik S, Hong S, et al. NTIRE 2019 challenge on video deblurring and super-resolution: Dataset and study[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019: 1985-1995. [70] Liu J, Liu Z, Ou W, et al. CFGPFSR: A generative method combining facial and gan priors for face super-resolution[J]. Neural Processing Letters, 2024, 56(2): 97-115. [71] Zhao Y, Teng Q, Chen H, et al. Activating more information in arbitrary-scale image super-resolution[J]. IEEE Transactions on Multimedia, 2024: 7946-7961. [72] Wang H, Han X, Wen T, et al. Fresnel incoherent compressive holography toward 3D videography via dual-channel simultaneous phase-shifting interferometry[J]. Optics Express, 2024, 32(6): 10563-10576. [73] Zhu H, Han G, Peng Y, et al. Functional-realistic CT image super-resolution for early-stage pulmonary nodule detection[J]. Future Generation Computer Systems, 2021, 115: 475-485. ﹀
中图分类号：	TP391
开放日期：	2024-06-18

附件下载