- 无标题文档
查看论文信息

论文中文题名:

 基于Transformer的低照度图像与视频增强方法研究    

姓名:

 樊璐    

学号:

 21208049012    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081203    

学科名称:

 工学 - 计算机科学与技术(可授工学、理学学位) - 计算机应用技术    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2024    

培养单位:

 西安科技大学    

院系:

 计算机科学与技术学院    

专业:

 计算机科学与技术    

研究方向:

 图像处理    

第一导师姓名:

 付燕    

第一导师单位:

 西安科技大学    

论文提交日期:

 2024-12-12    

论文答辩日期:

 2024-12-03    

论文外文题名:

 Research on low-illuminaion image and video enhancement method based on Transformer    

论文中文关键词:

 低照度图像 ; Transformer ; Swin Tranformer ; 分组时空移位 ; 自校准卷积    

论文外文关键词:

 Low light image ; Transformer ; Swin Tranformer ; Grouping space-time shift ; Self-calibrating convolution    

论文中文摘要:

图像和视频作为人们日常传递信息的重要载体之一,承载着许多重要的研究信息,然而,由于不理想的光照条件或者不可避免的环境因素,获取到的图像和视频会存在亮度过低无法辨别的问题,严重影响了后续的高级计算机视觉任务处理。因此,研究低照度图像与视频增强方法具有重大的意义。本文从图像和视频两个方面,依据Transformer和Swin Transformer网络模型的优势,通过训练学习得到基于Transformer的低照度图像与视频增强算法,主要研究内容如下:

(1)针对目前低照度图像增强方法通常仅考虑空域信息而忽略频域信息,使得恢复后的图像纹理细节不清晰的问题,本文提出了一种基于空域与频域融合的低照度图像增强算法。该算法依据Transformer的优势和傅里叶变换的应用背景,将低照度图像分为三个阶段进行增强:空域阶段、频域阶段、加权融合阶段,空域阶段通过融合多个基于通道自注意的Transformer模块来得到空域的增强结果,频域阶段通过傅里叶变换和幅度相位卷积模块来得到频域的增强结果,加权融合阶段将前两个阶段的结果进行加权融合得到最终的增强图像。本文所提方法分别在公共数据集LOL数据集和FiveK数据集上进行了验证与分析,实验结果表明,本文算法有效提升了图像亮度并成功恢复了纹理细节。与DEFormer算法相比,在LOL数据集上,本文方法将PSNR和SSIM指标分别提高了1.3dB、0.13,在FiveK数据集上,PSNR和SSIM则分别提高了0.2dB、0.05。

(2)针对目前低照度视频增强方法未能充分考虑视频多帧的连续性,从而导致增强结果产生的闪烁问题,本文提出了一种基于Swin Transformer的低照度视频增强算法。该算法首先利用Swin Transformer模块作为编码器,在不同层级提取多尺度的视频帧特征。接着,引入特征对齐融合模块,该模块通过分组时空移位和自校准卷积来对齐多帧特征,有效缓解闪烁问题。为了避免增强后的图像出现过增强或欠增强的现象,设计了冗余度度量与亮度自适应调整模块,以实现视频帧的亮度矫正。最后,为了降低视频帧的噪声,设计了无监督去噪模块,从而获得更为清晰的增强结果。本文所提方法分别在公共数据集SDSD数据集和DID-MW数据集上进行了验证与分析,实验结果表明,本文方法有效改善了视频帧在亮度恢复时的闪烁问题。与UVENet算法相比,本文方法在SDSD数据集上将PSNR、SSIM值提高了1.16dB、0.06,在DID-MW数据集上则分别提高了2.65dB、0.01。

论文外文摘要:

As one of the important carriers of information transmission in daily life, images and videos carry a lot of important research information. However, due to undesirable lighting conditions or unavoidable environmental factors, the acquired images and videos may have the problem of too low brightness to be distinguished, which seriously affects the subsequent advanced computer vision task processing. Therefore, it is of great significance to study low-light image and video enhancement methods. From the two aspects of images and videos, this paper, based on the advantages of Transformer and Swin Transformer network models, obtains a low-light image and video enhancement algorithm based on Transformer through training and learning. The main research contents are as follows:

(1) In view of the problem that the current low-light image enhancement method usually only considers the spatial domain information and ignores the frequency domain information, which makes the texture details of the restored image unclear, this paper proposes a low-light image enhancement algorithm based on the fusion of spatial domain and frequency domain. Based on the advantages of the Transformer and the application background of Fourier transform, this algorithm divides low-light images into three stages for enhancement: spatial domain stage, frequency domain stage, and weighted fusion stage. In the spatial domain stage, multiple Transformer modules based on channel self-attention are fused to obtain spatial domain enhancement results. In the frequency domain stage, Fourier transform and amplitude phase convolution modules are used to obtain frequency domain enhancement results. In the weighted fusion stage, the results of the first two stages are weightedly fused to obtain the final enhanced image. The proposed method is verified and analyzed on the public datasets LOL dataset and FiveK dataset, respectively. The experimental results show that the proposed algorithm effectively improves the image brightness and successfully restores texture details. Compared with the DEFormer algorithm, on the LOL dataset, the proposed method improves the PSNR and SSIM indicators by 1.3dB and 0.13, respectively, and on the FiveK dataset, the PSNR and SSIM are improved by 0.2dB and 0.05, respectively. 

(2) In view of the fact that the current low-light video enhancement methods fail to fully consider the continuity of multiple video frames, which leads to the flickering problem in the enhanced results, this paper proposes a low-light video enhancement algorithm based on Swin Transformer. The algorithm first uses the Swin Transformer module as an encoder to extract multi-scale video frame features at different levels. Then, a feature alignment fusion module is introduced, which aligns multi-frame features through grouped spatiotemporal shift and self-calibration convolution to effectively alleviate the flickering problem. In order to avoid the phenomenon of over-enhancement or under-enhancement of the enhanced image, a redundancy measurement and brightness adaptive adjustment module are designed to achieve brightness correction of the video frame. Finally, in order to reduce the noise of the video frame, an unsupervised denoising module is designed to obtain a clearer enhancement result. The method proposed in this paper is verified and analyzed on the public datasets SDSD dataset and DID-MW dataset respectively. The experimental results show that the method proposed in this paper effectively improves the flickering problem of video frames when brightness is restored. Compared with the UVENet algorithm, the proposed method improves the PSNR and SSIM values ​​by 1.16dB and 0.06 on the SDSD dataset, and by 2.65dB and 0.01 on the DID-MW dataset, respectively.

中图分类号:

 TP301.6    

开放日期:

 2024-12-12    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式