论文中文题名: | 基于红外与可见光图像融合的低光照区域行人检测 |
姓名: | |
学号: | 21206223055 |
保密级别: | 保密(1年后开放) |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2024 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 图像融合及目标检测 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2024-06-19 |
论文答辩日期: | 2024-06-06 |
论文外文题名: | Infrared and visible images fusion based pedestrian detection in low light areas |
论文中文关键词: | 深度学习 ; 红外与可见光图像融合 ; 低光照区域行人检测 ; YOLOv8 |
论文外文关键词: | deep learning ; infrared and visible image fusion ; pedestrian detection in low areas ; YOLOv8 |
论文中文摘要: |
行人检测是无人驾驶和智能监控等领域必不可少的技术,主要是指对视频或者图像中的行人进行识别与定位,但是在复杂场景中,由于存在光线较差、行人被物体遮挡等不可避免的实际问题,仅仅依靠单一红外或可见光图像获取的信息进行检测会存在漏检或误检的情况。将红外图像与可见光图像融合后再进行检测不仅可以提高检测精度,同时还可以保留红外图像的热辐射信息与可见光图像的纹理细节信息。因此,本文提出了基于红外与可见光图像融合的低光照区域行人检测方法,具体工作如下: (1)针对不同分辨率的源图像在预处理阶段会出现信息丢失的问题,以及特征提取过程中卷积核的感受野随着网络的深度增加而逐渐变小导致全局信息被局部信息淹没的问题,本文提出了基于元学习的红外与可见光图像联合超分融合方法(MSADRCN)。首先,本文设计了用于红外图像和可见光图像超分辨率重建的元学习超分网络,以信息无损的方式统一输入图像的分辨率。其次,使用自编码器-双鉴别器条件生成对抗融合网络捕捉全局信息与局部信息的关联性,生成初步的融合图像。最后,残差补偿双备份融合网络被用于补偿特征提取过程的信息损失,从而生成最终的融合图像。 (2)针对中小尺度的行人目标在网络采样过程中尺寸相对较小容易丢失特征信息的问题,本文提出了基于CCT-YOLO的融合图像低光照区域行人检测方法。首先,本文在上采样过程中引入CARAFE算子捕获丰富的语义特征和空间特征,以提高行人目标的可用特征信息。其次,使用CFC_C2F重构YOLOv8网络的瓶颈层,在降低模型的参数量和计算量的同时丰富网络的梯度路径。最后,在检测头部分加入三分支注意力机制,以结合低阶特征图的位置信息和高阶特征图的语义信息。 (3)本文在TNO数据集和LLVIP数据集上进行了各个网络的消融实验、不同融合方法的对比实验以及联合超分的对比实验。实验结果表明,本文的MSADRCN在主观视觉效果和客观评价指标上均有着较好的表现,适用于不同分辨率下的图像融合。同时,本文也在LLVIP数据集上进行了低光照区域行人检测的对比实验,以验证CCT-YOLO的有效性。结果表明,所提方法在低光照区域融合图像行人检测的召回率、准确率和均值平均精度上优于现有的检测方法。 本文在红外与可见光图像融合和低光照区域行人检测方法上做出了研究,为低光照场景下的无人驾驶和智能监控等领域提供了一种可行的检测方法。未来,探索多模态的数据融合方法将成为新的研究方向,有助于进一步提高低光照、遮挡和雨雪等极端环境下行人检测的精度,以促进无人驾驶和智能监控的发展。 |
论文外文摘要: |
Pedestrian detection is an essential technology in fields such as autonomous driving and intelligent monitoring, mainly referring to the recognition and localization of pedestrians in videos or images. However, in complex real-life scenarios, due to inevitable practical problems such as poor lighting and pedestrians being obstructed by objects, relying solely on information obtained from a single infrared or visible image for detection may result in missed or false detections. Fusing infrared images with visible images for detection not only improves detection accuracy, but also preserves the thermal radiation information of infrared images and the texture details of visible images. Therefore, this paper proposes a pedestrian detection method in low light areas based on the fusion of infrared and visible images. The specific work is as follows. (1) This thesis proposes a meta-learning based joint super-resolution fusion method for infrared and visible light images (MSADRCN) to address the issue of information loss and blurring in the preprocessing stage of source images with different resolutions, as well as the problem of the receptive field of the convolutional kernel gradually decreasing with the depth of the network, resulting in global information being overwhelmed by local information during feature extraction. Firstly, this paper designs a meta-learning super-resolution network for super-resolution reconstruction of infrared and visible images, which unifies the resolution of input images in an information lossless manner. Secondly, using autoencoder dual discriminator conditional generative adversarial networks to capture the correlation between global and local information, preliminary fused images are generated. Finally, the residual compensation dual backup fusion network is used to compensate for the information loss in the feature extraction process, thereby generating the final fused image. (2) This thesis proposes a fusion image low light area pedestrian detection method based on CCT-YOLO to address the problem of small and medium-sized pedestrian targets losing feature information during network sampling due to their relatively small size. Firstly, this paper introduces the CARAFE operator in the up-sampling process to capture rich semantic and spatial features, in order to improve the available feature information of pedestrian targets. Secondly, using CFC_C2F to reconstruct the bottleneck layer of the YOLOv8 network enriches the gradient path of the network while reducing the model's parameter and computational complexity. Finally, a three branch attention mechanism is added to the detection head to combine the positional information of low order feature maps with the semantic information of high order feature maps. (3) This thesis conducted ablation experiments on various networks, comparative experiments on joint super-resolution, and comparative experiments on different fusion methods on the TNO and LLVIP datasets. The experimental results show that the MSADRCN proposed in this paper performs well in both subjective visual effects and objective evaluation indicators, and is suitable for image fusion at different resolutions. Meanwhile, this paper also conducted comparative experiments on pedestrian detection in low light areas on the LLVIP dataset to verify the effectiveness of CCT-YOLO. The results show that the proposed method outperforms existing detection methods in terms of recall, accuracy, and mean average accuracy in pedestrian detection of fused images in low light areas. This paper conducts research on methods for infrared and visible light image fusion and pedestrian detection in low light areas, providing a feasible detection method for unmanned driving and intelligent monitoring in low light scenarios. In the future, exploring multimodal data fusion methods will become a new research direction, which will help further improve the accuracy of pedestrian detection in extreme environments such as low lighting, occlusion, and rain and snow, in order to promote the development of autonomous driving and intelligent monitoring. |
中图分类号: | TP391.4 |
开放日期: | 2025-06-20 |