- 无标题文档
查看论文信息

论文中文题名:

 无人机可见光与红外图像小目标检测研究    

姓名:

 彭林森    

学号:

 22210226086    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085700    

学科名称:

 工学 - 资源与环境    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2025    

培养单位:

 西安科技大学    

院系:

 测绘科学与技术学院    

专业:

 测绘工程    

研究方向:

 遥感图像处理    

第一导师姓名:

 黄远程    

第一导师单位:

 西安科技大学    

论文提交日期:

 2025-06-18    

论文答辩日期:

 2025-06-08    

论文外文题名:

 Research on Small Object Detection in UAV Visible and Infrared Images    

论文中文关键词:

 图像融合 ; 小目标检测 ; 尺度序列特征融合 ; 无源域自适应 ; 均值教师模型    

论文外文关键词:

 Image Fusion ; Small Object Detection ; Scale Sequence Feature Fusion ; Source-Free Domain Adaptation ; Mean Teacher Model    

论文中文摘要:

在无人机小目标检测任务中,虽然可见光图像具有丰富的纹理信息,但其在能见度 低的环境下检测质量较差。相比之下,红外图像不受环境能见度的影响,适合全天候工 作,但图像本身缺乏纹理细节。因此无人机单模态小目标检测存在较大的局限,且小目 标检测研究存在以下难点:(1)无人机图像中小目标的纹理信息稀缺导致特征难以提取, 此外多尺度目标与回归分支敏感等问题也影响着检测精度;(2)小目标检测依赖大规模、 高质量的标注数据集,制作这类数据集所需成本较大。针对以上问题,本研究首先将可 见光-红外图像进行配准融合,输出高质量融合图像;其次改进YOLOv8小目标检测网 络,有效提高小目标检测精度;最后引入无源域自适应方法,使小目标检测无需大量目 标域标注图像。具体工作如下: (1)针对单模态图像在低能见度小目标检测任务中的局限性问题,引入基于动态通 道注意力机制的可见光-红外图像融合方法。首先在双模态图像预处理阶段,通过将中 值滤波与自适应直方图均衡化结合,提升图像的对比度与边缘特征信息;其次在图像配 准阶段,采用Scharr算子改进传统ORB算法实现精确配准。在DroneVehicle数据集与 LLVIP 数据集上的配准实验结果显示,该方法配准精度分别为88.23%和86.20%,对比 传统ORB配准方法分别提升了17.86%和7.94%;最后在图像融合阶段,引入动态通道 注意力机制融合方法,自适应挖掘双模态图像的互补信息。实验结果显示,在上述数据 集上对比其他图像融合方法,该方法在EN、MI、CC、PSNR多个评价指标上均取得最 优。另外,采用YOLOv8模型对可见光图像、红外图像与融合图像进行目标检测,融合 图像mAP分别达到了90.5%与91.6%,检测精度均有提升。为后续小目标检测提供了高 质量数据集。 (2)针对融合图像小目标检测中纹理信息稀缺、多尺度与回归分支敏感等问题,在 YOLOv8 模型基础上提出了基于尺度序列特征融合的SFW-YOLOv8模型。该模型通过 尺度序列特征融合模块与三重特征编码模块,能够增强对小目标的细节捕获能力与上下 文感知能力。同时引入基于Wasserstein距离的GWN Loss损失函数,降低了小目标回归 分支敏感性。为验证改进模型的有效性,在融合后的数据集DV和LL上进行消融实验, 实验结果显示,该模型mAP分别达到了93.8%和93.4,对比YOLOv8模型分别提升了 3.3%和 1.8%。此外,对提出模型做了鲁棒性实验与热力图可视化结果分析,结果表明 提出模型具有良好的鲁棒性,且对小目标的检测更加精确。 (3)针对YOLO模型小目标检测对大规模、高质量数据集的依赖问题,引入无源域 自适应方法,提出基于均值教师模型框架的 ST-YOLO 模型。该模型以 SFW-YOLOv8 作为基准模型,通过目标域自适应增强模块与学生自稳定模块,实现在无需源域数据和 目标域标注的条件下,有效抑制伪标签噪声传播。为验证提出模型的实用性,首先将 DroneVehicle 数据集拆分为白天环境 DD数据集(源域)与夜间环境BD数据集(目标 域),与另一组非雾天数据集Cityscapes(源域)和雾天数据集Foggy Cityscapes(目标 域);其次在两组数据集上分别做跨域对比实验。实验结果表明,对比经典基于 Faster-RCNN 的无源域自适应方法,以及基于 YOLO 的无监督领域适应方法,该方法 mAP分别达到了64.7%和52.9%,领先于其他方法,实现了无源域数据的自适应检测。 综上所述,本研究面对低能见度环境下无人机单模态图像小目标检测存在局限性、小 目标检测存在的难点以及数据集难以获取等问题,提出了一套完整的解决方案,为无人 机可见光-红外图像小目标检测提供新思路。

论文外文摘要:

In unmanned aerial vehicle (UAV) small object detection tasks, although visible light images contain rich texture information, their detection performance degrades significantly under low-visibility conditions. In contrast, infrared images are unaffected by visibility and are suitable for all-weather operations, but they lack fine-grained texture details. Consequently, single-modality detection faces substantial limitations. Moreover, small object detection presents two main challenges: (1) UAV images often contain sparse texture information for small objects, making feature extraction difficult. Additionally, issues such as multi-scale target representation and regression branch sensitivity impact detection accuracy. (2) Small object detection heavily relies on large-scale, high-quality annotated datasets, the production of which is time-consuming and costly.To address these issues, this study first registers and fuses visible-infrared image pairs to generate high-quality fused images. Second, it improves the YOLOv8-based detection network to enhance the detection accuracy of small objects. Finally, it introduces a source-free domain adaptation strategy to eliminate the reliance on annotated target domain images. (1) To overcome the limitations of single-modality detection under low-visibility conditions, a visible-infrared image fusion method based on a dynamic channel attention mechanism is proposed. In the preprocessing stage, median filtering and CLAHE are combined to enhance image contrast and edge features. In the registration stage, the traditional ORB algorithm is improved using the Scharr operator for more accurate alignment. Experiments on the DroneVehicle and LLVIP datasets show that the registration accuracy reaches 88.23% and 86.20%, improving by 17.86% and 7.94%, respectively, over the traditional ORB method. In the fusion stage, the dynamic channel attention mechanism is introduced to adaptively extract complementary information from both modalities. The fusion results outperform other methods in. EN, MI, CC, and PSNR metrics. Detection results using YOLOv8 on fused images also show improved mAP values of 90.5% and 91.6%, respectively, offering high-quality data for small object detection. (2) To address the challenges of texture sparsity, multi-scale representation, and regression branch sensitivity in detecting small objects from fused images, an improved SFW-YOLOv8 model is proposed. It incorporates a Scale Sequence Feature Fusion (SSF) module and a Triple Feature Coding (TFC) module to enhance detail capture and context awareness. Additionally, a GWN Loss based on the Wasserstein distance is introduced to reduce the sensitivity of the regression branch. Ablation experiments on the DV and LL datasets show that the improved model achieves mAP values of 93.8% and 93.4%, outperforming YOLOv8 by 3.3% and 1.8%, respectively. Robustness testing and heatmap visualizations further confirm the model’s superior performance in detecting small objects accurately. (3) To eliminate the dependency on large-scale labeled datasets, a source-free domain adaptation model named ST-YOLO is proposed, based on a Mean Teacher framework. Taking SFW-YOLOv8 as the base, the model introduces a domain-specific enhancement module and a student self-regularization module to effectively suppress noisy pseudo-label propagation without requiring source data or labeled target data. Experiments were conducted on two cross-domain dataset pairs: DroneVehicle (daytime DD as source, nighttime BD as target) and Cityscapes/Foggy Cityscapes. The proposed method achieves mAP values of 64.7% and 52.9%, outperforming existing Faster-RCNN-based and YOLO-based domain adaptation methods, demonstrating its effectiveness in source-free domain adaptation. In summary, this study presents a comprehensive solution to tackle the challenges of low-visibility UAV single-modality detection, the inherent difficulties in small object detection, and data scarcity. The proposed framework offers a novel approach to visible-infrared UAV small object detection.

参考文献:

[1] 魏瑀皓, 黄松, 黄亚妮. 基于改进YOLOv7的卫星遥感影像多尺度目标检测方法[J].航天返回与遥感, 2024, 45(2): 153-162.

[2] 何丰郴. 基于双模态目标跟踪的防抖无人机视觉引导系统[D].南京:南京理工大学, 2021.

[3] Wu X, Sahoo D, Hoi S C H. Recent advances in deep learning for object detection[J]. Neurocomputing, 2020, 396: 39-64.

[4] Lin Y, Cao D, Zhou X. Adaptive infrared and visible image fusion method by using rolling guidance filter and saliency detection[J]. Optik-International Journal for Light and Electron Optics, 2022, 262(000): 169-218.

[5] Yu R, Chen W, Zhu B. Infrared and visible image fusion algorithm based on a cross-layer densely connected convolutional network[J]. Applied Optics, 2022, 61.

[6] Zuo F, Huang Y, Li Q, et al. Infrared and visible image fusion using multi-scale pyramid network[J]. International Journal of Wavelets, Multiresolution and Information Processing, 2022, 20(05): 2250019.

[7] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]// Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13. Springer International Publishing, 2014: 740-755.

[8] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.

[9] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.

[10] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.

[11] He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9): 1904-16.

[12] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

[13] Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[J]. IEEE, 2017: 6517-6525.

[14] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018.

[15] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.

[16] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[17] A Z X, A E D, A J T, et al. Light weight object detector based on composite attention residual network and boundary location loss[J].Neurocomputing, 2022, 494: 132-147.

[18] 谌海云, 肖章勇, 郭勇, 等. 基于改进YOLOv8s的无人机航拍目标检测算法[J]. 电光与控制, 2024, 31(12): 55-63.

[19] Chen Y, Wang B, Guo X, et al. DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection[C]//International Conference on Pattern Recognition. Springer, Cham, 2025: 236-252.

[20] Zhou Y T, Cao K Y, Piao J C. Fine-YOLO: A Simplified X-ray Prohibited Object Detection Network Based on Feature Aggregation and Normalized Wasserstein Dis-tance[J]. Sensors (Basel, Switzerland), 2024, 24(11): 3588.

[21] Moksyakov A, Wu Y, Gadsden S A, et al. Object detection and tracking with YOLO and the sliding innovation filter[J]. Sensors, 2024, 24(7): 2107.

[22] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]// European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.

[23] Beal J, Kim E, Tzeng E, et al. Toward transformer-based object detection[J]. arXiv preprint arXiv:2012.09958, 2020.

[24] Zhu X, Su W, Lu L, et al. Deformable Transformers for End-to-End Object Detection [J]. arXiv preprint arXiv:2010.04159, 2010.

[25] Srinivas A, Lin T Y, Parmar N, et al. Bottleneck transformers for visual recognition[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 16519-16529.

[26] Guo J, Han K, Wu H, et al. Cmt: Convolutional neural networks meet vision transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 12175-12185.

[27] Zhu K, Xu C, Wei Y, et al. Fast-PLDN: fast power line detection network[J].Journal of Real-Time Image Processing, 2022, 19(1); 3-13.

[28] Zhang Y, Yin Z, Nie L, et al. Attention Based Multi-Layer Fusion of Multispectral Images for Pedestrian Detection[J].IEEE Access, 2020, 8: 165071-165084.

[29] Liu J, Fan X, Huang Z, et al. Target-aware dual adversarial learning and a multisc-enario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 5802-5811.

[30] Cao Z, Yang H, Zhao J, et al. Attention fusion for one-stage multispectral pedestrian detection[J]. Sensors, 2021, 21(12): 4184.

[31] Wang W, Ren J, Su C, et al. Ship detection in multispectral remote sensing images via saliency analysis[J]. Applied Ocean Research, 2021, 106: 102448.

[32] Wang X, Zhao L, Wu W, et al. MCANet: multiscale cross-modality attention network for multispectral pedestrian detection[C]//International Conference on Multimedia Modeling. Cham: Springer International Publishing, 2023: 41-53.

[33] Konig D, Adam M, Jarvers C, et al. Fully convolutional region proposal networks for multispectral person detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017: 49-56.

[34] Zhang H, Fromont E, Lefèvre S, et al. Guided attentive feature fusion for multispectral pedestrian detection[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2021: 72-80.

[35] An Z, Liu C, Han Y. Effectiveness guided cross-modal information sharing for aligned RGB-T object detection[J]. IEEE Signal Processing Letters, 2022, 29: 2562-2566.

[36] Wagner J, Fischer V, Herman M, et al. Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks[C]//ESANN. 2016, 587: 509-514.

[37] Li C, Song D, Tong R, et al. Illumination-aware faster R-CNN for robust multispectral pedestrian detection[J]. Pattern Recognition, 2019, 85: 161-171.

[38] 白玉, 侯志强, 刘晓义, 等. 基于可见光图像和红外图像决策级融合的目标检测算法[J]. 空军工程大学学报(自然科学版), 2020, 21(6): 53-59.

[39] Han J, Cheng G, Li Z, et al. A unified metric learning-based for co-saliency detection framework[J]. IEEE Trans. Circuits Syst. Video Technol, 2018, 28(10): 2473-2483.

[40] Zhu Y, Sun X, Wang M, et al. Multi-modal feature pyramid transformer for rgb-infrared object detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(9): 9984-9995.

[41] Zhou K, Chen L, Cao X. Improving multispectral pedestrian detection by addressing modality imbalance problems[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer International Publishing, 2020: 787-803.

[42] Zhao M, Zhang H R. An infrared object detection method based on cross-domain fusion network[J]. Acta Photonica Sinica, 2021, 50(11): 1110001.

[43] Zhou H, Sun M, Ren X, et al. Visible-thermal image object detection via the combination of illumination conditions and temperature information[J]. Remote Sensing, 2021, 13(18): 3656.

[44] Fu H, Wang S, Duan P, et al. Lraf-net: Long-range attention fusion network for visible–infrared object detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023.

[45] Sun Y, Cao B, Zhu P, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713.

[46] Bao C, Cao J, Hao Q, et al. Dual-YOLO architecture from infrared and visible images for object detection[J]. Sensors, 2023, 23(6): 2934.

[47] Kuang C, Wang H. Object detection algorithm based on infrared and visible light images[J]. Infrared Technol., 2022, 44(9): 912.

[48] Xie Y, Zhang L, Yu X, et al. YOLO-MS: Multispectral object detection via feature interaction and self-attention guided fusion[J]. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15(4): 2132-2143.

[49] 员娇娇, 胡永利, 孙艳丰, 等. 基于深度学习的小目标检测方法综述[J]. 北京工业大学学报, 2021(3).

[50] 李科岑, 王晓强, 林浩, 等. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索,2022,16(1).

[51] 刘洋, 战荫伟. 基于深度学习的小目标检测算法综述[J]. 计算机工程与应用, 2021, 57(2): 12.

[52] Simard P Y, Steinkraus D, Platt J C. Best practices for convolutional neural networks applied to visual document analysis[C]//Icdar. 2003, 3(2003).

[53] Yaeger L, Lyon R, Webb B. Effective training of a neural network character classifier for word recognition[J]. Advances in neural information processing systems, 1996, 9.

[54] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.

[55] Wan L, Zeiler M, Zhang S, et al. Regularization of neural networks using dropconnect[C]//International conference on machine learning. PMLR, 2013: 1058-1066.

[56] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 113-123.

[57] Kisantal M, Wojna Z, Murawski J, et al. Augmentation for small object detection[J]. arXiv preprint arXiv:1902.07296, 2019.

[58] Zoph B, Cubuk E D, Ghiasi G, et al. Learning data augmentation strategies for object detection[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer International Publishing, 2020: 566-583.

[59] Zhu X, Lyu S, Wang X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 2778-2788.

[60] Cao H, Gao Y, Cai W, et al. Segmentation detection method for complex road cracks collected by UAV based on HC-Unet++[J]. Drones, 2023, 7(3): 189.

[61] Ouyang D, He S, Zhang G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1-5.

[62] Sarvaiya J N, Patnaik S, Bombaywala S. Image registration by template matching using normalized cross-correlation[C]//2009 international conference on advances in computing, control, and telecommunication technologies. IEEE, 2009: 819-822.

[63] Liu S, Yang B, Wang Y, et al. 2D/3D multimode medical image registration based on normalized cross-correlation[J]. Applied Sciences, 2022, 12(6): 2828.

[64] Viola P, Wells III W M. Alignment by maximization of mutual information[J]. International journal of computer vision, 1997, 24(2): 137-154.

[65] Cao S Y, Shen H L, Chen S J, et al. Boosting structure consistency for multispectral and multimodal image registration[J]. IEEE Transactions on Image Processing, 2020, 29: 5147-5162.

[66] 赵洪山, 张则言. 基于文化狼群算法的电力设备红外和可见光图像配准[J]. 光学学报, 2020, 40(16): 1610003.

[67] Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]//2011 International conference on computer vision. Ieee, 2011: 2564-2571.

[68] 伍朗, 易诗, 陈梦婷, 等. 基于融合式PC-ORB的异源图像配准算法[J]. 红外技术, 2024, 46(4): 419-426.

[69] Asery R, Sunkaria R K. A novel local octa-pattern feature descriptor for image retrieval[J]. Signal, Image and Video Processing, 2018, 12: 151-159.

[70] Lisowska A. Efficient edge detection method for focused images[J]. Applied Sciences, 2022, 12(22): 11668.

[71] Uzunova H, Wilms M, Handels H, et al. Training CNNs for image registration from few samples with model-based data augmentation[C]//Medical Image Computing and Computer Assisted Intervention− MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part I 20. Springer International Publishing, 2017: 223-231.

[72] Sokooti H, De Vos B, Berendsen F, et al. Nonrigid image registration using multi-scale 3D convolutional neural networks[C]//Medical Image Computing and Computer Assisted Intervention− MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part I 20. Springer International Publishing, 2017: 232-239.

[73] Xiang D, Xie Y, Cheng J, et al. Optical and SAR image registration based on feature decoupling network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-13.

[74] Ren J, Liu Z, Zhou X, et al. Saliency integration driven by similar images[J]. Journal of Visual Communication and Image Representation, 2018, 50: 227-236.

[75] Achanta R, Hemami S, Estrada F, et al. Frequency-tuned salient region detection, 2009[C]//IEEE Conference on CVPR. 1597-1604.

[76] Han J, Pauwels E J, De Zeeuw P. Fast saliency-aware multi-modality image fusion[J]. Neurocomputing, 2013, 111: 70-80.

[77] Zhang X, Ma Y, Fan F, et al. Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition[J]. Journal of the Optical Society of America A, 2017, 34(8): 1400-1410.

[78] Panguluri S K, Mohan L. Otsu thresholding based image fusion framework using contour-let transform[C]//2021 6th International Conference on Inventive Computation Technologies (ICICT). IEEE, 2021: 686-693.

[79] Kulkarni S C, Rege P P. Pixel level fusion techniques for SAR and optical images: A review[J]. Information Fusion, 2020, 59: 13-29.

[80] Long Y, Jia H, Zhong Y, et al. RXDNFuse: A aggregated residual dense network for infrared and visible image fusion[J]. Information Fusion, 2021, 69: 128-141.

[81] Ma J, Yu W, Liang P, et al. FusionGAN: A generative adversarial network for infrared and visible image fusion[J]. Information fusion, 2019, 48: 11-26.

[82] Ma J, Liang P, Yu W, et al. Infrared and visible image fusion via detail preserving adversarial learning[J]. Information Fusion, 2020, 54: 85-98.

[83] Li X, Chen W, Xie D, et al. A free lunch for unsupervised domain adaptive object detection without source data[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(10): 8474-8481.

[84] VS V, Oza P, Patel V M. Instance relation graph guided source-free domain adaptive object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 3520-3530.

[85] Li S, Ye M, Zhu X, et al. Source-free object detection by learning to overlook domain style[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 8014-8023.

[86] Zhang S, Zhang L, Liu Z. Refined pseudo labeling for source-free domain adaptive object detection[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1-5.

[87] Ramamonjison R, Banitalebi-Dehkordi A, Kang X, et al. Simrod: A simple adaptation method for robust object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 3570-3579.

[88] Li G, Ji Z, Qu X, et al. Cross-domain object detection for autonomous driving: A stepwise domain adaptative YOLO approach[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7(3): 603-615.

[89] Liu Q, Lin L, Shen Z, et al. Periodically exchange teacher-student for source-free object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 6414-6424.

[90] Mattolin G, Zanella L, Ricci E, et al. Confmix: Unsupervised domain adaptation for object detection via confidence-based mixing[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023: 423-433.

[91] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[92] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759-8768.

[93] Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results[J]. Advances in neural information processing systems, 2017, 30.

[94] Reza A M. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement[J]. Journal of VLSI signal processing systems for signal, image and video technology, 2004, 38: 35-44.

[95] Ding J, Xue N, Long Y, et al. Learning RoI transformer for oriented object detection in aerial images[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 2849-2858.

[96] Lee S, Park J, Park J. CrossFormer: Cross-guided attention for multi-modal object detection[J]. Pattern Recognition Letters, 2024, 179: 144-150.

[97] Bavirisetti D P, Dhuli R. Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform[J]. IEEE Sensors Journal, 2015, 16(1): 203-209.

[98] Naidu V P S. Image fusion technique using multi-resolution singular value decomposition [J]. Defence Science Journal, 2011, 61(5): 479.

[99] Li H, Wu X J, Durrani T. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(12): 9645-9656.

[100] Tran D, Bourdev L D, Fergus R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks [J]. 2015 IEEE International Conference on Computer Vision (ICCV), 2014,44: 89-97.

[101] Chattopadhay A, Sarkar A, Howlader P, et al. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks[C]//2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 2018: 839-847.

[102] Varailhon S, Aminbeidokhti M, Pedersoli M, et al. Source-Free Domain Adaptation for YOLO Object Detection[J]. arXiv preprint arXiv:2409.16538, 2024.

[103] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 3213-3223.

[104] Sakaridis C, Dai D, Van Gool L. Semantic foggy scene understanding with synthetic data[J]. International Journal of Computer Vision, 2018, 126: 973-992.

[105] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

中图分类号:

 TP391.41    

开放日期:

 2025-06-18    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式