查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的轻量化红外目标检测算法研究
姓名：	高重阳
学号：	21208223082
保密级别：	公开
论文语种：	chi
学科代码：	085212
学科名称：	工学 - 工程 - 软件工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	目标检测
第一导师姓名：	许晓阳
第一导师单位：	西安科技大学
论文提交日期：	2024-06-14
论文答辩日期：	2024-05-30
论文外文题名：	Research on Lightweight Infrared Target Detection Algorithm Based on Deep Learning
论文中文关键词：	红外目标检测 ; 知识蒸馏 ; 轻量级 ; 损失函数
论文外文关键词：	Infrared target Detection ; Knowledge Distillation ; Lightweight ; Loss Function
论文中文摘要：	︿红外目标检测是基于红外图像，对各种场景进行目标识别并检测。在交通领域中，红外场景的车辆与行人实时检测主要面临三方面问题，一是红外场景下检测算法的识别精度低、小目标检测难度大，二是红外目标检测算法参数量与计算量大、识别速度慢且难以部署，三是缺少相关场景的红外数据集。针对上述问题，本文采用基于深度学习的目标检测算法，通过轻量化模块与网络架构设计，研究轻量化红外目标检测算法。主要的研究工作有：（1）针对红外行人场景下多目标、密集人群难以识别检测，易发生漏检误检等问题，提出了一种轻量级红外行人目标检测算法：YOLO-SC。首先，将轻量级ShuffleNetV2网络针对红外场景进行更改，采用并行架构设计优化训练速度，使其在轻量化与检测准确率之间取得平衡。同时，使用CA注意力机制考虑通道间关系与位置关系使其更有效地提升模型准确性。随后，在特征融合层引入轻量级GhostNetV1模块与双线性插值法优化特征融合以提高模型对所学习特征的利用能力。最后，在保持精确度的基础上采用Focal-EIoU损失函数进一步提高精度，进一步校正数据集所带来的正负样本不平衡问题，提高算法的泛化性与鲁棒性。实验结果表明，所提出的YOLO-SC模型在参数量与计算量分别为原YOLOv5模型40%和47.7%的前提下，精度提高了4.3%。（2）针对红外驾驶场景下高速识别能力较差，且YOLO-SC算法难以检测重叠目标、FPS较低等问题，提出了基于ELAN-DW的轻量级红外目标检测算法：KD-YOLO-DW。首先，通过融合深度可分离卷积并联合梯度路径设计策略对原始ELAN模块进行优化提出了ELAN-DW模块，极大地降低了网络参数量与计算量。其次，在特征融合层使用GhostNetV2模块提高特征融合能力。然后，联合残差结构思想提出多尺度融合策略以提高不同尺度特征的融合能力。最后，通过知识蒸馏对轻量化模型再次浓缩，进一步提高了模型对检测红外目标的准确性。实验结果表明，KD-YOLO-DW模型在参数量与计算量方面分别较YOLOv7-tiny模型下降了24.6%和16.7%，模型大小仅为9.2M，mAP分别提高了3.27%和3.15%。（3）设计并实现了轻量级红外目标检测系统。该系统集成了本文所提出的YOLO-SC与KD-YOLO-DW两种算法，可对红外图像进行实时检测，并基于该系统建立了红外图像数据集收集平台，能够在一定程度上缓解红外图像数据集不足的问题。综上所述，本文提出了基于深度学习的两种轻量级红外目标检测算法，YOLO-SC与KD-YOLO-DW。这两种算法都可以应用于红外图像的目标检测，并实现集成两种算法的在线红外目标检测系统。﹀
论文外文摘要：	︿ Infrared target detection is based on infrared images to identify and detect targets in various scenes. In the field of traffic, the real-time detection of vehicles and pedestrians in infrared scenes mainly faces three problems: first, the recognition accuracy is low in infrared scenes and the detection of small targets is difficult; second, the infrared target detection algorithm has a large number of parameters and computation, the recognition speed is slow and difficult to deploy; third, the infrared data set of relevant scenes is lacking. To resolve the above problems, this paper adopts the object detection algorithm based on deep learning, and researches the lightweight infrared object detection algorithm through the design of lightweight module and network architecture. The main research work includes: (1) A lightweight infrared pedestrian detection algorithm named YOLO-SC was proposed to solve the problems such as the difficulty in identifying and detecting multiple targets and dense crowds in infrared pedestrian scene, and the easy occurrence of missed detection and misdetection. The lightweight ShuffleNetV2 network was optimized for infrared scenarios, and the parallel architecture design was used to optimize the training speed, so that it struck a balance between lightweight and detection accuracy. At the same time, CA attention mechanism is used to consider the relationship between channels and the position relationship to improve the model accuracy more effectively. Then, in the feature fusion layer, lightweight GhostNetV1 module and bilinear interpolation method are introduced to optimize the feature fusion to improve the ability of the model to utilize the learned features. Finally, the Focal-EIoU loss function was used to improve the accuracy and further correct the positive and negative sample imbalance caused by the data set, so as to improve the generalization and robustness of the algorithm. The experimental results show that. The accuracy of the proposed YOLO-SC model is improved by 4.3% under the premise that the number of parameters and calculation amount are 40% and 47.7% of the original YOLOv5 model, respectively. (2) Aiming at the problems of poor high-speed recognition ability in infrared driving scene, difficulty of detecting overlapping targets and low FPS with YOLO-SC algorithm, a lightweight infrared target detection algorithm based on ELAN-DW was proposed: KD-YOLO-DW. Firstly, ELAN DW module is proposed to optimize the original ELAN module by combining deep separable convolution and gradient path design strategy, which greatly reduces the number of network parameters and computation. Secondly, GhostNetV2 module is used in feature fusion layer to improve feature fusion capability. Then, a multi-scale fusion strategy is proposed to improve the fusion ability of different scale features. Finally, the lightweight model is reconcentrated by knowledge distillation, which further improves the accuracy of the model for detecting infrared targets. Experiments show that compared with YOLOv7-tiny model, KD-YOLO-DW model has 24.6% and 16.7% fewer parameters and 16.7% less computation, the model size is only 9.2M, and mAP is increased by 3.27% and 3.15%, respectively. (3) Lightweight infrared target detection system is designed and implemented. This system integrates the two algorithms YOLO-SC and KD-YOLO-DW proposed in this paper, which can detect infrared images in real time. Based on this system, an infrared image dataset collection platform is established, which can alleviate the problems such as insufficient infrared image dataset to a certain extent. In conclusion, this paper proposes two lightweight infrared target detection algorithms based on deep learning, YOLO-SC and KD-YOLO-DW. Both algorithms can be applied to infrared image target detection, and an online infrared target detection platform integrating the two algorithms is realized. ﹀
参考文献：	︿ [1] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104 (2013): 154-171. [2] Dollar P, Wojek C, Schiele B, et al. Pedestrian detection: An evaluation of the state of the art[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 34(4): 743-761. [3] Chin R T, Dyer C R. Model-based recognition in robot vision[J]. ACM Computing Surveys (CSUR), 1986, 18(1): 67-108. [4] Sreenu G, Durai S. Intelligent video surveillance: a review through deep learning techniques for crowd analysis[J]. Journal of Big Data, 2019, 6(1): 1-27. [5] Kong H, Audibert J Y, Ponce J. General road detection from a single image[J]. IEEE Transactions on Image Processing, 2010, 19(8): 2211-2220. [6] Omachi M, Omachi S. Traffic light detection with color and edge information[C]//2009 2nd IEEE International Conference on Computer Science and Information Technology. IEEE, 2009: 284-287. [7] Yurtsever E, Lambert J, Carballo A, et al. A survey of autonomous driving: Common practices and emerging technologies[J]. IEEE access, 2020, 8: 58443-58469. [8] Shen D, Wu G, Suk H I. Deep learning in medical image analysis[J]. Annual review of biomedical engineering, 2017, 19: 221-248. [9] Litjens G, Kooi T, Bejnordi B E, et al. A survey on deep learning in medical image analysis[J]. Medical image analysis, 2017, 42: 60-88. [10] Cheng G, Han J. A survey on object detection in optical remote sensing images[J]. ISPRS journal of photogrammetry and remote sensing, 2016, 117: 11-28. [11] Deng Z, Sun H, Zhou S, et al. Multi-scale object detection in remote sensing imagery with convolutional neural networks[J]. ISPRS journal of photogrammetry and remote sensing, 2018, 145: 3-22 [12] 李文博,王琦,高尚.基于深度学习的红外小目标检测算法综述[J].激光与红外,2023,53(10):1476-1484. [13] Prashanth H, Shashidhara H, Murthy K B. Image scaling comparison using universal image quality index[C]//2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies. IEEE, 2009, 859–863. [14] Hudson R D, Hudson J W. The military applications of remote sensing by infrared[J]. Proceedings of the IEEE, 1975, 63(1): 104-128. [15] 龙寰,杨婷,徐劭辉等.基于数据驱动的风电机组状态监测与故障诊断技术综述[J].电力系统自动化,2023,47(23):55-69. [16] Kateb B, Yamamoto V, Yu C, et al. Infrared thermal imaging: a review of the literature and case report[J]. NeuroImage, 2009, 47: T154-T162. [17] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Ieee, 2001, 1: I-I. [18] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Ieee, 2001, 1: I-I. [19] Felzenszwalb P F, Girshick R B, McAllester D. Cascade object detection with deformable part models[C]//2010 IEEE Computer society conference on computer vision and pattern recognition. Ieee, 2010: 2241-2248. [20] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916. [21] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [22] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37. [23] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788. [24] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018. [25] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020. [26] Li Y, Ren F. Light-weight retinanet for object detection[J]. arXiv preprint arXiv:1905.10011, 2019. [27] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114. [28] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020. [29] Jocher G, Stoken A, Borovec J, et al. ultralytics/yolov5: v3. 1-bug fixes and performance improvements[J]. Zenodo, 2020. [30] Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021. [31] Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022. [32] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. [33] Zhang X, Zhu X. Vehicle detection in the aerial infrared images via an improved yolov3 network[C]//2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP). IEEE, 2019: 372-376. [34] 王悦行,吴永国,徐传刚.基于深度迁移学习的红外舰船目标检测算法[J].空天防御,2021,4(04):61-66. [35] 李向荣,孙立辉.融合注意力机制的多尺度红外目标检测[J].红外技术,2023,45(07):746-754. [36] 代牮,赵旭,李连鹏,等.基于改进YOLOv5的复杂背景红外弱小目标检测算法[J].红外技术,2022,44(05):504-512. [37] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19. [38] 陈皋,王卫华,林丹丹.基于无预训练卷积神经网络的红外车辆目标检测[J].红外技术,2021,43(04):342-348. [39] 娄树理,王岩,郭建勤等.改进YOLOX-S的红外舰船目标检测算法[J].应用光学,2023,44(05):1054-1060. [40] 杨子轩,肖嵩,董文倩等.一种引入注意力机制的红外目标检测方法[J].西安电子科技大学学报,2022,49(03):28-35. [41] 楼哲航,罗素云.基于YOLOX和Swin Transformer的车载红外目标检测[J].红外技术,2022,44(11):1167-1175. [42] 赵明,张浩然.一种基于跨域融合网络的红外目标检测方法[J].光子学报,2021,50(11):339-349. [43] 黄磊,杨媛,杨成煜等.FS-YOLOv5：轻量化红外目标检测方法[J].计算机工程与应用,2023,59(09):215-224. [44] Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12993-13000. [45] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856. [46] Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589. [47] Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131. [48] Ramachandran P, Zoph B, Le Q V. Searching for activation functions[J]. arXiv preprint arXiv:1710.05941, 2017. [49] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. [50] Vedaldi A, Zisserman A. Vgg convolutional neural networks practical[J]. Department of Engineering Science, University of Oxford, 2016, 66. [51] Planinsic, Gorazd. "Infrared thermal imaging: Fundamentals, research and applications." (2011): 143 [52] Ferrari M, Mottola L, Quaresima V. Principles, techniques, and limitations of near infrared spectroscopy[J]. Canadian journal of applied physiology, 2004, 29(4): 463-487. [53] Ring E F J, Ammer K. Infrared thermal imaging in medicine[J]. Physiological measurement, 2012, 33(3): R33. [54] Osornio-Rios, Roque Alfredo, Jose Alfonso Antonino-Daviu, and Rene de Jesus Romero-Troncoso. "Recent industrial applications of infrared thermography: A review." IEEE transactions on industrial informatics 15.2 (2018): 615-625. [55] Hudson, R. D., and Jacqueline W. Hudson. "The military applications of remote sensing by infrared." Proceedings of the IEEE 63.1 (1975): 104-128. [56] Sinha D, El-Sharkawy M. Thin mobilenet: An enhanced mobilenet architecture[C]//2019 IEEE 10th annual ubiquitous computing, electronics & mobile communication conference (UEMCON). IEEE, 2019: 0280-0285. [57] Kaiser L, GOMEZ A N, CHOLLET F. Depthwise separable convolutions for ne-ural machine translation[J]. arXiv prepri-nt arXiv:1706.03059, 2017. [58] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722. [59] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011: 315-323. [60] Zheng Z, Wang P, Ren D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021. [61] Zhang Y F, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. [62] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117-2125. [63] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8759-8768. [64] Dumoulin V, Visin F. A guide to convolution arithmetic for deep learning[J]. arXiv preprint arXiv:1603.07285, 2016. [65] Xiong R, Yang Y, He D, et al. On layer normalization in the transformer architecture[C]//International Conference on Machine Learning. PMLR, 2020: 10524-10533. [66] Loshchilov I, Hutter F. Sgdr: Stochastic gradient descent with warm restarts[J]. arXiv preprint arXiv:1608.03983, 2016 [67] Wang Y, Liao H Y M, Yen I H. Designing Network Design Strategies T-hrough Gradient Path Analysis[J]. arXiv- preprint arXiv:2211.04800, 2022. [68] Zhang Y F, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accu-rate bounding box regression[J]. Neuroc-omputing, 2022, 506: 146-157. [69] He J, Eefani S, Ma X, et al. $\alpha $-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression[J]. Advances in Neural Infor-mation Processing Systems, 2021, 34: 20230-20242. ﹀
中图分类号：	TP391
开放日期：	2024-06-14

附件下载