查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的无人机航拍目标检测算法研究
姓名：	吴琰
学号：	21207040018
保密级别：	公开
论文语种：	chi
学科代码：	0810
学科名称：	工学 - 信息与通信工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	信息与通信工程
研究方向：	图像处理
第一导师姓名：	侯颖
第一导师单位：	西安科技大学
论文提交日期：	2024-06-12
论文答辩日期：	2024-06-05
论文外文题名：	Research on UAV Aerial Photography Target Detection Algorithm based on Deep Learning
论文中文关键词：	深度学习 ; 无人机航拍图像 ; 小目标检测 ; YOLOv8s ; 模型轻量化
论文外文关键词：	Deep learning ; UAV aerial images ; Small target detection ; YOLOv8s ; Model lightweight
论文中文摘要：	︿近年来，随着无人机技术的不断成熟及发展，无人机航拍目标检测在军事侦察、安全作业、科技农业等领域都具有重要的应用价值。然而，无人机航拍图像的目标检测任务面临多方面的挑战：拍摄高度、角度、天气条件等因素导致的复杂背景干扰、目标尺度变化大以及小目标密集等问题；当前的无人机检测模型体积较大、检测速度慢，难以满足边缘部署对模型轻量化和实时性的要求。针对上述难点，本文基于深度学习提出了两个目标检测模型，主要工作包括以下内容：（1）针对无人机航拍场景下目标检测算法存在的低精度和高误检率等问题，提出了基于YOLO-UAVSOD的无人机小目标检测算法。通过将SPD-Conv卷积和BiFormer模块组合设计成一个小目标检测模块组来改进骨干网络，保留小目标细粒度特征和聚焦有用信息。针对目标尺度不一，采用改进的REP-PAN Neck网络来处理不同分辨率的图像，提升模型检测性能。最后增加小目标检测层以及优化损失函数，以增强模型对航拍图像中小尺度目标的识别和定位能力。改进算法在VisDrone2019数据集上的精度和达到了51.2％和41.0%，相比YOLOv8s算法分别提升了10.9％和8.0%，能实现高精度的无人机航拍目标检测。（2）针对YOLO-UAVSOD航拍目标检测模型体积和参数量较大等问题，通过重构网络及轻量化处理，提出了基于YOLO-UAVGC的轻量化无人机目标检测算法。该算法保留了适用于小目标检测的BiFormer模块和PIoU v2损失函数，以保证小目标检测精度。采用轻量化Ghost卷积和设计的C2f_Ghost模块对网络进行重构，同时构建SFN浅层融合网络以及设计C3STR模块，减少参数量及模型体积，以满足低功耗设备的实时性和轻量化要求。改进算法在VisDrone2019数据集上和可达49.7%和39.9%，模型参数量和体积仅为5.9M和12.3M，相比原模型分别缩减了77.4%和76.7%，且检测速度可达78.1帧/秒，满足实时无人机航拍目标检测需求。﹀
论文外文摘要：	︿ In recent years, with the continuous maturity and development of unmanned aerial vehicle technology, UAV-based target detection has important applications in military reconnaissance, security operations, and scientific agriculture. However, the target detection task of UAV aerial images faces multiple challenges, including complex background interference caused by shooting height, angle, weather conditions, large-scale variability of targets, and dense small targets. The current UAV detection models are too voluminous and slow to meet the requirements of lightweight and real-time models for edge deployment. In response to the above difficulties, the thesis proposes two target detection models based on deep learning. The main work includes the following: (1) In view of the problems of low accuracy and high false detection rate of target detection algorithms in UAV aerial photography scenarios, a UAV small target detection algorithm based on YOLO-UAVSOD is proposed. The backbone network is improved by combining SPD-Conv convolution and BiFormer modules into a small target detection module group, retaining fine-grained features of small targets and focusing on useful information. In view of the different target scales, the improved Rep-PAN network is used to process images of different resolutions to improve model detection performance. Finally, a small target detection layer and an optimized loss function are added to enhance the model's ability to identify and locate small-scale targets in aerial images. The accuracy indicators and of the improved algorithm on the VisDrone2019 dataset reached 51.2% and 41.0%, respectively improved by 10.9% and 8.0% compared to the YOLOv8s algorithm, and the improved model is able to achieve high-precision detection of UAV aerial photography targets. (2) In view of the problems such as the large volume and parameter count of YOLO-UAVSOD aerial target detection model, the thesis reconstructs and lightweights its network, and proposes a lightweight UAV target detection algorithm based on YOLO-UAVGC.The algorithm retains the BiFormer module and PIoU v2 loss function suitable for small target detection to ensure small target detection accuracy. The lightweight Ghost convolution and the designed C2f_Ghost module are used to reconstruct the network, while the SFN shallow fusion network is constructed as well as the C3STR module is designed to reduce the number of parameters and the model volume to meet the real-time and lightweight requirements of low-power devices. The improved algorithm’s and can reach 49.7% and 39.9% on the VisDrone2019 dataset. The model parameters and volume are only 5.9M and 12.3M, which are 77.4% and 76.7% smaller than the original model. The detection speed can reach 78.1 frames per second, which can realize real-time detection of UAV aerial photography targets. ﹀
参考文献：	︿ [1]Mittal P, Singh R, Sharma A. Deep learning-based object detection in low-altitude UAV datasets: A survey[J]. Image and Vision computing, 2020, 104: 104046. [2]Wu X, Li W, Hong D, et al. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey[J]. IEEE Geoscience and Remote Sensing Magazine, 2021, 10(1): 91-124. [3]罗旭东, 吴一全, 陈金林. 无人机航拍影像目标检测与语义分割的深度学习方法研究进展 [J]. 航空学报, 2024, 45 (06): 241-270. [4]Jiang L, Yuan B, Du J, et al. MFFSODNet: Multi-Scale Feature Fusion Small Object Detection Network for UAV Aerial Images[J]. IEEE Transactions on Instrumentation and Measurement, 2024. [5]Ramachandran A, Sangaiah A K. A review on object detection in unmanned aerial vehicle surveillance[J]. International Journal of Cognitive Computing in Engineering, 2021, 2: 215-228. [6]Bouguettaya A, Zarzour H, Kechida A, et al. Vehicle detection from UAV imagery with deep learning: A review[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(11): 6047-6067. [7]戚玲珑, 高建瓴. 基于改进YOLOv7的小目标检测[J]. 计算机工程, 2023, 49 (01): 41-48. [8]Dalal N, Triggs B. Histograms of Oriente Gradients for Human Detection[C]// IEEE Computer Society Conference on Computer Vision & Pattern Recognition. IEEE, 2005:886-893. [9]Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 32(9): 1627-1645. [10]He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916. [11]Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2017,60(06):84-90. [12]Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587. [13]Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [14]Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28. [15]Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104: 154-171. [16]He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recog-nition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37: 1904–1920. [17]Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29. [18]Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162. [19]Pang J, Chen K, Shi J, et al. Libra r-cnn: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 821-830. [20]Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788. [21]Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37. [22]Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271. [23]Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988. [24]Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018. [25]Targ S, Almeida D, Lyman K. Resnet in resnet: Generalizing residual architectures[J]. arXiv preprint arXiv:1603.08029, 2016. [26]Law H, Deng J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 734-750. [27]Zhao Q, Sheng T, Wang Y, et al. M2det: A single-shot object detector based on multi-level feature pyramid network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 9259–9266. [28]Bochkovskiy A, Wang Chienyao, Liao Hongyuan. YOLOv4: Optimal Speed and Accuracy of Object Detection[J]. arXiv preprint arXiv, 2020,2004(04):1-17. [29]Zhu X, Su W, Lu L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020. [30]Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021. [31]Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 7464-7475. [32]Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022. [33]Li J, Dai Y, Li C, et al. Visual detail augmented mapping for small aerial target detection[J]. Remote Sensing, 2018, 11(1): 14. [34]Yang F, Fan H, Chu P, et al. Clustered object detection in aerial images[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 8311-8320. [35]张瑞倩, 邵振峰, Aleksei Portnov, 汪家明. 多尺度空洞卷积的无人机影像目标检测方法[J]. 武汉大学学报(信息科学版), 2020, 45 (06): 895-903. [36]WANG C Y，LIAO H Y M，WU Y H，et al.CSPNet： a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops，2020：390-391. [37]王胜科, 任鹏飞, 吕昕, 等. 基于中心点和双重注意力机制的无人机高分辨率图像小目标检测算法[J]. 应用科学学报, 2021, 39 (04): 650-659. [38]LI F, BAI J, ZHANG M, et al. Yield estimation of highdensity cotton fields using low-altitude UAV imaging and deep learning[J]. Plant Methods, 2022, 18(1): 55. [39]LIU Y C, SHI G, Li Y X, et al. M-YOLO based detection and recognition of highway surface oil filling with unmanned aerial vehicle[C]//2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 2022: 18841887. [40]Zhao H, Zhang H, Zhao Y. Yolov7-sea: Object detection of maritime uav images based on improved yolov7[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2023: 233-238. [41]Ma C, Fu Y, Wang D, et al. YOLO-UAV: Object Detection Method of Unmanned Aerial Vehicle Imagery Based on Efficient Multi-Scale Feature Fusion[J]. IEEE Access, 2023. [42]Wang G, Chen Y, An P, et al. UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios[J]. Sensors, 2023, 23(16): 7190. [43]Micheal A A, Vani K, Sanjeevi S, et al. Object detection and tracking with UAV data using deep learning[J]. Journal of the Indian Society of Remote Sensing, 2021, 49: 463-469. [44]董刚, 谢维成, 黄小龙, 等. 深度学习小目标检测算法综述[J]. 计算机工程与应用, 2023, 59 (11): 16-27. [45]江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42 (04): 137-151. [46]Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755. [47]Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125. 2 [48]Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759-8768. [49]Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 658-666. [50]Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12993-13000. [51]Zhang Y F, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. [52]Liu C, Wang K, Li Q, et al. Powerful-IoU: More straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism[J]. Neural Networks, 2024, 170: 276-284. [53]Zhu P, Wen L, Du D, et al. Detection and tracking meet drones challenge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(11): 7380-7399. [54]Sunkara R, Luo T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2022: 443-459. [55]Zhu L, Wang X, Ke Z, et al. BiFormer: Vision Transformer with Bi-Level Routing Attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 10323-10333. [56]Weng K, Chu X, Xu X, et al. EfficientRep: An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design[J]. arXiv preprint arXiv:2302.00386, 2023. [57]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020. [58]LI C, YANG T, ZHU S, et al. Density map guided object detection in aerial images[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA: IEEE, 2020: 737-746. [59]Nguyen K, Huynh N T, Nguyen P C, et al. Detecting objects from space: An evaluation of deep-learning modern approaches[J]. Electronics, 2020, 9(4): 583 1-18. [60]Zhao Q, Liu B, Lyu S, et al. TPH-YOLOv5++: Boosting Object Detection on Drone-Captured Scenarios with Cross-Layer Asymmetric Transformer[J]. Remote Sensing, 2023, 15(6): 1687. [61]Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589. [62]Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10012-100222 [63]Xu S, Wang X, Lv W, et al. PP-YOLOE: An evolved version of YOLO[J]. arXiv preprint arXiv:2203.16250, 2022. [64]Ma X, Wei W, Dong J, et al. RTOD-YOLO: Traffic Object Detection in UAV Images Based on Visual Attention and Re-parameterization[C]//2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023: 1-8. ﹀
中图分类号：	TP391.4
开放日期：	2024-06-12

附件下载