查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的无人机航拍车辆小目标检测
姓名：	黄金磊
学号：	20206223074
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	电气与控制工程学院
专业：	控制科学与工程
研究方向：	图像处理
第一导师姓名：	刘宝
第一导师单位：	西安科技大学
论文提交日期：	2023-06-14
论文答辩日期：	2023-06-02
论文外文题名：	Small Object Detection of UAV Aerial Photography Vehicles Based on Deep Learning
论文中文关键词：	深度学习 ; 车辆小目标检测 ; 无人机航拍检测 ; 注意力机制 ; 特征融合
论文外文关键词：	Deep learning ; Vehicle small target detection ; UAV aerial photography detection ; Attention mechanism ; Feature fusion
论文中文摘要：	︿无人机航拍车辆检测是无人机与图像检测领域中重要的研究内容之一。近些年，深度学习由于其强大的泛化能力，将深度学习应用在各个领域成为众多行业重点探索的内容之一。而基于深度学习的无人机航拍车辆检测中，由于航拍车辆目标小，复杂环境干扰等因素，导致经典航拍车辆检测算法在检测时产生漏检与误检现象，从而致使航拍车辆检测精度低。针对无人机航拍车辆检测中车辆漏检问题，提出基于全局-局部注意力机制无人机航拍车辆小目标检测模型。该模型一方面关注局部和全局范围内的小目标，增加车辆小目标的权值。另一方面将小目标信息保留至网络深层，提取更加丰富的语义信息。其次针对检测网络误检车辆目标情况，提出基于多分支特征金字塔无人机航拍车辆小目标检测模型。形成多尺度特征图在不同分支上进行多特征融合，抑制伪信息对车辆的干扰，增强网络对小目标细节特征的提取。之后根据两个模型的特点，将两个模型结合形成无人机航拍车辆小目标检测模型。采用K-means++聚类算法与遗传变异算法结合的方式，获得有效合适的锚框参数。然后对无人机航拍车辆小目标检测方法的模型结构和损失函数进行优化设计，重新设置检测模型的训练方式。最后实验分为三部分，第一首先将全局-局部注意力机制模型与通道、空间和混合域注意力机制进行对比实验，验证所提模型降低了对车辆的漏检率。之后在VOC2007数据集上验证基于全局-局部注意力机制航拍车辆检测模型的鲁棒性及其泛化能力。第二将多分支特征金字塔模型与特征金字塔和全局-局部注意力机制检测模型进行对比，验证所提模型降低了对车辆的误检率。第三首先对所提无人机航拍车辆小目标检测模型进行消融对比实验，其次在中科院航拍数据集上对比航拍车辆检测的经典算法如：Yolov5s，SSD，DSSD，MobileNetv3和HRDNet。实验结果表明，本文提出无人机航拍车辆小目标检测模型对航拍车辆检测的召回率、准确率、mAP值和F1值都有提升，验证了提出模型的优越性。解决了无人机航拍车辆检测目前存在的漏检和误检问题，为无人机航拍检测领域提供了思路。﹀
论文外文摘要：	︿ UAV aerial vehicle inspection is one of the important research elements in the field of UAV and image inspection. In recent years, deep learning has become one of the key explorations in many industries by applying deep learning to various fields due to its powerful generalization ability. In the aerial vehicle detection based on deep learning, due to the small target of aerial vehicle and complex environment interference, the classical aerial vehicle detection algorithm generates leakage and misdetection phenomenon when detecting, thus resulting in low accuracy of aerial vehicle detection. For the vehicle leakage detection problem in UAV aerial vehicle detection, a small target detection model for UAV aerial vehicles based on global-local attention mechanism is proposed. On the one hand, the model focuses on the small targets in the local and global scope, and increases the weights of vehicle small targets. On the other hand, the small target information is retained to the deeper layer of the network to extract richer semantic information. Secondly, for the detection network misdetection of vehicle targets, a multi-branch feature pyramid based UAV aerial vehicle small target detection model is proposed. The multi-scale feature map is formed to perform multi-feature fusion on different branches to suppress the interference of pseudo-information on vehicles and enhance the extraction of small target detail features by the network. After that, the two models are combined to form the UAV aerial photography vehicle small target detection model according to the characteristics of the two models. The combination of K-means++ clustering algorithm and genetic variation algorithm is used to obtain the effective and suitable anchor frame parameters. Then the model structure and loss function of the UAV aerial vehicle small target detection method are optimally designed and the training method of the detection model is reset. The final experiment is divided into three parts, the first part firstly compares the global-local attention mechanism model with the channel, spatial and mixed domain attention mechanisms for experiments to verify that the proposed model reduces the missed detection rate of vehicles. After that, the robustness of the global-local attention mechanism-based aerial vehicle detection model and its generalization capability are verified on the VOC2007 dataset. In the second part, the multi-branch feature pyramid model is compared with the feature pyramid and global-local attention mechanism detection models to verify that the proposed model reduces the false detection rate of vehicles. The third part firstly conducts ablation comparison experiments on the proposed UAV aerial vehicle small target detection model, and secondly compares classical algorithms for aerial vehicle detection such as Yolov5s, SSD, DSSD, MobileNetv3 and HRDNet on the CAS aerial photography dataset. The experimental results show that the recall, accuracy, mAP value and F1 value of aerial vehicle detection are improved by the proposed method in this paper, which verifies the superiority of the proposed method. It solves the current problems of missed and false detection of UAV aerial vehicle detection, and provides ideas for the field of UAV aerial photography detection. ﹀
参考文献：	︿ [1] Fahlstrom P, Gleason T. 无人机系统导论[M] 吴汉平, 施自胜, 丁亚非等译. 二版. 北京: 电子工业出版社, 2003. [2] Jaimes A. Computer vision startups tackle AI[J]. IEEE MultiMedia, 2016, 23(4): 94-96. [3] Kumar A, Kaur A, Kumar M. Face detection techniques: a review[J]. Artificial Intelligence Review, 2019, 52: 927-948. [4] Shao L, Zhang E, Ma Q, et al. Pixel-wise semisupervised fabric defect detection method combined with multitask mean teacher[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-11. [5] Wang H, Xu Y, He Y, et al. YOLOv5-Fog: A multi objective visual detection algorithm for fog driving scenes based on improved YOLOv5[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12. [6] Pustokhina I V, Pustokhin D A, Vaiyapuri T, et al. An automated deep learning based anomaly detection in pedestrian walkways for vulnerable road users safety[J]. Safety science, 2021, 142: 105356. [7] Liu H, Tian H, Li Y, et al. Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions[J]. Energy Conversion and Management, 2015, 92: 67-81. [8] Rokach L. Decision forest: Twenty years of research[J]. Information Fusion, 2016, 27: 111-125. [9] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEEComputer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893. [10] Zhong S, Liu Y, Chen Q. Visual orientation in homogeneity based scale-invariant feature transform[J]. Expert Systems with Applications, 2015, 42(13): 5658-5667. [11] Alwan H B, Mahamud K R. Cancellable face template algorithm based on speeded-up robust features and winner-takes-all[J]. Multimedia Tools and Applications, 2020, 79(39-40): 28675-28693. [12] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788. [13] Redmon J, Farhadi A. YOLOv1: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525. [14] Farhadi A, Redmon J. Yolov3: An incremental improvement[C]//Computer vision and pattern recognition, 2018, 1804: 1-6. [15] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal Speed and Accuracy of Object Detection[J]. arXiv preprint arXiv, 2020: 2004.10934. [16] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot multibox Detector[C]// European conference on computer vision, 2016: 21-37. [17] Fu C Y, Liu W, Ranga A, et al. DSSD: Deconvolutional single shot detector[J]. arXiv preprint arXiv, 2017, 1701.06659. [18] Shen Z, Liu Z, Li J, et al. DSOD: Learning deeply supervised object detectors from scratch[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1919-1927. [19] Li Z, Zhou F. FSSD: feature fusion single shot multibox detector[J]. arXiv preprintar arXiv, 2017, 1712.00960. [20] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, Montreal, Quebec, Canada, 2015: 91-99. [21] Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29. [22] Cai Z, Fan Q, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection[C]// European conference on computer vision, 2016: 354-370. [23] Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761-769. [24] Bodla N, Singh B, Chellappa R, et al. Soft-NMS-Improving Object Detection With One Line of Code[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 5561-5569. [25] Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 6154-6162. [26] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 2117-2125. [27] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]// Proceedings of the IEEE international conference on computer vision, 2017: 2961-2969. [28] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]// Proceedings of the IEEE international conference on computer vision, 2017: 2980-2988. [29] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8759-8768. [30] Pang J, Chen K, Shi J, et al. Libra r-cnn: Towards balanced learning for object detection[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 821-830. [31] Law H, Deng J. CornerNet: Detecting objects as paired keypoints[C]// Proceedings of the European conference on computer vision (ECCV), 2018: 734-750. [32] Duan K, Bai S, Xie L, et al. CenterNet: keypoint triplets for object detection[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2019: 6569-6578. [33] Li Z, Peng C, Yu G, et al. Detnet: A backbone network for object detection[J]. arXiv preprint arXiv, 2018:1804.06215. [34] Wang J, Chen K, Yang S, et al. Region Proposal by Guided Anchoring[C]// 2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [35] 戴媛, 易本顺, 肖进胜, 等. 基于改进旋转区域生成网络的遥感图像目标检测[J]. 光学学报, 2020, 40(1):0111020. [36] Singh B, Davis L S. An analysis of scale invariance in object detection snip[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 3578-3587. [37] Liu Y. Dense multiscale feature fusion pyramid networks for object detection in UAV-captured images[J]. arXiv preprint arXiv, 2020: 2012.10643. [38] Huang G, Liu Z, Laurens V D M, et al. Densely Connected Convolutional Networks[J]. arXiv preprint arXiv, 2016, 1608, 06993. [39] Yang X, Yang J, Yan J, et al. Scrdet: Towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 8232-8241. [40] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 5693-5703. [41] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv, 2014: 1409-1556. [42] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [43] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778. [44] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 1492-1500. [45] Gao S H, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(2): 652-662. [46] Zhao L, Wang L. A new lightweight network based on MobileNetv3[J]. KSII Transactions on Internet and Information Systems (TIIS), 2022, 16(1): 1-15. [47] Howard A G, Zhu M, Chen B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv, 2017: 1704-1713. [48] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 4510-4520. [49] 唐东凯, 王红梅, 胡明, 等. 优化初始聚类中心的改进 K-means 算法[J]. 小型微型计算机系统, 2018, 39(8): 1819-1823. [50] 段玉倩, 贺家李. 遗传算法及其改进[J]. 电力系统及其自动化学报, 1998, 10: 39- 52. ﹀
中图分类号：	TP391
开放日期：	2024-06-14

附件下载