查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于残差收缩网络与注意力机制的遥感图像目标检测算法
姓名：	郭松宜
学号：	19208207030
保密级别：	公开
论文语种：	chi
学科代码：	085211
学科名称：	工学 - 工程 - 计算机技术
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2022
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机技术
研究方向：	图像处理
第一导师姓名：	高晔
第一导师单位：	西安科技大学
论文提交日期：	2022-06-22
论文答辩日期：	2022-06-07
论文外文题名：	Remote sensing image target detection algorithm based on residual shrinkage network and attention mechanism
论文中文关键词：	遥感图像 ; 残差收缩网络 ; 注意力机制 ; 目标检测 ; 轻量化网络
论文外文关键词：	Remote sensing images ; Residual shrinkage network ; Attention mechanism ; Target detection ; Lightweight network
论文中文摘要：	︿遥感图像目标检测是对遥感图像分析应用的一项重要内容，可以应用于军事打击，城市规划和海情监测等领域。目前，基于深度学习的算法是自然图像目标检测的主流，但是由于遥感图像的特殊性，使得深度学习目标检测算法用于遥感图像时，精度下降，并且模型的参数量大，难以轻量化部署。根据这些问题，论文的主要工作如下：（1）针对遥感图像特殊性造成检测精度低的问题，提出基于残差收缩网络的遥感图像目标检测算法。该算法采用残差收缩网络作为特征提取网络，减少无用背景信息对检测效果的影响；除了常规的遥感图像增强方法比如：裁剪、旋转等，增加使用Mosaic 图像增强方法，增强对小目标的检测效果；设计结合最大值池化与均值池化的空间金字塔池化对特征进行充分融合，并结合通道注意力机制，筛选有效特征，增强算法模型对旋转目标和多尺度目标的检测效果；采用CIOU损失对目标候选区进行优化，使其定位更准确，提升对密集排列目标的检测效果。实验证明：改进的算法相比于原算法的总体 mAP 由 89.2%提升至 92.2%，获得了更好的性能表现。（2）针对深度学习目标检测模型参数量大，模型复杂，难以轻量化部署在无人机等设备中使用这一问题，提出结合混合注意力机制的轻量化遥感图像目标检测算法。该算法构建浅层轻量化的网络模型，最大程度的减少参数量，提升检测速度。为了保持精度和速度的平衡，改进下采样模块，使用空洞卷积的特征融合模块，并结合混合注意力机制。同时，对预测目标边界框，采用Kmeans聚类进行预先的精确调整，在减少参数量的同时，降低精度的损耗。实验证明：改进算法的模型文件大小仅为3.5M，同时检测速度可以达到0.022s，mAP可以达到82.9%，在轻量化模型的同时依然可以保证精度和实时性。﹀
论文外文摘要：	︿ Remote sensing image target detection is an important part of remote sensing image analysis and application. It can be applied to military strike, urban planning, sea monitoring and other fields. At present, the algorithm based on deep learning is the mainstream of natural image target detection,However, due to the particularity of remote sensing images, the accuracy of deep learning target detection algorithm used in remote sensing images decreases, and the parameters of the model are large, so it is difficult to deploy lightweight. According to the above problems, the main work of the paper is as follows: (1) To address the problem of poor detection accuracy caused by the specificity of remote sensing images,a target detection algorithm based on residual shrinkage network for remote sensing images is proposed.The algorithm adopts the residual shrinkage network as the feature extraction network to reduce the influence of useless background information on the detection effect; in addition to the conventional remote sensing image enhancement methods such as cropping and rotation, the Mosaic image enhancement method is added to enhance the detection effect of small targets; the spatial pyramid pooling combining the maximum pooling and mean pooling is designed to fully fuse the features and combine with the channel attention mechanism to filter the effective features and enhance the detection effect. mechanism to filter the effective features and enhance the detection effect of the algorithm model for rotating targets and multi-scale targets; the CIOU loss is used to optimize the target candidate area for more accurate localization and improve the detection effect for densely arranged targets. It is experimentally demonstrated that the overall mAP of the improved algorithm is improved from 89.2% to 92.2% compared with the original algorithm,and better performance is obtained. (2) To address the problem that deep learning target detection models have large number of parameters and complex models, which are difficult to be deployed in UAVs and other devices in a lightweight manner, we propose a lightweight remote sensing image target detection algorithm combining hybrid attention mechanism. The algorithm constructs a shallow lightweight network model to minimize the number of parameters and improve the detection speed. To maintain the balance of accuracy and speed, the downsampling module is improved and a feature fusion module with null convolution is used and combined with a hybrid attention mechanism. Meanwhile, for predicting the target bounding box, Kmeans clustering is used for pre-precise adjustment to reduce the loss of accuracy while reducing the number of parameters. Experiments prove that the model file size of the improved algorithm is only 3.5M, while the detection speed can reach 0.022s and mAP can reach 82.9%, which can still guarantee the accuracy and real-time performance while lightweighting the model. ﹀
参考文献：	︿ [1]王彦情,马雷,田原. 光学遥感图像舰船目标检测与识别综述 [J].自动化学报， 2011,37(09):1029-1039. [2] 聂光涛,黄华.光学遥感图像目标检测算法综述[J].自动化学报,2021,47(8): 1749−1768. [3] 张朝阳,程海峰,陈朝辉,等.高光谱遥感的发展及其对军事装备的威胁[J].光电技术应用,2008(01):10-12. [4] McKeown Jr D M, Denlinger J L. Cooperative methods for road tracking in aerial imagery[C]//the 1988 DARPA IUS Workshop. 1988: 327-341. [5] Fischler M A, Elschlager R A. The representation and matching of pictorial structures[J]. IEEE Transactions on computers,1973,100(1): 67-92. [6] Huertas A, Nevatia R. Detecting buildings in aerial images[J]. Computer vision， graphics, and image processing,1988,41(2): 131-152. [7] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//the 2005 IEEE computer society conference on computer vision and pattern recognition. Ieee,2005: 886-893. [8] Szeliski R. Computer Vision: algorithms and applications[M]. Springer-Verlag New York, Inc. 2011. [9] Lowe, D. G. Distinctive image features from fcale-invariant keypoints[J]. International Journal of Computer Vision 2004(60):91–110. [10] 李航.统计学习方法[M].北京:清华大学出版社,2012. [11] Sain, Stephan R. The nature of statistical learning theory[J]. Technometrics,1997, 38(4):409-409. [12] Felzenszwalb, Pedro F, et al. Object detection with discriminatively trained part-based models. [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2010, 32(9):1627-1645. [13] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, 2014: 580-587. [14] Girshick R. Fast R-CNN[C]//the IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2015:1440-1448. [15] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//International Conference on Neural Information Processing Systems. Cambridge, Massachusetts: MIT Press, 2015:91-99. [16] He k, Gkoxri G, Dollar P, et al. Mask R-CNN [J]. IEEE Transactionson Pattern Analysis and Mashine Intelligence,2020,42(2):386-397. [17] Redmon J, Divvala S, Girshick R, et al．You only look once: unified,real-time object detection[C]//IEEE Conference on Computer Vision & Pattern Recongnition,2016: 779-788． [18] Redmon J, Farhadi A. YOLO9000:better,faster,stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition. Honolulu, 2017:6517-6525. [19] Redmon J, Farhadi A. YOLOv3:an incremental improvement[C]//IEEE Conference on Computer Vision & Pattern Recongnition. Washington D.C., USA, 2018: 1-6. [20] Alexey B, Wang C Y, Yuan H Mark L. YOLOv4: optimal speed and accuracy of object Detection [EB/OL] [2020-4-23] https://arxiv.org/pdf/2004.10934.pdf. [21] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision,2016:21-37. [22] Tian Z, Shen C, Chen H, et al. FCOS:fully convolutional one-stage object detection[C]//the IEEE/CVF international conference on computer vision. 2019:9627-9636. [23] Law H , Deng J. CornerNet: detecting objects as paired keypoints[J].International Journal of Computer Vision, 2020, 128(3):642-656. [24] 李玉峰,顾曼璇,赵亮.采用改进 Faster R-CNN 的遥感图像目标检测方法[J]. 信号处理,2020,36(08):1363-1373. [25] 郑哲,雷琳,孙浩,匡纲要. FAGNet: 基于 MAFPN 和GVR的遥感图像多尺度目标检测算法[J]．计算机辅助设计与图形学学报.2021,33(6):1-12. [26] 李红艳,李春庚,安居白,任俊丽．注意力机制改进卷积神经网络的遥感图像目标检测[J].中国图象图形学报,2019,24( 08) :1400-1408． [27] Woo S, Park J, Lee J, et al. CBAM: convolutional block attention module[C]//European Conference on Computer Vision. Munich:Springer,2018:3-19. [28] 吴湘宁,贺鹏,邓中港,李佳琪,王稳,陈苗.一种基于注意力机制的小目标检测深度学习模型[J].计算机工程与科学,2021,43(1):95-104. [29] Yang X, Yang J, Yan J, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019:8232-8241. [30] 董彪,熊风光,韩燮,况立群,徐清宇基于改进 Yolov3算法的遥感建筑物检测研究[J].计算机工程与应用，2020,56(18):209-213. [31] 成喆,吕京国,白颖奇,曹逸飞. 结合RPN网络与SSD算法的遥感影像目标检测算法[J]. 测绘科学,2021,46(04):75-82+99. [32] Adam Van Etten. You Only Look Twice:rapid multi-scale object detection in satellite imagery[J].arXiv: 1805. 09512,2018． [33] Zhou C H, Zhang J S, Liu J M, et al. Bayesian transfer learning for object detection in optical remote sensing images[J].IEEE Transactions on Geoscience and Remote Sensing,2020,58(11): 7705-7719. [34] Zhang G J, Lu S J，Zhang W. CAD-Net: A context-aware detection network for objects in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing， 2019,57(12): 10015-10024. [35] Wei H, Zhou L, Zhang Y, et al. Oriented objects as pairs of middle lines[J]. ISPRS Journal of Photogrammetry and Remote Sensing,2020,169:268-279. [36] Cheng G, Han J.A survey on object detection in optical remote sensing images[J]. Isprs Journal of Photogrammetry & Remote Sensing, 2016, 117:11-28. [37] Xia G S, Bai X, Ding J, et al. DOTA:A large-scale dataset for object detection in aerial Images[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 3974-3983. [38] Li K, Wan G, Cheng G, et al. Object detection in optical remote sensing images: A survey and a new benchmark[J].ISPRS Journal of Photogrammetry and Remote Sensing, 2020,159:296-307. [39] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations. 2015: 1-14. [40] Gao S H, Cheng M M, Zhao K, et al.Res2Net:A new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, PP(99):1-1. [41] Xie S, Girshick R, Dollar P, et al. Aggregated residual transformations for deep neural networks[C]//the IEEE conference on computer vision and pattern recognition.2017: 1492-1500. [42] Ma N, Zhang X, Zheng H T, et al.ShuffleNetV2:practical guidelines for efficient CNN architecture design[C]//European Conference on Computer Vision, 2018: 116–131. [43] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducting internal covariate shift[C]//International Conference on Learning Representations,2015:1-11. [44] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]// the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125. [45] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European Conference on Computer Vision.Amsterdam,Netherlands: Springer,2016:483-499. [46] Zheng Z H, Wang P, Liu W, et al. DistanceIoU loss:faster and better learning for bounding box regression[C]//AAAI Conference on Artificial Intelligence,New York,USA,2020: 12993-13000. [47] 高晔,郭松宜,厍向阳.基于残差收缩网络的遥感图像目标检测算法[J/OL].计算机工程与应用. https://kns.cnki.net/kcms/detail/11.2127.TP.20211116.2043.014.html,2021. [48] Zhao M, Zhong S, Fu X, et al.Deep residual shrinkage networks for fault diagnosis[J],IEEE Transactions on Industrial Informatics,2020,16(7),4681-4690. [49] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//the IEEE Conference on Computer Vision and PatternRecognition,2018,pp.7132-7141. [50] Rezatofighi H, Tsoi N, Gwak J Y ,et al.Generalized intersection over union:A metric and a loss for bounding box regression[C]//the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 658-666. [51] Wang C Y, Mark Liao H Y, Wu Y H, et al.CSPNet:a new backbone that can enhance learning capability of CNN[C]//the 2020 IEEE/CVF Conference on ComputerVision and Pattern Recognition Workshops. Piscataway:IEEE,2020:1571-1580. [52] Yu F, Vladlen K. Multi-scale context aggregation by dilated convolutions[C]//the International Conference on Learning Representations,2016:1-14. ﹀
中图分类号：	TP751
开放日期：	2022-06-22

附件下载