查看论文信息

免费浏览

查看论文信息

论文中文题名：	注意力引导卷积神经网络的高分辨率遥感影像分类方法研究
姓名：	侯沙沙
学号：	18210063041
保密级别：	公开
论文语种：	chi
学科代码：	081602
学科名称：	工学 - 测绘科学与技术 - 摄影测量与遥感
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2021
培养单位：	西安科技大学
院系：	测绘科学与技术学院
专业：	摄影测量与遥感
研究方向：	遥感图像智能解译
第一导师姓名：	黄远程
第一导师单位：	西安科技大学
第二导师姓名：	张过
论文提交日期：	2021-06-11
论文答辩日期：	2021-05-31
论文外文题名：	Attention guided convolutional neural network for High resolution remote sensing image classification
论文中文关键词：	遥感影像分类 ; 地物精细化分割 ; 注意力机制 ; 点渲染 ; 门控卷积
论文外文关键词：	Remote sensing image classification ; Fine segmentation of ground features ; Attention mechanism ; PointRend ; Gated convolution
论文中文摘要：	︿摘要遥感作为大范围地表监测手段，对地理国情监测、环境变化研究、军事目标识别和可持续发展规划具有重要意义。与中低分辨率遥感影像相比，高分辨率遥感影像包含了更多地物细节信息，具有更丰富的纹理、形状和拓扑结构以及邻接关系等几何信息，能够为遥感影像智能解译提供客观可靠的信息来源。尽管近年来基于卷积神经网络的高分辨率遥感影像分类与典型地物分割得到了长足发展，但是目前仍面临着一些亟待解决的问题，一是大幅宽遥感影像中普遍存在的样本严重不均衡、“类内差异大，类间差异小”以及多尺度密集信息获取困难等问题导致整体分类精度偏低，尤其在小尺寸及占比较少的样本中表现得更加明显；二是地物边界复杂以及地物空间结构信息、全局上下文信息和边界信息获取不足造成分割边界不连贯，锯齿化现象严重。因此，本文旨在围绕以上基于卷积神经网络进行高分辨率遥感影像分类中存在的问题开展相关方法研究。本文的主要研究内容和成果如下：针对高分辨率遥感影像样本不均衡、“类内差异大，类间差异小”及其密集多尺度特征获取困难等问题，本文提出了注意力引导多尺度空间-通道信息联合的高分辨率遥感影像分类方法（Multi-Scale Dense Feature Extraction Network Module Composed of Position Attention，Channel Attention and Atrous Spatial Pyramid Pooling，PCASPPNet），该方法设计了一个由通道注意力模块、空间注意力模块以及空洞空间卷积池化金字塔（Atrous Spatial Pyramid Pooling，ASPP）组成的并联结构，在缓解了ASPP结构对输入特征的利用率较低，部分有用信息被忽略等问题的同时辅助引导多尺度空间-通道信息聚合，获取密集多尺度特征。对于Vaiheigen和GID数据集，实验结果表明，与多个经典方法相比，该方法的分类精度明显提升，尤其在小尺寸目标和样本占比较少的地物分类中更具优势。验证不同注意力模块在遥感影像中的响应机制，同时设计消融实验探究不同注意力模块的信息聚合能力。结果表明纵横交叉注意力模块（Criss-Cross Attention Module，CCAM）只响应被标记点“十字路径”上的语义信息，递归纵横交叉注意力模块（Recurrent Criss-Cross Attention Module，RCCAM）通过递归两次CCAM可以获取整幅影像上该类别的相关信息，位置注意力模块（Position Attention Module，PAM）则是在遥感图像上直接建立某一像素与其他所有像素之间的联系来捕捉到同种类别间的相似性语义信息及其远距离依赖关系，故全局上下文信息聚合能力PAM>RCCAM>CCAM。此外，通道注意力模块（Channel Attention Module，CAM）通过模拟不同通道间的依赖关系，明确响应了不同的类别。针对地物空间结构信息、全局上下文信息及边界信息挖掘不足造成的边缘像素易于错分等问题，本文设计了一种基于门控卷积和注意力模块的典型地物精细化语义分割网络（Fine Segmentation Network Based on Gated Convolution and Attention Module，GAFSNet），该网络通过语义分割支路获取具有辨别力的地物特征，明确地物是什么，利用边缘检测支路获取地物准确的位置和边界特征，明确地物在哪里，采用这种“what-where”联合学习方式提高网络的细粒度表征，从而克服分割边界不连贯，锯齿化现象严重等问题。同时，采用PointRend模块改进基线方法DeeplabV3+和FPN，以迭代细分策略提高其在边界处的分类精度。对于WHU数据集，改进后的基线方法能够自适应地渲染出抗锯齿的高质量分割结果。此外，分别与改进前后的基线方法相比，GAFSNet被证明在建筑物精细化分割中取得较好的成果。﹀
论文外文摘要：	︿ ABSTRACT As a large-scale surface monitoring method, remote sensing is of great significance for geographic national conditions monitoring, environmental change research, military target identification, and sustainable development planning. Compared with low and medium resolution remote sensing images, high resolution remote sensing images contain more details of ground objects, such as, richer texture, shape, topology and adjacency information, which can provide objective and reliable information for intelligent interpretation task. In recent years, although the classification of high-resolution remote sensing images and the segmentation of typical surfaces based on convolutional neural networks have made great progress, there are still some urgent problems to be solved. Firstly, in large and wide remote sensing images, there are some common problems, such as serious sample imbalance, "large intra-class differences, small inter-class differences" and difficulty in obtaining dense multi-scale information, which lead to low overall classification accuracy, especially in small and less samples. Secondly, the complex boundary of ground object and the lack of spatial structure information, global context information and boundary information lead to the incoherent segmentation boundary and serious sawtooth phenomenon. Therefore, in view of the above problems in high-resolution remote sensing image classification based on convolution neural network, this paper aims to carry out relevant research. The main research contents and results of this paper are as follows: Aiming at the problems of imbalanced high-resolution remote sensing image samples, "large intra-class differences, small inter-class differences" and difficulty in acquiring dense multi-scale features, the high resolution remote sensing image classification method based on the attention-guided multi-scale spatial and channel information joint are proposed in this paper, named PCASPPNet. This method includes a parallel structure composed of channel attention module, spatial attention module and atrous spatial pyramid pooling (ASPP), which alleviates the low utilization rate of input features and the neglect of some useful information in ASPP module, and assists in guiding the aggregation of multi-scale spatial and channel information to obtain dense multi-scale features. For the Vaiheigen and GID datasets, the experimental results show that the classification accuracy of PCASPPNet is significantly improved compared with multiple classic methods, especially in the classification of small-scale targets and less samples. Verify the response mechanism of different attention modules in remote sensing images, and design ablation experiments to explore the information aggregation ability of different attention modules. The results show that CCAM only responds to the semantic information on the "criss-cross path" of the marked point, but RCCAM can obtain the relevant information of the marked point in the whole image by recursing CCAM twice, and PAM captures the similarity semantic information and long-distance dependence of the same category by directly establishing the relationship between a certain pixel and other pixels in the remote sensing image. Therefore, in terms of global context information aggregation capability, PAM is the best, RCCAM is the second, and CCAM is the worst. In addition, CAM clearly responds to different categories by simulating the dependency between different channels. Aiming at the problem of edge pixels being prone to misclassification caused by insufficient spatial structure information and boundary information of ground features, we design a typical ground feature fine segmentation network based on two parallel branches composed of boundary extraction and semantic segmentation, named GAFSNet. In this network, the semantic segmentation branch is used to obtain the discriminative characteristic of the ground features which for clarifying what the ground features are. The edge detection branch is used to obtain the accurate position and boundary features of the ground features which for clarify where the ground features are. The "what-where" joint learning method improves the fine-grained representation of the network, and then overcomes the problems of incoherent segmentation boundary and serious sawtooth phenomenon. Meanwhile, the pointrend module is used to improve the classification accuracy of baseline method DeeplabV3+ and FPN at the boundary based on iterative subdivision strategy. For WHU datasets, the improved baseline method can adaptively render anti-aliasing high-quality segmentation results. In addition, compared with the baseline network and the improved baseline method, GAFSNet has been proved to achieve better results in the fine segmentation of buildings. ﹀
参考文献：	︿参考文献 [1] 李德仁. 论时空大数据的智能处理与服务[J]. 地球信息科学学报, 2019, 21(12): 1825-1831. [2] 熊盛青, 葛大庆, 于峻川. 对地观测——另一个视角看地球[J]. 国土资源科普与文化, 2017, (02): 6-15. [3] 汤玉奇. 面向对象的高分辨率影像城市多特征变化检测研究[D]. 武汉：武汉大学, 2013. [4] Schmidhuber, Jürgen. Deep learning in neural networks: an overview[J]. Neural Netw, 2015, 61: 85-117. [5] Mcculloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity[J]. biol math biophys, 1943, 5: 115-133. [6] Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386-408. [7] Minsky M L, Papert S. Perceptrons: an introduction to computational geometry[J]. The American Journal of Psychology, 1969, 3(84): 449-452. [8] David E R, Hinton G E, Williams R J. Learning representations by back propagating errors[J]. Nature, 1986, 323(9): 533-536. [9] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. [10] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554. [11] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25: 1097-1105. [12] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016: 770-778. [13] Huang G, Liu Z, Van D M L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017: 4700-4708. [14] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015: 1-9. [15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014. [16] Lu Z, Rathod V, Votel R, et al. Retinatrack: Online single stage joint detection and tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020: 14656-14666. [17] Zheng Z, Zhong Y, Wang J, et al. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020: 4095-4104. [18] Yu C Q, Wang J B, Gao C X, et al. Context prior for scene segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020: 12413-12422. [19] Tong X Y, Xia G S, Lu Q K, et al. Land-cover classification with high-resolution remote sensing images using transferable deep models[J]. Remote Sensing of Environment, 2020, 237: 111-322. [20] Estoque R C, Murayama Y, Akiyama C M. Pixel-based and object-based classifications using high and medium spatial resolution imageries in the urban and suburban landscapes[J]. Geocarto International, 2015, 30(10): 1113-1129. [21] Irvin B J, Ventura S.J, Slater B.K. Fuzzy and isodata classification of landform elements from digital terrain data in Pleasant Valley, Wisconsin[J]. Geoderma, 1997, 77(2-4): 137-154. [22] 陈华, 陈书, 海张平. K-means算法在遥感分类中的应用[J]. 红外与激光工程, 2000, 2(29): 26-30. [23] 张帅, 钟燕飞, 张良培. 自适应差分进化的遥感影像自动模糊聚类方法[J]. 测绘学报, 2013, 2(42): 239-246. [24] 朱建华, 刘政凯. 一种多光谱遥感图象的自适应最小距离分类方法[J]. 中国图象图形学报, 2000, (1): 21-24. [25] Wang P, Hui F, Zhai Z. A land cover classification method for antarctica using support vector machine and decision tree[J]. Open Cybernetics & Systemics Journal, 2015, 1(9): 2920-2928. [26] Nugroho J T, Sari N M, Kushardono D. A comparison of object-based and pixel-based approaches for land use land cover classification using LAPAN-A2 microsatellite data[J]. International Journal of Remote Sensing and Earth Sciences, 2017, 14(1): 27-36. [27] Zhang X M, He G J, Zhang Z M, et al. Spectral-spatial multi-feature classification of remote sensing big data based on a random forest classifier for land cover mapping[J]. Cluster Computing, 2017, 20(3): 2311-2321. [28] Chen G, Zhang X, Wang Q, et al. Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images[J]. IEEE Journal of Selected topics in applied earth observations and remote sensing, 2018, 11(5): 1633-1644. [29] Baatz M. Object-oriented and multi-scale image analysis in semantic networks[C]//Proc the International Symposium on Operationalization of Remote Sensing, 1999. [30] Tzotsos A, Argialas D. Support vector machine classification for object-based image analysis[M]. Berlin: Springer Berlin Heidelberg, 2008: 663-677. [31] Blaschke T. Object based image analysis for remote sensing[J]. ISPRS Journal of Photogrammetry and Remote Sensing[J], 2010, 65(1): 2-16. [32] Kim M, Warner T A, Madden M, et al. Multi-scale GEOBIA with very high spatial resolution digital aerial imagery: scale, texture and image objects[J]. International Journal of Remote Sensing, 2011, 32(10): 2825-2850. [33] 刘晓莉. 多种信息分割合并的面向对象遥感影像分类[J]. 测绘科学, 2014, 8: 144-147. [34] Zheng C, Wang L. Semantic segmentation of remote sensing imagery using object-based markov random field model with regional penalties[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(5): 1924-1935. [35] Ma L, Li M, Ma X, et al. A review of supervised object-based land-cover image classification[J]. ISPRS Journal of Photogrammetry & Remote Sensing, 2017, 130: 277-293. [36] Qiao H, Wan X. Object-based classification from Tiangong-2 using support vector machine optimized with evolutionary algorithm[C]//Proceedings of the Tiangong-2 Remote Sensing Application Conference. Springer, Singapore, 2019: 222-231. [37] Shu Y, Tang H, Li J, et al. Object-based unsupervised classification of VHR panchromatic satellite images by combining the HDP and IBP on multiple scenes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(11): 6148-6162. [38] Zhang L, Zhang L, Du B. Deep learning for remote sensing data: A technical tutorial on the state of the art[J]. IEEE Geoscience & Remote Sensing Magazine, 2016, 4(2): 22-40. [39] Abuelgasim A A, Gopal S, Irons J R, et al. Classification of ASAS multiangle and multispectral measurements using artificial neural networks[J]. Remote Sensing of Environment, 1996, 57(2): 79-87. [40] Hinton G E. A Practical guide to training restricted boltzmann machines[M]//Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012: 599-619. [41] Chen Y, Lin Z, Zhao X, et al. Deep learning-based classification of hyperspectral data[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2014, 7(6): 2094-2107. [42] Merentitis A, Debes C. Automatic fusion and classification using random forests and features extracted with deep learning[C]//IEEE International Geoscience and Remote Sensing Symposium, 2015: 2943-2946. [43] Marmanis D, Datcu M, Esch T, et al. Deep learning earth observation classification using ImageNet pretrained networks[J]. IEEE Geoscience & Remote Sensing Letters, 2015, 13(1): 105-109. [44] Liu D, Han L, Han X. High spatial resolution remote sensing image classification based on deep learning[J]. Acta Optica Sinica, 2016, 36(4): 428-441. [45] Zhang X, Chen G, Wang W, et al. Object-based land-cover supervised classification for very-high-resolution UAV images using stacked denoising autoencoders[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017, 10(7): 3373-3385. [46] Scott G J, Marcum R A, Davis C H, et al. Fusion of deep convolutional neural networks for land cover classification of high-resolution imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(9): 1638-1642. [47] Yu B, Yang L, Chen F. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2018, 11(9): 3252-3261. [48] Gong X, Liang W, Zhong X, et al. Classification method of high-resolution remote sensing scenes based on fusion of global and local deep features[J]. Acta Optica Sinica, 2019, 39(3): 2-9. [49] Ji S, Zhang Z, Zhang C, et al. Learning discriminative spatiotemporal features for precise crop classification from multi-temporal satellite images[J]. International Journal of Remote Sensing, 2020, 41(8): 3162-3174. [50] Huang H, Lan Y, Yang A. Deep learning versus object-based image analysis (OBIA) in weed mapping of UAV imagery[J]. International Journal of Remote Sensing, 2020, 41(9): 3446-3479. [51] Li X, Liu Z, Luo P, et al. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3193-3202. [52] Feng Y, Diao W, Sun X, et al. NPALOSS: neighboring pixel affinity loss for semantic segmentation in high-resolution in high-resolution aerial imagery[J]. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020, 5(2): 475-482. [53] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [54] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, 9: 249-256. [54] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015, 39(4): 640-651. [56] He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on imageNet classification[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1026-1034. [57] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234-241. [58] Çiçek Ö, Abdulkadir A, Lienkamp S.S, et al. 3D U-Net: learning dense volumetric segmentation from sparse annotation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham, 2016: 424-432. [59] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(12): 2481-2495. [60] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017: 2117-2125. [61] Chen L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. Computer Science, 2014, (4): 357-361. [62] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. [63] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017: 1-14. [64] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV), 2018: 801-818. [65] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018: 7794-7803. [66] Huang Z L, Wang X G, Wei Y C, et al. CCNet: criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020: 1939-3539. [67] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018, 7132-7141. [68] Sanghyun W, Park J, Lee J, et al. CBAM: Convolutional block attention module[C] //Proceedings of the European conference on computer vision (ECCV), 2018: 3-19. [69] Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2019: 3146-3154. [70] Yu F, Koltun V, Funkhouser T. Dilated residual networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017: 636-644. [71] Xie C W, Zhou H Y, Wu J X. Vortex pooling: Improving context representation in semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018. [72] Farahnak-Ghazani F, Baghshah M S. Multi-label classification with feature-aware implicit encoding and generalized cross-entropy loss[C]//24th Iranian Conference on Electrical Engineering (ICEE), 2016: 1574-1579. [73] Kirillov A, Wu Y, He K, et al. PointRend: image segmentation as rendering[C]// Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020: 9796-9805. [74] Tensmeyer C, Davis B, Wigington C, et al. PageNet: Page boundary extraction in historical handwritten documents[C]//Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, 2017: 59-64. [75] Wang R J, Li X, Ling C X. Pelee: A real-time object detection system on mobile devices[C]//NIPS'18 Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 31: 1967-1976. [76] Yu Z, Feng C, Liu M, et al. CASENet: deep category-aware semantic edge detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017: 1761-1770. [77] Takikawa T, Acuna D, Jampani V, et al. Gated-SCNN: gated shape CNNs for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5229-5238. [78] Acuna D, Kar A, Fidler S. Devil is in the edges: learning semantic boundaries from noisy annotations[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 11067-11075. [79] Zhen M M, Wang J L, Zhou L, et al. Joint semantic segmentation and boundary detection using iterative pyramid contexts[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020: 11075-11083. [80] 季顺平, 魏世清. 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48(04): 50-61. [81] Perazzi F, Pont T J, McWilliams B, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016: 724-732. ﹀
中图分类号：	TP751
开放日期：	2021-06-11

附件下载