论文中文题名: | 注意力引导卷积神经网络的高分辨率遥感影像分类方法研究 |
姓名: | |
学号: | 18210063041 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081602 |
学科名称: | 工学 - 测绘科学与技术 - 摄影测量与遥感 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2021 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 遥感图像智能解译 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
论文提交日期: | 2021-06-11 |
论文答辩日期: | 2021-05-31 |
论文外文题名: | Attention guided convolutional neural network for High resolution remote sensing image classification |
论文中文关键词: | |
论文外文关键词: | Remote sensing image classification ; Fine segmentation of ground features ; Attention mechanism ; PointRend ; Gated convolution |
论文中文摘要: |
摘 要 遥感作为大范围地表监测手段,对地理国情监测、环境变化研究、军事目标识别和可持续发展规划具有重要意义。与中低分辨率遥感影像相比,高分辨率遥感影像包含了更多地物细节信息,具有更丰富的纹理、形状和拓扑结构以及邻接关系等几何信息,能够为遥感影像智能解译提供客观可靠的信息来源。尽管近年来基于卷积神经网络的高分辨率遥感影像分类与典型地物分割得到了长足发展,但是目前仍面临着一些亟待解决的问题,一是大幅宽遥感影像中普遍存在的样本严重不均衡、“类内差异大,类间差异小”以及多尺度密集信息获取困难等问题导致整体分类精度偏低,尤其在小尺寸及占比较少的样本中表现得更加明显;二是地物边界复杂以及地物空间结构信息、全局上下文信息和边界信息获取不足造成分割边界不连贯,锯齿化现象严重。因此,本文旨在围绕以上基于卷积神经网络进行高分辨率遥感影像分类中存在的问题开展相关方法研究。本文的主要研究内容和成果如下: 针对高分辨率遥感影像样本不均衡、“类内差异大,类间差异小”及其密集多尺度特征获取困难等问题,本文提出了注意力引导多尺度空间-通道信息联合的高分辨率遥感影像分类方法(Multi-Scale Dense Feature Extraction Network Module Composed of Position Attention,Channel Attention and Atrous Spatial Pyramid Pooling,PCASPPNet),该方法设计了一个由通道注意力模块、空间注意力模块以及空洞空间卷积池化金字塔(Atrous Spatial Pyramid Pooling,ASPP)组成的并联结构,在缓解了ASPP结构对输入特征的利用率较低,部分有用信息被忽略等问题的同时辅助引导多尺度空间-通道信息聚合,获取密集多尺度特征。对于Vaiheigen和GID数据集,实验结果表明,与多个经典方法相比,该方法的分类精度明显提升,尤其在小尺寸目标和样本占比较少的地物分类中更具优势。 验证不同注意力模块在遥感影像中的响应机制,同时设计消融实验探究不同注意力模块的信息聚合能力。结果表明纵横交叉注意力模块(Criss-Cross Attention Module,CCAM)只响应被标记点“十字路径”上的语义信息,递归纵横交叉注意力模块(Recurrent Criss-Cross Attention Module,RCCAM)通过递归两次CCAM可以获取整幅影像上该类别的相关信息,位置注意力模块(Position Attention Module,PAM)则是在遥感图像上直接建立某一像素与其他所有像素之间的联系来捕捉到同种类别间的相似性语义信息及其远距离依赖关系,故全局上下文信息聚合能力PAM>RCCAM>CCAM。此外,通道注意力模块(Channel Attention Module,CAM)通过模拟不同通道间的依赖关系,明确响应了不同的类别。 针对地物空间结构信息、全局上下文信息及边界信息挖掘不足造成的边缘像素易于错分等问题,本文设计了一种基于门控卷积和注意力模块的典型地物精细化语义分割网络(Fine Segmentation Network Based on Gated Convolution and Attention Module,GAFSNet),该网络通过语义分割支路获取具有辨别力的地物特征,明确地物是什么,利用边缘检测支路获取地物准确的位置和边界特征,明确地物在哪里,采用这种“what-where”联合学习方式提高网络的细粒度表征,从而克服分割边界不连贯,锯齿化现象严重等问题。同时,采用PointRend模块改进基线方法DeeplabV3+和FPN,以迭代细分策略提高其在边界处的分类精度。对于WHU数据集,改进后的基线方法能够自适应地渲染出抗锯齿的高质量分割结果。此外,分别与改进前后的基线方法相比,GAFSNet被证明在建筑物精细化分割中取得较好的成果。 |
论文外文摘要: |
ABSTRACT As a large-scale surface monitoring method, remote sensing is of great significance for geographic national conditions monitoring, environmental change research, military target identification, and sustainable development planning. Compared with low and medium resolution remote sensing images, high resolution remote sensing images contain more details of ground objects, such as, richer texture, shape, topology and adjacency information, which can provide objective and reliable information for intelligent interpretation task. In recent years, although the classification of high-resolution remote sensing images and the segmentation of typical surfaces based on convolutional neural networks have made great progress, there are still some urgent problems to be solved. Firstly, in large and wide remote sensing images, there are some common problems, such as serious sample imbalance, "large intra-class differences, small inter-class differences" and difficulty in obtaining dense multi-scale information, which lead to low overall classification accuracy, especially in small and less samples. Secondly, the complex boundary of ground object and the lack of spatial structure information, global context information and boundary information lead to the incoherent segmentation boundary and serious sawtooth phenomenon. Therefore, in view of the above problems in high-resolution remote sensing image classification based on convolution neural network, this paper aims to carry out relevant research. The main research contents and results of this paper are as follows: Aiming at the problems of imbalanced high-resolution remote sensing image samples, "large intra-class differences, small inter-class differences" and difficulty in acquiring dense multi-scale features, the high resolution remote sensing image classification method based on the attention-guided multi-scale spatial and channel information joint are proposed in this paper, named PCASPPNet. This method includes a parallel structure composed of channel attention module, spatial attention module and atrous spatial pyramid pooling (ASPP), which alleviates the low utilization rate of input features and the neglect of some useful information in ASPP module, and assists in guiding the aggregation of multi-scale spatial and channel information to obtain dense multi-scale features. For the Vaiheigen and GID datasets, the experimental results show that the classification accuracy of PCASPPNet is significantly improved compared with multiple classic methods, especially in the classification of small-scale targets and less samples. Verify the response mechanism of different attention modules in remote sensing images, and design ablation experiments to explore the information aggregation ability of different attention modules. The results show that CCAM only responds to the semantic information on the "criss-cross path" of the marked point, but RCCAM can obtain the relevant information of the marked point in the whole image by recursing CCAM twice, and PAM captures the similarity semantic information and long-distance dependence of the same category by directly establishing the relationship between a certain pixel and other pixels in the remote sensing image. Therefore, in terms of global context information aggregation capability, PAM is the best, RCCAM is the second, and CCAM is the worst. In addition, CAM clearly responds to different categories by simulating the dependency between different channels. Aiming at the problem of edge pixels being prone to misclassification caused by insufficient spatial structure information and boundary information of ground features, we design a typical ground feature fine segmentation network based on two parallel branches composed of boundary extraction and semantic segmentation, named GAFSNet. In this network, the semantic segmentation branch is used to obtain the discriminative characteristic of the ground features which for clarifying what the ground features are. The edge detection branch is used to obtain the accurate position and boundary features of the ground features which for clarify where the ground features are. The "what-where" joint learning method improves the fine-grained representation of the network, and then overcomes the problems of incoherent segmentation boundary and serious sawtooth phenomenon. Meanwhile, the pointrend module is used to improve the classification accuracy of baseline method DeeplabV3+ and FPN at the boundary based on iterative subdivision strategy. For WHU datasets, the improved baseline method can adaptively render anti-aliasing high-quality segmentation results. In addition, compared with the baseline network and the improved baseline method, GAFSNet has been proved to achieve better results in the fine segmentation of buildings. |
参考文献: |
[1] 李德仁. 论时空大数据的智能处理与服务[J]. 地球信息科学学报, 2019, 21(12): 1825-1831. [2] 熊盛青, 葛大庆, 于峻川. 对地观测——另一个视角看地球[J]. 国土资源科普与文化, 2017, (02): 6-15. [3] 汤玉奇. 面向对象的高分辨率影像城市多特征变化检测研究[D]. 武汉:武汉大学, 2013. [22] 陈华, 陈书, 海张平. K-means算法在遥感分类中的应用[J]. 红外与激光工程, 2000, 2(29): 26-30. [23] 张帅, 钟燕飞, 张良培. 自适应差分进化的遥感影像自动模糊聚类方法[J]. 测绘学报, 2013, 2(42): 239-246. [24] 朱建华, 刘政凯. 一种多光谱遥感图象的自适应最小距离分类方法[J]. 中国图象图形学报, 2000, (1): 21-24. [33] 刘晓莉. 多种信息分割合并的面向对象遥感影像分类[J]. 测绘科学, 2014, 8: 144-147. [80] 季顺平, 魏世清. 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48(04): 50-61. |
中图分类号: | TP751 |
开放日期: | 2021-06-11 |