论文中文题名: | 结合光谱信息增强的高分辨率遥感影像建筑物语义分割研究 |
姓名: | |
学号: | 21210061016 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 0816 |
学科名称: | 工学 - 测绘科学与技术 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2024 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 遥感图像处理与应用 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2024-06-11 |
论文答辩日期: | 2024-06-01 |
论文外文题名: | Research on building semantic segmentation in high-resolution remote sensing images combined with spectral information enhancement |
论文中文关键词: | 高分辨率遥感影像 ; 建筑物提取 ; 深度卷积神经网络 ; Transformer ; 光谱增强 |
论文外文关键词: | high-resolution remote sensing imagery ; building extraction ; deep convolutional neural network (DCNN) ; transformer ; spectral enhancement |
论文中文摘要: |
高分辨率遥感影像可为像素级的建筑物提取工作提供更高质量的数据源。近年来,深度学习技术高速发展,也为遥感影像的建筑物提取工作带来了新的分割方法。然而,受不同自然、文化以及社会发展程度的影响,建筑物的形态与分布存在一定差异,这使得网络在提取影像中不同区域的建筑物时,难以取得稳定的分割效果。此外,因遥感影像中地物光谱信息复杂,其对多尺度建筑物的提取细节也会产生不同程度的影响。为此,本文利用高分辨率遥感影像作为数据源,同时结合语义分割技术以及遥感图像处理技术,来探究具有更高精度和分割效果的建筑物提取方法。具体研究工作如下: (1)本文选取了中国陕西省西安市的部分区域作为研究区,并采用国产高分二号卫星影像构建了一套地物背景信息复杂,且建筑物形态、分布以及尺度多样的西安建筑物数据集。该数据集影像中包含了红、绿、蓝以及近红外四个波段,可为多种光谱信息处理方式提供数据支持。此外,通过与武汉大学建筑物数据集、马萨诸塞州建筑物数据集开展的质量分析实验可知,本文自建的西安建筑物数据集具有一定的可行性,且能够让语义分割网络有效地学习影像中的建筑物特征信息,丰富了用于建筑物提取的数据集多样性。 (2)构建了一种融合多种注意力机制的并行编码建筑物提取网络。该网络在总体上由编码器和解码器构成。在编码器中,采用了深度卷积神经网络与Transformer的并行编码结构,以发挥它们对局部和全局特征信息的提取优势;在网络的不同深度位置,针对性的引入了坐标注意力机制和空间通道注意力机制来桥接编码器和解码器,以保留编码过程中更为丰富的空间和语义特征信息;引入密集型空洞特征金字塔池化,从而在解码过程中的各层上采样部分捕获更多尺度的上下文信息。在西安、武汉大学和马萨诸塞州建筑物数据集上开展了消融实验,验证了本文提出网络中各结构的有效性。通过在三种实验数据集上开展所提出网络与U-Net、ResUNet++、DeepLabv3+、Swin-Unet和UNetFormer的对比实验,证明了该网络的优势性与泛化性。本研究所提出网络的F1分数得分在三种建筑物数据集上分别为93.34%、94.52%和82.59%,边界交并比得分分别为80.57%、60.39%和67.06%。 (3)设计了一种增强高分辨率遥感影像建筑物光谱信息的模块。首先,该模块计算了比值运算,并利用ReliefF算法筛选了影像的红、绿、蓝、近红外波段以及比值运算结果中特征权值最大的几组。而后,利用最小噪声比率变换将形态学建筑物指数赋予给ReliefF筛选出的结果。通过上述操作强调了影像中多波段间的信息关联,并赋予了它们更强的建筑物形态学特征,以此来提升语义分割网络的建筑物提取效果。在西安数据集上对所提出模块开展了性能实验,验证了该模块的有效性和泛化性。采用该模块进行影像处理后,本文所提出网络的F1分数和边界交并比指标分别提升至94.10%和82.13%。 |
论文外文摘要: |
High-resolution remote sensing images can provide higher-quality data sources for pixel-level building extraction. In recent years, the rapid development of deep learning technology has also brought new segmentation methods for building extraction in remote sensing images. However, due to the influence of different natural, cultural, and social development levels, there are certain differences in the shape and distribution of buildings, which makes it difficult for the network to obtain a stable segmentation effect when extracting buildings in different regions of the image. In addition, due to the complex spectral information of ground objects in remote sensing images, it will also have different degrees of influence on the extraction details of multi-scale buildings. To this end, this paper uses high-resolution remote sensing images as the data source, combining both semantic segmentation techniques and remote sensing image processing techniques to explore building extraction methods with higher accuracy and segmentation effects. Specific research work is as follows: (1) In this paper, part of Xi'an City, Shaanxi Province, China, is selected as the study area, and a set of Xi'an building dataset with complex feature background information and diverse building forms, distributions, and scales is constructed by using domestic Gaofen-2 satellite images. The dataset contains images in the red, green, blue, and near-infrared bands, which can provide data support for a variety of spectral information processing methods. In addition, through the quality analysis experiments with the WHU building dataset and the Massachusetts building dataset, we can see that the Xi'an building dataset built by ourselves in this paper has certain feasibility, and can enable the semantic segmentation network to effectively learn the building feature information in the image, and enrich the diversity of the data set used for building extraction. (2) A parallel encoded building extraction network incorporating multiple attention mechanisms is constructed. The network in general consists of an encoder and a decoder. In the encoder, a parallel coding structure of the deep convolutional neural network and Transformer is employed to take advantage of their feature extraction of local and global feature information; at different depths, positions of the network, the Coordinate Attention, and the Convolutional Block Attention Module are introduced to bridge the encoder and decoder in a targeted manner to retain richer spatial and semantic feature information during the coding process; Introducing Dense Atrous Spatial Pyramid Pooling to capture more scale context information in each layer sampling part of the decoding process. Ablation experiments are carried out on the Xi'an, WHU, and Massachusetts building datasets to verify the effectiveness of the proposed network on each structure. The superiority and generalization of the proposed network are demonstrated by carrying out comparison experiments of this network with U-Net, ResUNet++, DeepLabv3+, Swin-Unet, and UNetFormer on three experimental datasets. On three building datasets, F1 Scores of the proposed network are 93.34%, 94.52%, and 82.59%. Boundary IoUs are 80.57%, 60.39%, and 67.06%. (3) A module was designed to enhance the spectral information of buildings from high-resolution remote sensing images. First, the module calculates the ratio calculation and uses the ReliefF algorithm to screen the red, green, blue, and near-infrared bands and the groups with the largest feature weights in the ratio calculation results. Then, the morphologic building index is assigned to the results of ReliefF by using the minimum noise ratio transformation. The above operations emphasize the information correlation among the multi-bands in the image and give them stronger building morphological features, to improve the building extraction effect of the semantic segmentation network. Performance experiments on the proposed module are carried out on the Xi'an dataset to verify the effectiveness and generalization of the module. After using the module for image processing, the F1 Score and Boundary IoU of the proposed network are improved to 94.10% and 82.13%, respectively. |
参考文献: |
[1] 吕少云, 李佳田, 阿晓荟, 等. Res_ASPP_UNet++:结合分离卷积与空洞金字塔的遥感影像建筑物提取网络[J]. 遥感学报, 2023, 27(2): 502-519. [2] 胡明洪, 李佳田, 姚彦吉, 等. 结合多路径的高分辨率遥感影像建筑物提取SER-UNet算法[J]. 测绘学报, 2022, 52(5): 808-817. [14] 路慧. 基于高分辨率遥感影像语义分割的建筑物自动提取方法研究[D]. 南京: 南京信息工程大学, 2022. [17] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [19] 季顺平, 魏世清. 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48(4): 448-459. [21] Mnih V. Machine learning for aerial image labeling[M]. University of Toronto (Canada), 2013. [23] 龙丽红, 朱宇霆, 闫敬文, 等. 新型语义分割D-UNet的建筑物提取[J]. 遥感学报, 2023, 27(11): 2593-2602. [24] 张媛, 张杰林, 赵学胜, 等. 高分辨率遥感影像的建筑物轮廓信息提取方法[J]. 国土资源遥感, 2015, 27(3): 52-58. [25] 吴目宇. 多尺度特征融合的高分辨率遥感影像建筑物自动提取[D]. 武汉: 武汉大学, 2024. [26] 邢云飞, 刘萍, 谢育珽, 等. 基于高分辨率遥感影像的面向对象建筑物分级提取方法[J]. 航天返回与遥感, 2023, 44(4): 88-102. [27] 周绍光, 孙金彦, 凡莉, 等. 高分辨率遥感影像的建筑物轮廓信息提取方法[J]. 国土资源遥感, 2015, 27(3): 52-58. [28] 班瑞, 郑延召. 一种高分辨率遥感影像中的建筑物快速提取方法[P]. 中国专利: CN105719306B, 2024-04-25. [29] 尹峰, 祁琼, 许博文. 基于角点的高分辨率遥感影像建筑物提取[J]. 地理空间信息, 2018, 16(10): 58-61. [31] 魏德强. 高分辨率遥感影像建筑物提取技术研究[D]. 郑州: 解放军信息工程大学, 2013. [32] 任晓娟, 肖双九, 彭小朋. 基于改进分水岭变换的遥感图像建筑物提取[J]. 计算机应用与软件, 2011, 28(12): 249-252. [33] 赵宗泽, 张永军. 基于植被指数限制分水岭算法的机载激光点云建筑物提取[J]. 光学学报, 2016, 36(10): 503-511. [35] 胡荣明, 黄小兵, 黄远程. 增强形态学建筑物指数应用于高分辨率遥感影像中建筑物提取[J]. 测绘学报, 2014, 43(5): 514-520. [38] 洪亮, 冯亚飞, 彭双云, 等. 面向对象的多尺度加权联合稀疏表示的高空间分辨率遥感影像分类[J]. 测绘学报, 2022, 51(2): 224-237. [44] 龚健雅, 季顺平. 摄影测量与深度学习[J]. 测绘学报, 2018, 47(6): 693-704. [53] 何直蒙, 丁海勇, 安炳琪. 高分辨率遥感影像建筑物提取的空洞卷积E-Unet算法[J]. 测绘学报, 2022, 51(03): 457-467. [75] 成嘉伟, 郭荣佐, 吴建成. 双分支特征融合的遥感建筑物检测模型[J]. 计算机工程与应用, 2024: 1-10. [79] 张云佐, 郭威, 武存宇. 融合CNN和Transformer的遥感图像建筑物快速提取[J]. 光学精密工程, 2023, 31(11): 1700-1709. [89] 于文玲. 基于编码解码的深度学习遥感影像建筑物提取研究[D]. 南昌: 华东理工大学, 2022. [90] 杨栋杰, 高贤君, 冉树浩, 等. 基于多重多尺度融合注意力网络的建筑物提取[J]. 浙江大学学报(工学版), 2022, 56(10): 1924-1934. [91] 江宝得, 黄威, 许少芬, 等. 融合分散自适应注意力机制的多尺度遥感影像建筑物实例细化提取[J]. 测绘学报, 2023, 52(9): 1504-1514. [96] 王振庆, 周艺, 王世新, 等. IEU-Net高分辨率遥感影像房屋建筑物提取[J]. 遥感学报, 2021, 25(11): 2245-2254. |
中图分类号: | P237 |
开放日期: | 2024-06-11 |