查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的高分辨率遥感影像建筑物提取研究
姓名：	任乐宽
学号：	20210226081
保密级别：	公开
论文语种：	chi
学科代码：	085215
学科名称：	工学 - 工程 - 测绘工程
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	测绘科学与技术学院
专业：	测绘工程
研究方向：	遥感图像处理与应用
第一导师姓名：	胡荣明
第一导师单位：	西安科技大学
论文提交日期：	2023-06-15
论文答辩日期：	2023-06-09
论文外文题名：	Deep learning based high resolution remote sensing image building extraction research
论文中文关键词：	高分辨率遥感影像 ; 深度学习 ; 建筑物提取 ; Swin Transformer ; 残差网络
论文外文关键词：	High-resolution Remote Sensing Imagery ; Deep Learning ; Building Extraction Swin Transformer ; Residual Network
论文中文摘要：	︿随着遥感技术的发展与突破，高分辨率遥感影像数据呈井喷式涌现，影像中蕴含的地物信息在地图测绘、变化检测、资源调查等领域得到了广泛应用。建筑物作为人类活动的重要场所体现着当地社会与经济的发展，因此从高分辨率遥感影像信息中提取建筑物为遥感数据在城市规划、社会经济发展等方面提供基础地理信息有着重要意义。而人工目视解译方法需要消耗大量的时间成本，传统机器学习方法精度有限且适用性不高，已不能适合当今信息快速变化的需求。近年来，深度学习的发展使遥感影像信息解译进入一个新阶段，不少学者将其运用到高分辨率遥感影像建筑物提取任务，并取得了一定的成果。基于此，本文以高分辨率遥感影像为数据源，使用前沿深度学习技术，以建筑物为提取目标，探索更高精度的建筑物提取方法。主要研究内容与结论如下：（1）阐述了深度学习神经网络相关基本原理，对卷积神经网络架构与Transformer网络架构进行了研究分析。并针对我国高分辨率遥感卫星影像建筑物数据集匮乏问题，基于高分二号遥感卫星影像，制作了长春市三环区域内的高分辨率遥感影像建筑物数据集，并进行了质量评估，补充了高分辨率遥感卫星影像建筑物数据集。（2）针对高分辨率遥感影像建筑物提取任务，基于Swin Transformer与残差网络构建了一种并行双编码器结构的深度学习网络模型SRF-Net。其针对遥感影像信息复杂、建筑物特征多样与尺度多变、边缘信息模糊、相似建筑物地物的干扰等问题，将Swin Transformer优越的全局性、长距离信息关联性与卷积神经网络在局部特征表达的优势结合，引入RFB模块并进行调整，使用联合损失函数计算损失值，构建了本文建筑物提取模型。在自建建筑物数据集上IoU达到87.00%，在WHU建筑物数据集上IoU达到89.22%，相比其他模型均有着更高的精度，且对建筑物边缘、建筑物多尺度情况、以及背景地物与建筑物相似区域的建筑物的提取效果有着明显改善。（3）提出了一种基于高分辨率遥感影像建筑物提取任务的深度学习模型通用的预增强模块。预增强模块对遥感影像建筑物数据使用形态学建筑物指数MBI增强建筑物特征、使用Canny边缘检测增强边缘特征、使用改进的加权波段间比值增强图像波段间联系，然后将增强结果与原图一起转化为多波段的张量传入深度学习网络。将预增强模块加载到SRF-Net上在Changchun3建筑物数据集F1-score提高了3.31%，IoU达到92.97%，且改善了环境复杂区域建筑物的提取效果。同时，将预增强模块加载到U-Net、U-Net++、DeepLabV3+模型，在Changchun3建筑物数据集上实验，成功提高了3个模型的提取精度，验证了预增强模块可添加到其他深度学习模型上的可拓展性；将加载了预增强模块的各模型在WHU和Massachusetts建筑数据集上进行实验，加载了预增强模块的各模型相比于加载之前的各模型在两个数据集上均得到了更高的建筑物提取精度，验证了预增强模块对不同数据集的普适性。此外，添加了预增强模块的SRF-Net模型在3个数据集上均有着最高的建筑物提取精度与最好的提取效果。本文通过自建建筑物数据集、构建SRF-Net模型、构建预增强模块，实现了更高精度的高分辨率遥感影像建筑物提取，并在两个不同的开源建筑物数据集上验证了SRF-Net的优势以及预增强模块的可行性。﹀
论文外文摘要：	︿ With the development and breakthrough of remote sensing technology, high-resolution remote sensing image data has exploded, the information contained in the images has been widely used in map making, change detecting and resource exploring. As one of the most important places for human activities, buildings reflect the development of local society and economy, so it is essential to extract buildings from high-resolution remote sensing image information to provide basic geographic information for city planning and social and economic development. The manual interpretation method requires a lot of time and cost, by the time, the accuracy and applicability of traditional machine learning method are low, which is no longer suitable for today's rapidly changing information needs. In recent years, the development of deep learning has brought the interpretation of remote sensing image information to a new Stage, many scholars have applied it to the task of building extraction from high-resolution remote sensing images and achieved certain results. Based on this, with the target of building extraction, this paper takes high-resolution remote sensing images as the data source and uses state-of-the-art deep learning techniques to explore a more accurate building extraction method. The main research contents and conclusions are as followes: (1) The basic principles related to deep learning neural networks are explained, and the convolutional neural network architecture and Transformer network architecture are studied and analysed. Based on the GF2 remote sensing satellite image data, the Changchun3 building dataset, a high-resolution remote sensing image dataset of Changchun City within the third ring area, was produced. The quality assessment was conducted to supplement the high-resolution remote sensing satellite image building dataset. (2) A deep learning network model SRF-Net with parallel double encoder structure based on Swin Transformer and residual network is constructed for the task of building extraction from high-resolution remote sensing images, which addresses the problems of complex remote sensing image information, diverse building features and variable scales, blurred edge information and interference of similar building features. The model combines the superior global and long-range information correlation of Swin Transformer with the advantages of convolutional neural network in local feature representation, introduces the RFB module and adapts it to calculate the loss value using the joint loss function, and constructs the building extraction model in this paper. IoU reached 87.00% on the self-built buildings dataset and 89.22% on the WHU buildings dataset, both with higher accuracy than the other models, and with significant improvements in the extraction of buildings at building edges, in multi-scale building situations, and in areas where the background features are similar to the buildings. (3) A general module of deep learning model based on high-resolution remote sensing image building extraction task is proposed - the pre-enhancement module. The pre-enhancement module enhances building features using morphology-based building index (MBI), enhances edge features using Canny edge detection, and enhances the inter-band relationship of image bands using an improved weighted band ratio. The enhanced results and the original image are transformed into a multi-band tensor and input into the deep learning network. Loading the pre-enhanced module onto SRF-Net improved the F1-score on the Changchun3 building dataset by 3.31% and the IoU by 92.97%, and improved the extraction of buildings in environmentally complex areas.. Meanwhile, loading the pre-augmentation module onto U-Net, U-Net++, DeepLabV3+ models and experimenting on the Changchun3 architectural dataset verified the scalability of the pre-augmentation module to be added to other deep learning models; the generality of the pre-augmentation module to different datasets was verified on the WHU and Massachusetts architectural datasets. In addition, the SRF-Net model with the pre-extension module has the highest building extraction accuracy and the best extraction results on all three datasets. In this study, we build our own building dataset, construct a SRF-Net model and a pre-enhancement module to achieve higher accuracy building extraction, and validate the advantages of SRF-Net and the feasibility of the pre-enhancement module on two different open source building datasets. ﹀
参考文献：	︿ [1] 周媛. 基于深度学习的高分辨率遥感影像建筑物提取与变化检测方法研究[D]. 武汉:中国地质大学, 2022. [2] 中国共产党中央委员会. 中华人民共和国国民经济和社会发展第十四个五年规划和2035年远景目标纲要[EB/OL]. 新华网, 2021-03-17 [3] 叶自然. 基于深度学习的农村住房遥感信息提取研究及时空演变应用[D]. 杭州: 浙江大学, 2021. [4] 曹峡. 基于深度对抗学习的农业遥感图像处理应用研究[D]. 成都: 成都大学, 2021. [5] Lin C, Nevatia R. Building detection and description from a single intensity image[J]. Computer vision and image understanding, 1998, 72(2): 101-121. [6] 陶文兵, 田岩, 张钧, 等. 航空图像矩形建筑物自动提取方法研究[J]. 宇航学报, 2003, 24(4): 341. [7] Jung C R, Schramm R. Rectangle detection based on a windowed Hough transform[C]//Proceedings. 17th Brazilian Symposium on Computer Graphics and Image Processing. IEEE, 2004: 113-120. [8] 尹峰, 祁琼, 许博文. 基于角点的高分辨率遥感影像建筑物提取[J]. 地理空间信息, 2018, 16(10): 58-61. [9] 魏德强. 高分辨率遥感影像建筑物提取技术研究[D]. 郑州: 解放军信息工程大学, 2013. [10] 任晓娟, 肖双九, 彭小朋. 基于改进分水岭变换的遥感图像建筑物提取[J]. 计算机应用与软件, 2011, 28(12): 249-252. [11] 赵宗泽, 张永军. 基于植被指数限制分水岭算法的机载激光点云建筑物提取[J]. 光学学报, 2016, 36(10): 503-511. [12] 张庆云, 赵冬. 高空间分辨率遥感影像建筑物提取方法综述[J]. 测绘与空间地理信息, 2015, 38(4): 74-78. [13] Cleve C, Kelly M, Kearns F R, et al. Classification of the wildland–urban interface: A comparison of pixel-and object-based classifications using high-resolution aerial photography[J]. Computers, Environment and Urban Systems, 2008, 32(4): 317-326. [14] 黄昕. 高分辨率遥感影像多尺度纹理, 形状特征提取与面向对象分类研究[D]. 武汉:武汉大学, 2009. [15] 洪亮, 楚森森, 彭双云等. 顾及全局和局部最优的高分辨率遥感影像多尺度分割优化算法[J]. 遥感学报, 2020, 24(12): 1464-1475. [16] BAATZ M, SCHAPE A. Object-oriented and multi-scale image analysis in semantic networks[A]. Proc of the 2nd International Symposium on operationalization of Remote Sensing August 16th-20th, 1999. [17] 张峰, 薛艳丽, 李英成, 等. 基于 SVM 的多源遥感影像面向对象建筑物提取方法[J]. 自然资源遥感, 2009, 20(2): 27-29. [18] 胡荣明, 黄小兵, 黄远程. 增强形态学建筑物指数应用于高分辨率影像中建筑物提取[J]. 测绘学报, 2014, 43(5): 514. [19] 洪亮, 冯亚飞, 彭双云, 等. 面向对象的多尺度加权联合稀疏表示的高空间分辨率遥感影像分类[J]. 测绘学报, 2022, 51(02): 224-237. [20] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. nature, 2015, 521(7553): 436-444. [21] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [22] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C]// International Conference on Learning Representations (ICLR). 2015: 1-14 [23] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440. [24] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015: 234-241. [25] Zhou Z, Siddiquee M M R, Tajbakhsh N, et al. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation[J]. IEEE transactions on medical imaging, 2019, 39(6): 1856-1867. [26] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [27] Chen L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs[J]. Computer Science, 2014(4): 357-361. [28] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848. [29] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. ar Xiv, 2017, abs/1706.05587: 1-14. [30] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 801-818. [31] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258. [32] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [33] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014. [34] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. [35] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020. [36] Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention[C]//International conference on machine learning. PMLR, 2021: 10347-10357. [37] Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 558-567. [38] Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 15908-15919. [39] Touvron H, Cord M, Sablayrolles A, et al. Going deeper with image transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 32-42. [40] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022. [41] Cao H, Wang Y, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation[C]//Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III. Cham: Springer Nature Switzerland, 2023: 205-218. [42] Lin A, Chen B, Xu J, et al. Ds-transunet: Dual swin transformer u-net for medical image segmentation[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-15. [43] Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv: 2102.04306, 2021. [44] Zhang Y, Liu H, Hu Q. Transfuse: Fusing transformers and cnns for medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer International Publishing, 2021: 14-24. [45] He X, Zhou Y, Zhao J, et al. Swin transformer embedding UNet for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-15. [46] Dong B, Wang P, Wang F. Head-Free Lightweight Semantic Segmentation with Linear Transformer[J]. arXiv preprint arXiv: 2301.04648, 2023. [47] Yue X, Meng F, Pi Jia-tian, et al. Remote sensing images segmentation method based on deep learning model [J]. Journal of Computer Applications, 2019, 39(10): 2905-2914． [48] 季顺平, 魏世清. 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J].测绘学报, 2019, 48(04): 448-459. [49] Cai Y, Chen D, Tang Y, et al. Multi-Scale Building Instance Extraction Framewor k in High Resolution Remote Sensing Imagery Based on Feature Pyramid Object- Aware Convolution Neural Network[C]//2021 IEEE International Geoscience and R emote Sensing Symposium IGARSS. IEEE, 2021: 2779-2782. [50] Shao Z, Tang P, Wang Z, et al. BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images[J]. Remote Sensing, 2020, 12(6): 1050. [51] 王振庆, 周艺, 王世新, 等. IEU-Net高分辨率遥感影像房屋建筑物提取[J]. 遥感学报, 2021, 25(11): 2245-2254. [52] 何直蒙, 丁海勇, 安炳琪. 高分辨率遥感影像建筑物提取的空洞卷积E-Unet算法[J].测绘学报, 2022, 51(03): 457-467. [53] Cao Y, Huang X. A full-level fused cross-task transfer learning method for building change detection using noise-robust pretrained networks on crowdsourced labels[J]. Remote Sensing of Environment, 2023, 284: 113371. [54] 许正森, 管海燕, 彭代锋, 等. 高分辨率遥感影像建筑物提取的注意力胶囊网络算法[J]. 遥感学报, 2022, 26(08): 1636-1649. [55] 刘亦凡. 基于卷积神经网络的高分辨率遥感影像建筑物提取方法研究[D]. 徐州: 中国矿业大学, 2020. [56] 沈聿林. 基于Transformer的少样本学习[D]. 成都: 电子科技大学, 2022. [57] Mc Culloch WS, Pitts W. A logical calculus of the ideas immanent in nervous a ctivity[J]. The bulletin of mathematical biophysics, 1943, 5(4): 115-133 [58] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011: 315-323. [59] 胡荣明, 任乐宽, 苏瑞鹏, 等. 一种改进U-Net的遥感影像建筑物提取方法[J]. 测绘科学, 2023, 48(01): 39-48. [60] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014. [61] 马鑫. 基于深度生成学习的雷达杂波模拟技术研究[D]. 成都: 电子科技大学, 2022. [62] 于文玲. 基于编码解码的深度学习遥感影像建筑物提取研究[D]. 南昌: 东华理工大学, 2022. [63] 王赓. 基于深度学习和迁移学习的行星齿轮箱故障诊断算法研究[D]. 西安: 西安电子科技大学, 2021. [64] 王美乐. 基于深度学习的高分辨率遥感影像建筑物提取方法研究[D]. 西安: 西安科技大学, 2021. [65] 罗嘉琦. 基于深度学习的倾斜摄影建筑物震害信息提取[D]. 北京: 中国地震局地震预测研究所, 2022. [66] Kussul N, Lavreniuk M, Skakun S, et al. Deep learning classification of land cover and crop types using remote sensing data[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(5): 778-782. [67] Wan T, Jun H U, Zhang H, et al. Kappa coefficient: a popular measure of rater agreement[J]. Shanghai archives of psychiatry, 2015, 27(1): 62. [68] 侯俊杰. 深度学习目标检测算法中不平衡问题的研究[D]. 秦皇岛: 燕山大学, 2021. [69] 杨承林. 基于Transformer的农业图像分类方法研究[D]. 长春: 长春工业大学,2022. [70] 田战胜, 刘立波. 基于改进Transformer的细粒度图像分类模型[J]. 激光与光电子学进展, 2023, 60(02): 171-178. [71] 刘文婷, 卢新明. 基于计算机视觉的Transformer研究进展[J]. 计算机工程与应用, 2022, 58(06): 1-16. [72] Mnih V. Machine learning for aerial image labeling[M]. University of Toronto (Canada), 2013. [73] Liu S, Huang D. Receptive field block net for accurate and fast object detection[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 385-400. [74] 宋国杰, 黄佳芳, 陈普春等. 使用九点双三次卷积插值方法改进的DeepLab-v3模型[J]. 计算机应用研究, 2020, 37(09): 2876-2880. [75] Zhao H, Gallo O, Frosio I, et al. Loss functions for image restoration with neural networks[J]. IEEE Transactions on computational imaging, 2016, 3(1): 47-57. [76] Jadon S. A survey of loss functions for semantic segmentation[C]//2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB). IEEE, 2020: 1-7. [77] Moltz J H, Hänsch A, Lassen-Schmidt B, et al. Learning a loss function for segmentation: A feasibility study[C]//2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020: 357-360. [78] Huang X, Zhang L. A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery[J]. Photogrammetric Engineering and Remote Sensing. 2011, 77(7), 721-732. [79] 马为駽. 基于形态学的高分辨率遥感影像建筑物自动提取[D]. 武汉: 武汉大学, 2019. [80] Canny J. A computational approach to edge detection[J]. IEEE Transactions on pattern analysis and machine intelligence, 1986 (6): 679-698. [81] 刘宇涵, 闫河, 陈早早, 等. 强噪声下自适应Canny算子边缘检测[J]. 光学精密工程, 2022, 30(03): 350-362. ﹀
中图分类号：	P237
开放日期：	2025-06-19

附件下载