- 无标题文档
查看论文信息

论文中文题名:

 增强空间及语义特征感知的路面结构层提取研究与应用    

姓名:

 张宵宵    

学号:

 22210226072    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085700    

学科名称:

 工学 - 资源与环境    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2025    

培养单位:

 西安科技大学    

院系:

 测绘科学与技术学院    

专业:

 测绘工程    

研究方向:

 遥感图像处理与应用    

第一导师姓名:

 胡荣明    

第一导师单位:

 西安科技大学    

论文提交日期:

 2025-06-16    

论文答辩日期:

 2025-06-08    

论文外文题名:

 Research and Application of Pavement Structure Layer Extraction for Enhancing Spatial and Semantic Feature Perception    

论文中文关键词:

 特征感知 ; 路面结构层 ; 多尺度融合 ; Transformer ; 轻量化网络    

论文外文关键词:

 Feature perception ; Multi-scale fusion ; Transformer ; Pavement structural layers ; Lightweight network    

论文中文摘要:

随着我国公路建设数字化转型的加速推进,路面结构层信息的准确获取对于道路的 建设、养护以及质量评估等方面起着至关重要的作用,无人机技术与深度学习模型的结 合为路面结构层信息的智能提取提供了全新的技术路径。传统检测方法存在效率低、成 本高和语义特征感知能力不足等问题,同时深度学习模型在面对复杂施工场景时,易受 背景干扰、光谱信息有限及多尺度特征融合不足等挑战。本文以西安市施工道路为研究 对象,基于无人机可见光影像,提出了一种增强空间及语义特征感知的路面结构层提取 方法(Enhanced Spatial-Spectral Feature Network,ESSF-Net),通过特征工程优化与混 合网络架构设计,显著提升了复杂场景下路面结构层分割精度与鲁棒性。主要研究内容 如下: (1)多特征融合与优化策略:针对无人机可见光影像波段单一、光谱信息不足的问 题,构建了多维特征空间。首先,采用灰度共生矩阵的图像纹理特征提取方法,结合可 见光植被指数等增强植被区分能力;其次,通过ReliefF算法、自适应波段选择算法及主 成分分析算法筛选最优特征组合,形成包含光谱、纹理与指数的3波段输入,有效降低了 冗余信息对模型训练的干扰。实验表明,优化后的特征集使模型在不同结构层的分割精 度提升4.2%-6.8%,特征冗余度降低,为后续深度学习模型的高效训练奠定基础。 (2)混合网络模型设计与优化:提出ESSF-Net模型,融合ConvNeXt骨干网络、卷 积加自注意力视觉变压器(CAS-ViT)、空洞空间金字塔池化模块与双注意力机制模块。 ConvNeXt通过分层设计与深度可分离卷积,兼顾局部纹理细节提取与轻量化需求; CAS-ViT结合自注意力机制与卷积操作,增强全局上下文建模能力;ASPP模块通过多尺 度空洞卷积捕捉路面结构层的异质性特征;DA模块通过通道注意力抑制噪声干扰,并结 合位置注意力增强关键区域的定位能力,抑制背景噪声并强化关键区域特征。消融实验 表明,各模块协同作用使模型的平均交并比提升5.48%,其中DA模块对绿化分割的F1分 数贡献率达2.1%。 (3)模型验证与工程应用:基于西安市城市与县级道路的多时相无人机影像构建数 据集,通过对比实验验证模型性能。ESSF-Net在基础数据集上的总体精度达97.73%, MIoU为93.46%,较U-Net、DeepLabv3+等主流模型显著提升;在空间特征增强数据集上, MIoU进一步提升至96.26%,背景误检率降低2.3%。实际应用中,模型成功实现了施工 道路各结构层的精细化分割,并通过多时相面积变化统计,精准监测道路施工进度。案 例分析表明,模型在直线、曲线及多材质叠加场景中均表现出强适应性,为道路全生命 周期管理提供了高效、可靠的技术支撑。 本文提出的ESSF-Net通过多特征融合优化与混合网络架构创新,有效解决了复杂施 工场景下路面结构层分割的精度与鲁棒性问题。实验与工程应用表明,该方法在分割精 度、抗干扰能力及适应性方面均优于现有方法,为公路建设数字化管理提供了新的技术 手段。未来研究可进一步探索多模态数据融合及轻量化部署,以拓展其在智能交通领域 的应用价值。

论文外文摘要:

With the accelerated digital transformation of highway construction in China, the accurate acquisition of pavement structural layer information plays a crucial role in road construction, maintenance, and quality assessment. The combination of unmanned aerial vehicle (UAV) technology and deep learning models provides a new technical path for the intelligent extraction of pavement structural layer information. Traditional detection methods suffer from low efficiency, high cost, and insufficient semantic feature perception capabilities. Existing deep learning models face challenges such as background interference, limited spectral information, and insufficient multi-scale feature fusion when dealing with complex construction scenarios. This paper takes the construction roads in Xi'an as the research object and proposes an enhanced spatial and semantic feature perception pavement structural layer extraction method (ESSF-Net) based on UAV visible light images. Through feature engineering optimization and hybrid network architecture design, the accuracy and robustness of pavement structural layer segmentation in complex scenarios have been significantly improved. The main research contents are as follows: (1) Multi-feature fusion and optimization strategy: To address the issues of single band and insufficient spectral information in UAV visible light images, a multi-dimensional feature space is constructed. Firstly, the image texture feature extraction method based on the gray-level co-occurrence matrix is adopted, combined with visible light vegetation indices to enhance the ability to distinguish vegetation. Secondly, the ReliefF algorithm, ABS index algorithm, and principal component analysis algorithm are used to select the optimal feature combination, forming a 3-band input including spectral, texture, and index features, effectively reducing the interference of redundant information on model training. Experiments show that the optimized feature set improves the segmentation accuracy of different structural layers by 4.2% to 6.8%, especially in the overlapping area of greenery and asphalt layers, where the IoU index increases by 9.3%. (2) Hybrid network model design and optimization: The ESSF-Net model is proposed, integrating the ConvNeXt backbone network, convolutional and self-attention visual transformer (CAS-ViT), spatial pyramid pooling module, and dual attention mechanism module. ConvNeXt, through hierarchical design and depthwise separable convolution, balances the extraction of local texture details and the demand for lightweight. CAS-ViT, by combining self-attention mechanisms and convolution operations, enhances the ability to model global context. The ASPP module captures the heterogeneous features of pavement structural layers through multi-scale dilated convolution. The DA module suppresses noise interference through channel attention and enhances the positioning ability of key areas through position attention, reducing background noise and strengthening the features of key areas. Ablation experiments show that the collaborative effect of each module increases the average intersection over union (IoU) of the model by 5.48%, with the DA module contributing 2.1% to the F1 score of greenery segmentation. (3) Model validation and engineering application: A dataset is constructed based on multi-temporal UAV images of urban and county roads in Xi'an, and the model performance is verified through comparative experiments. The overall accuracy of ESSF-Net on the basic dataset reaches 97.73%, and the mean IoU is 93.46%, which is 2.39% to 5.65% higher than that of mainstream models such as U-Net and DeepLabv3+. On the spatial feature enhanced dataset, the IoU further increases to 96.26%, and the background false detection rate is reduced by 8%. In practical applications, the model successfully achieves fine-grained segmentation of each structural layer of construction roads and accurately monitors the construction progress through multi-temporal area change statistics. Case analysis shows that the model demonstrates strong adaptability in straight, curved, and multi-material overlay scenarios, providing efficient and reliable technical support for the full life cycle management of roads. The ESSF-Net proposed in this paper effectively solves the problems of accuracy and robustness in pavement structural layer segmentation in complex construction scenarios through multi-feature fusion optimization and innovative hybrid network architecture. Experiments and engineering applications have shown that this method outperforms existing ones in terms of segmentation accuracy, anti-interference ability and adaptability, providing a new technical means for digital management in highway construction. Future research can further explore the fusion of multi-modal data and lightweight deployment to expand its application value in the field of intelligent transportation.

参考文献:

[1] 交通运输部. 智慧公路建设与BIM技术应用[M]. 北京: 中国交通出版社,2023.

[2] Phusakulkajorn P, Smith J, Nguyen T. Synergistic application of drones and AI in road

maintenance[J]. Journal of Transportation Engineering, 2023, 149(4): 04023012.

[3] 唐宗明. 路面多维高频检测装备的经济效益分析[J]. 中国公路学报,2020,33(2):

78-85.

[4] International Road Congress. IRC:82-2021, International Code for Pavement Damage

Classification[S]. Geneva: IRC Publications, 2021.

[5] Yan K, Zhang Z. Automated asphalt highway pavement crack detection based on

deformable single shot multi-box detector under a complex environment[J]. IEEE Access,

2021, 9: 150925-150938.

[6] 陈江, 原野, 郎洪, 等. 基于多分支深度学习的沥青路面多病害检测方法[J]. 东南大

学学报(自然科学版),2023,53(01):123-129.

[7] Pham T A, Lee B R. Color-based segmentation of construction scene images using HSV

and LAB color spaces[J].Automation in Construction, 2015, 56(1): 1-16.

[8] Xu X., Zhao M., Shi P., et al. Crack detection and comparison study based on Faster

R-CNNand MaskR-CNN[J]. Sensors, 2022, 22(20): 1215.

[9] Hoff J., Miller R., Davis K. SWOT analysis of intelligent sensor networks in pavement

monitoring[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6):

6789-6800.

[10]沈国治, 余瀚, 孙明皓, 等. 面向多尺度与条形特征的道路提取方法[J]. 软件导刊,

2024, 23(12): 189-197.

[11]王振庆, 周艺, 王世新, 等.IEU-Net高分辨率遥感影像房屋建筑物提取[J]. 遥感学报,

2021, 25(11): 2245-2254.

[12]刘世川, 王庆, 唐晴, 等. 基于多特征提取与注意力机制深度学习的高分辨率影像松

材线虫病树识别[J]. 林业工程学报,2022,7(1):177-184.

[13]刘威. 基于多类特征深度学习的高分辨率遥感影像分类[D]. 北京:北京建筑大学,

2018.

[14]杨丹, 李崇贵, 张家政. 基于深度学习的多时相GF-1影像林分类型分类研究[J]. 林

业资源管理,2022,(1):142-149.

[15]徐知宇. 基于深度学习的城市绿地遥感分类及应用研究[D]. 北京:中国科学院大学,2021.77

[16]奚祥书, 夏凯, 杨垠晖, 等. 结合多光谱影像降维与深度学习的城市单木树冠检测[J].遥感学报,2022,26(4): 711-721.

[17]李赫, 王玉, 范凯, 等. 基于深度学习、小波变换和可见光谱的茶树冻害程度评估[J].光谱学与光谱分析,2024,44(1):234-240.

[18]曹乾洋. 基于深度学习的无人机影像城市植被智能分类研究[D]. 贵阳:贵州师范大学, 2024.

[19]韩文霆, 崔家伟, 崔欣, 等. 基于特征优选与机器学习的农田土壤含盐量估算研究[J].农业机械学报,2023,54(03): 328-337.

[20]张梦. 基于无人机RGB影像的大豆种植区提取方法研究[D]. 淮南:安徽理工大学,

2022.

[21]祁志伟. 面向对象结合深度学习的Sentinel-2A影像火龙果提取方法研究[D]. 桂林:桂林理工大学,2024.

[22]江德婷. 基于GF-1卫星数据的长三角地区森林资源变化检测研究[D]. 南京:南京信息工程大学,2024.

[23]马俊鹏. 基于多源哨兵卫星影像的湿地遥感制图研究[D]. 黑龙江:黑龙江大学,2024.

[24]韦嫦. 特征优选结合UPerNet-ConvNeXt的遥感影像桉树提取研究[D]. 桂林:桂林理工大学,2023.

[25]贺洁. 深度学习结合面向对象的无人机影像光合和非光合植被盖度提取研究[D]. 陕西杨凌:西北农林科技大学,2023.

[26]韩伟杰. 基于高分六号卫星数据的森林扰动遥感监测研究[D]. 芜湖:安徽师范大学,2023.

[27]刘雅辉. 基于深度学习的高分辨率遥感影像分类算法研究[D]. 唐山:华北理工大学,2023.

[28]Mnih V, Hinton G E. Learning to detect roads in high-resolution aerial

images[C]//Proceedings of the 11th European Conference on Computer Vision (ECCV2010) . Berlin: Springer, 2010: 210-222.

[29]党宇, 陈丽. 基于改进UNet的非结构化道路分割算法研究[J]. 智能物联技术,2023,6(3): 18-26.

[30]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in NeuralInformation Processing Systems 30 (NeurIPS 2017). Long Beach: NeurIPS, 2017: 5998

6008.

[31]田青, 张瑶, 张正, 等. 空间可分离注意力的跨尺度编码Transformer遥感图像道路提取方法[J]. 计算机工程与应用,2024,60(23):219-228.

[32]冯志成, 杨杰, 陈智超. 基于轻量级Transformer的城市路网提取方法[J]. 浙江大学

7

8

参考文献

学报(工学版),2024,58(1): 40-49.

[33]Chen J, Lu Y, Yu Q, et al. TransUNet: transformers make strong encoders for medical

image segmentation[J]. arXiv preprint arXiv: 2102. 04306, 2021.

[34]Zhang Y, Liu H, Hu Q. TransFuse: fusing transformers and CNNs for medical image

segmentation[C]//Proceedings of Medical Image Computing and Computer Assisted

Intervention– MICCAI 2021. LNCS 12902. Cham: Springer, 2021: 14–23.

[35]龚轩, 郭中华, 陈旺. 基于CA-TransUNet的遥感图像道路分割[J]. 计算机与现代化,

2023, (7): 112-118.

[36]Wang Y, Yang H. Improved Features Using Convolution-Augmented Transformers for Keyword Spotting[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31(1): 2107-2113.

[37]贾迪, 蔡鹏, 王骞, 等. 面向弱纹理目标立体匹配的Transformer网络[J]. 自动化学报,2024, 50(3): 621-632.

[38]刘茜, 王浩然, 李志刚, 等. 基于轻量级CNN-Transformer混合网络的梯田图像语义

分割[J]. 测绘学报,2024,53(3): 450-460.

[39]于婷婷. 基于生成对抗网络的人脸年龄合成研究[D]. 保定:华北电力大学,2023.

[40]李辉, 王俊印, 程远志, 等. 图像语义特征引导与点云跨模态融合的三维目标检测方

法[J]. 计算机学报,2024,47(5): 1024-1032.

[41]于明洋, 徐海青, 张文焯, 等. 融合ASPP与双注意力机制的建筑物提取模型[J]. 测

绘学报,2024,53(2): 234-243.

[42]王勇, 杨海波. 基于卷积增强Transformer的关键词识别方法[J]. 语音与语言处理学

报, 2023, 31(1): 2107-2113.

[43]Liu J J, Liu Z A, Cheng M M. Rethinking the U-shaped structure for salient object

detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,

43(8): 2767-2782.

[44]吴强强, 王帅, 王彪, 等. 空间信息感知语义分割模型的高分辨率遥感影像道路提取

[J]. 遥感学报,2022,26(09): 1872-1885.

[45]林弘烨, 裘君, 潘泽民, 等. 加强空间信息引导的道路场景实时语义分割[J]. 国外电

子测量技术,2023,42(07): 8-15.

[46]何蔚. 基于多特征的高分遥感影像道路提取方法研究[D]. 重庆:重庆交通大学,2022.

[47]韩洁. 融合非监督分类和几何—纹理—光谱特征的高分辨率遥感影像道路提取[D].

北京:中国科学院大学(中国科学院遥感与数字地球研究所),2018.

[48]Han W, Dayong Y, Zhi X. Assessment of Asphalt[J] Pavement Aging Condition Based on

GF-2 High-Resolution. Remote Sensing, 2024, 18(1): 014528-014528.

79

[49]戴激光, 杜阳, 方鑫鑫, 等. 多特征约束的高分辨率光学遥感影像道路提取方法研究

[J]. 测绘学报,2018,47(3): 356-364.

[50]张宗军, 杨风暴. 基于改进最大期望聚类的遥感影像道路提取算法[J]. 遥感技术与

应用,2020, 35(2): 289-295.

[51]Reddy S L K, Chen C. Spectral-spatial classification for road extraction using

GANMM[J]. International Journal of Remote Sensing, 2020, 41(16): 6345-6365.

[52]Ronneberger O,Fischer P, Brox T. U-Net: Convolutional networks forbiomedical image

segmentation[C]//Proceedings

ofthe

18th

Medical

Image

Computingand

Computer-Assisted Intervention. Berlin, German: Springer, 2015: 234-241.

[53]药新东. 基于混合CNN-Transformer的医学图像分割方法研究[D]. 太原:山西大学,

2023.

[54]Lecuny, Bottoul, Bengio Y, et al. Gradient-based learning applied to document

recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324

[55]Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional

neural networks[J]. Communications of theACM, 2017, 60(6): 84-90

[56]Simonyan K, Zisserman A. Very deep convolutional networks for large-scaleimage

recognition[J]. arXiv preprint arXiv: 1409.1556, 2014.

[57]Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C] // Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA,

USA: IEEE, 2015: 1-9.

[58]He K, Zhang X, Ren S, et al. Deep residual learning for image

recognition[C]//Proceedings ofthe 2016 lEEE Conference on Computer Vision and Pattern

Recognition, 2016: 770-778.

[59]Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating

errors[J]. Nature, 1986, 323(6088): 533-536.

[60]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words:

Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[61]Geng Z,Guo M, Chen H, et al. Is attention better than matrix decomposition?

[C]//International Conference on Learning Representations (ICLR 2021). 2021.

[62]Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with

deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE

transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848.

[63]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words:

Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

80

[64]Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted

windows[C]//Montreal: International Conference on Computer Vision, 2021:

10012-10022.

[65]Cao H, Wang Y, Chen J, et al. Swin-UNet: Unet-like pure transformer for medicaimage

segmentation[C]//Computer Vision– ECCV 2022 Workshops. LNCS 13694. Cham:

Springer Nature, 2023: 205-218.

[66]Ramachandranp, Parmar N,Vaswani A. et al. Stand-alone self-attention invision

models[C]//Advances in Neural Information Processing Systems 32 (NeurIPS 2019).

2019: 68-80.

[67]Cordonnier J, Loukas A, Jaggi M. On the relationship between selfattention and

convolutional layers[J]. arXiv preprint arXiv:1911.03584, 2019.

[68]Wu B, Xu C, Dai X, et al. Visual transformers: Token-based image representation

andprocessing for computer vision[J]. arXiv preprint arXiv:2006.03677, 2020.

[69]Srinivzs A, Lin T Y, Parmar N, et al. Bottleneck transformers for

visua.recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer

Visionand Pattern Recognition. 2021:16519-16529

[70]Xie E, Wang W, Yu Z, et al. SegFormer: Simple and efficient design for semantic

segmentation with transformers[J]. Advances in neural information processing systems,

2021, 34: 12077-12090.

[71]Zhang S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a

sequence-to-sequence perspective with transformers[C]//Proceedings of the 2021

IEEE/CVFConference on Computer Vision and Pattern Recognition. 2021: 6881-6890.

[72]温学钧, 郑晓光. 城镇道路路面设计手册[M]. 北京: 人民交通出版社,2024.

[73]Wang L, Li R, Zhang C, et al. UNetFormer: A UNet-like transformer for efficient

semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of

Photogrammetry and Remote Sensing, 2022, 190: 196-214.

[74]左萍萍, 付波霖, 蓝斐芜, 等. 基于无人机多光谱的沼泽植被识别方法[J]. 中国环境

科学,2021, 41(05): 2399-2410.

[75]解毅, 王佳楠, 刘钰. 基于Sentinel-1/2 数据特征优选的冬小麦种植区识别方法研究

[J]. 农业机械学报,2024,55(02): 231-241.

[76]宋荣杰, 宁纪锋, 刘秀英, 等. 基于纹理特征和SVM的QuickBird影像苹果园提取[J].

农业机械学报,2017,48(03): 188-197.

[77]郑淑丹, 郑江华, 石明辉, 等. 基于分形和灰度共生矩阵纹理特征的种植型药用植物

遥感分类[J]. 遥感学报,2014,18(04):868-886.

81

西安科技大学全日制专业硕士学位论文

[78]汪小钦, 王苗苗, 王绍强, 等. 基于可见光波段无人机遥感的植被信息提取[J]. 农业

工程学报,2015,31(05): 152-157.

[79]Torres-Sanchez I, Pena J M, de Castro A I, et al. Multi-temporal mapping ofthe vegetation

fractionin early-season wheat fields using images from UAVJ[J]. Computers and

Electronics in Agriculture, 2015, 114: 104-113.

[80]高永刚, 林悦欢, 温小乐, 等. 基于无人机影像的可见光波段植被信息识别[J]. 农业

工程学报,2020,36(03): 178-189.

[81]丁雷龙, 李强子, 杜鑫, 等. 基于无人机图像颜色指数的植被识别[J]. 国土资源遥感,

2016, 28(01): 78-86.

[82]李平辉. 多源数据特征优选的黄河源典型区湿地信息提取[D]. 北京:中国地质大学

(北京), 2021.

[83]Kira K . The feature selection problem: traditional methods and a new algorithm[C]//

Proceedings of the 10th National Conference on Artificial Intelligence (AAAI-92). 1992:

129-134.

[84]刘春红, 赵春晖, 张凌雁. 一种新的高光谱遥感图像降维方法[J]. 中国图象图形学报,

2005, 10(2): 218-222.

[85]Arya D, Maeda H, Ghosh S K, et al. Deep learning-based road damage detection and

classification for multiple countries[J]. Automation in Construction, 2021, 132: 103935.

[86]Jolliffe I T.Principal component analysis[J]. Journal of Marketing Research, 2002.

87(4):513.

[87]Liu Z, Mao H, Wu CY, et al.AConvNet for the 2020s[C]//Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition (CVPR). 2022: 11976-11986.

[88]Zhang T, Li L, Zhou Y, et al. CAS-ViT: Convolutional Additive Self-attention Vision

Transformers for Efficient Mobile Applications[J]. arXiv preprint arXiv:2408.03703, 2024.

中图分类号:

 P237    

开放日期:

 2025-06-17    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式