- 无标题文档
查看论文信息

论文中文题名:

 面向无人驾驶的目标检测与分割算法研究    

姓名:

 郜振威    

学号:

 21206223051    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2024    

培养单位:

 西安科技大学    

院系:

 电气与控制工程学院    

专业:

 控制工程    

研究方向:

 计算机视觉    

第一导师姓名:

 黄梦涛    

第一导师单位:

 西安科技大学    

论文提交日期:

 2024-06-24    

论文答辩日期:

 2024-06-06    

论文外文题名:

 Research on Road Object Detection and Segmentation Algorithms for Unmanned Driving    

论文中文关键词:

 无人驾驶 ; 目标检测 ; 可行驶区域分割 ; 多任务学习 ; 环境感知    

论文外文关键词:

 Unmanned driving ; Object detection ; Drivable area segmentation ; Multi-task learning ; Environment perception    

论文中文摘要:

在科技飞速发展的当下,人工智能与计算机视觉技术的持续进步使得无人驾驶领域成为业界瞩目的焦点。其中,道路目标检测和可行驶区域分割技术是无人驾驶环境感知的关键。本文聚焦城市道路可行驶区域分割,通过深度神经网络提取图像特征,结合目标检测、语义分割和多任务学习技术,优化行人车辆目标检测和可行驶区域分割算法。基于多任务学习的思想,融合道路目标检测和可行驶区域分割模型,设计一种多任务的道路可行驶区域检测模型。

主要研究内容为:(1)针对道路目标检测的图像往往包含着许多的小目标和相互遮挡的目标,检测时容易出现误检、漏检的情况,在YOLOv7的基础上构建基于残差结构的目标检测算法YOLO-RM。在ELAN模块的基础上,基于残差结构设计ELAN_R和ELAN_Es模块,增强空间特征信息的提取,避免因网络层数加深所带来的学习误差;在FPN结构的基础上设计多尺度自适应特征融合结构MFPN,增加特征融合网络的小目标特征输出层,提高对不同尺度上的人车目标特征提取能力,降低小目标的漏检率。(2)为提高语义分割算法的推理速度,在DeepLabv3+的基础上构建实时可行驶区域分割算法DeepLab-SF。使用跨步卷积替换ASPP中的空洞卷积设计跨步空间金字塔池化模块,以减少算法在池化阶段的计算量;在解码器部分设计特征融合网络,有效地结合来自语义分支和空间分支的浅层和深层特征。(3)设计一个基于视觉的多任务道路可行驶区域检测模型,该模型将目标检测与可行驶区域分割两大任务融为一体,通过共享一个高效的特征提取网络,同步实现对车辆前方可行驶空间的精确感知及行人、车辆目标的高效识别。

本文所提的多任务道路可行驶区域检测模型能够完成无人驾驶环境感知的要求,人车目标检测精度mAP为78.1%,可行驶区域分割精度mIoU为91.1%,推理速度为36.4/帧,多任务模型性能满足无人驾驶环境感知的精度和实时性要求。

论文外文摘要:

With the rapid development of science and technology, the continuous progress of artificial intelligence and computer vision technology has made the field of unmanned driving the focus of the industry. Among them, visual road pedestrian and vehicle detection and drivable area segmentation technology are the key to the perception of unmanned driving environment. This paper focuses on the segmentation of drivable areas of urban roads. It extracts image features through deep neural networks, combines target detection, semantic segmentation and multi-task learning technology to optimize pedestrian and vehicle target detection and drivable area segmentation algorithms. Based on the idea of multi-task learning, the road target detection and drivable area segmentation models are integrated to design a multi-task road drivable area detection model.

The primary research components are as follows: (1) Given the challenges posed by mutual occlusion and significant scale variations in pedestrian and vehicle objects, which often lead to false detections and missed detections, we've enhanced YOLOv7 by introducing a residual structure-based object detection algorithm called YOLO-RM. Based on the ELAN module, we've designed the ELAN_R and ELAN_Es modules, leveraging the residual structure to better capture spatial feature information and mitigate learning errors caused by increasing network depth. Additionally, we've devised a multi-scale adaptive feature fusion structure called MFPN, inspired by the FPN structure, and added a small object feature output layer to the feature fusion network. This enhances the network's ability to extract features from pedestrian and vehicle objects at varying scales, thereby reducing the missed detection rate for small objects. (2) To boost the computational speed of the semantic segmentation algorithm, we've constructed a real-time drivable area segmentation algorithm named DeepLab-SF, building upon DeepLabv3+. We replaced dilated convolution with strided convolution in ASPP and introduced the strided spatial pyramid pooling module to minimize computational complexity during the pooling phase. Furthermore, a feature fusion network has been designed in the decoder section to seamlessly integrate data from both the semantic and spatial branches, blending shallow and deep features. (3) We've devised a vision-based multi-task detection scheme for urban road drivable areas, integrating urban road object detection with road segmentation. This approach employs a feature extraction network for forward propagation, enabling the simultaneous perception of the drivable area ahead and the acquisition of vehicle and pedestrian object information.

The multi-task road drivable area segmentation model we proposed fulfills the demands of driverless environment perception. The pedestrian-vehicle target detection accuracy mAP is 78.1%, the drivable area segmentation accuracy mIoU is 91.1%, and the inference speed is 36.4/frame. The performance of the multi-task model meets the accuracy and real-time requirements of driverless environment perception.

参考文献:

[1] 张天培. 2023年机动车保有量4.35亿辆[N]. 人民日报, 2024-02-13(1).

[2] 中国汽车技术研究中心, 同济大学, 百度 Apollo. 自动驾驶汽车交通安全白皮书[R]. 北京: 中国汽车技术研究中心有限公司, 2021.

[3] 蒋凯伟. 基于深度学习的智能驾驶目标检测技术研究[D]. 北京: 北京交通大学, 2023.

[4] 龚建伟, 龚乘, 林云龙, 等. 智能车辆规划与控制策略学习方法综述[J]. 北京理工大学学报, 2022, 42(7): 665-674.

[5] 彭湃, 耿可可, 王子威, 等. 智能汽车环境感知方法综述[J]. 机械工程学报, 2023, 59(20): 281-303.

[6] 彭育辉, 江铭, 马中原, 等. 汽车自动驾驶关键技术研究进展[J]. 福州大学学报(自然科学版), 2021, 49 (5): 691-703.

[7] 段续庭, 周宇康, 田大新, 等. 深度学习在自动驾驶领域应用综述[J]. 无人系统技术, 2021, 4(6): 1-27

[8] 张凯祥, 朱明. 基于YOLOv5的多任务自动驾驶环境感知算法[J]. 计算机系统应用, 2022, 31(9): 226-232.

[9] Gao C, Zhao F, Zhang Y, et al. Research on multitask model of object detection and road segmentation in unstructured road scenes[J]. Measurement Science and Technology, 2024, 35(6): 91-113.

[10] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Columbus: CVPR, 2014: 580-587.

[11] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE international conference on computer vision. Columbus: ICCV, 2015: 1440-1448.

[12] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.

[13] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision. Amsterdam: ECCV, 2016: 21-37.

[14] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018.

[15] Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv: 2107.08430, 2021.

[16] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Toronto: CVPR, 2023: 7464-7475.

[17] Wang C Y, Yeh I H, Liao H Y M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information[J]. arXiv preprint arXiv: 2402.13616, 2024.

[18] 申铉京, 李涵宇, 黄永平, 等. 基于自适应多尺度特征融合网络的车辆检测方法[J]. 电子学报, 2023, 31(3): 1-9.

[19] 冉险生, 苏山杰, 陈俊豪, 等. 自适应特征融合的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2023, 59(24): 216-226.

[20] Park J, Woo S, Lee J Y, et al. BAM: Bottleneck attention module[J]. arXiv preprint arXiv: 1807.06514, 2018.

[21] 李国进, 胡洁, 艾矫燕. 基于改进SSD算法的车辆检测[J]. 计算机工程, 2022, 48(01): 266-274.

[22] 杜娟, 崔少华, 晋美娟, 等. 改进YOLOv7的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2024, 60(1): 96-103.

[23] 李安达, 吴瑞明, 李旭东. 改进YOLOv7的小目标检测算法研究[J]. 计算机工程与应用, 2024, 60(1): 122-134.

[24] Shao X, Wang Q, Yang W, et al. Multi-scale feature pyramid network: A heavily occluded pedestrian detection network based on ResNet[J]. Sensors, 2021, 21(5): 1820-1836.

[25] Zhang S, Chen D, Yang J, et al. Guided attention in cnns for occluded pedestrian detection and re-identification[J]. International Journal of Computer Vision, 2021, 129(10): 1875-1892.

[26] Huang X, Ge Z, Jie Z, et al. Nms by representative region: Towards crowded pedestrian detection by proposal pairing[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Seattle: CVPR, 2020: 10750-10759.

[27] 厍向阳, 李蕊心, 叶鸥. 融合随机擦除和残差注意力网络的行人重识别[J]. 计算机工程与应用, 2022, 58(3): 215-221.

[28] 高昂, 梁兴柱, 夏晨星, 等. 一种改进YOLOv8的密集行人检测算法[J]. 图学学报, 2023, 44(5): 890-898.

[29] 王泽宇, 徐慧英, 朱信忠, 等. 基于YOLOv8改进的密集行人检测算法:MER-YOLO [J]. 计算机工程与科学, 2023, 42(11): 1-17.

[30] 丁俊进. 基于机器视觉道路识别技术的研究[D]. 武汉: 武汉理工大学, 2007.

[31] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Boston: CVPR, 2015: 3431-3440.

[32] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation[C]//MICCAI 2015: 18th international conference. Munich: MICCAI, 2015: 234-241.

[33] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.

[34] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the IEEE international conference on computer vision. Hawaii: CVPR, 2017: 2961-2969.

[35] Gao G, Xu G, Li J, et al. FBSNet: A fast bilateral symmetrical network for real-time semantic segmentation[J]. IEEE Transactions on Multimedia, 2023, 25(26): 3273-3283.

[36] Zhou Y, Zheng X, Yang Y, et al. Multi‐directional feature refinement network for real‐time semantic segmentation in urban street scenes[J]. IET Computer Vision, 2023, 17(4): 431-444.

[37] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.

[38] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Hawaii: CVPR, 2017: 2881-2890.

[39] Chen L, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs[J]. arXiv preprint arXiv: 1412.7062, 2015.

[40] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848.

[41] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv: 1706.05587, 2017.

[42] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision. Munich: ECCV, 2018: 801-818.

[43] Paszke A, Chaurasia A, Kim S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[J]. arXiv preprint arXiv: 1606.02147, 2016.

[44] Romera E, Alvarez J M, Bergasa L M, et al. ErfNet: Efficient residual factorized convnet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 263-272.

[45] Yu C, Wang J, Peng C, et al. Bisenet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European conference on computer vision. Munich: ECCV, 2018: 325-341.

[46] Tan S X. Feature reuse and fusion for real-time semantic segmentation[J]. arXiv preprint arXiv: 2105.12964, 2021.

[47] Song Q, Li J, Li C, et al. Fully attentional network for semantic segmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2022: 2280-2288.

[48] 王等准. 基于计算机视觉行车环境感知及识别研究[D]. 贵阳: 贵州大学, 2023.

[49] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: CVPR, 2016: 770-778.

[50] Gevorgyan Z. SIoU loss: More powerful learning for bounding box regression[J]. arXiv preprint arXiv: 2205.12740, 2022.

[51] Han J, Liang X, Xu H, et al. SODA10M:A large-scale2D self/semi-supervised object detection dataset for autonomous driving[J]. arXiv preprint arXiv: 2106.11118, 2021.

[52] Yu F, Xian W, Chen Y, et al. Bdd100k: a diverse driving video database with scalable annotation tooling. 2018[J]. arXiv preprint arXiv: 1805.04687, 2018.

[53] Liu S, Huang D. Receptive field block net for accurate and fast object detection[C]// Proceedings of the European conference on computer vision. Munich: ECCV, 2018: 385-400.

[54] Wandell B A, Winawer J. Computational neuroimaging and population receptive fields[J]. Trends in cognitive sciences, 2015, 19(6): 349-357.

[55] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2020, 42(02): 318-327.

[56] Crawshaw M. Multi-task learning with deep neural networks: A survey[J]. arXiv preprint arXiv: 2009.09796, 2020.

[57] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2015.

[58] 刘超. 基于深度学习的无人驾驶车辆多目标感知算法研究[D]. 桂林: 桂林电子科技大学, 2023.

中图分类号:

 TP391    

开放日期:

 2024-06-24    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式