- 无标题文档
查看论文信息

论文中文题名:

 洗煤厂环境下的计算机视觉 行人定位技术研究    

姓名:

 沈树建    

学号:

 20207223106    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电子与通信工程    

研究方向:

 计算机视觉    

第一导师姓名:

 蔺丽华    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-15    

论文答辩日期:

 2023-05-30    

论文外文题名:

 Research on Computer Vision Pedestrian Localization Technology in Coal Washing Plant Environment    

论文中文关键词:

 深度学习 ; 目标检测 ; YOLOv5s ; 视觉定位    

论文外文关键词:

 Deep learning ; Target detection ; YOLOv5s ; Visual positioning    

论文中文摘要:

       洗煤厂中的监控摄像头只能提供监控功能,不能提取视频图像中的行人在空间中的位置信息,采用深度学习和计算机视觉的方法可以获取行人所处的位置信息,在发生紧急情况时,可以迅速提供准确的位置信息并发出警报。本文对洗煤厂环境下的计算机视觉行人定位技术的研究工作,主要从行人目标检测和单目视觉定位两个方面展开。具体工作内容如下:

       在行人检测部分,针对洗煤厂环境存在复杂背景干扰导致行人检测精度低的问题,本文在 YOLOv5s 检测算法的基础上改进了两个方面:首先,在骨干特征提取网络中融入了通道注意力机制 ECA 模块,增强了检测网络对行人目标局部特征的提取能力;其次,引入了新的边界框回归损失函数 SIOU,使模型的性能有一定的提升。实验结果表明,本文对 YOLOv5s 检测算法的改进在召回率和 mAP 值上分别提升了 2.21%和 2.13%。

       在单目视觉定位部分,本文构建并且优化了坐标映射定位模型。采用了激光雷达构建三维点云,通过张正友标定法和辅助点的方法,获取相机的内参和外参。基于计算机视觉的方法建立摄像头监控区域与三维点云监控区域的坐标映射定位模型,实现了二维图像像素任意一点在三维点云空间中的定位。针对单目视觉定位精度低的问题,本文采用改进的粒子群算法优化了坐标映射定位模型的外参。实验结果表明,静态测试点的平均绝对误差(MAE)由 58.64cm 降低至 15.88cm,均方根误差(RMSE)由 63.41cm 降低至 18.29cm。

       通过行人目标检测获得的像素坐标输入优化后的坐标映射定位模型,计算行人在三维点云中的坐标信息,实现了洗煤厂环境下的计算机视觉行人定位。实验结果表明,行人动态的 MAE 由 61.84cm 降低至 20.20cm,RMSE 由 69.60cm 降低至 22.06cm。本文研究的洗煤厂环境下的计算机视觉行人定位算法能够实现行人在三维点云空间中的三维可视化定位,因此研究洗煤厂环境下的计算机视觉行人定位技术具有非常重要的意义。

论文外文摘要:

        The surveillance cameras in coal washing plants can only provide monitoring functions and cannot extract the spatial location information of pedestrians in video images. By using deep learning and computer vision methods, the location information of pedestrians can be obtained. In case of emergency, accurate location information can be quickly provided and an alarm can be issued. This thesis focuses on the research of computer vision pedestrian positioning technology in the environment of coal washing plants, mainly from two aspects: pedestrian object detection and monocular visual positioning. The specific work content is as follows:

         In the pedestrian detection section, in response to the problem of low pedestrian detection accuracy caused by complex background interference in the coal washing plant environment, this thesis improves two aspects on the YOLOv5s detection algorithm: firstly, the attention mechanism ECA module is integrated into the backbone feature extraction network to enhance the detection network's ability to extract local features of pedestrian targets; Secondly, a new boundary box regression loss function SIOU is introduced to improve the performance of the model. The experimental results show that the improvement of the YOLOv5s detection algorithm in this thesis has improved the recall rate and mAP value by 2.21% and 2.13%, respectively.

        In the monocular visual positioning section, this thesis constructs and optimizes a coordinate mapping positioning model. We used LiDAR to construct a 3D point cloud, and obtained the camera's internal and external parameters through Zhang Zhengyou's calibration method and auxiliary point method. Based on computer vision, a coordinate mapping positioning model is established between the camera monitoring area and the 3D point cloud monitoring area, achieving the positioning of any point in the 2D image pixel in the 3D point cloud space. In response to the problem of low positioning accuracy in monocular vision, this paper uses an improved particle swarm optimization algorithm to optimize the external parameters of the coordinate mapping positioning model. The experimental results show that the mean absolute error (MAE) of the static test points is reduced from 58.64cm to 15.88cm, and the root mean square error (RMSE) is reduced from 63.41cm to 18.29cm.

        The pixel coordinates obtained through pedestrian target detection were ultimately inputted into the optimized coordinate mapping positioning model, and the coordinate information of pedestrians in the 3D point cloud was calculated, achieving computer vision pedestrian positioning in the coal washing plant environment. The experimental results showed that the MAE of pedestrian dynamics decreased from 61.84cm to 20.20cm, and the RMSE decreased from 69.60cm to 22.06cm. The computer vision pedestrian positioning algorithm studied in this thesis can achieve three-dimensional visual positioning of pedestrians in the three-dimensional point cloud space. Therefore, studying the computer vision pedestrian positioning technology in the coal washing plant environment is of great significance.

参考文献:

[1] 张瑶,卢焕章,张路平,等.基于深度学习的视觉多目标跟踪算法综述[J].计算机工程与应用,2021,57(13):55-66.

[2] 耿艺宁,刘帅师,刘泰廷,等.基于计算机视觉的行人检测技术综述[J].计算机应

用,2021,41(S1) :43-50.

[3] Shi L L, Wang X, Shen Y. Research on 3D face recognition method based on LBP and SVM[J]. Optik - International Journal for Light and Electron Optics, 2020, 220:165157.

[4] Joshi G, Singh S, Vig R. Taguchi-TOPSIS based HOG parameter selection for complex background sign language recognition[J]. Journal of Visual Communication and Image Representation, 2020, 71: 102834.

[5] 丁国绅,乔延利,易维宁,等.基于光谱图像空间的改进 SIFT 特征提取与匹配[J].北京理工大学学报,2022,42(02):192-199.

[6] Mhalla A, Chateau T, Amara N E B. Spatio-temporal object detection by deep learning: Video-interlacing to improve multi-object tracking[J]. Image and Vision Computing,2019,88: 120-131.

[7] Hinton G E, Salakhutdinov R R .Reducing thedimensionality of data with neural networks[J].Science,2006,313(5786):504.

[8] 牛为华,殷苗苗.基于改进 YOLOv5 的道路小目标检测算法[J].传感技术学报, 2023,36 (01):36-44.

[9] Xue Y, Ju Z, Li Y, et al. MAF-YOLO: Multi-modal attention fusion based YOLO for pedestrian detection[J]. Infrared Physics & Technology, 2021, 118: 103906.

[10]Chahyati D, Fanany M I, Arymurthy A M. Tracking people by detection using CNN features[J].Procedia Computer Science, 2017, 124: 167-172.

[11]赵琬婷,李旭,董轩,等.基于超分辨率特征的小尺度行人检测网络研究[J].传感器与微系统,2022,41(06):56-60.

[12]Agrawal P, Girshick R, Malik J. Analyzing the Performance of Multilayer Neural Networks for Object Recognition[J]. Springer International Publishing, 2014,329-344.

[13]He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid poolingin deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916.

[14]Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision (ICCV), 2015, 1440–1448.

[15]Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]// Conference on computer vision and pattern recognition, 2017, 2117-2125.

[16]He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 2980–2988.

[17]Redmon J, Divvala S, Girshick R, et al. You Only Look Once: unified, real-time object detection[C]// Conference on Computer Vision and Pattern Recognition. Las Vegas: CVPR, 2016: 779-788.

[18]Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox detector[C]//European Conference on Computer Vision.Amsterdam: Springer,2016, 21–37.

[19]Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]//Conference on Computer Vision and Pattern Recognition,2017, 6517-6525.

[20]Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[C]//Conference on

Computer Vision and Pattern Recognition, 2018, 89-95.

[21]Lin T Y , Dollar P , Girshick R , et al. Feature Pyramid Networks for Object Detection[C]// Proceedings of the IEEE Conference on Computer Vision. Los Alamitos: IEEE Computer Society Press, 2017: 936-944.

[22]Bochkovskiy A, Wang C Y, Liao H Y M.YOLOv4: optimal speed and accuracy of object detection[J].ArXiv ,2020,04-23.

[23]Woo S, Park J, Lee J, et al. CBAM: convolutional block attention module[C]// Proceedings of the European Conference on Computer Vision, 2018: 3-19.

[24]Shu L, Lu Q, Haifang Q, et al. Path aggregation network for instance segmentation[C]//Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.

[25]谢良波,李宇洋,王勇,等.基于自适应蝙蝠算法的室内 RFID 定位算法[J].通信学报,

2022, 43(08):90-99.

[26]杨保,张鹏飞,李军杰,等.一种基于蓝牙的室内定位导航技术[J].测绘科学, 2019, 44 (06): 89-95.

[27]Kok, Hol J D, Schon T B. Indoor Positioning Using Ultrawideband and Inertial Measurements[J]. IEEE Transactions on Vehicular Technology,2015,64(4):1293-1303.

[28]靳超,邱冬炜.基于 WiFi 信号室内定位技术的研究[J].测绘通报,2017(05):21-25.

[29]Zou H, Jiang H, Lu X, et al. An Online Sequential Extreme Learning Machine Approach to WiFi Based Indoor Positioning[C] //Internet of Things, 2014:111-116.

[30]陈国良,张言哲,汪云甲,等.WiFi-PDR 室内组合定位的无迹卡尔曼滤波算法[J].测绘学报,2015,44(12):1314-1321.

[31]李星云,李众立,廖晓波.基于单目视觉的工业机器人定位系统的设计[J].机床与液压,2015,43(09):35-38.

[32]Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: a versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics,2015, 31(5): 1147-1163.

[33]朱永丰,朱述龙,张静静,等.基于 ORB 特征的单目视觉定位算法研究[J].计算机科

学,2016,43(S1):198-202+254.

[34]Sarlin P E, Cadena C, Siegwart R and Dymczyk M. From coarse to fine: robust hierarchical localization at large scale[C]//Conference on Computer Vision and Pattern Recognition (CVPR), 2019,12716-12725.

[35]邓晖,邓逸川,欧智斌,等.单目视觉技术在室内定位中的应用研究[J].测绘工

程,2021,30(06):8-15.

[36]聂伟,文怀志,谢良波,等.一种基于单目视觉的无人机室内定位方法[J].电子与信息学报,2022,44(3):906-914.

[37]王勇,陈国良,李晓园,等.一种相机标定辅助的单目视觉室内定位方法[J].测绘通报,

2018(02):35-40.

[38]张星,刘涛,孙龙培,等.一种视觉与惯性协同的室内多行人目标定位方法[J].武汉大学学报(信息科学版),2021,46(05):672-680.

[39]冯伟夏,郭建龙,薛江,等.基于 YOLOv3 和坐标映射的变电站作业人员精确立体定位算法研究[J].武汉大学学报(工学版),2022,55(06):617-622.

[40]Zhang Z. A flexible new technique for camera calibration[J]. IEEE Transactions on pattern analysis and machine intelligence, 2000, 22.

[41]Wang Qilong, Wu Banggu, Zhu Pengfei, et al, ECA-net: Efficient channel attention for deep convolutional neural networks[C]//The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 11531-11539.

[42]Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks[J].Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023.

[43]Jiang B, Luo R, Mao J, et al. Acquisition of localization confidence for accurate object detection[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018:784-799.

[44]Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.

[45]Zheng Z, Wang P, Liu W, et al. Distance-loU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 12993-13000.

[46]S Du, B Zhang, P Zhang and P Xiang. An Improved Bounding Box Regression Loss Function Based on CIOU Loss for Multi-scale Object Detection[C]// International Conference on Pattern Recognition and Machine Learning(PRML), 2021: 92-98.

[47]Gevorgyan Z.SIoU Loss:More powerful learning for bounding box regression[J]. Computer Science,2022,ArXiv:2205.12740.

[48]秦瑞康,杨月全,李福东,等.基于全参数自适应变异粒子群算法的单目相机标定[J].东南大学学报(自然科学版),2017,47(S1):193-198.

[49]乔佳伟,贾运红,王强.基于改进粒子群算法的煤矿井下相机参数优化[J].煤炭技

术,2022,41(04):119-122.

中图分类号:

 TP391    

开放日期:

 2023-06-15    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式