查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于点云空间的人员定位技术研究
姓名：	张汉瑾
学号：	20207223059
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	电子与通信工程
研究方向：	计算机视觉
第一导师姓名：	蔺丽华
第一导师单位：	西安科技大学
论文提交日期：	2023-06-15
论文答辩日期：	2023-05-30
论文外文题名：	Research on personnel positioning technology based on point cloud space
论文中文关键词：	YOLOv5 ; 人员定位 ; 2D3D-MatchNet ; 点云空间
论文外文关键词：	YOLOv5 ; Personnel positioning ; 2D3D-MatchNet ; Point cloud space
论文中文摘要：	︿在视频监控区域中人员定位的智能化应用具有重要意义，很多研究者开始聚焦于视频监控区域内人员定位的研究。本文以选煤厂视频监控区域下的人员为研究对象，针对厂房环境干扰多、人员定位准确度较低的问题。本文主要的研究内容如下：（1）设计了基于YOLOv5改进的目标检测算法和基于2D3D-MatchNet改进的关键点匹配算法，对人员进行精准的定位。首先通过YOLOv5算法检测到视频中整个人员的目标框，将目标框的底部中心点作为人员的像素坐标，由于目标检测的准确度直接决定了视觉定位的精度，为了提升人员检测的准确度，本文引入Swin Transformer和SimAM模块对YOLOv5算法进行改进。其次利用2D3D-MatchNet网络实现了查询图像和点云关键点的自动匹配，根据获得的图像和点云关键点匹配对，通过EPnP算法求解摄像机位姿，为了提高图像和点云关键点的匹配准确度，本文改进了2D3D-MatchNet网络，引入了空间变换网络和BN层实现图像特征的有效提取，从而实现精确的摄像机位姿估计和人员精确定位；最后结合摄像机成像模型，将人员像素坐标和摄像机位姿转换为点云空间中的人员精确定位，实现人员的数字化定位。YOLOv5算法改进后mAP值提升了4.5%，改进后的2D3D-MatchNet网络用于人员定位平均提升了12cm的定位精度，相比经典方法每帧提升了0.3秒，这些改进方法在人员识别准确度和定位精度方面都提升了。（2）设计了基于单应性变换的人员定位方法。在某些室内环境中，例如空旷的楼道，图像和点云特征描述符较少，难以通过EPnP算法求解摄像机位姿并实现人员定位。为解决这个问题，本文引入了单应性变换建立图像之间的映射关系。在已知摄像机初始外参数的基础上，通过单应性矩阵对摄像机位姿变化时拍摄的图像进行修正，从而利用摄像机的初始外参数实现人员定位。基于单应性变换的人员定位在误差均小于50cm，人员定位过程中一帧耗时0.042秒。该方法满足室内人员定位精度要求且耗时较少，为实际应用提供了可行的解决方案。﹀
论文外文摘要：	︿ The intelligent application of personnel positioning in video surveillance area is of great significance. Many researchers have begun to focus on the research of personnel positioning in video surveillance area. In this paper, the personnel in the video monitoring area of coal preparation plant are taken as the research object, aiming at the problems of many interference in the plant environment and low accuracy of personnel positioning. The main research contents of this paper are as follows : ( 1 ) An improved target detection algorithm based on YOLOv5 and an improved key point matching algorithm based on 2D3D-MatchNet are designed to accurately locate personnel. Firstly, the target frame of the whole person in the video is detected by the YOLOv5 algorithm, and the bottom center point of the target frame is taken as the pixel coordinate of the person. Because the accuracy of the target detection directly determines the accuracy of the visual positioning, in order to improve the accuracy of the person detection, this paper introduces the Swin Transformer and SimAM module to improve the YOLOv5 algorithm. Secondly, the 2D3D-MatchNet network is used to realize the automatic matching of the key points of the query image and the point cloud. According to the obtained matching pairs of the key points of the image and the point cloud, the camera pose is solved by the EPnP algorithm. In order to improve the matching accuracy of the key points of the image and the point cloud, this paper improves the 2D3D-MatchNet network, and introduces the spatial transformation network and the BN layer to realize the effective extraction of image features, so as to achieve accurate camera pose estimation and accurate personnel positioning. Finally, combined with the camera imaging model, the personnel pixel coordinates and camera pose are transformed into the precise positioning of personnel in the point cloud space to realize the digital positioning of personnel. The mAP value of the improved YOLOv5 algorithm is increased by 4.5%. The improved 2D3D-MatchNet network improves the positioning accuracy of the personnel positioning by an average of 12cm, which is 0.3 seconds per frame higher than the classical method. These improved methods have improved the accuracy of personnel identification and positioning accuracy. ( 2 ) A personnel positioning method based on homography transformation is designed. In some indoor environments, such as open corridors, there are few image and point cloud feature descriptors, and it is difficult to solve the camera pose and achieve personnel positioning through the EPnP algorithm. In order to solve this problem, this paper introduces homography transformation to establish the mapping relationship between images. On the basis of the known initial external parameters of the camera, the image taken when the camera pose changes is corrected by the homography matrix, so that the initial external parameters of the camera can be used to realize the personnel positioning. The error of personnel positioning based on homography transform is less than 50cm, and one frame takes 0.042 second in the process of personnel positioning. This method meets the requirements of indoor personnel positioning accuracy and takes less time, which provides a feasible solution for practical applications. ﹀
参考文献：	︿ [1]阮陵, 张翎, 许越等. 室内定位:分类、方法与应用综述[J]. 地理信息世界, 2015(2): 8-14. [2]于素君, 易昌华, 李春芬等. 北斗卫星导航系统定位原理及其应用综述[J]. 物探装备, 2020, 30(01): 59-63. [3]Want R, Hopper A, Falcao V, et al. The Active Badge Location System[J]. ACM Transactions on Information Systems, 1992, 10(1): 91-102. [4]毛玲, 李振波, 张大伟等. 基于红外传感器的移动微机器人定位研究[J]. 传感器与微系统, 2014, 33(12): 38-41. [5]邓诗凡, 蒋伟, 杨俊杰等. 蓝牙到达角室内定位抗干扰优化研究[J]. 导航定位学报, 2022, 10(06): 75-80. [6]张驰, 张峰, 刘叶楠等. 基于融合聚类的蓝牙指纹室内定位算法优化[J]. 计算机仿真, 2020, 37(07): 314-318. [7]乐燕芬, 许远航, 施伟斌. 基于DPC指纹子空间匹配的室内WiFi定位方法[J]. 仪器仪表学报, 2021, 42(11): 106-114. [8]朱正伟, 蒋威, 张贵玲等. 基于RSSI的室内WiFi定位算法[J]. 计算机工程与设计, 2020, 41(10): 2958-2962. [9]沈郭浩, 马永涛, 刘开华等. 非视距环境下室内RFID标签定位算法研究[J]. 计算机工程与科学, 2016, 38(03): 454-459. [10]罗文兴. 面向教育机器人的室内定位研究[D]. 武汉: 华中师范大学, 2018. [11]谢良波, 李宇洋, 王勇等. 基于自适应蝙蝠算法的室内RFID定位算法[J]. 通信学报, 2022, 43(08): 90-99. [12]Claudio Piciarelli. Visual Indoor Localization in Known Environments.[J]. IEEE Signal Process, 2016, 23(10): 1-1. [13]Gakne P V, O'Keefe K. Monocular-based pose estimation using vanishing points for indoor image correction[C]//International Conference on Indoor Positioning and Indoor Navigation, 2017:1-7. [14]Zeyad Farisi, Tian Lianfang, Li Xiangyang, Zhu Bin.Vision based Indoor Localization Method via Convolution Neural Network[J]. International Journal of Advanced Computer Science and Applications, 2019, 10(7): 785-790. [15]Zeyad Farisi, Tian Lianfang, Li Xiangyang, Zhu Bin. Vision-based Indoor Localization Algorithm using Improved ResNet[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(2): 564-569. [16]孙森震, 李广云, 冯其强等. 可见光通信与双目视觉的室内定位[J]. 光学精密工程, 2020, 28(04): 834-843. [17]黄刚, 蔡浩, 邓超等. 基于室内标志的视觉定位方法[J]. 交通信息与安全, 2021, 39(06): 172-179. [18]邓晖, 邓逸川, 欧智斌等. 单目视觉技术在室内定位中的应用研究[J].测绘工程, 2021, 30(06): 8-15. [19]Huang C, Lin H, et al. YO-VIO: Robust Multi-Sensor Semantic Fusion Localization in Dynamic Indoor Environments[C]//International Conference on Indoor Positioning and Indoor Navigation, 2021:1-6. [20]张艳, 王宇. 基于视觉里程计的室内位姿测量技术研究[J]. 电子测量与仪器学报, 2022, 36(06): 73-81. [21]聂伟, 文怀志, 谢良波等. 一种基于单目视觉的无人机室内定位方法[J]. 电子与信息学报, 2022, 44(03): 906-914. [22]Zeng A, Xiao J. 3DMatch:Learning Local Geometric Descriptors from RGB-D Reconstructions[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 199-208. [23]Deng H, Birdal T, Ilic S. PPFNet: Global Context Aware Local Features for Robust 3D Point Matching[J]. IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 195-205. [24]Wang Y , Solomon J M. Deep Closest Point: Learning Representations for Point Cloud Registration[J]. lEEE/CVF International Conference on Computer Vision, 2019: 3523-3532. [25]Li Y, Harada T. Lepard: Learning partial point cloud matching in rigid and deformable scenes[J]. 1EEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 5554-5564. [26]Feng M, Hu S, Ang M, et al. 2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud[J]. International Conference on Robotics and Automation, 2019: 4790-4796. [27]Pham Q H, Uy M A, Hua B S, et al. LCD: Learned Cross-Domain Descriptors for 2D-3D Matching[J]. The AAAI Conference on Artificial Intelligence, 2020: 11856-11864. [28]Song Z, Wang C, et al. Recalling Direct 2D-3D Matches for Large-Scale Visual Localization[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021:1191-1197. [29]Mathew A B, Kurian S. Identification of Malicious Code Variants using Spp-Net Model and Color Images[C]//IEEE 15th International Conference on Industrial and Information Systems. IEEE, 2020: 581-585. [30]Liu B, Zhao W, Sun Q. Study of object detection based on Faster R-CNN[C]//Chinese Automation Congress, 2017: 6233-6236. [31]Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149. [32]Wei Liu, Dragomir Anguelov, et al. SSD: Single Shot MultiBox Detector[J]. CoRR, 2015, abs/1512.02325. [33]Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified Real-Time Object Detection[J]. CoRR, 2015, abs/1506.02640. [34]Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[J]. IEEE, 2017, abs/1612. 08242: 6517-6525. [35]Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[J]. arXiv e-prints, 2018. [36]迟德霞, 王洋, 宁立群等. 张正友法的摄像机标定试验[J]. 中国农机化学报, 2015, 36(2): 4. [37]Liu Z, Lin Y, Cao Y, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[J]. IEEE/CVF International Conference on Computer Vision, 2021, abs/2103. 14030: 9992- 10002. [38]叶铭亮, 周慧英, 李建军. 基于改进Swin Transformer的森林火灾检测算法[J]. 中南林业科技大学学报, 2022, 42(08): 101-110. [39]Jiayi Ma, Linfeng Tang, Fan Fan, et al. SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(07): 1200-1217. [40]Zu F, Zhou C J, Wang X. An improved convolutional neural network based on centre loss for facial expression recognition[J]. International journal of adaptive and innovative systems, 2021, 3(01): 58-73. [41]Wang Zhongwen, Lu Haozhu, Jin Junlan, et al. Human Action Recognition Based on Improved Two-Stream Convolution Network[J]. Applied Sciences, 2022, 12(12). [42]李子茂, 李嘉晖, 尹帆等. 基于可形变卷积与SimAM注意力的密集柑橘检测算法[J]. 中国农机化学报, 2023, 44(02): 156-162. [43]Xu D, You F L, Min T. A general recursive linear method and unique solution pattern design for the perspective-n-point problem[J]. Image & Vision Computing, 2021, 26(6): 740-750. [44]Mortensen E N, Deng H, Shapiro L G. A SIFT descriptor with global context[J].IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, 1: 184-190. [45]Finkelstein H, Abbas T A. Power-Efficient Direct Time of Flight LIDAR. US2022099814A1[P], 2022. [46]Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems, 2015, 28: 2017-2025. [47]宫法明, 徐晨曦, 李厥瑾. 基于对抗深度学习的无人机航拍违建场地识别[J]. 计算机工程, 2022, 48(01): 275-280. [48]刘建伟, 赵会丹, 罗雄麟等. 深度学习批归一化及其相关算法研究进展[J]. 自动化学报, 2020, 46(06): 1090-1120. [49]赵谦, 童申鑫, 贺顺等. 改进的SURF-RANSAC图像匹配算法[J]. 计算机工程与设计, 2021, 42(10): 2902-2909. [50]秦绪佳, 陈国富, 王洋洋等. 改进ORB特征点检测的图像拼接方法[J]. 小型微型计算机系统, 2022, 43(01): 98-102. ﹀
中图分类号：	TP391
开放日期：	2023-06-15

附件下载