查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的行人检测与跟踪研究
姓名：	宋滋苗
学号：	20208223076
保密级别：	公开
论文语种：	chi
学科代码：	0854
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机技术
研究方向：	图形图像处理
第一导师姓名：	李占利
第一导师单位：	西安科技大学
论文提交日期：	2023-06-14
论文答辩日期：	2023-06-05
论文外文题名：	Research on pedestrian detection and tracking based on deep learning
论文中文关键词：	深度学习 ; YOLOV5 ; 行人检测 ; DeepSort ; 行人跟踪 ; 人流统计与分析系统
论文外文关键词：	Deep learning ; YOLOV5 ; Pedestrian detection ; DeepSort ; Pedestrian trackings ; People flow counting and analysis system
论文中文摘要：	︿随着视频监控应用场景的不断增多，例如学校、公园等环境复杂、人口密集的公共场所，都需要视频监控的监管防护，但是大多数监控系统只能静态存储和显示视频信息，缺乏对视频中目标的检测和分析，因此对视频监控的智能化应用提出了更高的要求。行人作为视频监控中最常见的对象，识别和定位视频中的行人对于维护公共场所的安全具有重要的价值和意义。因此，本文针对行人检测的漏检和误检问题以及行人多目标跟踪的ID跳变问题提出了相关的改进算法，主要研究工作如下。针对行人检测过程中容易出现的不同大小尺度变化和行人之间遮挡等情况导致的漏检和误检问题，本文在YOLOV5算法的基础上提出了一种多尺度加权特征融合的行人检测方法YOLO-WC。首先，通过设计Res-spp模块来更好地融合行人的局部特征和全局特征，并引入CIOU LOSS作为边界框回归损失函数，可以有效改善因行人之间遮挡而导致的漏检问题；其次，设计适合行人目标特征的Neck结构，并提出了基于权重的跨层连接机制用于融合多尺度特征，从而改善行人在不同大小尺度变化情况下的漏检和误检问题；最后，通过在VOC行人数据集和WiderPerson数据集上进行对比实验和消融实验，分别达到了87.9％和86.7%的精度，相比YOLOV5算法，分别提升了4.2%和4.8%，能够有效改善行人检测过程中的漏检和误检问题，为后续行人多目标跟踪提供了良好的基础。针对行人多目标跟踪过程中存在的遮挡和行人运动速度变化等情况导致的ID跳变问题，本文在DeepSort算法的基础上提出了一种基于改进DeepSort的行人多目标跟踪算法。首先，优化DeepSort算法的特征提取网络，设计基于Resnet34的行人外观特征提取网络，为其引入三元组损失函数，并加入自适应平均池化以实现高效的图像表示，提升了网络对于行人深度外观特征的提取能力，从而提高跟踪算法的精度；其次，在卡尔曼滤波器状态预测阶段，设计了基于多模型的卡尔曼滤波器，更好的适应行人多目标跟踪过程中目标运动速度的变化，有效缓解了跟踪过程中出现的ID跳变问题；最后，通过在MOT16数据集上进行实验，本文算法的MOTA和MOTP指标结果达到了53.4%和81.6%，相比DeepSort算法提升了3.8%和1.5%，ID跳变次数减少了276，为了验证本文跟踪算法的泛化性，在实际的校园环境中进行了跟踪实验，本文改进后的方法可以更加有效的跟踪校园环境中的行人，在实际任务中具有较好的跟踪结果。在上述研究内容的基础上，完成了人流统计与分析系统。系统不仅可以实现对视频中行人的检测与跟踪，还可以实现人数的统计与分析，并采用折线图的形式将人流变化的信息进行可视化展示，对于校园等公共场所的行人预警分析具有一定的实用价值。﹀
论文外文摘要：	︿ With the increasing prevalence of video surveillance in various complex and densely populated public areas such as schools and parks, there is a growing demand for intelligent applications that go beyond static storage and display of video information. Specifically, the detection and analysis of object within surveillance videos, particularly pedestrians, are of crucial significance for ensuring public safety. This paper addresses the challenges of missed detections, false detections in pedestrian detection, and identity switches in multi-object tracking, proposing relevant improvement algorithms to tackle these issues. The main contributions of this research are as follows: In response to the problem of missed detections and false alarms caused by variations in pedestrian scales and occlusions, a novel pedestrian detection method named YOLO-WC is introduced, building upon the YOLOv5 algorithm. Firstly, a Res-spp module is designed to effectively integrate both local and global features of pedestrians. Moreover, the CIOU loss function is introduced as the bounding box regression loss, mitigating missed detections caused by occlusions between pedestrians. Additionally, a Neck structure tailored to pedestrian target features is designed, and a weight-based cross-layer connection mechanism is proposed to fuse multi-scale features, thereby addressing missed detections and false alarms arising from scale variations. Comparative and ablation experiments on the VOC pedestrian dataset and the WiderPerson dataset yield accuracies of 87.9% and 86.7% respectively, surpassing the YOLOv5 algorithm by 4.2% and 4.8%. The proposed method effectively improves the issues of missed detections and false alarms in pedestrian detection, thereby establishing a solid foundation for subsequent pedestrian multi-object tracking. Regarding the issue of ID Switch from occlusions and variations in pedestrian motion speeds during multi-object tracking, an enhanced pedestrian multi-object tracking algorithm based on DeepSort is presented. Firstly, the feature extraction network of the DeepSort algorithm is optimized by incorporating a Resnet34 based pedestrian appearance feature extraction network. A triplet loss function is introduced, and adaptive average pooling is employed to efficiently represent images, enhancing the network's ability to extract deep appearance features and thus improving tracking accuracy. Secondly, a multi-model-based Kalman filter is devised in the state prediction stage, effectively adapting to changes in target motion speeds during the multi-object tracking process and alleviating the issue of identity switches. Experimental results on the MOT16 dataset demonstrate that the proposed algorithm achieves MOTA and MOTP metrics of 53.4% and 81.6%, respectively, outperforming the DeepSort algorithm by 3.8% and 1.5%. Moreover, the number of identity switches is reduced by 276. Furthermore, to validate the generalizability of the proposed tracking algorithm, tracking experiments are conducted in campus environment, demonstrating the improved method's efficacy in pedestrian tracking for practical applications. Building upon the aforementioned research, a people counting and analysis system is developed. The system not only facilitates the detection and tracking of pedestrians in videos but also enables people counting and analysis. Moreover, the system visualizes information on people flow variations using line graphs, which holds practical value for pedestrian early warning analysis in public places such as campuses. ﹀
参考文献：	︿ [1]Zhou Z, Yu H, Shi H. Optimization of wireless video surveillance system for smart campus based on internet of things[J]. IEEE Access, 2020: 136434-136448. [2]罗艳, 张重阳, 田永鸿等. 深度学习行人检测方法综述[J]. 中国图象图形学报, 2022, 27(07): 2094-2111. [3]曹自强, 赛斌, 吕欣. 行人跟踪算法及应用综述[J]. 物理学报, 2020, 69(08): 41-58. [4]Sun Z, Chen J, Chao L, et al. A survey of multiple pedestrian tracking based on tracking-by-detection framework[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1819-1833. [5]刘晓蓉. 基于视频的运动目标检测与跟踪方法研究[D]. 西南科技大学, 2022. [6]Xiao Y, Zhou K, Cui G, et al. Deep learning for occluded and multi‐scale pedestrian detection: A review[J]. Iet Image Processing, 2021, 15(2): 286-301. [7]Jia L, Lei Z, Zhao Z, et al. Review of multi-object tracking based on deep learning[C]//Proceedings of the Computing and Machine Learning (CACML). IEEE, 2022: 719-725. [8]王丽园, 赵陶, 王文等. 具有姿态变化鲁棒性的行人检测跟踪算法[J].计算机工程与设计, 2022, 43(10): 2877-2881. [9]陈凯, 宋晓, 刘敬. 基于深度卷积网络与尺度不变特征变换的行人跟踪框架[J]. 中国科学:信息科学, 2018, 48(07): 841-855. [10]李文书, 韩洋, 阮梦慧等. 改进的基于增强型HOG的行人检测算法[J]. 计算机系统应用, 2020, 29(10): 199-204. [11]任梦茹, 侯宏录, 韩修来. Gabor特征结合快速HOG特征的行人检测[J]. 计算机系统应用, 2021, 30(10): 259-263. [12]陈泽, 叶学义, 钱丁炜等. 基于改进Faster R-CNN的小尺度行人检测[J]. 计算机工程, 2020, 46(09): 226-232+241. [13]Huang X, Ge Z, Jie Z, et al. Nms by representative region: Towards crowded pedestrian detection by proposal pairing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 10750-10759. [14]Shao X, Wei J, Guo D, et al. Pedestrian detection algorithm based on improved faster rcnn[C]//Proceedings of the Electronic and Automation Control Conference (IAEAC). IEEE, 2021, 5: 1368-1372. [15]Lan W, Dang J, Wang Y, et al. Pedestrian detection based on YOLO network model[C]//Proceedings of the International Conference on Mechatronics and Automation (ICMA). IEEE, 2018: 1547-1551. [16]徐守坤, 邱亮, 李宁等. 基于HOG-CSLBP及YOLOV2的行人检测[J]. 计算机工程与设计, 2019, 40(10): 2964-2968. [17]Yang Y, Guo B, Li C, et al. An improved YOLOV3 algorithm for pedestrian detection on uav imagery[C]//Proceedings of the Thirteenth International Conference on Genetic and Evolutionary Computing. Springer Singapore, 2020: 253-261. [18]Hsu WY, Lin WY. Ratio-and-scale-aware YOLO for pedestrian detection[J]. IEEE transactions on image processing, 2020: 934-947. [19]孙好, 董兴法, 王军等. 基于改进YOLOV4-tiny轻量化校内行人目标检测算法[J]. 计算机工程与应用, 2022: 1-12 [20]Zan M, Zhou H, Han D, et al. Survey of particle filter target tracking algorithms[J]. Computer Engineering and Applications, 2019, 55(5): 8-17. [21]Kim Y, Bang H. Introduction to kalman filter and its applications[J]. Introduction and Implementations of the Kalman Filter, 2018: 1-16. [22]陆意骏, 陈一民, 黄诗华. 运动像机下基于粒子滤波的多目标跟踪算法[J]. 计算机工程与设计, 2010, 31(07): 1513-1515+1569. [23]Gong Z , Gao G , Wang M, et al. An adaptive particle filter for target tracking based on double space-resampling[J]. IEEE Access, 2021, 9: 1-1. [24]Munoz R, Aguirre E, Miguel G, et al. People detection and tracking using stereo vision and color[J]. Image and Vision Computing, 2007, 25(6): 995-1007. [25]王宗祺, 范勇, 高琳等. 基于局部特征匹配的多目标跟踪算法[J]. 计算机工程与设计, 2014, 35(12): 4306-4310. [26]Wang L, Pham N, Ng T, et al. Learning deep features for multiple object tracking by using a multi-task learning strategy[C]//Proceedings of the International Conference on Image Processing (ICIP). IEEE, 2014: 838-842. [27]Chao M, Huang J, Yang X, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE international conference on computer vision, 2016: 3074-3082. [28]Zhang Y, Chan W, Jaitly N. Very deep convolutional networks for end-to-end speech recognition[C]//Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017: 4845-4849. [29]Tan C, Sun F, Kong T, et al. A survey on deep transfer learning[C]//Proceedings of the Artificial Neural Networks and Machine Learning-ICANN 2018: 27th International Conference on Artificial Neural Networks. Springer International Publishing, 2018: 270-279. [30]Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2016: 4293-4302. [31]Qi Y, Zhang S, Qin L, et al. Hedged deep tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2016: 4303-4311. [32]Qi Y, Zhang S, Qin L, et al. Hedging deep features for visual tracking[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(5): 1116-1130. [33]Leibe B, Matas J, Sebe N. Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]//Proceedings of the Computer Vision-ECCV 2016: 14th European Conference. Springer International Publishing, 2016: 472-488. [34]Danelljan M, Robinson A, Shahbaz Khan F, et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]//Computer Vision–ECCV 2016: 14th European Conference. Springer International Publishing, 2016: 472-488. [35]Danelljan M, Bhat G, Khan F, et al. Eco: Efficient convolution operators for tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 6638-6646. [36]王慧燕, 杨宇涛, 张政等. 深度学习辅助的多行人跟踪算法[J]. 中国图象图形学报, 2017, 22(03): 349-357. [37]袁大龙, 纪庆革. 协同运动状态估计的多目标跟踪算法[J]. 计算机科学, 2017, 44(S2): 154-159. [38]Sun S, Akhtar N, Song H, et al. Deep affinity network for multiple object tracking[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(01): 104-119. [39]Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//Proceedings of the 2017 IEEE international conference on image processing (ICIP). IEEE, 2017: 3645-3649. [40]Bewley A, Ge Z, Ott L, et al. Simple online and realtime tracking[C]//Proceedings of the 2016 IEEE International Conference on Image Processing. IEEE, 2016: 3645-3649. [41]Wei L , Rui Z , Tong X, et al. Deepreid: Deep filter pairing neural network for person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2014: 152-159. [42]Zhang Q. Multi-object trajectory extraction based on YOLOV3-DeepSort for pedestrian-vehicle interaction behavior analysis at non-signalized intersections[J]. Multimedia Tools and Applications, 2022, 82(10): 15223-15245. [43]郭克友, 贺成博, 王凯迪等. COVID-19疫情下基于YOLOV4的安全社交距离风险评估[J]. 计算机工程, 2022, 48(10): 28-36. [44]邵小强, 李鑫, 杨涛等. 改进YOLOV5s和DeepSort的井下人员检测及跟踪算法[J]. 煤炭科学技术, 2022: 1-12 [45]陆峰, 刘华海, 黄长缨等. 基于深度学习的目标检测技术综述[J]. 计算机系统应用, 2021, 30(03): 1-13. [46]刘志成, 祝永新, 汪辉等. 基于卷积神经网络的多目标实时检测[J]. 计算机工程与设计, 2019, 40(04): 1085-1090. [47]Sultana F, Sufian A, Paramartha D. A review of object detection models based on convolutional neural network[J]. Intelligent computing: image processing based applications 2020: 1-16. [48]Lecun Y, Boser B, Denker JS et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989 1(4): 541-551. [49]Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2014: 580-587. [50]Girshick R. Fast r-cnn[J]. In: Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [51]Ren S , He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017: 1137-1149. [52]Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2016: 779-788. [53]Everingham M, Zisserman A, Williams C, et al. The 2005 pascal visual object classes challenge[C]//Proceedings of the First PASCAL Machine Learning Challenges Workshop. Springer Berlin Heidelberg, 2006: 117-176. [54]Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2017: 7263-7271. [55]Redmon J, Farhadi A. YOLOV3: An incremental improvement[J]. arXiv preprint arXiv: 1804. 02767, 2018, doi:10.48550/arXiv.1804.02767. [56]唐昊, 张晓艳, 韩江洪等. 基于连续时间半马尔可夫决策过程的Option算法[J].计算机学报, 2014, 37(09): 2027-2037. [57]孙丽莉. 多目标跟踪关键技术及数据关联算法研究[D]. 西安电子科技大学, 2020. [58]石昌森, 侯巍, 杨琳琳等. 基于位置一步预测的核相关目标跟踪算法[J]. 计算机系统应用, 2022, 31(04): 260-267. [59]李玺, 查宇飞, 张天柱等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. [60]张瑶, 卢焕章, 张路平等. 基于深度学习的视觉多目标跟踪算法综述[J]. 计算机工程与应用, 2021, 57(14): 55-66. [61]He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(09): 1904-1916. [62]Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Proceedings of the In Computer Vision-ECCV 2016: 14th European Conference, Springer International Publishing, 2016: 21-37. [63]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2016: 770-778. [64]Schroff F, Kalenichenko D, Philbin J. FaceNet: A Unified Embedding for Face Recognition and Clustering[J]. IEEE, 2015: 815-823. [65]Zheng L , Shen L , Tian L, et al. Person Re-identification Meets Image Search[J]. Computer Science, 2015: 1-1. [66]Milan A , Leal-Taixe L , Reid I , et al. MOT16: A benchmark for multi-object tracking[J]. arXiv preprint arXiv:1603. 00831, 2016, doi:10.48550/arXiv.1603.00831. [67]Chen L, Ai H, Zhuang Z, et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification[C]//Proceedings of the IEEE International Conference on Multimedia & Expo. IEEE, 2018: 1-6. [68]Ji Z, Hua Y, Liu N, et al. Online multi-object tracking with dual matching attention networks[C]//Proceedings of the European Conference on Computer Vision. Springer, Cham, 2018: 366-382. ﹀
中图分类号：	TP391
开放日期：	2023-06-14

附件下载