- 无标题文档
查看论文信息

论文中文题名:

 基于多特征和UKF融合的行人跟踪算法研究    

姓名:

 王雨田    

学号:

 20207223066    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电子与通信工程    

研究方向:

 计算机视觉    

第一导师姓名:

 韩晓冰    

第一导师单位:

 西安科技大学    

第二导师姓名:

 师文    

论文提交日期:

 2023-06-15    

论文答辩日期:

 2023-05-30    

论文外文题名:

 Research on Pedestrian Tracking Algorithm Based on Multi-feature and UKF Fusion    

论文中文关键词:

 行人检测 ; 注意力机制 ; 特征金字塔 ; 行人跟踪 ; 无迹卡尔曼滤波    

论文外文关键词:

 Pedestrian detection ; Attention mechanism ; Feature pyramid ; Pedestrian tracking ; Unscented kalman filter    

论文中文摘要:

行人检测与跟踪是目标检测与跟踪中一个热门的研究方向,在人行横道场景下,可能存在行人尺寸小、密集、遮挡等问题,行人检测和跟踪受到很大的限制。为了解决上述问题,本文对YOLOv5检测算法和DeepSORT跟踪算法进行了改进,主要内容与创新点总结如下:

(1)YOLOv5目标检测算法的改进。本文分析了在人行横道场景下,YOLOv5目标检测算法存在的问题,首先,引入CBAM注意力机制增强网络特征提取能力,更加关注重要特征,提升网络的检测性能。然后,通过对特征金字塔网络的改进,采用跳跃连接和加权特征融合的方法,以解决因目标尺度变化可能出现漏检的问题。最后,用SIoU边界框损失函数代替CIoU边界框损失函数,加速边界框回归,提高定位准确度。通过在CrowHuman数据集上的算法对比仿真,结果表明改进的YOLOv5算法相比原算法在该数据集下精确率提升了5.2%,召回率提升了2.1%,平均精确率提升2.8%。

(2)DeepSORT目标跟踪算法的改进。首先对原始DeepSORT跟踪算法中表观特征模块存在特征提取不充分的问题,改进表观特征模块,用该模块提取行人的HOG特征和外观特征,并将这些特征用于后续特征匹配阶段,从而提高跟踪准确率。其次,针对原DeepSORT跟踪算法在目标状态预测阶段,卡尔曼滤波算法只能适用于简单的线性环境,难以对非线性情况下的目标行人状态进行准确预测的问题。在行人状态预测阶段需要采用无迹卡尔曼滤波(UKF)算法,对卡尔曼滤波算法进行改进。

最后将改进的YOLOv5与改进的DeepSORT算法相结合,使用MOT16数据集以及采集的数据集进行测试。在MOT16数据集上,相对于原始的DeepSORT算法,改进后的算法的跟踪准确度提升了5.2%,行人身份切换次数减少了58次;在行人高峰峰期,改进算法的跟踪准确度提升了4.7%,跟踪精度提升了4.3%,行人身份切换次数减少16次。本文改进的行人跟踪算法在实际的人行横道场景下跟踪准确度为60.3%,且满足实时跟踪的需求,可为智能交通、智能监控等提供一定的理论依据和方法借鉴。

论文外文摘要:

Pedestrian detection and tracking is a popular research direction in target detection and tracking. In the crosswalk scenario, there may be problems such as small size, dense and obscured pedestrians, and pedestrian detection and tracking are greatly limited. In order to solve the above problems, this thesis improves the YOLOv5 detection algorithm and DeepSORT tracking algorithm, and the main content and innovation points are summarized as follows:

(1) Improvement of YOLOv5 target detection algorithm. This thesis analyzes the problems of the YOLOv5 target detection algorithm in the crosswalk scenario. First, the CBAM attention mechanism is introduced to enhance the feature extraction ability of the network and pay more attention to important features to improve the detection performance of the network; then, the improvement of the feature pyramid network is adopted by jump connection and weighted feature fusion to solve the problem of possible missed detection due to the change of target scale; finally, the SIoU bounding box loss function is used instead of the CIoU bounding box loss function to improve the detection performance; Finally, the SIoU bounding box loss function is used to replace the CIoU bounding box loss function to accelerate the bounding box regression and improve the localization accuracy. Through the algorithm comparison simulation on the CrowHuman dataset, the results show that the improved YOLOv5 algorithm improves the accuracy rate by 5.2%, the recall rate by 2.1%, and the average accuracy rate by 2.8% compared with the original algorithm under this dataset.

(2) Improvement of DeepSORT target tracking algorithm. Firstly, to address the problem of inadequate feature extraction in the apparent feature module of the original DeepSORT tracking algorithm, the apparent feature module is improved, and the HOG features and appearance features of pedestrians are extracted with this module, and these features are used in the subsequent feature matching phase, thus improving the tracking accuracy. Second, for the original DeepSORT tracking algorithm in the target state prediction stage, the Kalman filter algorithm can only be applied to a simple linear environment, which is difficult to accurately predict the target pedestrian state in a nonlinear situation and prone to the problem of pedestrian identity ID switching. Therefore, the traceless Kalman filter (UKF) algorithm is needed in the pedestrian state prediction stage, and the Kalman filter algorithm is improved.

Finally, the improved YOLOv5 was combined with the improved DeepSORT algorithm and tested using the MOT16 dataset as well as the collected dataset. On the MOT16 dataset, the improved algorithm improved the tracking accuracy by 5.2% and reduced the number of pedestrian identity switches by 58 times compared to the original DeepSORT algorithm; during the peak period of the collected dataset, the improved algorithm improved the tracking accuracy by 4.7%, the tracking precision by 4.3%, and the number of pedestrian identity switches by 16 times. The improved pedestrian tracking algorithm in this thesis has a tracking accuracy of 60.3% in the actual crosswalk scenario and meets the demand of real-time tracking, which can provide some theoretical basis and methodological reference for intelligent transportation and intelligent monitoring.

参考文献:

[1] Lin C J, Jhang J Y. Intelligent traffic-monitoring system based on YOLO and convolutional fuzzy neural networks[J]. IEEE Access, 2022, 10: 14120-14133.

[2] Li D, Guan Z, Chen Q, et al. Research on Road Traffic Moving Target Detection Method Based on Sequential Inter Frame Difference and Optical Flow Method[C]//Artificial Intelligence in China: Proceedings of the 3rd International Conference on Artificial Intelligence in China. Singapore: Springer Singapore,2022: 376-383.

[3] Shijila B, Tom A J, George S N. Simultaneous denoising and moving object detection using low rank approximation[J]. Future Generation Computer Systems, 2019, 90: 198-210.

[4] Dong C Z, Celik O, Catbas F N, et al. Structural displacement monitoring using deep learning-based full field optical flow methods[J]. Structure and Infrastructure Engineering, 2020, 16(1): 51-71.

[5] Sevilla-Lara L, Liao Y, Güney F, et al. On the integration of optical flow and action recognition[C]//Pattern Recognition: 40th German Conference. Stuttgart: Springer International Publishing, 2019: 281-297.

[6] Zhao Z Q, Zheng P, Xu S, et al. Object detection with deep learning: A review[J]. IEEE transactions on neural networks and learning systems, 2019, 30(11): 3212-3232.

[7] Dhiman P, Kukreja V, Manoharan P, et al. A novel deep learning model for detection of severity level of the disease in citrus fruits[J]. Electronics, 2022, 11(3): 495.

[8] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Columbus: IEEE, 2014: 580-587.

[9] Chaganti S Y, Nanda I, Pandi K R, et al. Image Classification using SVM and CNN[C]//2020 International conference on computer science, engineering and applications (ICCSEA). London: IEEE, 2020: 1-5.

[10] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.

[11] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.

[12] Zhai S, Shang D, Wang S, et al. DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion[J]. IEEE access, 2020, 8: 24344-24357.

[13] Yundong LI, Han D, Hongguang LI, et al. Multi-block SSD based on small object detection for UAV railway scene surveillance[J]. Chinese Journal of Aeronautics, 2020, 33(6): 1747-1755.

[14] 赵璐璐,王学营,张翼等.基于YOLOv5s融合SENet的车辆目标检测技术研究[J].图学学报,2022,43(05):776-782.

[15] 武历展,王夏黎,张倩等.基于优化YOLOv5s的跌倒人物目标检测方法[J].图学学报,2022,43(05):791-802.

[16] Sugirtha T, Sridevi M. A survey on object detection and tracking in a video sequence[C]//Proceedings of International Conference on Computational Intelligence. Singapore: Springer Singapore, 2022: 15-29.

[17] Bhat P G, Subudhi B N, Veerakumar T, et al. Target tracking using a mean-shift occlusion aware particle filter[J]. IEEE Sensors Journal, 2021, 21(8): 10112-10121.

[18] Kumar S, Raja R, Gandham A. Tracking an object using traditional MS (Mean Shift) and CBWH MS (Mean Shift) algorithm with Kalman filter[J]. Applications of Machine Learning, 2020: 47-65.

[19] Kumar A, Ajani O S, Das S, et al. GridShift: A Faster Mode-seeking Algorithm for Image Segmentation and Object Tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8131-8139.

[20] 张惊雷,时鹏,温显斌.基于尺度自适应均值偏移优化的TLD跟踪算法[J].控制与决策,2019,34(01):144-150.

[21] 喻璐璐,谭逢富,陈修涛等.基于Mean Shift的背景梯度修正直方图跟踪算法[J].激光与光电子学进展,2020,57(22):234-241.

[22] Fang Y, Wang C, Yao W, et al. On-road vehicle tracking using part-based particle filter[J]. IEEE transactions on intelligent transportation systems, 2019, 20(12): 4538-4552.

[23] Moghaddasi S S, Faraji N. A hybrid algorithm based on particle filter and genetic algorithm for target tracking[J]. Expert Systems with Applications, 2020, 147: 113188.

[24] Bai S, He Z, Dong Y, et al. Multi-hierarchical independent correlation filters for visual tracking[C]//2020 IEEE International Conference on Multimedia and Expo (ICME). London: IEEE, 2020: 1-6.

[25] Ye J, Fu C, Lin F, et al. Multi-regularized correlation filter for UAV tracking and self-localization[J]. IEEE Transactions on Industrial Electronics, 2021, 69(6): 6004-6014.

[26] 王科平,朱朋飞,杨艺等.多时空感知相关滤波融合的目标跟踪算法[J].计算机辅助设计与图形学学报,2020,32(11):1840-1852.

[27] 汤张泳,吴小俊,朱学峰.多空间分辨率自适应特征融合的相关滤波目标跟踪算法[J].模式识别与人工智能, 2020, 33(01): 66-74.

[28] 吕晨,程德强,寇旗旗等.基于YOLOv3和ASMS的目标跟踪算法[J].光电工程,2021,48(02):70-80.

[29] 姜文涛,涂潮,刘万军.背景与方向感知的相关滤波跟踪[J].中国图象图形学报, 2021, 26(03): 527-541.

[30] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//Computer Vision–ECCV 2016 Workshops. Amsterdam: Springer International Publishing, 2016: 850-865..

[31] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE international conference on image processing (ICIP). Beijing: IEEE, 2017: 3645-3649.

[32] Danelljan M, Bhat G, Khan F S, et al. Atom: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Long Beach: IEEE, 2019: 4660-4669.

[33] Jiang B, Luo R, Mao J, et al. Acquisition of localization confidence for accurate object detection[C]//Proceedings of the European conference on computer vision (ECCV). Munich: Springer, 2018: 784-799.

[34] 朱新丽,才华,寇婷婷等.行人多目标跟踪算法[J].吉林大学学报(理学版),2021,59(05):1161-1170.

[35] 李星辰,柳晓鸣,成晓男.融合YOLO检测的多目标跟踪算法[J].计算机工程与科学,2020,42(04):665-672.

[36] 张文龙,南新元.基于改进YOLOv5的道路车辆跟踪算法[J].广西师范大学学报(自然科学版),2022,40(02):49-57.

[37] Tan M, Le Q. Efficientnetv2: Smaller models and faster training[C]//International conference on machine learning. Online: ACM, 2021: 10096-10106.

[38] Wang Y, Yang H. Multi-target pedestrian tracking based on yolov5 and deepsort[C]//2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). Dalian: IEEE, 2022: 508-514.

[39] Zhang H, Wang Y, Dayoub F, et al. Varifocalnet: An iou-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Online: IEEE, 2021: 8514-8523.

[40] Ma S, Huang Y, Che X, et al. Faster RCNN‐based detection of cervical spinal cord injury and disc degeneration[J]. Journal of Applied Clinical Medical Physics, 2020, 21(9): 235-243.

[41] Hao W, Zhili S. Improved mosaic: algorithms for more complex images[C]//Journal of Physics: Conference Series. Kunming: IOP Publishing, 2020, 1684(1): 012094.

[42] Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//IEEE Conference on Computer Vision and Pattern Recognition, Seattle: IEEE, 2020: 390-391.

[43] Misra D. Mish: A self regularized non-monotonic activation function[J]. arXiv preprint arXiv:1908.08681, 2019.

[44] Zhang X, Wang W, Zhao Y, et al. An improved YOLOv3 model based on skipping connections and spatial pyramid pooling[J]. Systems Science & Control Engineering, 2021, 9(sup1): 142-149.

[45] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE, 2018: 8759-8768.

[46] Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence. New York: AAAI 2020, 34(07): 12993-13000.

[47] Zhang Y, Sun P, Jiang Y, et al. Bytetrack: Multi-object tracking by associating every detection box[C]//Computer Vision–ECCV 2022: 17th European Conference. Tel Aviv: Springer Nature Switzerland, 2022: 1-21.

[48] Potortì F, Torres-Sospedra J, Quezada-Gaibor D, et al. Off-line evaluation of indoor positioning systems in different scenarios: The experiences from IPIN 2020 competition[J]. IEEE Sensors Journal, 2021, 22(6): 5011-5054.

[49] 陈晨,邓赵红,高艳丽等.多模糊核融合的单目标跟踪算法[J].计算机科学与探索,2020,14(05):848-860.

[50] Cai J, Xu M, Li W, et al. MeMOT: multi-object tracking with memory[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8090-8100.

[51] Wojke N,Bewley A, Paulus D.Simple online and realtime tracking with a deep association metric[C]. IEEE International Conference on Image Processing, Beijing: IEEE, 2017: 3645-3649.

[52] Luo C, Song C, Zhang Z. Learning to Adapt Across Dual Discrepancy for Cross-Domain Person Re-Identification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1963-1980.

[53] Diwan T, Anirudh G, Tembhurne J V. Object detection using YOLO: Challenges, architectural successors, datasets and applications[J]. Multimedia Tools and Applications, 2022: 1-33.

[54] Fu H, Song G, Wang Y. Improved YOLOv4 marine target detection combined with CBAM[J]. Symmetry, 2021, 13(4): 623.

[55] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Seattle: IEEE, 2020: 11534-11542.

[56] Ulutan O, Iftekhar A S M, Manjunath B S. Vsgnet: Spatial attention network for detecting human object interactions using graph convolutions[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Seattle: IEEE, 2020: 13617-13626.

[57] Syazwany N S, Nam J H, Lee S C. MM-BiFPN: multi-modality fusion network with Bi-FPN for MRI brain tumor segmentation[J]. IEEE Access, 2021, 9: 160708-160720.

[58] 董小伟,韩悦,张正等.基于多尺度加权特征融合网络的地铁行人目标检测算法[J].电子与信息学报,2021,43(07):2113-2120.

[59] Gevorgyan Z. SIoU loss: More powerful learning for bounding box regression[J]. arXiv preprint arXiv:2205.12740, 2022.

中图分类号:

 TP391    

开放日期:

 2023-06-16    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式