查看论文信息

查看全文

免费浏览

查看论文信息

论文中文题名：	基于深度学习的行人检测与跨模态行人重识别算法研究
姓名：	官志斌
学号：	21207223096
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	电子与通信工程
研究方向：	计算机视觉
第一导师姓名：	马莉
第一导师单位：	西安科技大学
论文提交日期：	2024-06-11
论文答辩日期：	2024-06-05
论文外文题名：	Deep Learning-based Person Detection and Cross-modality Person Re-identification Algorithm Research
论文中文关键词：	行人检测 ; 跨模态行人重识别 ; 中间模态 ; 深度学习 ; 跨模态行人搜索系统
论文外文关键词：	Person detection ; Cross-modality person re-identification ; Middle modality ; Deep Learning ; Cross-modality person search system
论文中文摘要：	︿行人搜索旨在对未经处理的图像或视频中的行人进行检测和识别，涉及到行人检测与行人重识别两个过程。在行人检测阶段，行人遮挡以及行人目标过小导致的误检和漏检问题都会使行人重识别的性能下降。在行人重识别阶段，由于红外和可见光图像之间存在模态差异和类内差异，导致现有算法的精度不高。因此，本文针对行人检测阶段存在的误检和漏检问题以及跨模态行人重识别阶段存在的模态差异和类内差异的问题展开研究以进一步提高现有算法的精度。从社会安全角度来看，该技术在寻找失踪人员、追踪嫌疑人等方面具有重要的价值和意义。针对行人检测存在的行人遮挡以及行人目标过小导致的误检和漏检问题。本文提出了一种基于改进YOLOv5s的行人检测算法。在YOLOv5s的基础上加入改进的混合注意力模块PSAM，通过提取更细粒度的多尺度空间信息，使模型能更好地关注图像中的重要信息，充分提取目标行人区域的有效特征，以降低误检和漏检。为了进一步降低漏检，本文还在检测头部分加入160×160的检测特征图模块，用于检测4×4以上的小目标行人。最后在由WiderPerson、CrowdHuman 和CityPerson三个公开数据集构建的混合行人数据集上进行验证，实验结果表明本文方法的AP和FPS分别达到86.4%和60，具有更好的性能。针对跨模态行人重识别存在的模态差异和类内差异问题。本文利用中间模态生成器将不同模态图像映射到统一的特征空间以生成中间模态图像，再通过双流参数共享网络提取特征减小模态差异。为了进一步减小模态差异，采用全局特征与局部特征相结合的多粒度池化策略提高模型的表征学习能力。为了减小类内差异，联合分布一致性损失、标签平滑交叉熵损失和异质中心三元组损失进一步优化模型。实验结果表明在SYSU-MM01数据集的全搜索模式下，本文方法的mAP达到68.11%，性能提高了3.29%。为了对改进算法的识别结果进行直观地展示和评估，开发了端到端的跨模态行人搜索系统，该系统在可见光和红外两种模态下可以进行行人重识别。研究结果表明，该算法为行人搜索任务提供了一种参考。﹀
论文外文摘要：	︿ Person search is designed to detect and identify person in unprocessed images or videos, which involves two processes: person detection and person re-identification. In the person detection stage, the performance of person re-identification will be degraded due to the false detection and missed detection problems caused by person occlusion and the small person target. In the person re-identification stage, the accuracy of the existing algorithms is not high due to the modal differences and intra-class differences between infrared and visible light images. Therefore, the thesis focuses on the problems of false detection and missed detection in the person detection stage, as well as the modal differences and intra-class differences in the cross-modality person re-identification stage, so as to further improve the accuracy of the existing algorithms. From the perspective of social security, this technology has important value and significance in finding missing persons and tracking suspects. Aiming at the problem of person occlusion and the presence of small target persons that lead to false and missed detections in person detection, in this thesis, a person detection algorithm based on improved YOLOv5s is proposed. On the basis of YOLOv5s, an improved hybrid attention module PSAM is added, and by extracting more fine-grained multi-scale spatial information, the model can better pay attention to the important information in the image and fully extract the effective features of the target person area, so as to reduce false detections and missed detections.In order to further reduce the missed detection, the detection feature map module of 160×160 is introduced in the detection head part for detecting small target person with more than 4×4.Finally, Finally, the verification is carried out on the hybrid person dataset constructed by three public datasets: WiderPerson, CrowdHuman and CityPerson. The experimental results show that the AP and FPS of the proposed method reach 86.4% and 60, respectively, and have better performance. To address the problems of modal differences and intra-class differences in cross-modality person re-identification. In this thesis, the middle modality generator is used to map different modal images to a unified feature space to generate intermediate modal images, and then the features are extracted through the two-stream parameter sharing network to reduce the modal differences. In order to further reduce the modal differences, a multi-granularity pooling strategy combining global features and local features was used to improve the representation learning ability of the model. In order to reduce the intra-class differences, the model was further optimized by combining the distribution consistency loss, label smoothing cross-entropy loss and heterogeneous center triplet loss. The experimental results show that in the full search mode of SYSU-MM01 dataset, the mAP of this thesis's method reaches 68.11%, which improves the performance by 3.29%. In order to visually display and evaluate the recognition results of the improved algorithm, an end-to-end cross-modality person search system was developed, which can perform person re-identification in both visible and infrared modes. The results show that the algorithm provides a reference for person search tasks. ﹀
参考文献：	︿ [1]陈策, 李嘉豪, 何明森. 视频监控专网安全运营解决方案[J]. 信息安全研究, 2022, 7(E1): 16. [2]杨永胜,邓淼磊,李磊,等.基于深度学习的行人重识别综述[J].计算机工程与应用,2022,58(09):51-66. [3]黄同愿, 向国徽, 杨雪姣. 基于深度学习的行人检测技术研究进展[J]. 重庆理工大学学报 (自然科学), 2019, 33(4): 98-109. [4]Farenzena M, Bazzani L, Perina A, et al. Person re-identification by symmetry-driven accumulation of local features[C]//2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010: 2360-2367. [5]赵才荣,齐鼎,窦曙光,等.智能视频监控关键技术:行人再识别研究综述[J].中国科学:信息科学,2021,51(12):1979-2015. [6]Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893. [7]Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. IEEE, 2001, 1: I-I. [8]Zhu Q, Yeh M C, Cheng K T, et al. Fast human detection using a cascade of histograms of oriented gradients[C]//2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06). IEEE, 2006, 2: 1491-1498. [9]Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587. [10]He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916. [11]Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [12]Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28. [13]Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788. [14]Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37. [15]Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271. [16]Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint ar Xiv:1804.02767, 2018. [17]Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020. [18]Zheng L, Yang Y, Hauptmann A G. Person re-identification: Past, present and future[J]. arXiv preprint arXiv:1610.02984, 2016. [19]Geng M, Wang Y, Xiang T, et al. Deep transfer learning for person re-identification[J]. arXiv preprint arXiv:1611.05244, 2016. [20]Sun Y, Zheng L, Yang Y, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 480-496. [21]Kalayeh M M, Basaran E, Gökmen M, et al. Human semantic parsing for person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1062-1071. [22]Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 815-823. [23]Varior R R, Haloi M, Wang G. Gated siamese convolutional neural network architecture for human re-identification[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer International Publishing, 2016: 791-808. [24]Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification[J]. arXiv preprint arXiv:1703.07737, 2017. [25]Chen W, Chen X, Zhang J, et al. Beyond triplet loss: a deep quadruplet network for person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 403-412. [26]孙志琳, 张丽红. 基于风格迁移及度量融合的行人再识别研究[J]. 测试技术学报, 2019, 33(1): 24-28. [27]Zhu Y, Yang Z, Wang L, et al. Hetero-center loss for cross-modality person re-identification[J]. Neurocomputing, 2020, 386: 97-109. [28]Liu H, Tan X, Zhou X. Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification[J]. IEEE Transactions on Multimedia, 2020, 23: 4414-4425. [29]Dai P, Ji R, Wang H, et al. Cross-modality person re-identification with generative adversarial training[C]//IJCAI. 2018, 1(3): 6. [30]Wang G, Zhang T, Cheng J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 3623-3632. [31]Almahairi A, Rajeshwar S, Sordoni A, et al. Augmented cyclegan: Learning many-to-many mappings from unpaired data[C]//International conference on machine learning. PMLR, 2018: 195-204. [32]Ketkar N, Moolayil J, Ketkar N, et al. Convolutional neural networks[J]. Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch, 2021: 197-242. [33]Lipton Z C, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning[J]. arXiv preprint arXiv:1506.00019, 2015. [34]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [35]Jocher, G. (2020). YOLOv5 by Ultralytics (Version 5.0) [Computer software]. https://github.com/ultralytics/yolov5. [36]Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022. [37]Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7464-7475. [38]Jocher, G., Chaurasia, A., & Qiu, J. (2023). Ultralytics YOLO (Version 8.0.0) [Computer software]. https://github.com/ultralytics/ultralytics. [39]Varior R R, Haloi M, Wang G. Gated siamese convolutional neural network architecture for human re-identification[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer International Publishing, 2016: 791-808. [40]Cheng D, Gong Y, Zhou S, et al. Person re-identification by multi-channel parts-based cnn with improved triplet loss function[C]//Proceedings of the iEEE conference on computer vision and pattern recognition. 2016: 1335-1344. [41]Chen W, Chen X, Zhang J, et al. Beyond triplet loss: a deep quadruplet network for person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 403-412. [42]Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification[J]. arXiv preprint arXiv:1703.07737, 2017. [43]Zhao H, Tian M, Sun S, et al. Spindle net: Person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1077-1085. [44]Sarfraz M S, Schumann A, Eberle A, et al. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 420-429. [45]Wu A, Zheng W S, Yu H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5380-5389. [46]Ye M, Wang Z, Lan X, et al. Visible thermal person re-identification via dual-constrained top-ranking[C]//IJCAI. 2018, 1: 2. [47]Ye M, Lan X, Li J, et al. Hierarchical discriminative learning for visible thermal person re-identification[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1). [48]Li D, Wei X, Hong X, et al. Infrared-visible cross-modal person re-identification with an x modality[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(04): 4610-4617. [49]Zhang Y, Yan Y, Lu Y, et al. Towards a unified middle modality learning for visible-infrared person re-identification[C]//Proceedings of the 29th ACM international conference on multimedia. 2021: 788-796. [50]王莹, 田莹. 基于改进YOLOv5s的复杂环境行人检测模型[J]. 微电子学与计算机, 2024,41(03):29-36. [51]Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19. [52]Zhang H, Zu K, Lu J, et al. Epsanet: An efficient pyramid split attention block on convolutional neural network[J]. arXiv preprint arXiv:2105.14447, 2021. [53]张艳, 张明路,吕晓玲等. 深度学习小目标检测算法研究综述[J]. Journal of Computer Engineering & Applications, 2022, 58(15). [54]Zhang S, Xie Y, Wan J, et al. Widerperson: A diverse dataset for dense pedestrian detection in the wild[J]. IEEE Transactions on Multimedia, 2019, 22(2): 380-393. [55]Shao S, Zhao Z, Li B, et al. Crowdhuman: A benchmark for detecting human in a crowd[J]. arXiv preprint arXiv:1805.00123, 2018. [56]Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for pedestrian detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 3213-3221. [57]Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19. [58]Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [59]Luo H, Gu Y, Liao X, et al. Bag of tricks and a strong baseline for deep person re-identification[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2019: 0-0. [60]Wang G, Yuan Y, Chen X, et al. Learning discriminative features with multiple granularities for person re-identification[C]//Proceedings of the 26th ACM international conference on Multimedia. 2018: 274-282. [61]邓茜文. 基于深度学习的单模态和跨模态行人重识别技术研究[D]. 成都: 四川大学, 2021. [62]Munaro M, Fossati A, Basso A, et al. One-shot person re-identification with a consumer depth camera[J]. Person Re-Identification, 2014: 161-181. [63]罗浩, 姜伟, 范星等. 基于深度学习的行人重识别研究进展[J]. 自动化学报, 2019, 45(11): 2032-2049. [64]Hao Y, Wang N, Li J, et al. HSME: Hypersphere manifold embedding for visible thermal person re-identification[C]//Proceedings of the AAAI conference on artificial intelligence. 2019, 33(01): 8385-8392. [65]Wang Z, Wang Z, Zheng Y, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 618-626. [66]Ye M, Shen J, Lin G, et al. Deep learning for person re-identification: A survey and outlook[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(6): 2872-2893. [67]Ye M, Shen J, J. Crandall D, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16. Springer International Publishing, 2020: 229-247. [68]Fu C, Hu Y, Wu X, et al. CM-NAS: Cross-modality neural architecture search for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 11823-11832. [69]Liu H, Chai Y, Tan X, et al. Strong but Simple Baseline with Dual-Granularity Triplet Loss for Visible-Thermal Person Re-Identification [J]. IEEE Signal Processing Letters, 2021, 28:653-657. [70]Zhang Q, Lai C, Liu J, et al. Fmcnet: Feature-level modality compensation for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 7349-7358. ﹀
中图分类号：	TP391.41
开放日期：	2024-06-11

附件下载