查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于改进卷积神经网络的视频行人重识别算法研究
姓名：	王爱萍
学号：	20208223061
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机技术
研究方向：	图像识别
第一导师姓名：	厍向阳
第一导师单位：	西安科技大学
论文提交日期：	2023-06-14
论文答辩日期：	2023-06-05
论文外文题名：	Research on Video-based Person Re-identification using Improved Convolutional Neural Network
论文中文关键词：	视频行人重识别 ; 卷积神经网络 ; 注意力机制 ; 时空特征 ; 亚像素卷积
论文外文关键词：	Video-based Person Re-identification ; Convolutional Neural Network ; Attention Mechanism ; Spatial-Temporal Features ; Sub-pixel Convolutional
论文中文摘要：	︿近年来，视频行人重识别领域取得了很大的进展，但是识别结果仍然会受到部分因素的制约。例如，视频中存在行人部位被遮挡，噪声干扰、以及拍摄设备差异等因素都会影响行人重识别的准确率。为此，本文提出了两种基于改进卷积神经网络的视频行人重识别算法。主要的研究工作如下：（1）针对视频行人重识别中存在遮挡、噪声等低质量帧以及无法有效学习跨通道的局部特征导致识别准确率较低的问题，提出了结合多注意力机制的视频行人重识别算法。该算法在标准 ResNet50 残差网络的基础上进行改进：①在 ResNet50 残差网络结构中添加通道交互注意力模块，获取各通道和其相邻通道间的局部特征，增强局部特征的表达；②引入时间注意力对图像帧进行权重分配和特征融合，有效提取更具鉴别性的特征描述符；③采用联合损失函数约束模型的训练，提高模型的泛化性能。实验结果表明：与原算法相比，改进的算法在 MARS 数据集的 Rank-1 和 mAP 分别提升了 2.5%、1%， DukeMTMC-VideoReID 数据集的 Rank-1 和 mAP 分别提升了 2.1%、1.8%。该算法能够有效增强行人特征信息的鲁棒性，提升识别准确率。（2）针对视频中时序信息未能充分利用以及低分辨率图像帧影响特征提取的问题，提出了基于改进三维卷积和分辨率重建的视频行人重识别算法。该算法：①在 ResNet50 网络结构中将部分 Bottleneck 块替换为改进的三维卷积结构，同步完成时间特征和空间特征的有效提取；②引入亚像素卷积层对低分辨率图像帧进行像素重组，提高特征映射图的分辨率，便于模型更好地学习。实验结果表明：该算法在 MARS 数据集的 Rank-1 和 mAP 分别为 89.0%、83.2%，DukeMTMC-VideoReID 数据集的 Rank-1 和 mAP 分别为 96.3%、95.5%。该算法可以有效学习视频序列的时空特征，增强低分辨率图像帧的特征表示，从而提升视频行人重识别的准确率。﹀
论文外文摘要：	︿ In recent years, significant progress has been made in the field of video-based person reidentification, but recognition result is still constrained by various factors. For instance, occlusion, noise interference, and differences in capturing devices in videos can all affect the accuracy of person re-identification. To address these issues, this paper proposes two videobased person re-identification algorithms based on improved convolutional neural networks. The main research work is as follows: (1) To address the problem of low recognition accuracy in video-based person reidentification caused by low-quality frames such as occlusion and noise, and the inability to effectively learn local features across channels, a video-based person re-identification algorithm combining multiple attention mechanisms is proposed. The algorithm improves on the standard ResNet50 residual network by:①adding a channel interaction attention module to the ResNet50 residual network structure, which obtains local features between each channel and its adjacent channel to enhance the representation of local features;②introducing temporal attention to allocate weights and fuse features of image frames, effectively extracting more discriminative feature descriptors;③using a joint loss function to constrain the model training and improve the generalization performance of model. Experimental results show that compared with the original algorithm, the improved algorithm increases Rank-1 and mAP of MARS dataset by 2.5% and 1% respectively, and increases Rank-1 and mAP of DukeMTMC-VideoReID dataset by 2.1% and 1.8% respectively. The algorithm can effectively enhance the robustness of pedestrian feature information and improve the recognition accuracy of the algorithm. (2) An improved video-based person re-identification algorithm is proposed to address the issue of inefficient utilization of temporal information in videos and the impact of low resolution image frames on feature extraction. The algorithm includes the following improvements:①part of the Bottleneck blocks in the ResNet50 network structure are replaced Subject : Research on Video-based Person Re-identification using Improved Convolutional Neural Network Specialty : Electronic Information Name : Wang Aiping (Signature) Instructor : She Xiangyang (Signature) Cheng Jianyuan (Signature) with improved 3D convolutional structure, which effectively extracts temporal and spatial features in the video simultaneously; ② sub-pixel convolutional layer is introduced to reconstruct the low-resolution image frames for higher resolution feature maps, which facilitates better learning by the model. Experimental results show that the proposed algorithm achieves a Rank-1 accuracy of 89.0% and mAP of 83.2% on the MARS dataset, and a Rank-1 accuracy of 96.3% and mAP of 95.5% on the DukeMTMC-VideoReID dataset. The algorithm is capable of effectively learning the spatiotemporal features of video sequences and enhancing the feature representation of low-resolution image frames, thereby improving the accuracy of video-based person re-identification. ﹀
参考文献：	︿ [1]李梦静, 吉根林. 视频行人重识别研究进展[J].南京师大学报(自然科学版), 2020, 43(02): 120–130. [2]Zajdel W, Zivkovic Z, Krose B. Keeping track of humans: have I seen this person before[C]//IEEE International Conference on Robotics and Automation, Barcelona, Spain, 2005: 2081–2086. [3]Gray D, Brennan S, Tao H. Evaluating appearance models for recognition, reacquisition, and tracking[J]. International journal of computer vision, 2007, 89(2): 56–68. [4]Zheng L, Bie Z, Sun Y, et al. Mars: A video benchmark for large-scale person re-identification[C]// European Conference on Computer Vision, Berlin: Springer, 2016: 868–884. [5]王素玉, 肖塞. 行人重识别研究综述[J].北京工业大学学报, 2022, 48(10): 1100–1112. [6]Lin C Y, Kang L W, Kao J H, et al. Multi-camera invariant appearance modeling for non-rigid object identification in a real-time environment[J]. Journal of Visual Communication & Image Representation, 2013, 24(6): 717–728. [7]Layne R, Hospedales T, Gong S G. Person re-identification by attributes[C]//The British Machine Vision Conference. Nottingham, Park, 2012, 2(3): 8. [8]Chen Z, Li A, Jiang S, et al. Attribute-aware identity-hard triplet loss for video-based person re-identification[J]. arXiv preprint arXiv:2006.07597, 2020. [9]Klaser A, Marszalek M, Schmid C. A spatio-temporal descriptor based on 3D-gradients[C]//British Machine Vision Conference. Nottingham, British, 2008:152–159. [10]Man J, Bhanu B. Individual recognition using gait energy image[J]. IEEE transactions on pattern analysis and machine intelligence, 2006, 28(2): 316–322. [11]Bedagkar-gala A, Shah S K. Gait-assisted person re-identification in wide area surveillance[C]//Asian Conference on Computer Vision. Singapore: Springer International Publishing, 2014: 633–649. [12]Wang T, Gong S G, Zhu X, et al. Person re-identification by video ranking[C]//European Conference on Computer Vision. Zurich, Switzerland, 2014: 688–703. [13]邹国锋, 傅桂霞, 高明亮, 等. 行人重识别中度量学习方法研究进展[J]. 控制与决策, 2021, 36(07): 1547–1557. [14]Koestinger M, Hirzer M, Wohlhart P, et al. Large scale metric learning from equivalence constraint [C]//IEEE Conference on Computer Vision and Pattern Recognition. Providence, Rhode island, 2012: 2288–2295. [15]Pedagadi S, Orwell J, Velastin S, et al. Local fisher discriminant analysis for pedestrian re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3318–3325. [16]Liao S, Hu Y, Zhu X, et al. Person re-identification by local maximal occurrence representation and metric learning[C]//IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 2197–2206. [17]Simonnet D, Lewandowski M, Velastin S A, et al. Re-identification of pedestrians in crowds using dynamic time warping[C]//International Conference on Computer Vision. Springer-Verlag, 2012: 423–432. [18]You J, Wu A, Li X, et al. Top-push video-based person re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition. Lasvegas, USA, 2016: 1345–1353. [19]Weinberger K Q, Saul K L. Distance metric learning for large margin nearest neighbor classification[J]. Journal of machine learning research, 2009, 10(1): 207–244. [20]Syed M A, Jiao J. Multi-kernel metric learning for person re-identification[C]//2016 IEEE International Conference on Image Processing (ICIP). Orland, Florida, USA: IEEE, 2016: 784–788. [21]Zhu X K, Jing X Y, You X G, et al. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics[J]. IEEE transactions image processing, 2018, 27(11): 5683–5695. [22]齐美彬, 王运侠, 檀胜顺, 等. 正则化独立测度矩阵的行人再识别[J].模式识别与人工智能, 2016, 29(6): 511–518. [23]Yu H X, Wu A, Zheng W S. Cross-view asymmetric metric learning for unsupervised person re-identification[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 994–1002. [24]Navaneet K L, Vasudha T, Venkatesh B R, et al. All for one: frame-wise rank loss for improving video-based person re-identification[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing, Brighton, UK, 2019: 2472–2476. [25]Feng Y, Yuan Y, Lu X. Person Reidentification via Unsupervised Cross-View Metric Learning[J]. IEEE Trans on Cybernetics, 2019: 1–11. [26]丁宗元, 王洪元, 陈付华, 等. 基于距离中心化与投影向量学习的行人重识别[J].计算机研究与发展, 2017, 54(8): 1785–1794. [27]周智恒, 刘楷怡, 黄俊楚, 等.一种基于等距度量学习策略的行人重识别改进算法[J]. 电子与信息学报, 2019, 41(2): 477–483. [28]Yi D, Lei Z, Liao S, et al. Deep metric learning for person re-identification[C]// International Conference on Pattern Recognition. Stockholm, 2014: 34–39. [29]Qian X, Fu Y, Jiang Y G, et al. Multi-scale deep learning architectures for person re- identification [C]//International Conference on Computer Vision (ICCV). IEEE, 2017: 5409–5418. [30]Liu X, Zhang P, Yu C, et al. Watching you: global-guided reciprocal learning for video-based person re-identification [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13334–13343. [31]Zhou K, Yang Y, Cavallaro A, et al. Omni-scale feature learning for person re-identification [C]//2019IEEE/ CVF International Conference on Computer Vision (ICCV). Piacataway: IEEE, 2019: 3701–3711. [32]李梦静, 吉根林, 赵斌. 基于步行周期聚类的视频行人重识别关键帧提取算法[J].南京航空航天大学学报, 2021, 53(05): 780–788. [33]Sun Y, Zheng L, Yang Y, et al. Beyond Part Models: person retrieval with refined part pooling (and a strong convolutional baseline)[C]//Proceedings of the European Conference on Computer Vision (ECCV).Berlin: Springer, 2018: 480–496. [34]Zhang X, Luo H, Fan X, et al. Aligned ReID: surpassing human-level performance in person re-identification[J]. arXiv preprint arXiv:1711.08184, 2017. [35]Su C, Li J, Zhang S, et al. Pose-driven deep convolutional model for person re- identification [C]//International Conference on Computer Vision (ICCV). IEEE, 2017: 3980–3989. [36]Liu J, Sun C, Xi X, et al. A spatial and temporal features mixture model with body parts for video-based person re-identification[J]. Applied intelligence, 2019, 49(9): 3436–3446. [37]Yu B Z, Xu N, Zhou J. Cross-media body-part attention network for image-to-video person re-identification[J]. IEEE access 7, 2019: 94966–94976. [38]Wu Y, Bourahla O, Li X, et al. Adaptive graph representation learning for video person re-identification[J]. IEEE Transactions on Image Processing, 2020, 29: 8821–8830. [39]Li S, Bak S, Carr P, et al. Diversity regularized spatiotemporal attention for video-based person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 369–378. [40]Zhao Y, Shen X, Jin Z, et al. Attribute-driven feature disentangling and temporal aggregation for video person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 4913–4922. [41]Fu Y, Wang X, Wei Y, et al. STA: spatial-temporal attention for large-scale video-based person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 8287–8294. [42]Wang G, Yuan Y, Chen X, et al. Learning discriminative features with multiple granularities for person re-identification[C]// Proceedings of the 26th ACM international conference on Multimedia. 2018: 274–282. [43]Zheng Z, Yang X, Yu Z, et al. Joint discriminative and generative learning for person re-identification[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019:2138–2147. [44]Yu Z, Li T, Yu N, et al. Three-stream convolutional networks for video-based person re-identification[C]//International Conference on Intelligent Systems and Knowledge Engineering (ISKE). IEEE, 2019: 598–606. [45]Zhou Z, Huang Y, Wang W, et al. See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017:4747–4756. [46]Chen L, Yang H, Gao Z Y. Joint attentive spatial-temporal feature aggregation for video-based person re-identification[J]. IEEE Access, 2019, 7: 41230–41240. [47]Li J, Zhang S, Huang T. Multi-scale temporal cues learning for video person re-identification[J]. IEEE Transactions on Image Processing, 2020, 29: 4461–4473. [48]Yang J, Zheng W S, Yang Q, et al. Spatial-temporal graph convolutional network for video-based person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 3286–3296. [49]Liu J W, Zha Z, et al. Dense 3d-convolutional neural network for person re-identification in videos[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2019, 15(1s): 1–19. [50]Liao X, He L, Yang Z, et al. Video-based person re-identification via 3D convolutional networks and non-local attention[C]//Proceedings of the 14th Asian Conference on Computer Vision. Cham: Springer, 2019: 221–231. [51]陈莉, 王洪元, 张云鹏, 等. 联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法[J].计算机应用, 2021, 41(01): 164–169. [52]Hou R, et al. VRSTC: Occlusion-free video person re-identification[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 7183–7192. [53]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Pr-oceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778. [54]Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 4489–4497. [55]Ristani E, Solera F, Zou R, et al. Performance measures and a data set for multi-target, multi-camera tracking [C]//European Conference on Computer Vision. Springer, Cham, 2016: 17–35. [56]Chen Y C, Zheng W S, Lai J H, et al. An asymmetric distance model for cross-view feature mapping in person re-identification[J]. IEEE transactions on circuits and systems for video technology, 2016(99): 1661–1675. [57]Jie H, Li S, and Gang S, et al. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132–7141. [58]Wang Q, Wu B, Zhu P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020: 11531–11539. [59]项俊, 林染染, 黄子源, 等. 时域模型对视频行人重识别性能影响的研究[J].计算机工程与应用, 2020, 56(20): 152–157. [60]徐志晨, 王洪元, 齐鹏宇, 等. 基于图模型与加权损失策略的视频行人重识别研究[J].计算机应用研究, 2022, 39(02): 598–603. [61]戴臣超, 王洪元, 倪彤光, 等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别[J].计算机研究与发展, 2019, 56(08): 1632–1641. [62]Wu Y, Lin Y, Dong X, et al. Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5177–5186. [63]Chen, G, Lu, J, Yang, et al. Spatial-temporal attention-aware learning for video-based person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(9): 4192–4205. [64]Suh Y, Wang J, Tang S, et al. Part-aligned bilinear representations for person re-identification[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 402–419. [65]Shi W, Caballero J, F Huszár, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1874–1883. [66]Qiu Z, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3D residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5533–5541. [67]Kiran M, Bhuiyan A, Thanh N M, et al. Flow guided mutual attention for person re-identification[J]. Image and Vision Computing, 2021, 113: 104246. [68]刘紫燕,朱明成,袁磊, 等.基于非局部关注和多重特征融合的视频行人重识别[J].计算机应用, 2021, 41(02): 530–536. ﹀
中图分类号：	TP391.41
开放日期：	2023-06-14

附件下载