论文中文题名: | 基于改进的3DCNN与LSTM的人体动作识别研究 |
姓名: | |
学号: | 21207223068 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2024 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2024-06-12 |
论文答辩日期: | 2024-06-06 |
论文外文题名: | Research on Human Action Recognition Based on Improved 3DCNN and LSTM |
论文中文关键词: | |
论文外文关键词: | Deep learning ; Human action recognition ; Action-time perception ; 3DCNN ; LSTM |
论文中文摘要: |
人体动作识别是计算机视觉领域中备受各界学者关注的重要研究方向之一。动作识别任务处理的是视频,相对于图像分类不仅需要获取空间特征也需要获取时间特征。因此,本文基于3D卷积神经网络与LSTM网络,围绕如何有效地提取时空信息和获取长时间动作特征展开研究。 针对视频中存在冗余信息和含有动作信息的特征通道分布稀疏的问题,对3D卷积神经网络进行了改进,设计了一种运动-时间感知模块。该模块由运动感知模块和时间注意力模块构成。运动感知模块计算特征级别的时间差来激励运动敏感通道以此获取运动特征;时间注意力模块利用时间卷积沿时间维度计算注意力权重矩阵,将特征图与注意力权重矩阵相乘进行自适应的特征学习,从而获得时间特征。将运动-时间感知模块加入到3D卷积神经网络中,以此构造基于运动-时间感知的3D卷积神经网络(3DCNN based on Action-Time Perception,简称ATMNet网络)。实验结果表明,在公共数据集UCF101和HMDB51上,ATMNet相对其对应的基础网络,人体动作识别的准确率均有提升,其中ATMNet相对于3DResNeXt-101网络改善度最好,人体动作识别的准确率分别提升了1.6%和0.6%,说明本文对3D卷积神经网络的改进是可行的、有效的。 针对ATMNet网络无法充分获取长时间动作特征的问题,引入了LSTM网络,以此获取不同序列间的依赖关系。通过将ATMNet与LSTM以级联方式构成ATMNet-LSTM网络,可以获得更加充分的动作特征信息。ATMNet网络捕获了不同片段的短期动作特征,而LSTM网络能够捕捉到各个片段特征之间的依赖关系。同时,为了提高网络的泛化性,本文采用了带有参数调节的中心损失和交叉熵损失作为网络模型的损失函数。实验结果表明,在公共数据集UCF101和HMDB51上,ATMNet-LSTM网络相对于ATMNet网络,人体动作识别的准确率分别提升了0.3%和3.5%。说明ATMNet-LSTM网络获取序列特征间的依赖关系后,能够进一步提高人体动作识别的准确率。 |
论文外文摘要: |
Human action recognition is one of the important research directions in the field of computer vision that has attracted the attention of scholars from all walks of life. The action recognition task deals with video, and compared with image classification, it is necessary to obtain not only spatial features, but also temporal features. Therefore, based on 3D convolutional neural network and LSTM network, this paper focuses on how to effectively extract spatiotemporal information and obtain long-term action features. In order to solve the problem of redundant information and sparse distribution of feature channels containing action information in the video, the 3D convolutional neural network was improved, and a motion-time sensing module was designed. The module consists of a motion perception module and a temporal attention module. The motion sensing module calculates the time difference at the feature level to excite the motion-sensitive channel to obtain the motion features. The temporal attention module uses temporal convolution to calculate the attention weight matrix along the time dimension, and multiplies the feature map with the attention weight matrix for adaptive feature learning, so as to obtain the temporal features. The 3DCNN based on Action-Time Perception (ATMNet network) was constructed by adding the Motion-Time Perception module to the 3D Convolutional Neural Network. Experimental results show that on the public datasets UCF101 and HMDB51, ATMNet has improved the accuracy of human action recognition compared with the corresponding basic networks, and ATMNet has the best improvement compared with the 3DResNeXt-101 network, and the accuracy of human action recognition is increased by 1.6% and 0.6%, respectively, indicating that the improvement of 3D convolutional neural network in this paper is feasible and effective. In order to solve the problem that the ATMNet network cannot fully obtain the long-term action characteristics, the LSTM network is introduced to obtain the dependencies between different sequences. By cascading ATMNet and LSTM to form an ATMNet-LSTM network, more sufficient action characteristic information can be obtained. The ATMNet network captures the short-term action features of different fragments, while the LSTM network captures the dependencies between the features of each fragment. At the same time, in order to improve the generalization of the network, the center loss and cross-entropy loss with parameter adjustment are used as the loss functions of the network model. Experimental results show that on the public datasets UCF101 and HMDB51, the accuracy of ATMNet-LSTM network is improved by 0.3% and 3.5%, respectively, compared with ATMNet network. It is shown that the dependence between sequence features obtained by ATMNet-LSTM network can further improve the accuracy of human action recognition. |
参考文献: |
[2]黄新瑞,黄河颂,黄渝川,等.基于视频的动作智能识别在医学中的应用[J].中国医学物理学杂志,2024,41(01):1-7. [3]李璇. VR课堂中的多模动作识别算法研究[D].重庆:重庆大学,2022. [4]王中石. 面向自动驾驶的交警手势识别研究[D].北京:北京交通大学,2023. [5]吴婷,刘瑞欣,刘明甫,等.基于深度学习的人体行为识别综述[J].现代信息科技,2024,8(04):50-55. [8]赵祥涛,刘银华,李志晗,等.基于人体关节点多特征融合的暴力行为识别[J].自动化与仪器仪表,2024,(02):1-5+10. [10]龙雨馨,赖文杰,张怀元,等. 基于梯度方向直方图的红外与可见光融合网络的损失函数[J].激光与光电子学进展,2023,60(24):170-179. [11]陈辛,杨江涛,许新云. 基于光流法优化的人群异常行为检测研究[J].现代电子技术,2023,46(12):168-174. [13]陈敏. 基于轨迹特征和深度学习的视频人体行为识别研究[D].江苏:江苏大学,2022. [14]李莉. 基于Transformer的视频表征学习[D].合肥:中国科学技术大学,2022. [17]吕淑平,黄毅,王莹莹. 基于双流卷积神经网络的人体动作识别研究[J].实验技术与管理,2021,38(08):144-148. [23]张宁.基于FCN的循环卷积网络的变化检测方法研究[J].航空计算技术,2021,51(04):71-75. [26]高德勇,康自兵,王松等. 利用卷积块注意力机制识别人体动作的方法[J].西安电子科技大学学报,2022,49(04):144-155+200. [30]牛为华,翟瑞冰. 基于改进3D ResNet的视频人体行为识别方法研究[J].计算机工程与科学,2023,45(10):1814-1821. [31]蒋圣南,陈恩庆,郑铭耀,等. 基于ResNeXt的人体动作识别[J].图学学报,2020,41(02):277-282. [35]江励,周鹏飞,汤健华. 基于深度学习的人体动作识别算法[J].机电工程技术,2023,52(11):23-27+80. [36]张坤,杨静,张栋,等. MRTP:时间-动作感知的多尺度时间序列实时行为识别方法[J].西安交通大学学报,2022,56(03):22-32. [45]范银行,赵海峰,张少杰. 基于3D卷积残差网络的人体动作识别算法[J].计算机应用研究,2020,37(S2):300-301+304. [52]屈小春.基于Transformer的双流动作识别方法研究[D].重庆:西南大学,2023. [55]石跃祥,朱茂清.基于骨架动作识别的协作卷积Transformer网络[J].电子与信息学报,2023,45(04):1485-1493. [56]卢先领,杨嘉琦.时空关联的Transformer骨架行为识别[J].信号处理,2024,40(04):766-775. |
中图分类号: | TP391.41 |
开放日期: | 2024-06-12 |