论文中文题名: | 结合Alphapose与时空图卷积网络的康复动作识别 |
姓名: | |
学号: | 19207040012 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081002 |
学科名称: | 工学 - 信息与通信工程 - 信号与信息处理 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2022 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机视觉 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-06-21 |
论文答辩日期: | 2022-06-06 |
论文外文题名: | Rehabilitation action recognition based on Alphapose and Spatial temporal graph convolution networks |
论文中文关键词: | |
论文外文关键词: | Pose estimation ; Spatio-Temporal graph convolutional network ; Hierarchical residual ; Attention mechanism ; Rehabilitation action recognition |
论文中文摘要: |
疾病以及意外事故的发生会导致老年人产生运动障碍,病后的居家康复训练对于老人的健康尤其重要。智能化康复训练,通过识别患者动作与标准动作比对可实现居家康复训练的指导与监督。因此,本文对康复动作识别展开研究,设计了融入注意力机制的分层残差结构时空图卷积网络模型,然后与姿态估计Alphapose、目标检测以及跟踪算法融合实现多人动作识别。 针对现有模型特征提取不充分以及单关节特征建模单一的问题,本文提出分层残差结构的时空图卷积网络模型Res2-STGCN。将原网络中的7层顺序结构的时空图卷积模块GT构造为分层残差结构GT-Res2Net,旨在不增加负载的前提下更细粒化提取多尺度特征以提升模型精度。针对Res2-STGCN在提取骨架信息多尺度特征的过程中,多层混合卷积融合了感受野的通道和空间信息,同时分层残差的“分组”机制降低了通道的相关度的问题,在GT-Res2Net后加入含有注意力机制的时空图模块GT-Attention组成新的模块,实现通道特征的自主调整。改进后的新模块与原模块构成新模型Res2SC-STGCN。肢干数据特征也蕴含大量与动作相关的信息,因此建立双流模型Res2SCs-STGCN,同时提取关节和肢干特征,实现了骨架数据的充分利用,并采用加权的方式对双流网络进行融合。上述改进模型仅针对单人动作进行识别,对于实际场景中的多人动作识别,本文借助目标检测、跟踪、姿态估计与改进后的模型融合实现。 实验结果表明,在公共数据集NTU-RGB+D的两种划分准则下,最终获取的最优模型关节流Top-1精度分别达到88.60%和95.11%,肢干流Top-1精度分别达到90.58%和96.12%,融合Top-1精度分别达到91.66%和97.12%,相比基准网络(ST-GCN)均有较大提升,同时在自建康复数据集下识别率均达到97%以上。融合后的算法对于康复场景中不同情况下的多人动作识别均达到较好的效果。 |
论文外文摘要: |
Illness and accidents can lead to motor impairment in the elderly, and home rehabilitation training after illness is particularly important for their health. Intelligent rehabilitation training, by recognizing the patient's movements and comparing them with standard movements, can guide and supervise the home rehabilitation training. Therefore, this paper investigates rehabilitation action recognition by designing a hierarchical residual structured spatio-temporal graph convolutional network model incorporating attention mechanism, and then fusing it with posture estimation Alphapose, target detection and tracking algorithms to achieve multi-person action recognition. To address the problems of inadequate feature extraction and single-joint feature modeling of existing models, the spatio-temporal graph convolutional network model with hierarchical residual structure is proposed。Inoder to extract multi-scale features more finely without increasing the load to improve the model accuracy,The 7-layer sequential structure of the spatio-temporal graph convolution module GT in the original network is constructed into a layered residual structure GT-Res2Net.In allusion to the problem that the multi-layer hybrid convolution of Res2-STGCN fuses the channel and spatial information of perceptual fields in the process of extracting multi-scale features of skeleton information, and the "grouping" mechanism of layered residuals reduces the relevance of channels, a new spatio-temporal graph module with attention mechanism(GT-Attention), is added after GT-Res2Net to realize the autonomous adjustment of channel features.The improved new module and the original module form the new model Res2SC-STGCN.Bone data features also contain a lot of action-related information, so the dual-stream model is established,Simultaneous extraction of joint and bone features enables the full utilization of skeleton data and the fusion of dual-stream networks using a weighted approach.The above improved model is only for single person action recognition, for the recognition of multi-person action in real scenes, this paper achieves with the help of target detection, tracking, pose estimation and the fusion of the improved model.The above improved model is only for single person action recognition, this paper fuses the target detection, tracking and pose estimation with the improved model to achieve multi-person action recognition in real scenes. The experimental results show that the final obtained optimal models reach 88.60% and 95.11% accuracy for joint flow Top-1, 90.58% and 96.12% accuracy for bone flow Top-1, and 91.66% and 97.12% accuracy for fusion Top-1 under the two division criteria of the public dataset NTU-RGB+D, respectively.Compared with the benchmark network ( ST-GCN) ,these accurate values are both greatly improved.At the same time,the recognition accuracy under the self-built rehabilitation dataset are both more than 97%. The fused algorithms achieve better results for the recognition of multi-person actions in different situations in rehabilitation scenarios. |
参考文献: |
[1]王谦.医养结合:养是基础,医是支撑[J].中国卫生,2018(12):66-67. [53]管珊珊,张益农.基于残差时空图卷积网络的3D人体行为识别[J].计算机应用与软件,2020,37(03):198-201+250. [64]胡锦林,齐永锋,王佳颖.基于时空图卷积网络的学生在线课堂行为识别[J].光电子·激光,2022,33(02):149-156. |
中图分类号: | TP391.413 |
开放日期: | 2022-06-21 |