- 无标题文档
查看论文信息

论文中文题名:

 结合Alphapose与时空图卷积网络的康复动作识别    

姓名:

 白凡    

学号:

 19207040012    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081002    

学科名称:

 工学 - 信息与通信工程 - 信号与信息处理    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2022    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 信息与通信工程    

研究方向:

 计算机视觉    

第一导师姓名:

 吴冬梅    

第一导师单位:

 西安科技大学通信与信息工程学院    

论文提交日期:

 2022-06-21    

论文答辩日期:

 2022-06-06    

论文外文题名:

 Rehabilitation action recognition based on Alphapose and Spatial temporal graph convolution networks    

论文中文关键词:

 姿态估计 ; 时空图卷积 ; 分层残差 ; 注意力机制 ; 康复动作识别    

论文外文关键词:

 Pose estimation ; Spatio-Temporal graph convolutional network ; Hierarchical residual ; Attention mechanism ; Rehabilitation action recognition    

论文中文摘要:

疾病以及意外事故的发生会导致老年人产生运动障碍,病后的居家康复训练对于老人的健康尤其重要。智能化康复训练,通过识别患者动作与标准动作比对可实现居家康复训练的指导与监督。因此,本文对康复动作识别展开研究,设计了融入注意力机制的分层残差结构时空图卷积网络模型,然后与姿态估计Alphapose、目标检测以及跟踪算法融合实现多人动作识别。

针对现有模型特征提取不充分以及单关节特征建模单一的问题,本文提出分层残差结构的时空图卷积网络模型Res2-STGCN。将原网络中的7层顺序结构的时空图卷积模块GT构造为分层残差结构GT-Res2Net,旨在不增加负载的前提下更细粒化提取多尺度特征以提升模型精度。针对Res2-STGCN在提取骨架信息多尺度特征的过程中,多层混合卷积融合了感受野的通道和空间信息,同时分层残差的“分组”机制降低了通道的相关度的问题,在GT-Res2Net后加入含有注意力机制的时空图模块GT-Attention组成新的模块,实现通道特征的自主调整。改进后的新模块与原模块构成新模型Res2SC-STGCN。肢干数据特征也蕴含大量与动作相关的信息,因此建立双流模型Res2SCs-STGCN,同时提取关节和肢干特征,实现了骨架数据的充分利用,并采用加权的方式对双流网络进行融合。上述改进模型仅针对单人动作进行识别,对于实际场景中的多人动作识别,本文借助目标检测、跟踪、姿态估计与改进后的模型融合实现。

实验结果表明,在公共数据集NTU-RGB+D的两种划分准则下,最终获取的最优模型关节流Top-1精度分别达到88.60%和95.11%,肢干流Top-1精度分别达到90.58%和96.12%,融合Top-1精度分别达到91.66%和97.12%,相比基准网络(ST-GCN)均有较大提升,同时在自建康复数据集下识别率均达到97%以上。融合后的算法对于康复场景中不同情况下的多人动作识别均达到较好的效果。

论文外文摘要:

Illness and accidents can lead to motor impairment in the elderly, and home rehabilitation training after illness is particularly important for their health. Intelligent rehabilitation training, by recognizing the patient's movements and comparing them with standard movements, can guide and supervise the home rehabilitation training. Therefore, this paper investigates rehabilitation action recognition by designing a hierarchical residual structured spatio-temporal graph convolutional network model incorporating attention mechanism, and then fusing it with posture estimation Alphapose, target detection and tracking algorithms to achieve multi-person action recognition.

To address the problems of inadequate feature extraction and single-joint feature modeling of existing models, the spatio-temporal graph convolutional network model with hierarchical residual structure is proposed。Inoder to extract multi-scale features more finely without increasing the load to improve the model accuracy,The 7-layer sequential structure of the spatio-temporal graph convolution module GT in the original network is constructed into a layered residual structure GT-Res2Net.In allusion to the problem that the multi-layer hybrid convolution of Res2-STGCN fuses the channel and spatial information of perceptual fields in the process of extracting multi-scale features of skeleton information, and the "grouping" mechanism of layered residuals reduces the relevance of channels, a new spatio-temporal graph module with attention mechanism(GT-Attention), is added after GT-Res2Net to realize the autonomous adjustment of channel features.The improved new module and the original module form the new model Res2SC-STGCN.Bone data features also contain a lot of action-related information, so the dual-stream model is established,Simultaneous extraction of joint and bone features enables the full utilization of skeleton data and the fusion of dual-stream networks using a weighted approach.The above improved model is only for single person action recognition, for the recognition of multi-person action in real scenes, this paper achieves with the help of target detection, tracking, pose estimation and the fusion of the improved model.The above improved model is only for single person action recognition, this paper fuses the target detection, tracking and pose estimation with the improved model to achieve multi-person action recognition in real scenes.

The experimental results show that the final obtained optimal models reach 88.60% and 95.11% accuracy for joint flow Top-1, 90.58% and 96.12% accuracy for bone flow Top-1, and 91.66% and 97.12% accuracy for fusion Top-1 under the two division criteria of the public dataset NTU-RGB+D, respectively.Compared with the benchmark network ( ST-GCN) ,these accurate values are both greatly improved.At the same time,the recognition accuracy under the self-built rehabilitation dataset are both more than 97%. The fused algorithms achieve better results for the recognition of multi-person actions in different situations in rehabilitation scenarios.

参考文献:

[1]王谦.医养结合:养是基础,医是支撑[J].中国卫生,2018(12):66-67.

[2]Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 4489-4497.

[3]Wang L, Xiong Y, Wang Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//European conference on computer vision. Springer, Cham, 2016: 20-36.

[4]Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 35(1): 221-231.

[5]Qiu Z, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision. 2017: 5533-5541.

[6]Hernández F, Suárez L F, Villamizar J, et al. Human activity recognition on smartphones using a bidirectional LSTM network[C]//2019 XXII symposium on image, signal processing and artificial vision (STSIVA). IEEE, 2019: 1-5.

[7]Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets: Deep networks for video classification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4694-4702.

[8]Zaidi S S A, Ansari M S, Aslam A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2022: 103-514.

[9]Otter D W , Medina J R , Kalita J K . A Survey of the Usages of Deep Learning in Natural Language Processing[J].2018:1-35

[10]Li J. Recent Advances in End-to-End Automatic Speech Recognition[J]. arXiv preprint arXiv:2111.01690, 2021:1-26

[11]Yang J, Leskovec J. Community-affiliation graph model for overlapping network community detection[C]//2012 IEEE 12th international conference on data mining. IEEE, 2012: 1170-1175.

[12]Bhatia V, Rani R. A distributed overlapping community detection model for large graphs using autoencoder[J]. Future Generation Computer Systems, 2019, 94: 16-26.

[13]Ji G, Liu K, He S, et al. Knowledge graph completion with adaptive sparse transfer matrix[C]//Thirtieth AAAI conference on artificial intelligence. 2016: 1-7.

[14]Hamaguchi T, Oiwa H, Shimbo M, et al. Knowledge transfer for out-of-knowledge-base entities: A graph neural network approach[J]. arXiv preprint arXiv:1706.05674, 2017:1802-1808.

[15]Du Y, Fu Y, Wang L. Skeleton based action recognition with convolutional neural network[C]//2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). IEEE, 2015: 579-583.

[16]Gori M, Monfardini G, Scarselli F. A new model for learning in graph domains[C]//Proceedings. 2005 IEEE international joint conference on neural networks. 2005, 2(2005): 729-734.

[17]Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization[J]. arXiv preprint arXiv:1409.2329, 2014:1-8

[18]Li Y, Tarlow D, Brockschmidt M, et al. Gated graph sequence neural networks[J]. arXiv preprint arXiv:1511.05493, 2015:1-20

[19]Estrach J B, Zaremba W, Szlam A, et al. Spectral networks and deep locally connected networks on graphs[C]//2nd international conference on learning representations, ICLR. 2014, 2014:1-14.

[20]Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016:1-14

[21]Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering[J]. Advances in neural information processing systems, 2016:29-34.

[22]Micheli A. Neural network for graphs: A contextual constructive approach[J]. IEEE Transactions on Neural Networks, 2009, 20(3): 498-511.

[23]Gilmer J, Schoenholz S S, Riley P F, et al. Neural message passing for quantum chemistry[C]//International conference on machine learning. PMLR, 2017: 1263-1272.

[24]Atwood J, Towsley D. Diffusion-convolutional neural networks[J]. Advances in neural information processing systems, 2016:29-45.

[25]Xu K, Hu W, Leskovec J, et al. How powerful are graph neural networks?[J]. arXiv preprint arXiv:1810.00826, 2018:1-16

[26]Kipf T N, Welling M. Variational graph auto-encoders[J]. arXiv preprint arXiv:1611.07308, 2016:1-3

[27]You J, Liu B, Ying Z, et al. Graph convolutional policy network for goal-directed molecular graph generation[J]. Advances in neural information processing systems, 2018:1-12

[28]Do K, Tran T, Venkatesh S. Graph transformation policy network for chemical reaction prediction[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 750-760.

[29]Bojchevski A, Shchur O, Zügner D, et al. Netgan: Generating graphs via random walks[C]//International Conference on Machine Learning. PMLR, 2018: 610-619.

[30]Zügner D, Akbarnejad A, Günnemann S. Adversarial attacks on neural networks for graph data[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 2847-2856.

[31]Wang T, Guo H, Lyu B, et al. Speech Signal Processing on Graphs: Graph Topology, Graph Frequency Analysis and Denoising[J]. Chinese Journal of Electronics, 2020, 29(5): 926-936.

[32]Li M, Leung H. Graph-based approach for 3D human skeletal action recognition[J]. Pattern Recognition Letters, 2017, 87: 195-202.

[33]Li C, Cui Z, Zheng W, et al. Action-attending graphic neural network[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3657-3670.

[34]Thakkar K, Narayanan P J. Part-based graph convolutional network for action recognition[J]. arXiv preprint arXiv:1809.04983, 2018:1-6

[35]Ding X, Yang K, Chen W. An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition[C]//Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning. 2019: 79-84.

[36]Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-second AAAI conference on artificial intelligence. 2018:2-7

[37]Si C, Jing Y, Wang W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 103-118.

[38]Papadopoulos K, Ghorbel E, Aouada D, et al. Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition[J]. arXiv preprint arXiv:1912.09745, 2019:1-6

[39]Huang W, Zhang T, Rong Y, et al. Adaptive sampling towards fast graph representation learning[J]. Advances in neural information processing systems, 2018,31-41.

[40]Shi L, Zhang Y, Cheng J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 12026-12035.

[41]Fan Y, Weng S, Zhang Y, et al. Context-aware cross-attention for skeleton-based human action recognition[J]. IEEE Access, 2020, 8: 15280-15290.

[42]Jiang Y, Song K, Wang J. Action recognition based on fusion skeleton of two kinect sensors[C]//2020 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, 2020: 240-244.

[43]Lin C H, Chou P Y, Lin C H, et al. SlowFast-GCN: A Novel Skeleton-Based Action Recognition Framework[C]//2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE, 2020: 170-174.

[44]Miki D, Chen S, Demachi K. Weakly Supervised Graph Convolutional Neural Network for Human Action Localization[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 653-661.

[45]Zhong Q, Zheng C, Zhang H. Research on discriminative skeleton-based action recognition in spatiotemporal fusion and human-robot interaction[J]. Complexity, 2020:1-10.

[46]Cai J, Jiang N, Han X, et al. JOLO-GCN: mining joint-centered light-weight information for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 2735-2744.

[47]Tsai M F, Chen C H. Spatial Temporal Variation Graph Convolutional Networks (STV-GCN) for skeleton-based emotional action recognition[J]. IEEE Access, 2021, 9: 13870-13877.

[48]Yang W, Zhang J, Cai J, et al. Shallow graph convolutional network for skeleton-based action recognition[J]. Sensors, 2021, 21(2): 452-462.

[49]Alsawadi M S, Rio M. Skeleton Split Strategies for Spatial Temporal Graph Convolution Networks[J]. arXiv preprint arXiv:2108.01309, 2022:8.

[50]Liu C, Fu R, Li Y, et al. A Self-Attention Augmented Graph Convolutional Clustering Networks for Skeleton-Based Video Anomaly Behavior Detection[J]. Applied Sciences, 2022, 12(1): 4.

[51]Fang H S, Xie S, Tai Y W, et al. Rmpe: Regional multi-person pose estimation[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2334-2343.

[52]Shahroudy A, Liu J, Ng T T, et al. Ntu rgb+ d: A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1010-1019.

[53]管珊珊,张益农.基于残差时空图卷积网络的3D人体行为识别[J].计算机应用与软件,2020,37(03):198-201+250.

[54]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[55]Gao S H, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(2): 652-662.

[56]Ruder S. An overview of gradient descent optimization algorithms[J]. arXiv preprint arXiv:1609.04747, 2016.

[57]Mnih V, Heess N, Graves A. Recurrent models of visual attention[J]. Advances in neural information processing systems, 2014, 27.

[58]Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.

[59]Misra D. Mish: A self regularized non-monotonic activation function[J]. arXiv preprint arXiv:1908.08681, 2019.

[60]Li M, Chen S, Chen X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 3595-3603.

[61]Song Y F, Zhang Z, Wang L. Richly activated graph convolutional network for action recognition with incomplete skeletons[C]//2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019: 1-5.

[62]Li S, Yi J, Farha Y A, et al. Pose refinement graph convolutional network for skeleton-based action recognition[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1028-1035.

[63]Lin C H, Chou P Y, Lin C H, et al. SlowFast-GCN: A Novel Skeleton-Based Action Recognition Framework[C]//2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE, 2020: 170-174.

[64]胡锦林,齐永锋,王佳颖.基于时空图卷积网络的学生在线课堂行为识别[J].光电子·激光,2022,33(02):149-156.

中图分类号:

 TP391.413    

开放日期:

 2022-06-21    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式