论文中文题名: | 基于深度学习的连续动作识别研究 |
姓名: | |
学号: | 18207042034 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081002 |
学科名称: | 工学 - 信息与通信工程 - 信号与信息处理 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2021 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 视频动作识别 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2021-06-18 |
论文答辩日期: | 2021-06-03 |
论文外文题名: | Research on Continuous Action Recognition Based on Deep Learning |
论文中文关键词: | |
论文外文关键词: | Action recognition ; Deep learning ; Pyramid pool ; Attention mechanisms ; Smooth window |
论文中文摘要: |
在基建现场通过智能视频监控实现人体连续动作识别对于保障工人安全有着重要意义。连续动作由多个动作组成,具有一定的复杂性,而已有的深度学习网络结构复杂度高、准确率低,用于人体连续动作识别还有一定缺陷。因此,本文对连续动作识别展开研究,从单一动作的角度出发,设计了一种具有注意力机制的G-ResNet网络模型,然后结合滑动窗口完成连续动作识别。 针对现有模型不能更好的提取视频时空特征的问题,本文提出基于G-ResNet网络的人体动作识别模型。首先该模型使用残差网络ResNet34提取深层空间特征,解决深层网络出现退化的问题,其次使用GRU网络获取视频帧之间的时序信息,处理帧序列之间的长期依赖关系,最后采用三步训练策略优化网络模型,改善了动作识别的准确率。 针对G-ResNet网络提取特征信息不充分的问题,本文提出基于FSAG-ResNet网络的人体动作识别模型。该模型是在G-ResNet网络的基础上,首先对ResNet34网络引入空间金字塔池化操作,用多尺度窗口提取特征,使提取的特征更加丰富,其次对GRU网络融合时间注意力机制,根据视频帧序列的重要程度,为其分配不同的权重值,提高GRU网络捕捉更多关键特征的能力,进一步提高了动作识别的准确率。 为了实现基建现场的连续动作识别,本文提出采用滑动窗口结合FSAG-ResNet网络的方法。首先建立基建现场不同场景的单一动作和连续动作视频数据集,其次利用迁移学习的思想,将FSAG-ResNet网络应用于基建现场,采用全部单一动作片段以及部分连续动作划分后的片段训练迁移的网络模型,最后对连续动作视频采用平滑窗口去除突变结果,完成连续动作识别。 实验结果表明,FSAG-ResNet网络模型在UCF101上的准确率达到96.2%,在HMDB51上的准确率达到64.3%,相比其他主流网络,有较大的提升。同时将滑动窗口结合FSAG-ResNet网络模型用于连续动作识别,可以实时检测到连续动作视频中的每个动作,平均识别率为88.79%,验证了本文算法的有效性。 |
论文外文摘要: |
It is of great significance to realize the continuous action recognition of human body through intelligent video surveillance in the construction site to ensure the safety of workers. Continuous action is composed of multiple actions, each action duration is uncertain, has a certain complexity, but some deep learning network structure complexity, low accuracy, for human continuous action recognition there are some defects. hence, in this paper, continuous action recognition is studied. from the point of view of single action, a G-ResNet network model with attention mechanism is designed, and then continuous action recognition is completed with sliding window. Aiming at the problem that the existing model can not better extract the temporal and spatial features of video, this paper proposes a human action recognition model based on G-ResNet network. Firstly, the model uses ResNet34 network to extract deep spatial features to solve the problem of deep network degradation. Secondly, the GRU network is used to obtain the timing information between video frames and to process the long-term dependence between frame sequences. Finally, the three-step training strategy is used to optimize the network model and improve the accuracy of action recognition. To solve the problem of insufficient feature information extraction in G-ResNet network, this paper proposes a human action recognition model based on FSAG-ResNet network. The model is based on the G-ResNet network. Firstly, the spatial pyramid pooling operation is introduced into the ResNet34 network, and the features are extracted with multi-scale windows to enrich the extracted features. Secondly, the time - attention mechanism is integrated the GRU network. According to the importance of video frame sequence, different weight values are assigned to the GRU network to capture more key features, the accuracy of action recognition is further improved. A sliding window combined with FSAG-ResNet network method is proposed to realize continuous action recognition in infrastructure construction site. First, the single action and continuous action video data set of different scenes in the construction site is established, and then the FSAG-ResNet network is applied to the construction site by using the idea of transfer learning. Finally, the continuous action video is recognized by smoothing window to complete the continuous action recognition. The experimental results show that the accuracy of the FSAG-ResNet network model in UCF101 is 96.2 and 64.3 in HMDB51, which is greatly improved compared with other mainstream networks. At the same time, the sliding window combined with the FSAG-ResNet network model is used for continuous action recognition. Each action in the continuous action video can be detected in real time, and the average recognition rate is 88.79, which verifies the effectiveness of the algorithm in this paper. |
参考文献: |
[1]王阳,罗云,裴晶晶,等.电力企业违章行为的风险管控模式研究[J].中国安全生产科学技术,2018,14(04):173-180. [2]张云佐,张莎莎,吕芬芬,等.监控视频浓缩进展研究[J].电视技 术,2018,42(05):66-70. [4]陈煜平,邱卫根.基于视觉的人体行为识别算法研究综述[J/OL].计算机应用研究,2019(07):1-10. [5]付文博,孙涛,梁藉,等.深度学习原理及应用综述[J].计算机科学学,2018,45(S1):11-15+40. [24]莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(06):107-113. [25]裴晓敏,范慧杰,唐延东.时空特征融合深度学习网络人体行为识别方法[J].红外与激光工程, 2018,47(02):55-60. [26]余兴. 基于深度学习的视频行为识别技术研究[D].电子科技大学,2018. [27]何冰倩,魏维,张斌,等.基于改进的深度神经网络的人体动作识别模型[J].计算机应用研究,2019,36(10):3107-3111. [28]李松龄. 基于卷积神经网络的人体动作识别研究[D].电子科技大学,2019. [30]张瑞,李其申,储裙.基于3D卷积神经网络的人体动作识别算法[J].计算机工程,2019,45(01):259-263. [31]胡正平,刁鹏成,张瑞雪,等.基于注意力机制的时间分组深度网络行为识别算法[J].模式识别与人工智能,2019,32(10):892-900. [32]聂玮,曹悦,朱冬雪,等.复杂监控背景下基于边缘感知学习网络的行为识别算法[J].计算机应用与软件,2020,37(08):227-232. [35]童佳宁,李志刚.基于BP神经网络的连续动作识别在清淤设备中的应用[J].中国航海,2018,41(03):43-46+58. [36]杨世强,罗晓宇,乔丹,等.基于滑动窗口和动态规划的连续动作分割与识别[J].计算机应用,2019,39(02):348-353. |
中图分类号: | TP391.4 |
开放日期: | 2021-06-18 |