- 无标题文档
查看论文信息

论文中文题名:

 基于改进双流CNN的施工人员不安全行为识别方法研究    

姓名:

 王卓    

学号:

 19207107005    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 080902    

学科名称:

 工学 - 电子科学与技术(可授工学、理学学位) - 电路与系统    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2022    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电路与系统    

研究方向:

 计算机视觉    

第一导师姓名:

 马莉    

第一导师单位:

 西安科技大学    

论文提交日期:

 2022-06-22    

论文答辩日期:

 2022-06-06    

论文外文题名:

 Research on Risky Action Recognition Method of Constructors Based on Improved Two-Stream CNN    

论文中文关键词:

 不安全行为识别 ; 双流CNN ; 注意力机制 ; 轻量级模型 ; 边缘计算    

论文外文关键词:

 Unsafe action recognition ; Two-stream CNN ; Attention mechanism ; Lightweight Model ; Edge Computation    

论文中文摘要:

建筑施工活动是高危生产活动,由于施工人员普遍存在安全意识薄弱、作业不规范等问题,因此大部分施工安全事故都是由于施工人员的不安全行为所引发。双流CNN和模型轻量化方法已经应用于施工人员的不安全行为智能识别分析中,但存在识别模型体积大、识别速度不高、准确率不高等问题。因此,提高施工人员不安全行为识别模型的处理速度和准确率非常重要。

本文在基于注意力机制的模型特征向量提取方法以及基于深度学习模型的轻量化方法的基础上,设计了基于改进双流CNN的施工人员不安全行为识别模型。首先基于TV-L1方法提取施工人员不安全行为视频数据的稠密光流图像,利用帧间差分法选取视频关键帧,并对所提取关键帧进行二次去重以降低相邻关键帧相似度,从而提高模型处理效率;其次采用高效网络结构ShuffleNetV2与CBAM结合实现视频数据的时空流特征提取,针对视频的长时序特性引入Bi-LSTM融合时空流特征,利用注意力机制自适应优化行为分类。实验结果表明,本文所设计的不安全行为识别模型在测试集上的识别准确可以达到94.3%,模型大小为41M。相较于传统双流行为识别模型,该模型的识别准确率得到提升,模型体积与计算复杂度均有较大程度的下降。

最后设计了基于RK3399Pro核心板的施工人员不安全行为边缘识别设备的最小硬件电路系统,主要包括电源电路、复位电路和时钟电路,以及USB电路、视频显示电路和UART电路等外围功能接口电路。为了实现模型在边缘端对视频就地处理,需将施工人员不安全行为识别模型进行模型转换以及模型量化处理。结果显示,16位整型转换的模型体积为20.9M,识别准确率为91.1%。实验结果表明,本文所设计的轻量级不安全行为识别模型虽然在性能受限的边缘端设备上的准确率有微小下降,但达到了90%以上的通用要求,说明本文所提模型具有一定的参考价值。

论文外文摘要:

Construction activities are typical high-risk production activities. Due to the weak safety awareness and irregular operation of construction workers, most construction safety accidents are caused by the unsafe action of construction workers. Two-stream CNN and model lightweight methods have been applied to the intelligent recognition analysis of unsafe action of construction workers, but there are problems such as large model size, low recognition speed and low recognition accuracy. Therefore, it is significant to improve the recognition speed and accuracy of the model of unsafe action of construction workers.

On the basis of attention mechanism based feature vector extraction method and the deep learning model-based lightweight method, this paper designs an unsafe action recognition model of construction workers based on improved two-stream CNN. The model first extracts dense optical flow images based on TV-L1 method, selects video key frames using the inter-frame difference method, and reduplicates the extracted key frames to further reduce the similarity of adjacent key frames and improve the model processing efficiency. Based on the two-stream CNN framework, the model secondly adopts the efficient neural network structure ShuffleNetV2 and the CBAM to achieve spatial-temporal flow feature extraction from video data, introduces the Bi-LSTM structure for the long-time sequence characteristics of the video to integrate the spatial-temporal flow features, and uses the attention mechanism to adaptively optimize the results of action recognition. The results show that the recognition accuracy of the trained unsafe action recognition model designed in this paper is 94.3% and 94.8% on the public datasets and self-built datasets, and the model size is 41M. Compared with the traditional two-stream CNN action recognition model, the recognition accuracy of the model is improved, and the model size and computational complexity are greatly reduced.

Finally, the minimum hardware circuit system of edge recognition device for unsafe action of construction workers based on RK3399Pro core board is designed, which mainly includes power supply circuit, reset circuit and clock circuit, as well as peripheral function interface circuit such as USB circuit, video display circuit and UART circuit. For realizing the in-situ processing of the video by the model at edge, it is necessary to execute model transformation and model quantification on the lightweight unsafe action recognition model of construction workers. The results show that the model size after 16-bit integer conversion is 20.9M, the model recognition accuracy is 91.1%. The experimental results show that although the accuracy of the lightweight unsafe action recognition model designed in this paper decreases slightly on edge devices with limited performance, it reaches more than 90% of the general requirement, indicating that the model proposed in this paper has certain reference value.

参考文献:

[1] 李佶骏.建筑工地安全预警技术研究及实现[D].绵阳: 西南科技大学,2020.

[2] 罗会兰, 王婵娟, 卢飞.视频行为识别综述[J].通信学报, 2018, 039(006): 169-180.

[3] 蔡强, 邓毅彪, 李海生.基于深度学习的人体行为识别方法综述[J].计算机科学, 2020, 47(4): 9-18.

[4] 陈煜平, 邱卫根.基于视觉的人体行为识别算法研究综述[J].计算机应用研究, 2019, 36(07): 1927-1934.

[5] Wang X, Gupta A. Videos as space-time region graphs[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 399-417.

[6] Li W, Nie W, et al. Human action recognition based on selected spatio-temporal features via bidirectional LSTM[J]. IEEE Access 2018, 6(1): 44211-44220.

[7] Zhang Z, Luo H, Wang C, et al. Automatic modulation classification using CNN-LSTM based dual-stream structure[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 13521-13531.

[8] Zhao Y, Xiong Y, Lin D. Trajectory convolution for action recognition[J]. Advances in Neural Information Processing Systems, 2018,31(1): 243-251.

[9] 赫磊, 邵展鹏, 张剑华,周小龙.基于深度学习的行为识别算法综述[J]. 计算机科学, 2020, 47(1): 139-147.

[10]曾明如, 郑子胜, 罗顺. 结合LSTM的双流卷积人体行为识别[J]. 现代电子技术, 2019, 42(19): 37-40.

[11]Liu N, Han J. A deep spatial contextual long-term recurrent convolutional network for saliency detection[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3264-3274.

[12]Yang H, Yuan C, Li B, et al. Asymmetric 3d convolutional neural networks for action recognition[J]. Pattern recognition, 2019, 85(1): 1-12.

[13]Lee H J, Ullah I, Wan W, et al. Real-time vehicle make and model recognition with the residual SqueezeNet architecture[J]. Sensors, 2019, 19(5): 982-990.

[14]Wang L, Ge L, Li R, et al. Three-stream CNNs for action recognition[J]. Pattern Recognition Letters, 2017, 92(1):33-40.

[15]Liu L, Chen M, Xu M, et al. Two-stream network for infrared and visible images fusion[J]. Neurocomputing, 2021, 460(1): 50-58.

[16]Li Y, Zhang H, Shen Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network[J]. Remote Sensing, 2017, 9(1): 67-81.

[17]Hara K, Kataoka H, Satoh Y. Learning spatio-temporal features with 3d residual networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 3154-3160.

[18]Varol G, Laptev I, Schmid C. Long-term Temporal Convolutions for Action Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 40(6):1510-1517.

[19]Kim D H, Baddar W J, Jang J, et al. Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition[J]. IEEE Transactions on Affective Computing, 2017, 10(2): 223-236.

[20]Qiu Z, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5533-5541.

[21]Zhou X, Liang W, Kevin I, et al. Deep-learning-enhanced human activity recognition for Internet of healthcare things[J]. IEEE Internet of Things Journal, 2020, 7(7): 6429-6438.

[22]Donahue J, Anne Hendricks L, Guadarrama S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2015: 2625-2634.

[23]Zhenyang, Li, Kirill, et al. VideoLSTM convolves, attends and flows for action recognition[J]. Computer vision and image understanding: CVIU, 2018, 166:41-50.

[24]Gammulle H, Denman S, Sridharan S, et al. Two stream lstm: A deep fusion framework for human action recognition[C]//2017 IEEE Winter Conference on Applications of Computer Vision. IEEE, 2017: 177-186.

[25]郎磊, 夏应清. 紧凑的神经网络模型设计研究综述[J]. 计算机科学与探索, 2020, 14(9): 15-28.

[26]刘济樾.基于轻量化网络的实时人脸检测方法研究[D].成都: 电子科技大学, 2020.

[27]Nguyen H V, Zhou K, Vemulapalli R. Cross-domain synthesis of medical images using efficient location-sensitive deep network[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015: 677-684.

[28]Wu B, Dai X, Zhang P, et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 10734-10742.

[29]Xu X, Ding Y, Hu S X, et al. Scaling for edge inference of deep neural networks[J]. Nature Electronics, 2018, 1(4): 216-222.

[30]Simons T, Lee D J. A review of binarized neural networks[J]. Electronics, 2019, 8(6): 661-668.

[31]Wen W, Xu C, Wu C, et al. Coordinating filters for faster deep neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 658-666.

[32]Bolukbasi T, Wang J, Dekel O, et al. Adaptive neural networks for efficient inference[C]//International Conference on Machine Learning. 2017: 527-536.

[33]Cheng Y, Wang D, Zhou P, et al. Model compression and acceleration for deep neural networks: The principles, progress, and challenges[J]. IEEE Signal Processing Magazine, 2018, 35(1): 126-136.

[34]Su L, Ma L, Qin N, et al. Fault diagnosis of high-speed train bogie by residual-squeeze net[J]. IEEE Transactions on Industrial Informatics, 2019, 15(7): 3856-3863.

[35]Kim W, Jung W S, Choi H K. Lightweight driver monitoring system based on multi-task mobilenets[J]. Sensors, 2019, 19(14): 3200-3212.

[36]Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018: 4510-4520.

[37]Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.

[38]Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.

[39]Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision. 2018: 116-131.

[40]侯海翔.基于边缘计算的移动群智感知执行优化机制[D].武汉:华中科技大学, 2019.

[41]邱强.基于深度学习的嵌入式行人检测系统的研究与应用[D].成都: 电子科技大学, 2019.

[42]卢冶, 陈瑶, 李涛. 面向边缘计算的嵌入式FPGA卷积神经网络构建方法[J]. 计算机研究与发展, 2018, 55(3): 551-562.

[43]黄晴晴, 周风余, 刘美珍. 基于视频的人体动作识别算法综述[J]. 计算机应用研究, 2020, 37(11): 3213-3219.

[44]赵一秾, 李若熙, 曹语含. 双流卷积网络工人异常行为识别算法研究[J]. 辽宁科技大学学报, 2019, 42(4): 301-308.

[45]马翠红, 毛志强, 崔金龙. 基于深度LSTM与双流融合网络的行为识别[J]. 计算机工程与设计, 2019, 40(9): 7-20.

[46]马翠红, 王毅, 毛志强. 基于注意力的双流CNN的行为识别[J]. 计算机工程与设计, 2020, 41(10): 2903-2906.

[47]舒豪, 王晨, 史崯. 基于BiLSTM和注意力机制的入侵检测[J]. 计算机工程与设计, 2020, 41(11): 3042-3046.

[48]马淼, 李贻斌.基于多级图像序列和卷积神经网络的人体行为识别[J]. 吉林大学学报:工学版, 2017, 47(4): 1244-1252.

[49]Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. 2018: 3-19.

[50]苑澄橙. 面向移动终端的轻量化CNN模型研究与应用[D]. 大连: 大连海事大学, 2020.

[51]李前, 杨文柱, 陈向阳, 等. 基于紧耦合时空双流卷积神经网络的人体动作识别模型[J]. 计算机应用, 2020, 40(11): 3178-3183.

贾芃. 基于异构嵌入式的目标检测系统的研究与设计[D].太原:中北大学, 2021.

中图分类号:

 TP391.4    

开放日期:

 2022-06-22    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式