查看论文信息

查看全文

免费浏览

查看论文信息

论文中文题名：	基于视频的矿工行为识别算法研究
姓名：	权锦成
学号：	20208223040
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	图形图像处理
第一导师姓名：	李占利
第一导师单位：	西安科技大学
论文提交日期：	2023-12-13
论文答辩日期：	2023-12-04
论文外文题名：	Research on Video Based Miner behavior recognition algorithm
论文中文关键词：	矿工行为识别 ; 行为数据集 ; 图像增强 ; 多尺度卷积 ; 注意力机制
论文外文关键词：	miner behavior recognition ; behavioral datasets ; image enhancement ; multi-scale convolution ; attention mechanism
论文中文摘要：	︿随着煤矿行业的发展，煤矿开采过程中的矿工安全问题愈来愈受人关注。矿工的操作不当是煤矿安全事故的主要原因之一，因此，对矿工的行为识别与分析具有十分重要的意义。论文以煤矿安全为背景，对现有矿工行为识别方法展开研究，主要的研究工作如下：针对矿井下光照不均、低光导致图像质量较低的问题，论文提出一种基于HSV-RNet的低光图像增强方法。首先，将低光图像进行HSV转换，提取出照度图像V进行增强，减少模型对其色相与饱和度的干扰，另外在增强网络中加入了颜色损失函数，提升模型的颜色保真效果；其次，在分解网络中加入去噪损失函数，并在增强网络中使用图像锐化的方法改善反射图像的纹理特征。实验结果表明，对比RetinexNet算法以及现有的图像增强算法，基于HSV-RNet的图像增强方法在主观评价和峰值信噪比、结构相似性等图像质量评价指标上均优于其他算法，相较于RetinexNet算法，论文方法在LOL公共数据集上的峰值信噪比和结构相似性分别提高了0.86dB、0.11，在煤矿数据集上分别提高了9.2dB、0.33。针对矿井下视频动态变化、井下设备遮挡造成矿工行为识别准确率低下的问题，本文提出了一种基于3D-Attention与多尺度的行为识别方法。首先，在C3D模型中加入了3D多尺度特征融合，将视频放入到多尺度卷积模块中，学习不同尺度的特征，以此来增强模型的泛化性；其次，在模型中加入了3D注意力机制，使其更加关注识别区域，增强模型的特征提取能力，提升模型识别准确率。通过实验结果表明，在UCF-101数据集上，论文方法相较于R3D、R(2+1)D、ConvLSTM和SlowFast算法的识别准确率分别提升了4.1%、4.3%、1.37%、-0.7%，在KTH数据集上分别提高了3.5%、2.2%、14.56%和1.2%，在使用矿工行为数据集进行实验时，对比上述算法和C3D算法，识别准确率平均提升了6.85%。针对目前国内外有关矿工行为数据集匮乏的问题，论文模拟矿井环境构建了一个矿工行为数据集。该行为数据集由Kinect V2.0 RGB相机拍摄完成，包括矿工的矿井下奔跑(run)、翻越围栏(jump)、坐轨道(sit)、行走(walk)、挥手(wave)5种识别动作，此外还包括了交谈、弯腰和矿工工作3种干扰动作。由10位身高、体重不同的工作人员在8个不同煤矿场景下完成，每位工作人员至少重复5次不同动作，共640个视频片段。最后，利用软件开发技术，将以上方法应用到矿工行为识别系统中，实现对矿工的行为识别和记录功能。系统主要包括用户管理模块、模型管理模块、矿工行为识别模块和日志记录模块四个模块。经软件工程应用实测，系统整体运行良好，页面布局合理，操作简单流畅，所有功能均能正常使用，达到了软件开发的要求。﹀
论文外文摘要：	︿ With the development of coal mining industry, more and more people pay attention to the safety of miners during coal mining. Based on the background of coal mine safety, this paper studies the existing miners' behavior identification methods. The main research work is as follows: Aiming at the problem of poor image quality caused by uneven illumination and low light in mine, this paper proposes a low light image enhancement method based on HSV-RNet. First, the low-light image is converted by HSV to extract the illuminance image V for enhancement to reduce the interference of the model on its hue and saturation. In addition, the color loss function is added to the enhancement network to improve the color fidelity effect of the model. Secondly, the denoising loss function is added to the decomposition network, and the image sharpening method is used to improve the texture features of the reflected images in the enhancement network. The experimental results show that compared with the RetinexNet algorithm and existing image enhancement algorithms, the image enhancement method based on HSV-RNet is superior to other algorithms in terms of subjective evaluation, peak signal-to-noise ratio, structural similarity and other image quality evaluation indicators. The PEak-to-noise ratio (PSNR) and structural similarity on LOL public data set are improved by 0.86dB and 0.11, and 9.2dB and 0.33 on coal mine data set, respectively. Aiming at the low accuracy of miners' behavior recognition caused by the dynamic changes of underground video and the occlusion of underground equipment, this paper proposes a behavior recognition method based on 3D-Attention and multi-scale. Firstly, 3D multi-scale feature fusion is added to the C3D model, and the video is put into the multi-scale convolution module to learn features of different scales, so as to enhance the generalization of the model. Secondly, the 3D attention mechanism is added to the model to make it pay more attention to the recognition area, enhance the feature extraction ability of the model, and improve the recognition accuracy of the model. The experimental results show that compared with R3D, R(2+1)D and ConvLSTM algorithms, the recognition accuracy of the proposed method on UCF-101 data set is improved by 4.1%, 4.3% and 1.37% respectively, and that on KTH data set, the recognition accuracy of the proposed method is improved by 3.5%, 2.2% and 14.56% respectively. Compared with the above algorithm and the C3D algorithm, the recognition accuracy is improved by 6.85% on average when using the miner behavior data set for experiments. Aiming at the lack of data sets about miners' behavior at home and abroad, this paper constructs a miners' behavior data set by simulating mine environment. The behavioral data set was captured by the Kinect V2.0 RGB camera, including five recognition actions of miners running down the mine (run), jumping over the fence (jump), sitting on the track (sit), walking (walk) and waving (wave), in addition to three interference actions of talking, bending and working. It was completed by 10 workers of different heights and weights in 8 different coal mine scenes, and each worker repeated different actions at least 5 times, with a total of 640 video clips. Finally, using the software development technology, the above method is applied to the miners' behavior recognition system to realize the function of miners' behavior recognition and recording. The system mainly includes four modules: user management module, model management module, miner behavior identification module and log recording module. Through software engineering application measurement, the system runs well, the page layout is reasonable, the operation is simple and smooth, all functions can be used normally, and meet the requirements of software development. ﹀
参考文献：	︿ [1]张俊文,杨虹霞.2005—2019年我国煤矿重大及以上事故统计分析及安全生产对策研究[J].煤矿安全,2021,52(12):261-264. [2]张培森,牛辉,朱慧聪,李复兴.2019—2020年我国煤矿安全生产形势分析[J].煤矿安全,2021,52(11):245-249. [3]张世龙,张民波,朱仁豪,孟庆玲.近5年我国煤矿事故特征分析及防治对策[J].煤炭与化工,2021,44(08):101-106+109. [4]黎跃进.煤矿行业发生安全事故原因分析及事故防范措施探索研究[J].内蒙古煤炭经济,2020(23):130-131. [5]宁小亮.2013—2018年全国煤矿事故规律分析及对策研究[J].工矿自动化,2020,46(07):34-41. [6]杨嘉能. 基于直方图均衡的图像增强算法优化研究[D].新疆大学,2021. [7]文海琼,李建成.基于直方图均衡化的自适应阈值图像增强算法[J].中国集成电路,2022,31(03):38-42+71. [8]刘德全,崔涛,杨雅宁.局部对比度自适应直方图均衡化图像增强的算法研究[J].信息与电脑(理论版),2016(07):79-80. [9]郝才成,李萍,吴宣儒.基于Retinex理论的雾天图像增强算法[J].无线电工程,2020,50(10):848-852. [10]刘富,刘璐,侯涛,刘云.基于优化MSR的夜间道路图像增强方法[J].吉林大学学报(工学版),2021,51(01):323-330. [11]高古学,赖惠成,刘月琴.结合CLAHE和改进MSRCR的沙尘图像增强[J].计算机仿真,2020,37(08):157-161+430. [12]Fang Y, Zhang C, Yang W, et al. Blind visual quality assessment for image super-resolution by convolutional neural network[J]. Multimedia Tools and Applications, 2018, 77: 29829-29846. [13]Yang W, Feng J, Yang J, et al. Deep edge guided recurrent residual learning for image super-resolution[J]. IEEE Transactions on Image Processing, 2017, 26(12): 5895-5907. [14]Yang W, Feng J, Xie G, et al. Video super-resolution based on spatial-temporal recurrent residual networks[J]. Computer Vision and Image Understanding, 2018, 168: 79-92. [15]Yang W, Xia S, Liu J, et al. Reference-guided deep super-resolution via manifold localized external compensation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(5): 1270-1283. [16]Liu J, Yang W, Yang S, et al. Erase or fill? deep joint recurrent rain removal and reconstruction in videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3233-3242. [17]Qian R, Tan R T, Yang W, et al. Attentive generative adversarial network for raindrop removal from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2482-2491. [18]Yang W, Tan R T, Feng J, et al. Deep joint rain detection and removal from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 1357-1366. [19]Lore K G, Akintayo A, Sarkar S. LLNet: A deep autoencoder approach to natural low-light image enhancement[J]. Pattern Recognition, 2017, 61: 650-662. [20]Wei C, Wang W, Yang W, et al. Deep retinex decomposition for low-light Enhancement[J]. BMVC 2018(Oral). Dataset and Project Page, 2018, 29(8):237-239. [21]Chen Y S, Wang Y C, Kao M H, et al. Deep photo enhancer: unpaired learning for image enhancement from photographs with gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6306-6314. [22]Sharma V, Diba A, Neven D, et al. Classification-driven dynamic image enhancement[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4033-4041. [23]李治杰,陈明,冯国富.结合双分支结构和无配对GAN的低光图像增强[J/OL].计算机工程与应用:1-13[2023-06-04]. [24]祖佳贞,周永霞,陈乐.结合注意力的双分支残差低光照图像增强[J].计算机应用,2023,43(04):1240-1247. [25]王攀. 基于骨骼的人体行为识别技术研究与应用[D].电子科技大学,2021. [26]Yamato J, Ohya J, Ishii K. Recognizing human action in time-sequential images using hidden markov model[C]//CVPR. 1992, 92: 379-385. [27]Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance[C]//Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part II 9. Springer Berlin Heidelberg, 2006: 428-441. [28]Laptev I. On space-time interest points[J]. International Ournal of Computer Vision, 2005, 64: 107-123. [29]Bobick A, Davis J. An appearance-based representation of action[C]//Proceedings of 13th International Conference on Pattern Recognition. IEEE, 1996, 1: 307-312. [30]Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2012: 20-27. [31]Wang H, Schmid C. Action recognition with improved trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 3551-3558. [32]李创起. 煤矿掘进工作面复杂环境下人的安全行为模式研究[D].河南理工大学,2012. [33]Li C, Tong R, Tang M. Modelling human body pose for action recognition using deep neural networks[J]. Arabian Journal for Science and Engineering, 2018, 43: 7777-7788. [34]Han Y, Zhang P, Zhuo T, et al. Going deeper with two-stream convNets for action recognition in video surveillance[J]. Pattern Recognition Letters, 2018, 107: 83-90. [35]Yasin H, Hussain M, Weber A. Keys for action: An efficient keyframe-based approach for 3D action recognition using a deep neural network[J]. Sensors, 2020, 20(8): 2226. [36]Cui R, Zhu A, Hua G, et al. Multisource learning for skeleton-based action recognition using deep LSTM and CNN[J]. Journal of Electronic Imaging, 2018, 27(4): 43050-43050. [37]Li C, Zhong Q, Xie D, et al. Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2017: 597-600. [38]Le T M, Inoue N, Shinoda K. A fine-to-coarse convolutional neural network for 3D human action recognition[J]. Camera-ready Manuscript for BMVC2018, 2018, 26(9):335-338 [39]Kim T S, Reiter A. Interpretable 3D human action analysis with temporal convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2017: 1623-1631. [40]周培培,丁庆海,罗海波,侯幸林.视频监控中的人群异常行为检测与定位[J].光学学报,2018,38(8):97-105. [41]于言滔. 施工现场工人安全状态监测方法研究[D].清华大学,2017. [42]方东平, 郭红领, 周颖,等. 基于场景理解的施工现场工人不安全行为识别方法:, CN111445524A[P]. 2020. [43]陈莹,何丹丹.基于贝叶斯融合的时空流异常行为检测模型[J].电子与信息学报,2019,41(05):1137-1144. [44]安国成,罗志强,李洪研. 改进运动历史图的异常行为识别算法[C]//中国智能交通协会.第八届中国智能交通年会优秀论文集——智能交通与安全.电子工业出版社,2013:65-72. [45]毕林,谢伟,崔君.基于卷积神经网络的矿工安全帽佩戴识别研究[J].黄金科学技术,2017,25(04):73-80. [46]蔡利梅. 基于视频的煤矿井下人员目标检测与跟踪研究[D].中国矿业大学,2010. [47]李辰政,张小俊,朱海涛,张明路.基于迁移学习的危险行为识别方法研究[J].科学技术与工程,2019,19(16):187-192. [48]冯仕民,刘忠育,俞啸,孟磊,赵志凯,丁恩杰.矿山物联网人员情境描述与不安全行为识别[J].物联网学报,2018,2(04):93-98. [49]何冰倩,魏维,张斌,高联欣,宋岩贝.基于改进的深度神经网络的人体动作识别模型[J].计算机应用研究,2019,36(10):3107-3111. [50]佟瑞鹏,赵辉,张娜,王伟,安宇.矿工不安全行为涌现性建模研究[J].矿业科学学报,2020,5(03):311-319. [51]佟瑞鹏,张艳伟.人工智能技术在矿工不安全行为识别中的融合应用[J].中国安全科学学报,2019,29(01):7-12. [52]Rahmani B, Loterie D, Kakkava E, et al. Actor neural networks for the robust control of partially measured nonlinear systems showcased for image propagation through diffuse media[J]. Nature Machine Intelligence, 2020, 2(7): 403-410. [53]党伟超,张泽杰,白尚旺,龚大力,吴喆峰.基于改进双流法的井下配电室巡检行为识别[J].工矿自动化,2020,46(04):75-80. [54]张国伟. 基于图神经网络的人体动作识别算法研究[D].华南理工大学,2020. [55]刘晓阳,刘金强,郑昊琳.基于双流神经网络的煤矿井下人员步态识别方法[J].矿业科学学报,2021,6(02):218-227. [56]吕淑平,黄毅,王莹莹.基于C3D卷积神经网络人体动作识别方法改进[J].实验技术与管理,2021,38(10):168-171+176. [57]Li J, Wang T, Zhou Y, et al. Using gabor filter in 3D convolutional neural networks for human action recognition[C]//2017 36th Chinese Control Conference (CCC). IEEE, 2017: 11139-11144. [58]Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 4489-4497. [59]Jing L, Yang X, Tian Y. Video you only look once: overall temporal convolutions for action recognition[J]. Journal of Visual Communication and Image Representation, 2018, 52: 58-65. [60]Chen Y, Guo B, Shen Y, et al. Using efficient group pseudo-3D network to learn spatio-temporal features[J]. Signal, Image and Video Processing, 2021, 15: 361-369. [61]Han Y, Wei C, Zhou R, et al. Combining 3D-CNN and squeeze-and-excitation networks for remote sensing sea ice image classification[J]. Mathematical Problems in Engineering, 2020, 78: 1-15. [62]Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 2-6. [63]Deng L, Wang S H, Zhang Y D. Fully optimized convolutional neural network based on small-scale crowd[C]//2020 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2020: 1-5. [64]Choudhary B K, Sinha N K, Shanker P. Pyramid method in image processing[J]. Journal of Information Systems and Communication, 2012, 15: 267-269. [65]Pang Y, Wang T, Anwer R M, et al. Efficient featurized image pyramid network for single shot detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7336-7344. [66]Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117-2125. [67]Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37. [68]Ghiasi G, Lin T Y, Le Q V. Nas-fpn: learning scalable feature pyramid architecture for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7036-7045. [69]谭显静. 图像去噪的ROF模型的理论分析与算法研究[D].重庆大学,2019. [70]刘千顺. 数字图像去噪、锐化与颜色增强研究[D].浙江大学,2016. [71]Al-Najjar Y, Chen D. Comparison of image quality assessment: PSNR, HVS, SSIM, UIQI[J]. International Journal of Scientific and Engineering Research, 2012, 3(8): 1-5. [72]Guo C, Li C, Guo J, et al. Zero-reference deep curve estimation for low-light image enhancement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1780-1789. [73]Wang W, Chen Z, Yuan X, et al. Adaptive image enhancement method for correcting low-illumination images[J]. Information Sciences, 2019, 496: 25-41. [74]Guo X, Li Y, Ling H. LIME: low-light image enhancement via illumination map estimation[J]. IEEE Transactions on Image Processing, 2016, 26(2): 982-993. [75]Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1725-1732. [76]王帅琛,黄倩,张云飞,李兴,聂云清,雒国萃.多模态数据的行为识别综述[J].中国图象图形学报,2022,27(11):3139-3159. [77]刘忠育. 基于深度学习的矿工不安全行为识别方法研究[D].中国矿业大学,2021. [78]王婷,刘光辉,张钰敏,孟月波,徐胜军.多模态特征融合的长视频行为识别方法[J].计算机测量与控制,2021,29(11):165-170+175. [79]Jin W, Yiyuan A, Wei D, et al. Behavior recognition algorithm based on the improved R3D and LSTM network fusion[J]. High Technology Letters, 2021, 27(4): 381-387. [80]谈咏东,王永雄,陈姝意等.(2+1)D多时空信息融合模型及在行为识别的应用[J].信息与控制,2019,48(06):715-722. [81]Ullah A, Ahmad J, Muhammad K, et al. Action recognition in video sequences using deep bi-directional LSTM with CNN features[J]. IEEE Access, 2017, 6: 1155-1166. [82]Feichtenhofer C, Fan H, Malik J, et al. Slowfast networks for video recognition[C] Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6202-6211. ﹀
中图分类号：	TP391.41
开放日期：	2023-12-14

附件下载