- 无标题文档
查看论文信息

论文中文题名:

 基于YOWO的井下皮带运输系统违规跨越行为检测算法研究    

姓名:

 卢愿萌    

学号:

 21207040031    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081002    

学科名称:

 工学 - 信息与通信工程 - 信号与信息处理    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2024    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 信息与通信工程    

研究方向:

 深度学习    

第一导师姓名:

 马莉    

第一导师单位:

 西安科技大学    

论文提交日期:

 2024-06-13    

论文答辩日期:

 2024-06-05    

论文外文题名:

 Research on The Detection Algorithm of Lllegal Crossing Behavior in Underground Belt Transportation System Based on YOWO    

论文中文关键词:

 违规跨越皮带 ; 时空行为检测 ; YOWO ; 轻量化 ; 模型部署    

论文外文关键词:

 Illegal crossing of belts ; Spatio-temporal behavior detection ; YOWO ; Lightweight ; Model deployment    

论文中文摘要:

井下皮带运输系统不仅是煤矿运输中至关重要的一环,也是人员容易发生事故的重点作业区域,因此对皮带运输系统违规跨越皮带行为检测的研究非常重要。但现有的行为检测算法精度不高,且用于皮带运输系统违规跨越行为的专用检测算法较少。因此,本文研究了一种高精度轻量级的皮带运输系统违规跨越行为检测算法。

针对现有时空行为检测算法将定位和分类任务解耦为两阶段,导致模型无法统一优化的问题,本文基于单阶段时空行为检测算法YOWO进行改进,提出YOWO-CCAA算法。首先,改进了YOWO的时空特征提取网络,通过3D CNN分支提取动作的运动特征,通过2D CNN分支提取动作主体的空间特征。其次,针对YOWO算法仅在通道维度对双分支特征进行融合,提出多级特征融合模块CCAA,在通道和空间层面进行特征融合,有效提升模型的检测精度。最后,采用SIoU作为边界框回归损失函数,加快模型收敛速度。实验结果表明,在自建的矿工跨越皮带数据集上,改进后的算法F-mAP达到93.83%,在UCF101-24和JHMDB-51公开数据集上较YOWO算法提升了2.51%和1.17%。

针对YOWO-CCAA算法参数量大,无法在嵌入式设备端部署进行实时检测的问题,本文基于GhostNetV2轻量级网络,设计Ghost-YOWO-CCAA轻量化算法。首先,将2D-GhostNetV2扩展为3D-GhostNetV2,重构时空提取网络的运动分支。其次,设计轻量级的特征提取结构C3GhostV2,重构时空特征提取网络的目标分支。轻量化后的模型为80MB,相较原模型体积压缩了4.5倍,在服务器端的推理速度达到41.4帧/秒,提升了1.62倍,F-mAP达到91.70%。

最后对轻量化后的模型进行模型转换和量化处理,部署在以RK3399pro为主处理器的嵌入式设备上进行应用测试。在嵌入式设备上的检测帧率约为26 FPS,满足了实际应用中25帧/秒的实时检测需求。尽管检测精度有微小下降,但达到了90%以上的通用性要求。研究结果表明,本文所提出的算法对井下皮带运输系统违规跨越行为检测方面具有一定的参考价值。

论文外文摘要:

The underground belt transportation system is not only a crucial part of coal mine transportation, but also a key operation area where personnel are prone to accidents, so the research on the detection of illegal crossing behavior of the belt transportation system is very important. However, the existing behavior detection algorithms have low accuracy and fewer dedicated detection algorithms are used for the illegal crossing behavior of the belt transportation system, so this thesis investigates a high-precision and lightweight detection algorithm for the illegal crossing behavior of the belt transportation system.

Aiming at the problem that the existing spatio-temporal behavior detection algorithm decouples the localization and classification tasks into two stages, resulting in the model not being able to be optimized uniformly, this thesis improves the single-stage spatio-temporal behavior detection algorithm YOWO based on the single-stage spatio-temporal behavior detection algorithm, and proposes the YOWO-CCAA algorithm. First, the spatio-temporal feature extraction network of YOWO is improved to extract the motion features of the action through 3D CNN branches and the spatial features of the action subject through 2D CNN branches. Second, for the YOWO algorithm which only fuses the dual-branch features in the channel dimension, a multi-level feature fusion module, CCAA, is proposed to perform feature fusion in the channel and spatial dimensions, which effectively improves the model detection accuracy. Finally, SIoU is adopted as the bounding box regression loss function to accelerate the model convergence speed. The experimental results show that the improved algorithm F-mAP reaches 93.83% on the self-built miner crossing belt dataset, and improves 2.51% and 1.17% over the YOWO algorithm on the UCF101-24 and JHMDB-51 public datasets.

Aiming at the problem that the YOWO-CCAA algorithm has a large number of parameters and cannot be deployed on the embedded device side for real-time detection, this thesis designs the Ghost-YOWO-CCAA lightweight algorithm based on the GhostNetV2 lightweight network. First, 2D-GhostNetV2 is extended to 3D-GhostNetV2 to reconstruct the motion branch of the spatio-temporal extraction network. Second, the lightweight feature extraction structure C3GhostV2 is designed to reconstruct the target branch of the spatio-temporal feature extraction network. The lightweight model is 80MB, which is 4.5 times compressed compared with the original model volume, and the inference speed on the server side reaches 41.4 frames/sec, which is 1.62 times improved, and the F-mAP reaches 91.70%.

Finally, the lightweight model is subjected to model transformation and quantization, and deployed on an embedded device with RK3399pro as the main processor for application testing. The detection frame rate on the embedded device is about 26 FPS, which meets the real-time detection requirement of 25 fps in practical applications. Despite the slight decrease in detection accuracy, the generality requirement of more than 90% is achieved, and the results show that the algorithm proposed in this thesis has certain reference value for the detection of illegal crossing behavior in underground belt transportation system.

参考文献:

[1]赵亚军, 张志男, 贾廷贵. 2010—2021年我国煤矿安全事故分析及安全对策研究[J]. 煤炭技术, 2023, 42 (08): 128-131.

[2]Köpüklü O, Wei X, Rigoll G. You only watch once: A unified cnn architecture for real-time spatiotemporal action localization[J]. arXiv preprint arXiv:1911.06644, 2019.

[3]Wang P, Zeng F, Qian Y. A Survey on Deep Learning-based Spatio-temporal Action Detection[J]. arXiv preprint arXiv:2308.01618, 2023.

[4]Gu C, Sun C, Ross D A, et al. Ava: A video dataset of spatio-temporally localized atomic visual actions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6047-6056.

[5]Chen S, Sun P, Xie E, et al. Watch only once: An end-to-end video action detection framework[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 8178-8187.

[6]Sui L, Zhang C L, Gu L, et al. A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023: 5999-6008.

[7]Wu C Y, Feichtenhofer C, Fan H, et al. Long-term feature banks for detailed video understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 284-293.

[8]Liu Y, Yang F, Ginhac D. ACDnet: An action detection network for real-time edge computing based on flow-guided feature approximation and memory aggregation[J]. Pattern Recognition Letters, 2021, 145: 118-126.

[9]Zhao J, Snoek C G M. Dance with flow: Two-in-one stream action detection[C]//Proceedings of the ieee/cvf conference on computer vision and pattern recognition. 2019: 9935-9944.

[10]Chen L, Tong Z, Song Y, et al. Efficient Video Action Detection with Token Dropout and Context Refinement[J]. arXiv preprint arXiv:2304.08451, 2023.

[11]Pan J, Chen S, Shou M Z, et al. Actor-context-actor relation network for spatio-temporal action localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 464-474.

[12]Ning Z, Xie Q, Zhou W, et al. Person-Context Cross Attention for Spatio-Temporal Action Detection[J]. Technical report, Huawei Noah’s Ark Lab, and University of Science and Technology of China, 2021.

[13]Faure G J, Chen M H, Lai S H. Holistic interaction transformer network for action detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023: 3340-3350.

[14]Song L, Zhang S, Yu G, et al. Tacnet: Transition-aware context network for spatio-temporal action detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 11987-11995.

[15]Li Y, Lin W, See J, et al. Cfad: Coarse-to-fine action detector for spatiotemporal action localization[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 510-527.

[16]Girdhar R, Gkioxari G, Torresani L, et al. Detect-and-track: Efficient pose estimation in videos[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 350-359.

[17]Li Y, Wang Z, Wang L, et al. Actions as moving points[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16. Springer International Publishing, 2020: 68-84.

[18]Liu Y, Yang F, Ginhac D. Accumulated micro-motion representations for lightweight online action detection in real-time[J]. Journal of Visual Communication and Image Representation, 2023: 103879.

[19]Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.

[20]Zhao J, Zhang Y, Li X, et al. Tuber: Tubelet transformer for video action detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 13598-13607.

[21]Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

[22]Gschwend D. Zynqnet: An fpga-accelerated embedded convolutional neural network[J]. arXiv preprint arXiv:2005.06892, 2020.

[23]Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

[24]Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[25]Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.

[26]Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856.

[27]Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131.

[28]Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589.

[29]Tang Y, Han K, Guo J, et al. GhostNetv2: enhance cheap operation with long-range attention[J]. Advances in Neural Information Processing Systems, 2022, 35: 9969-9982.

[30]Yang L, Jiang H, Cai R, et al. Condensenet v2: Sparse feature reactivation for deep networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 3569-3578.

[31]Terreran M, Tramontano A G, Lock J C, et al. Real-time object detection using deep learning for helping people with visual impairments[C]//2020 IEEE 4th International Conference on Image Processing, Applications and Systems (IPAS). IEEE, 2020: 89-95.

[32]Womg A, Shafiee M J, Li F, et al. Tiny SSD: A tiny single-shot detection deep convolutional neural network for real-time embedded object detection[C]//2018 15th Conference on computer and robot vision (CRV). IEEE, 2018: 95-101.

[33]Sumit S S, Awang Rambli D R, Mirjalili S, et al. Restinet: On improving the performance of tiny-yolo-based cnn architecture for applications in human detection[J]. Applied Sciences, 2022, 12(18): 9331.

[34]程叶群, 王艳, 范裕莹, 等. 基于卷积神经网络的轻量化目标检测网络[J]. Laser & Optoelectronics Progress, 2021, 58(16): 1610023.

[35]Qin Z, Li Z, Zhang Z, et al. ThunderNet: Towards real-time generic object detection on mobile devices[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6718-6727.

[36]Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and patter recognition. 2020: 10781-10790.

[37] Yu E, Aggarwal J K. Detection of fence climbing from monocular video[C]//18th International Conference on Pattern Recognition (ICPR'06). IEEE, 2006, 1: 375-378.

[38]Kolekar M H, Bharti N, Patil P N. Detection of fence climbing using activity recognition by support vector machine classifier[C]//2016 IEEE Region 10 Conference (TENCON). IEEE, 2016: 398-402.

[39]张泰, 张为, 刘艳艳. 周界视频监控中人员翻越行为检测算法[J]. 西安交通大学学报, 2016, 50 (06): 47-53.

[40]倪焱. 基于机器视觉的露台人体危险行为检测[D]. 长春理工大学, 2019.

[41]李瑞. 面向智能监控的异常行为实时识别[D]. 哈尔滨工程大学, 2021.

[42]杨源. 博物馆游客违规行为识别系统设计与实现[D]. 哈尔滨工业大学, 2021.

[43]李逸辰. 基于Kinect的地铁乘客异常行为检测[D]. 中国矿业大学, 2021.

[44]Suo F, Li G, Zhu C, et al. Analysis of illegal behavior in power station based on video surveillance[C]//2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 2022, 10: 381-385.

[45]王志鹏, 王涛. 基于Faster RCNN的穿越围栏违规行为检测[J]. 计算机系统应用, 2022, 31 (04): 346-351.

[46]周巧瑜, 曹扬, 詹瑾瑜, 等. 基于Yolo和GOTURN的景区游客翻越行为识别[J]. 计算机技术与发展, 2022, 32 (01): 134-140.

[47]Zou Z, Chen K, Shi Z, et al. Object detection in 20 years: A survey[J]. Proceedings of the IEEE, 2023.

[48]Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37.

[49]Li Y, Ren F. Light-weight retinanet for object detection[J]. arXiv preprint arXiv:1905.10011, 2019.

[50]He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.

[51]Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781-10790.

[52]Guo H, Yang X, Wang N, et al. A rotational libra R-CNN method for ship detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(8): 5772-5781.

[53]Wei J, Wang H, Yi Y, et al. P3D-CTN: Pseudo-3D convolutional tube network for spatio-temporal action detection in videos[C]//2019 IEEE international conference on image processing (ICIP). IEEE, 2019: 300-304.

[54]Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[55]Feichtenhofer C, Fan H, Malik J, et al. Slowfast networks for video recognition[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6202-6211.

[56]Xia X, Xu C, Nan B. Inception-v3 for flower classification[C]//2017 2nd international conference on image, vision and computing (ICIVC). IEEE, 2017: 783-787.

[57]Chen P, Liu S, Zhao H, et al. Gridmask data augmentation[J]. arXiv preprint arXiv:2001.04086, 2020.

[58]Su R, Ouyang W, Zhou L, et al. Improving action localization by progressive cross-stream cooperation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 12016-12025.

[59]Li Z, Wang T, Zhu A, et al. STD-TR: End-to-End Spatio-Temporal Action Detection with Transformers[C]//2021 China Automation Congress (CAC). IEEE, 2021: 7615-7620.

中图分类号:

 TP391.41    

开放日期:

 2024-06-14    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式