查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于无监督学习的视频异常事件检测方法研究
姓名：	杨宝
学号：	21208088025
保密级别：	公开
论文语种：	chi
学科代码：	083500
学科名称：	工学 - 软件工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	计算机视觉
第一导师姓名：	付燕
第一导师单位：	西安科技大学
论文提交日期：	2024-06-19
论文答辩日期：	2024-05-30
论文外文题名：	Research on video anomaly event detection method based on unsupervised learning
论文中文关键词：	视频异常事件检测 ; 深度学习 ; 自编码器 ; 无监督学习 ; 对比学习
论文外文关键词：	Video anomaly detection ; Deep learning ; Autoencoder ; Unsupervised learning ; Contrastive learning
论文中文摘要：	︿视频异常事件检测在视频监控、智能安防等多个应用领域中扮演着至关重要的角色，随着深度无监督学习技术的快速发展，利用此技术进行视频内容的智能分析成为了研究焦点。目前，基于自编码器的方法因过分关注视频特征的低级细节而导致正常与异常事件之间的重构误差相似，难以有效检测异常。依赖单一帧的预测任务在复杂多变的场景下无法充分利用视频内容的时空上下文信息，从而限制了检测效果。针对上述问题，本文提出了两种基于无监督学习的视频异常事件检测方法。主要研究内容如下： (1) 针对自编码器过分关注视频特征的低级细节而难以有效区分正常与异常事件的问题，本文提出了一种基于记忆增强时空掩蔽自编码器的视频异常事件检测方法。首先，使用时空立方体的形式来表示视频事件以深入分析视频的时空关系。其次，利用时空掩蔽自编码器提取视频的高级语义特征。此外，引入多个记忆模块并添加跳跃连接，增强记忆正常特征的能力且补偿关键信息的丢失，确保重构的完整性。最后，通过比较重构数据和输入数据之间的差异来计算异常分数，实现异常检测。所提方法在UCSD Ped 2、CUHK Avenue和Shanghai Tech数据集上的AUC分别达到99.9%、94.8%和78.9%，并优于记忆指导自编码器(MNAD)和残差自编码器(AR-AE)等主流方法。 (2) 针对数据标注耗时且昂贵的问题，以及由于预测方法仅依赖单一任务而无法充分利用视频内容的时空上下文信息，导致预测结果和实际结果不一致的问题，本文提出了一种基于深度无监督对比学习的视频异常事件检测方法。首先，在对比学习双分支架构下，将融合通道和空间注意力机制的C3D(Convolutional 3D)卷积网络模型作为编码器，以提取视频时空特征。其次，采用具有两层MLP（多层感知机）结构的投影变换网络，降低时空特征的维度。最后，计算多个无监督任务的对比损失，结合LOF(Local Outlier Factor)算法来识别视频中的异常事件。所提方法在UCSD Ped2和ShanghaiTech数据集上，与双鉴别器生成对抗方法(CT-D2GAN)相比，性能分别提升了2.6%和3.4%，在Avenue数据集上，与多路径帧预测方法(ROADMAP)相比，性能提升达到了6.6%。﹀
论文外文摘要：	︿ Video anomaly event detection plays a critical role in various application domains such as video surveillance and intelligent security. With the rapid development of deep unsupervised learning techniques, utilizing this technology for intelligent video content analysis has become a research focus. Currently, methods based on autoencoders tend to excessively focus on low-level details of video features, leading to similar reconstruction errors between normal and anomalous events, which hinders effective anomaly detection. Moreover, relying on single-frame predictions fails to fully leverage the spatiotemporal contextual information of video content, limiting detection performance in complex and dynamic scenes. To address these challenges, this paper proposes two unsupervised learning-based video anomaly event detection methods. The main research contributions are as follows: To overcome the issue of autoencoders overemphasizing the details of video features and struggling to differentiate between normal and abnormal events, we introduce a method that employs a memory-enhanced spatiotemporal masked autoencoder for video anomaly detection. Firstly, we utilize spatiotemporal cubes to represent video events and analyze their temporal and spatial relationships comprehensively. Then, we leverage the spatiotemporal masked autoencoder to extract high-level semantic features from the videos. Furthermore, we integrate multiple memory modules and skip connections to enhance the model's capacity to memorize normal features and compensate for the loss of crucial information, thereby ensuring the integrity of the reconstruction. Finally, we compute anomaly scores by contrasting the reconstructed data with the input data, enabling effective anomaly detection. The proposed method outperforms mainstream approaches like Memory-guided Autoencoders (MNAD) and Residual Autoencoders (AR-AE), achieving AUC scores of 99.9%, 94.8%, and 78.9% on the UCSD Ped 2, CUHK Avenue, and Shanghai Tech datasets, respectively. To tackle the challenges of costly and time-consuming data annotation, and the disparity between predictions and actual outcomes caused by reliance on single-task prediction methods that overlook the spatiotemporal context of video content, this paper introduces a video anomaly detection approach grounded in deep unsupervised contrastive learning. Firstly, we employ a dual-branch architecture that leverages contrastive learning with a C3D (Convolutional 3D) convolutional network model incorporating channel and spatial attention mechanisms as the encoder to extract spatiotemporal features from the video. Secondly, we utilize a projection transformation network with a two-layer MLP (Multilayer Perceptron) structure to reduce the dimensionality of the spatiotemporal features. Finally, we compute the contrastive loss for multiple unsupervised tasks and combine it with the LOF (Local Outlier Factor) algorithm to identify abnormal events in videos. The proposed method achieves improvements of 2.6% and 3.4% over the Dual Discriminator Generative Adversarial Network (CT-D2GAN) on the UCSD Ped2 and ShanghaiTech datasets, respectively, and a 6.6% performance increase compared to the Multi-Path Frame Prediction method (ROADMAP) on the Avenue dataset. ﹀
参考文献：	︿ [1] Zhou S, Shen W, Zeng D, et al. Spatial–temporal convolutional neural networks for anomaly detection and localization in crowded scenes[J]. Signal Processing: Image Communication, 2016, 47: 358-368. [2] Liu K, Ma H. Exploring background-bias for anomaly detection in surveillance videos[C]//Proceedings of the 27th ACM International Conference on Multimedia. 2019: 1490-1499. [3] Gong M, Zeng H, Xie Y, et al. Local distinguishability aggrandizing network for human anomaly detection[J]. Neural Networks, 2020, 122: 364-373. [4] 武光利, 郭振洲, 李雷霆, 等. 融合 FCN 和 LSTM 的视频异常事件检测[J]. 上海交通大学学报, 2021, 55(5): 607-614. [5] Adam A, Rivlin E, Shimshoni I, et al. Robust real-time unusual event detection using multiple fixed-location monitors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(3): 555-560. [6] Kim J, Grauman K. Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 2921-2928. [7] Hasan M, Choi J, Neumann J, et al. Learning temporal regularity in video sequences[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 733-742. [8] Sultani W, Chen C, Shah M. Real-world anomaly detection in surveillance videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6479-6488. [9] Wan B, Fang Y, Xia X, et al. Weakly supervised video anomaly detection via center-guided discriminative learning[C]//2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2020: 1-6. [10] 彭嘉丽, 赵英亮, 王黎明. 基于深度学习的视频异常行为检测研究[J]. Laser & Optoelectronics Progress, 2021, 58(6): 600004-600015. [11] Luo W, Liu W, Gao S. A revisit of sparse coding based anomaly detection in stacked rnn framework[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 341-349. [12] Zhao Y, Deng B, Shen C, et al. Spatio-temporal autoencoder for video anomaly detection[C]// Proceedings of the 25th ACM International Conference on Multimedia. 2017: 1933-1941. [13] Ionescu R T, Khan F S, Georgescu M I, et al. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7842-7851. [14] Nguyen T N, Meunier J. Anomaly detection in video sequence with appearance-motion correspondence[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1273-1283. [15] Deepak K, Chandrakala S, Mohan C K. Residual spatiotemporal autoencoder for unsupervised video anomaly detection[J]. Signal, Image and Video Processing, 2021, 15(1): 215-222. [16] Le V T, Kim Y G. Attention-based residual autoencoder for video anomaly detection[J]. Applied Intelligence, 2023, 53(3): 3240-3254. [17] Kommanduri R, Ghorai M. Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection[J]. Journal of Visual Communication and Image Representation, 2023, 95: 103-860. [18] Sun C, Jia Y, Song H, et al. Adversarial 3d convolutional auto-encoder for abnormal event detection in videos[J]. IEEE Transactions on Multimedia, 2020, 23: 3292-3305. [19] Li N, Chang F, Liu C. Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes[J]. IEEE Transactions on Multimedia, 2020, 23: 203-215. [20] Li T, Chen X, Zhu F, et al. Two-stream deep spatial-temporal auto-encoder for surveillance video abnormal event detection[J]. Neurocomputing, 2021, 439: 256-270. [21] Liu Y, Liu J, Zhao M, et al. Learning appearance-motion normality for video anomaly detection[C]//2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022: 1-6. [22] Cao C, Lu Y, Zhang Y. Context recovery and knowledge retrieval: A novel two-stream framework for video anomaly detection[J]. IEEE Transactions on Image Processing, 2024, 33: 1810-1825. [23] Liu W, Luo W, Lian D, et al. Future frame prediction for anomaly detection–a new baseline[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6536-6545. [24] 于晓升, 许茗, 王莹, 等. 基于卷积变分自编码器的异常事件检测方法[J]. 仪器仪表学报, 2023 (5): 151-158. [25] Zhao M, Liu Y, Liu J, et al. Exploiting spatial-temporal correlations for video anomaly detection[C]//2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022: 1727-1733. [26] Xu H, Liu W, Xing W, et al. Motion-aware future frame prediction for video anomaly detection based on saliency perception[J]. Signal, Image and Video Processing, 2022, 16(8): 2121-2129. [27] Zhou J T, Zhang L, Fang Z, et al. Attention-driven loss for anomaly detection in video surveillance[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(12): 4639-4647. [28] Liang X, Lee L, Dai W, et al. Dual motion GAN for future-flow embedded video prediction[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 1744-1752. [29] Lu Y, Kumar K M, shahabeddin Nabavi S, et al. Future frame prediction using convolutional vrnn for anomaly detection[C]//2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019: 1-8. [30] Wang X, Che Z, Jiang B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2301-2312. [31] Zhang Y, Nie X, He R, et al. Normality learning in multispace for video anomaly detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(9): 3694-3706. [32] Zhang Q, Feng G, Wu H. Surveillance video anomaly detection via non-local U-Net frame prediction[J]. Multimedia Tools and Applications, 2022, 81(19): 27073-27088. [33] Li C, Li H, Zhang G. Future frame prediction based on generative assistant discriminative network for anomaly detection[J]. Applied Intelligence, 2023, 53(1): 542-559. [34] Singh R, Sethi A, Saini K, et al. CVAD-GAN: Constrained video anomaly detection via generative adversarial network[J]. Image and Vision Computing, 2024: 1049-1070. [35] Pang G, Shen C, Cao L, et al. Deep learning for anomaly detection: A review[J]. ACM Computing Surveys (CSUR), 2021, 54(2): 1-38. [36] Candès E J, Li X, Ma Y, et al. Robust principal component analysis[J]. Journal of the ACM (JACM), 2011, 58(3): 1-37. [37] Zou H, Hastie T, Tibshirani R. Sparse principal component analysis[J]. Journal of Computational and Graphical Statistics, 2006, 15(2): 265-286. [38] Li P, Hastie T J, Church K W. Very sparse random projections[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 287-296. [39] Pang G, Cao L, Chen L, et al. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 2041-2050. [40] Pevný T. Loda: Lightweight on-line detector of anomalies[J]. Machine Learning, 2016, 102: 275-304. [41] Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. [42] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [43] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [44] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. [45] Pang G, Yan C, Shen C, et al. Self-trained deep ordinal regression for end-to-end video anomaly detection[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 12173-12182. [46] Zhou J T, Du J, Zhu H, et al. Anomalynet: An anomaly detection network for video surveillance[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(10): 2537-2550. [47] Erfani S M, Rajasegarar S, Karunasekera S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J]. Pattern Recognition, 2016, 58: 121-134. [48] Ionescu R T, Khan F S, Georgescu M I, et al. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7842-7851. [49] Yu W, Cheng W, Aggarwal C C, et al. Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 2672-2681. [50] Knorr E M, Ng R T. Finding intensional knowledge of distance-based outliers[C]// The Vldb Journal, 1999, 99: 211-222. [51] Knorr E M, Ng R T, Tucakov V. Distance-based outliers: algorithms and applications[J]. The Vldb Journal, 2000, 8(3): 237-253. [52] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets[C]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 2000: 427-438. [53] Schölkopf B, Platt J C, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution[J]. Neural Computation, 2001, 13(7): 1443-1471. [54] Tax D M J, Duin R P W. Support vector data description[J]. Machine Learning, 2004, 54: 45-66. [55] Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20: 273-297. [56] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. [57] Jiang X, Gao J, Hong X, et al. Gaussian processes autoencoder for dimensionality reduction[C]//Advances in Knowledge Discovery and Data Mining: 18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. Proceedings, Part II 18. Springer International Publishing, 2014: 62-73. [58] Lokman S F, Othman A T, Musa S, et al. Deep contractive autoencoder-based anomaly detection for in-vehicle controller area network (CAN)[J]. Progress in Engineering Technology: Automotive, Energy Generation, Quality Control and Efficiency, 2019: 195-205. [59] An J, Cho S. Variational autoencoder based anomaly detection using reconstruction probability[J]. Special Lecture on IE, 2015, 2(1): 1-18. [60] Graves A, Wayne G, Reynolds M, et al. Hybrid computing using a neural network with dynamic external memory[J]. Nature, 2016, 538(7626): 471-476. [61] Gong D, Liu L, Le V, et al. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1705-1714. [62] Park H, Noh J, Ham B. Learning memory-guided normality for anomaly detection[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 14372-14381. [63] Fernando T, Denman S, Ahmedt-Aristizabal D, et al. Neural memory plasticity for medical anomaly detection[J]. Neural Networks, 2020, 127: 67-81. [64] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229. [65] Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]// International Conference on Machine Learning. PMLR, 2020: 1597-1607. [66] 李诗菁, 卿粼波, 何小海, 等. 基于 NVIDIA Jetson TX2 的道路场景分割[J]. 计算机系统应用, 2019 (1): 239-244. [67] Cheng M, Cai K, Li M. RWF-2000: an open large scale video database for violence detection[C]//2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021: 4183-4190. [68] Soliman M M, Kamal M H, Nashed M A E M, et al. Violence recognition from videos using deep learning techniques[C]// 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS). IEEE, 2019: 80-85. [69] Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6450-6459. [70] Zhong Z, Zheng L, Kang G, et al. Random erasing data augmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 13001-13008. [71] Sabokrou M, Khalooei M, Fathy M, et al. Adversarially learned one-class classifier for novelty detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3379-3388. [72] Li Y, Liu W, Huang Q. Traffic anomaly detection based on image descriptor in videos[J]. Multimedia Tools and Applications, 2016, 75: 2487-2505. [73] Fang Z, Liang J, Zhou J T, et al. Anomaly detection with bidirectional consistency in videos[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 33(3): 1079-1092. [74] Liu Y, Li S, Liu J, et al. Learning attention augmented spatial-temporal normality for video anomaly detection[C]//2021 3rd International Symposium on Smart and Healthy Cities (ISHC). IEEE, 2021: 137-144. [75] Chang Y, Tu Z, Xie W, et al. Video anomaly detection with spatio-temporal dissociation[J]. Pattern Recognition, 2022, 122: 108-121. [76] Wang X, Che Z, Jiang B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2301-2312. [77] Zhao M, Liu Y, Liu J, et al. Exploiting spatial-temporal correlations for video anomaly detection[C]//2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022: 1727-1733. [78] Liu Y, Liu J, Zhao M, et al. Learning appearance-motion normality for video anomaly detection[C]//2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022: 1-6. [79] Le V T, Kim Y G. Attention-based residual autoencoder for video anomaly detection[J]. Applied Intelligence, 2023, 53(3): 3240-3254. [80] Hu X, Lian J, Zhang D, et al. Video anomaly detection based on 3D convolutional auto-encoder[J]. Signal, Image and Video Processing, 2022, 16(7): 1885-1893. [81] Feng X, Song D, Chen Y, et al. Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 5546-5554. [82] Wang X, Che Z, Jiang B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2301-2312. [83] Nguyen T N, Meunier J. Anomaly detection in video sequence with appearance-motion correspondence[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1273-1283. [84] Zhao M, Liu Y, Liu J, et al. Exploiting spatial-temporal correlations for video anomaly detection[C]//2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022: 1727-1733. [85] Li C, Li H, Zhang G. Future frame prediction based on generative assistant discriminative network for anomaly detection[J]. Applied Intelligence, 2023, 53(1): 542-559. ﹀
中图分类号：	TP391.41
开放日期：	2024-06-19

附件下载