- 无标题文档
查看论文信息

论文中文题名:

 基于半监督学习的加密恶意流量 识别方法研究    

姓名:

 王新琦    

学号:

 21207223077    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2024    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电子信息    

研究方向:

 网络安全    

第一导师姓名:

 姚军    

第一导师单位:

 西安科技大学    

论文提交日期:

 2024-06-13    

论文答辩日期:

 2024-06-01    

论文外文题名:

 Research on Encrypted Malicious Traffic Identification Methods Based on Semi-Supervised Learning    

论文中文关键词:

 加密恶意流量 ; 多模态 ; 半监督学习 ; 深度学习 ; 注意力机制    

论文外文关键词:

 Malicious encrypted traffic ; Multimodality ; Semi supervised learning ; Deep Learning ; Attention mechanism    

论文中文摘要:

       网络流量加密技术已成为保护用户隐私和敏感信息安全的重要手段。然而,这一技术被网络攻击者所利用来规避安全检测,达到数据窃取、传播恶意软件和数据篡改等目的,对互联网安全构成重大威胁。因此,如何准确识别网络流量中的加密恶意流量已成为网络安全领域亟需研究的重点问题。

       目前大部分的加密恶意识别方法只使用加密恶意流量单模态的特征作为输入,流量特征信息提取不完全,识别准确度低,且这些方法仅利用了已标记的流量数据。因此,本文设计了一种基于多模态半监督的加密恶意流量识别方法。首先,通过构建多模态神经网络模型,利用加密恶意流量的多模态特征来完成加密恶意流量识别分类任务,从而提升加密恶意流量的识别准确度。本文构建的模型结合原始字节模态与统计特征模态,通过特征串联生成加密恶意流量多模态特征。具体而言,原始字节模态通过卷积神经网络融合卷积块注意力模块注意力机制提取空间特征,统计特征模态使用双向门控制单元融合多头注意力机制和卷积神经网络提炼和压缩关键特征。其次,针对多模态神经网络模型无法利用未标记流量数据进行训练的局限性,进一步将多模态模型和半监督模型相结合。在多模态神经网络模型的基础上,构建了两个结构相同的多模态网络,通过对原始字节模态和统计特征模态采用随机高斯噪声数据增广方式,结合Mean Teacher模型,通过Mean Teacher的一致性学习策略,有效地利用了大量未标记的流量数据。

       实验结果显示,在仅有1%的标记样本下,多模态半监督的加密恶意流量识别方法的准确率和F1-Score最高分别达到了97.42%和94.60%,效果均好于经典的识别方法。实验结果表明,本文设计的方法有效提高了在仅有少量标记样本下对加密恶意流量的识别能力。

论文外文摘要:

        Network traffic encryption technology has become an essential means for protecting user privacy and the security of sensitive information. However, this technology has been exploited by cyber attackers to evade security detection, achieving objectives such as data theft, spreading malware, and data tampering, posing a significant threat to internet security. Therefore, accurately identifying encrypted malicious traffic within network traffic has become an urgent research focus in the field of cybersecurity.

       Currently, most encryption-based malicious identification methods rely solely on the single-modality characteristics of encrypted malicious traffic as input, resulting in incomplete traffic feature extraction and low recognition accuracy. Furthermore, these methods only utilize traffic data that has been previously labeled. Therefore, this thesis proposes a multimodal semi-supervised method for the identification of encrypted malicious traffic. Initially, a multimodal neural network model is constructed to leverage the multimodal features of encrypted malicious traffic to accomplish the classification task of identifying encrypted malicious traffic, thus improving the recognition accuracy. The model developed in this thesis combines the original byte modality with the statistical feature modality to generate multimodal features for encrypted malicious traffic through feature concatenation. Specifically, the original byte modality extracts spatial features through convolutional neural networks that integrate convolutional block attention module mechanisms, while the statistical feature modality refines and compresses key features using bidirectional gated recurrent units combined with multi-head attention mechanisms and convolutional neural networks. Subsequently, to overcome the limitation of multimodal neural network models that cannot be trained with unlabeled traffic data, the thesis further integrates multimodal models with semi-supervised models. On the foundation of the multimodal neural network model, two structurally identical multimodal networks are established. By applying random Gaussian noise data augmentation to the original byte modality and statistical feature modality, and in conjunction with the Mean Teacher model, the consistency learning strategy of the Mean Teacher effectively utilizes a large volume of unlabeled traffic data.

       The experimental results show that with only 1% labeled samples, the accuracy and F1-Score of the multi-modal semi-supervised encrypted malicious traffic identification method reached up to 97.42% and 94.60%, respectively, both outperforming the classic identification methods. The results indicate that the method designed in this thesis effectively enhances the identification capability of encrypted malicious traffic with only a small number of labeled samples.

参考文献:

[1] 2023年中国互联网网络安全报告 [R]. 北京:中国互联网络信息中心, 2023.

[2] Notess G R.Google Transparency Report [EB/OL].谷歌透明报告官网,2024-03-10:

[3] ThreatLabz. Zscaler ThreatLabz 2023 State of Encrypted Attacks Report [EB/OL].zscaler官网,2023-12-18:

[4] 王宇航, 姜文刚, 翟江涛, 等. 面向ssl vpn加密流量的识别方法 [J]. 计算机工程与应用, 2022, 58(01): 143-151.

[5] Wang L, Wang H, He R, et al. Malradar: Demystifying android malware in the new era [J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2022, 6(2): 1-27.

[6] Shekhawat AS, Di Troia F, Stamp M. Feature analysis of encrypted malicious traffic [J]. Expert Systems with Applications, 2019, 125: 130-141.

[7] Marín G, Caasas P, Capdehourat G. Deepmal-deep learning models for malware traffic detection and classification[C]//Data Science–Analytics and Applications: Proceedings of the 3rd International Data Science Conference–iDSC2020. Springer Fachmedien Wiesbaden, 2021: 105-112.

[8] 卢宛芝, 丁要军. 基于半监督多视图特征协同训练的网络恶意流量识别方法 [J]. 通信技术, 2022, 55(04): 513-518.

[9] 巩思越, 刘辉, 王宝会. 基于会话统计编码器的恶意加密流量检测方法研究 [J/OL].计算机科学,2024-05-03:

[10] 江魁, 陈小雷, 顾杜娟, 等. 基于可变长序列的恶意加密流量检测方法 [J]. 福州大学学报(自然科学版), 2023, 51(05): 711-716.

[11] 程筱彪, 张曼君. 基于k-means聚类模型的加密流量识别方法 [J]. 邮电设计技术, 2023, (08): 53-56.

[12] 王天棋, 丁要军. 基于stacking的网络恶意加密流量识别方法 [J]. 通信技术, 2022, 55(07): 935-942.

[13] 霍跃华, 赵法起, 吴文昊. 多特征融合的煤矿网络加密恶意流量检测方法 [J]. 工矿自动化, 2022, 48(07): 142-148.

[14] 杨彦召, 丁杰, 仇晶, et al. 基于数据包特征的加密流量分类 [J]. 广州大学学报(自然科学版), 2022, 21(02): 60-66.

[15] Fu C, Li Q, Shen M, et al. Realtime robust malicious traffic detection via frequency domain analysis[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021: 3431-3446.

[16] Lotfollahi M, Jafari Siavoshani M, Shirali Hossein Zade R, et al. Deep packet: A novel approach for encrypted traffic classification using deep learning [J]. Soft Computing, 2020, 24(3): 1999-2012.

[17] Lin P, Ye K, Hu Y, et al. A novel multimodal deep learning framework for encrypted traffic classification [J]. IEEE/ACM Transactions on Networking, 2022.

[18] Wang W, Zhu M, Wang J, et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//2017 IEEE international conference on intelligence and security informatics (ISI). IEEE, 2017: 43-48.

[19] 孙懿, 高见, 顾益军. 融合一维inception结构与vit的恶意加密流量检测 [J]. 计算机工程, 2023, 49(01): 154-162.

[20] 邓昕, 刘朝晖, 欧阳燕, 等. 基于cnn cbam-bigru attention的加密恶意流量识别 [J]. 计算机工程, 2023, 49(11): 178-186.

[21] Lin K, Xu X, Gao H. Tscrnn: A novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of iiot [J]. Computer Networks, 2021, 190: 107974.

[22] Wang M, Zheng K, Luo D, et al. An encrypted traffic classification framework based on convolutional neural networks and stacked autoencoders[C]//2020 IEEE 6th International Conference on Computer and Communications (ICCC). IEEE, 2020: 634-641.

[23] Wang X, Chen S, Su J. App-net: A hybrid neural network for encrypted mobile traffic classification[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2020: 424-429.

[24] Aceto G, Ciuonzo D, Montieri A, et al. Mimetic: Mobile encrypted traffic classification using multimodal deep learning [J]. Computer Networks, 2019, 165: 106944.

[25] 焦利彬, 王猛, 霍永华. 基于多模态深度学习的流量分类识别方法 [J]. 无线电通信技术, 2021, 47(02): 215-219.

[26] Lu W, Ding Y. A Network Malicious Traffic Detection Method Based on Semi-Supervised Deep Learning[C]//2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). IEEE, 2021: 1-6.

[27] 谷勇浩, 黄博琪, 王继刚, 等. 基于半监督深度学习的木马流量检测方法 [J]. 计算机研究与发展, 2022, 59(06): 1329-1342.

[28] 王天棋, 丁要军. 基于半监督深度学习的网络恶意加密流量识别方法 [J]. 信息安全与通信保密, 2023, (05): 88-98.

[29] Wang P, Wang Z, Ye F, et al. Bytesgan: A semi-supervised generative adversarial network for encrypted traffic classification of sdn edge gateway in green communication network [J]. arXiv preprint arXiv:210305250, 2021.

[30] Zhao R, Deng X, Yan Z, et al. MT-FlowFormer: A Semi-Supervised Flow Transformer for Encrypted Traffic Classification[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022: 2576-2584.

[31] 康鹏, 杨文忠, 马红桥. Tls协议恶意加密流量识别研究综述 [J]. 计算机工程与应用, 2022, 58(12): 1-11.

[32] 陈子涵, 程光, 徐子恒, 等. 互联网加密流量检测、分类与识别研究综述 [J]. 计算机学报, 2023, 46(05): 1060-1085.

[33] 李小剑, 谢晓尧, 徐洋, 等. 基于cnn-sindrnn的恶意tls流量快速识别方法 [J]. 计算机工程, 2022, 48(04): 148-157+164.

[34] 鲁刚, 郭荣华, 周颖, 等. 恶意流量特征提取综述 [J]. 信息网络安全, 2018, (09): 1-9.

[35] Li Y, Guo H, Hou J, et al. A Survey of Encrypted Malicious Traffic Detection[C]//2021 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI). IEEE, 2021: 1-7.

[36] Hui Tang B, Xu Jiang W, Xiang Ke Y. Research on CNN-Based Malicious Traffic Identification Method[C]//2021 7th International Conference on Computing and Artificial Intelligence. 2021: 257-265.

[37] Oh C, Ha J, Roh H. A survey on tls-encrypted malware network traffic analysis applicable to security operations centers [J]. Applied Sciences, 2021, 12(1): 155.

[38] Van Engelen JE, Hoos HH. A survey on semi-supervised learning [J]. Machine learning, 2020, 109(2): 373-440.

[39] 刘雅芬, 郑艺峰, 江铃燚, 等. 深度半监督学习中伪标签方法综述 [J]. 计算机科学与探索, 2022, 16(06): 1279-1290.

[40] Yang X, Song Z, King I, et al. A survey on deep semi-supervised learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 35(9): 8934-8954.

[41] Ouali Y, Hudelot C, Tami M. An overview of deep semi-supervised learning [J]. arXiv preprint arXiv:200605278, 2020.

[42] Lee D H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks[C]//Workshop on challenges in representation learning, ICML. 2013, 3(2): 896.

[43] Manoochehri O, Asoodeh A, Forooraghi K. Pi-model dual-band impedance transformer for unequal complex impedance loads [J]. IEEE Microwave and Wireless Components Letters, 2015, 25(4): 238-240.

[44] Laine S, Aila T. Temporal ensembling for semi-supervised learning [J]. arXiv preprint arXiv:161002242, 2016.

[45] Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results [J]. Advances in neural information processing systems, 2017, 30.

[46] Sohn K, Berthelot D, Carlini N, et al. Fixmatch: Simplifying semi-supervised learning with consistency and confidence [J]. Advances in neural information processing systems, 2020, 33: 596-608.

[47] Oliver A, Odena A, Raffel CA, et al. Realistic evaluation of deep semi-supervised learning algorithms [J]. Advances in neural information processing systems, 2018, 31.

[48] 付钰, 王坤, 段雪源, 等. 面向软件定义网络的异常流量检测研究综述 [J]. 通信学报, 2024, 45(03): 208-226.

[49] 侯剑, 鲁辉, 刘方爱, 等. 加密恶意流量检测及对抗综述 [J]. 软件学报, 2024, 35(01): 333-355.

[50] 李慧慧, 张士庚, 宋虹, 等. 结合多特征识别的恶意加密流量检测方法 [J]. 信息安全学报, 2021, 6(02): 129-142.

[51] Bader O, Lichy A, Hajaj C, et al. MalDIST: From encrypted traffic classification to malware traffic detection and classification[C]//2022 IEEE 19th annual consumer communications & networking conference (CCNC). IEEE, 2022: 527-533.

[52] 彭兴维, 袁凌云, 于勇涛, 等. 基于深度学习的物联网安全态势评估[J/OL].计算机应用与软件,2024-05-03:

[53] Qin J, Liu G, Duan K. A new imbalanced encrypted traffic classification model based on cbam and re-weighted loss function [J]. Applied Sciences, 2022, 12(19): 9631.

[54] Maonan W, Kangfeng Z, Ning X, et al. CENTIME: a direct comprehensive traffic features extraction for encrypted traffic classification[C]//2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS). IEEE, 2021: 490-498.

[55] Zhang J, Zhang X, Liu Z, et al. A network intrusion detection model based on bilstm with multi-head attention mechanism [J]. Electronics, 2023, 12(19): 4170.

[56] Yang H, He Q, Liu Z, et al. Malicious encryption traffic detection based on nlp [J]. Security and Communication Networks, 2021, 2021: 1-10.

[57] Ferriyan A, Thamrin AH, Takeda K, et al. Encrypted malicious traffic detection based on word2vec [J]. Electronics, 2022, 11(5): 679.

[58] Aceto G, Ciuonzo D, Montieri A, et al. Distiller: Encrypted traffic classification via multimodal multitask deep learning [J]. Journal of Network and Computer Applications, 2021, 183: 102985.

[59] 谷勇浩, 徐昊, 张晓青. 基于多粒度表征学习的加密恶意流量检测 [J]. 计算机学报, 2023, 46(09): 1888-1899.

[60] Kang B, Garcia Garcia D, Lijffijt J, et al. Conditional t-sne: More informative t-sne embeddings [J]. Machine learning, 2021, 110: 2905-2940.

中图分类号:

 TP393.08    

开放日期:

 2024-06-14    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式