- 无标题文档
查看论文信息

论文中文题名:

 基于对抗样本生成的Android恶意程序检测器重训练方法研究    

姓名:

 秦子琳    

学号:

 22208223057    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2025    

培养单位:

 西安科技大学    

院系:

 人工智能与计算机学院    

专业:

 计算机技术    

研究方向:

 软件安全    

第一导师姓名:

 刘晓建    

第一导师单位:

 西安科技大学    

论文提交日期:

 2025-06-16    

论文答辩日期:

 2025-05-27    

论文外文题名:

 Research on Retraining Method for Android Malware Detector Using Adversarial Sample Generation    

论文中文关键词:

 Android恶意程序检测 ; 对抗样本生成 ; 程序变异 ; 模拟退火算法 ; 约束组合优化    

论文外文关键词:

 Android malware detection ; Adversarial samples generation ; Program mutations ; Simulated annealing ; Constrained combinatorial optimization    

论文中文摘要:

随着机器学习技术的蓬勃发展和Android恶意程序样本的持续积累,基于机器学习的Android恶意程序检测方法已成为恶意程序检测领域的主流方法之一。然而此类方法严重依赖于已有恶意样本的质量和数量,难以识别使用复杂规避技术或零日漏洞开发的新型恶意程序变体。为解决该问题,对抗训练提供了一种有效的方法来提高检测恶意程序变体的鲁棒性,但在Android恶意程序检测领域,对抗训练仍面临一项关键问题,即如何生成有效且难以识别的对抗样本。针对这一挑战,本研究提出了一种基于对抗样本生成的Android恶意程序检测器重训练方法。主要研究内容如下:

(1)为了对种子样本施加有效的扰动,使其能够成功逃脱检测,提出了一组基于程序转换的原子变异方法。这些变异仅对程序进行语义等价的变异,不删除或修改基本程序特征,从而保留了种子程序的基本功能不变,并确保生成样本的有效性。采用大语言模型的提示工程指导原子变异的实现,提高了开发效率和代码质量。实验结果表明,原子变异的引入有助于恶意程序样本逃脱安全检测。

(2)为了寻找能够以最低成本逃脱检测的扰动,将生成对抗性恶意程序的问题建模为一个约束组合优化问题,即对抗样本在逃脱检测的同时,应消耗最少的生成代价。为此,结合生成对抗网络和模拟退火算法的优势,设计了一种优化求解策略,将生成对抗样本的过程视为生成器与替代检测器之间的博弈,在探索和开发之间取得平衡,以筛选最优对抗样本。实验结果显示,与现有方法相比,优化求解策略可以生成具有更快收敛速度和更低检测率的对抗样本,其中有95%能够成功逃脱检测。

(3)为了提高Android恶意程序检测器的鲁棒性,使用对抗训练方法得到增强的分类器。本研究将一组第三方检测器视为黑盒参考检测器,并训练一个替代检测器以拟合其性能。然后使用生成的对抗恶意样本扩充数据集,重新训练替代检测器以实现增强的恶意程序分类器,提高检测新恶意程序变体的性能。实验表明,通过对抗训练获得的增强分类器在检测新出现和不断发展的恶意程序变体方面具有很强的鲁棒性,并且表现出优于最先进的恶意程序检测器的性能。

论文外文摘要:

With the rapid development of machine learning technologies and the continuous accumulation of Android malware samples, Machine learning-based Android malware detection methods have become one of the mainstream methods in the field of malware detection. Since machine learning-based methods heavily rely on the quality and quantity of the accessible malware samples, making it hard to identify new malware variants developed using sophisticated evasion techniques or zero-day exploits. To tackle this problem, adversarial training offers a promising approach to improving the resilience of detection against newly emerging malware variants. However, in the setting of Android malware detection, adversarial training still faces a critical challenge—how to craft valid and hard-to-detect adversarial samples. In response to this challenge, this study proposes a retraining method for Android malware detector based on adversarial sample generation. The primary research content is as follows:

(1) To introduce proper perturbations to the seed samples and enable them to successfully evade detection, a set of atomic mutation methods based on program transformation are proposed. These program mutations are semantically equivalent, and do not remove or modify essential program features. This treatment ensures the atomic mutations preserve the core functions of seed malware to ensure the generated samples can be successfully installed and run. Furthermore, the mutations, in general, act on distinct program Smali code locations, guaranteeing mutations are mutually independent. Meanwhile, prompt engineering based on the Large Language Models is employed to guide the implementation of atomic mutations, in order to improve development efficiency and code quality.

(2) To seek for such perturbations, the problem of crafting adversarial malware is formulated as a constrained combinatorial optimization problem—adversarial samples should evade detection while consuming minimal crafting efforts. For this problem, an optimization solution strategy is devised. This strategy combines strengths of the Generative Adversarial Networks and the Simulated Annealing algorithm, treating the adversarial sample generation process as a game between a generator and a substitute detector. It achieves a balance between exploration and exploitation to screen the optimal adversarial samples.

(3) To enhance the robustness of Android malware detectors, an enhanced classifier is obtained through a retraining method. In this study, a set of third-party detectors are regarded as black-box reference detectors, and a reference detector is trained to closely approximate their performance. Subsequently, an enhanced malware classifier is retrained by augmenting the dataset with the generated adversarial malware samples to improve the performance of detection against new malware variants.

Extensive experimental evaluation shows that, the optimization solution strategy can generate adversarial samples with a faster convergence and lower detection rate than existing leading approaches; 95% of the generated adversarial samples can successfully evade detection. The enhanced classifier obtained through adversarial training achieves a strong robustness in detecting against newly emerging and evolving malware variants, and exhibits a superior performance over the state-of-the-art malware detectors.

参考文献:

[1] 2023年度中国手机安全状况报告[R]. 360互联网安全中心, 2023. https://pop.shouji.360.cn/safe_report/Mobile-Security-Report-202312.pdf.

[2] 范铭,刘烃,刘均,等.安卓恶意软件检测方法综述[J]. 中国科学:信息科学,2020,50(08): 1148-1177.

[3] Sihag V, Swami A, Vardhan M, et al. Signature based malicious behavior detection in android[C]//International conference on computing science, communication and security. Singapore: Springer Singapore, 2020: 251-262.

[4] Sokolova K, Perez C, Lemercier M. Android application classification and anomaly detection with graph-based permission patterns[J]. Decision Support Systems, 2017, 93: 62-76.

[5] Sun S, Fu X, Ruan H, et al. Real-time behavior analysis and identification for Android application[J]. IEEE Access, 2018, 6: 38041-38051.

[6] Alsmadi T, Alqudah N. A survey on malware detection techniques[C]//2021 international conference on information technology (ICIT). IEEE, 2021: 371-376.

[7] Hamid K, Iqbal M W, Aqeel M, et al. Analysis of techniques for detection and removal of zero-day attacks (zda)[C]//International Conference on Ubiquitous Security. Singapore: Springer Nature Singapore, 2022: 248-262.

[8] Kabakus A T. DroidMalwareDetector: A novel Android malware detection framework based on convolutional neural network[J]. Expert Systems with Applications, 2022, 206: 117833.

[9] Singh N, Tripathy S. MDLDroid: Multimodal Deep Learning Based Android Malware Detection[C]//International Conference on Information Systems Security. Cham: Springer Nature Switzerland, 2023: 159-177.

[10] Li Y, Fang S, Zhang T, et al. Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task[J]. arXiv preprint arXiv: 2410.04352, 2024.

[11] Zheng J, Liu J, Zhang A, et al. MaskDroid: Robust Android Malware Detection with Masked Graph Representations[C]//Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024: 331-343.

[12] Wang L, Wang H, He R, et al. MalRadar: Demystifying android malware in the new era[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2022, 6(2): 1-27.

[13] Conti M, Vinod P, Vitella A. Obfuscation detection in android applications using deep learning[J]. Journal of Information Security and Applications, 2022, 70: 103311.

[14] Xiong S, Chen X, Zhang H, et al. Domain Adaptation-Based Deep Learning Framework for Android Malware Detection Across Diverse Distributions[J]. Artificial Intelligence Advances, 2024, 6(1): 13-24.

[15] Huang Y, Hu H, Chen C. Robustness of on-device models: Adversarial attack to deep learning models on android apps[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2021: 101-110.

[16] Suciu O, Coull S E, Johns J. Exploring adversarial examples in malware detection[C]//2019 IEEE Security and Privacy Workshops (SPW). IEEE, 2019: 8-14.

[17] Grosse K, Papernot N, Manoharan P, et al. Adversarial perturbations against deep neural networks for malware classification[J]. arXiv preprint arXiv: 1606.04435, 2016.

[18] Rastogi V, Chen Y, Jiang X. Catch me if you can: Evaluating android anti-malware against transformation attacks[J]. IEEE Transactions on Information Forensics and Security, 2013, 9(1): 99-108.

[19] Maiorca D, Ariu D, Corona I, et al. Stealth attacks: An extended insight into the obfuscation effects on android malware[J]. Computers & Security, 2015, 51: 16-31.

[20] Faruki P, Bhan R, Jain V, et al. A survey and evaluation of android-based malware evasion techniques and detection frameworks[J]. Information, 2023, 14(7): 374.

[21] Wang C, Xu C, Yao X, et al. Evolutionary generative adversarial networks[J]. IEEE Transactions on Evolutionary Computation, 2019, 23(6): 921-934.

[22] Liu X, Du X, Zhang X, et al. Adversarial samples on android malware detection systems for IoT systems[J]. Sensors, 2019, 19(4): 974.

[23] He P, Xia Y, Zhang X, et al. Efficient query-based attack against ML-based Android malware detection under zero knowledge setting[C]//Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 2023: 90-104.

[24] Taylor M A, Larson E C, Thornton M A. Rapid ransomware detection through side channel exploitation[C]//2021 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE, 2021: 47-54.

[25] Hu W, Tan Y. Generating adversarial malware examples for black-box attacks based on GAN[C]//International Conference on Data Mining and Big Data. Singapore: Springer Nature Singapore, 2022: 409-423.

[26] Rebuffi S A, Gowal S, Calian D A, et al. Data augmentation can improve robustness[J]. Advances in neural information processing systems, 2021, 34: 29935-29948.

[27] Xu G, Xin G H, Jiao L, et al. Ofei: A semi-black-box android adversarial sample attack framework against dlaas[J]. IEEE Transactions on Computers, 2023, 73(4): 956-969.

[28] Demetrio L, Biggio B, Lagorio G, et al. Functionality-preserving black-box optimization of adversarial windows malware[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 3469-3478.

[29] Kucuk Y, Yan G. Deceiving portable executable malware classifiers into targeted misclassification with practical adversarial examples[C]//Proceedings of the tenth ACM conference on data and application security and privacy. 2020: 341-352.

[30] Song W, Li X, Afroz S, et al. Mab-malware: A reinforcement learning framework for attacking static malware classifiers[J]. arXiv preprint arXiv:2003.03100, 2020.

[31] Pierazzi F, Pendlebury F, Cortellazzi J, et al. Intriguing properties of adversarial ml attacks in the problem space[C]//2020 IEEE symposium on security and privacy (SP). IEEE, 2020: 1332-1349.

[32] Dan Hendrycks, Thomas G Dietterich. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.[J]. CoRR, 2019, abs/1903.12261.

[33] Catak F O, Ahmed J, Sahinbas K, et al. Data augmentation based malware detection using convolutional neural networks[J]. Peerj computer science, 2021, 7: e346.

[34] Yu S, Wang T, Wang J. Data augmentation by program transformation[J]. Journal of Systems and Software, 2022, 190: 111304.

[35] Dong Z, Hu Q, Zhang Z, et al. On the effectiveness of graph data augmentation for source code learning[J]. Knowledge-Based Systems, 2024, 285: 111328.

[36] Gupta R, Pal S, Kanade A, et al. Deepfix: Fixing common c language errors by deep learning[C]//Proceedings of the aaai conference on artificial intelligence. 2017, 31(1).

[37] Arya S, Peddoju S K. (POSTER) Federated Learning Assisted Model for Android Malware Detection using Gannet Optimization Algorithm[C]//2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT). IEEE, 2024: 777-779.

[38] Liu X, Du X, Lei Q, et al. Multifamily classification of Android malware with a fuzzy strategy to resist polymorphic familial variants[J]. IEEE Access, 2020, 8: 156900-156914.

[39] 陈颖,林雨衡,王志强,等.基于Transformer的安卓恶意软件多分类模型[J]. 信息安全研究,2023,9(12):1138-1144.

[40] 印杰,黄肖宇,刘家银,等.基于预训练语言模型的安卓恶意软件检测方法[J]. 计算机工程与科学,2023,45(08):1433-1442.

[41] Jeon S, Moon J. Malware-detection method with a convolutional recurrent neural network using opcode sequences[J]. Information Sciences, 2020, 535: 1-15.

[42] Huda S, Abawajy J, Alazab M, et al. Hybrids of support vector machine wrapper and filter based framework for malware detection[J]. Future Generation Computer Systems, 2016, 55: 376-390.

[43] Rabadi D, Teo S G. Advanced windows methods on malware detection and class-ification[C]//Proceedings of the 36th Annual Computer Security Applications Conf-erence. 2020: 54-68.

[44] 张雪芹,王逸璇,赵敏.基于深度学习的Android恶意软件动态检测[J]. 计算机工程与设计,2024,45(01):10-16. DOI:10.16208/j.issn1000-7024.2024.01.002.

[45] Han X, Yu X, Pasquier T, et al. {SIGL}: Securing software installations through deep graph learning[C]//30th USENIX Security Symposium (USENIX Security 21). 2021: 2345-2362.

[46] Chen L, Sultana S, Sahita R. Henet: A deep learning approach on intel® processor trace for effective exploit detection[C]//2018 IEEE Security and Privacy Workshops (SPW). IEEE, 2018: 109-115.

[47] O’Shaughnessy S, Sheridan S. Image-based malware classification hybrid framework based on space-filling curves[J]. Computers & Security, 2022, 116: 102660.

[48] Mat S R T, Ab Razak M F, Kahar M N M, et al. A Bayesian probability model for Android malware detection[J]. ICT Express, 2022, 8(3): 424-431.

[49] Koli J D. RanDroid: Android malware detection using random machine learning classifiers[C]//2018 Technologies for Smart-City Energy Security and Power (ICSESP). IEEE, 2018: 1-6.

[50] Sun Y, Xie Y, Qiu Z, et al. Detecting android malware based on extreme learning machine[C]//2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, 2017: 47-53.

[51] Arzt S. Static data flow analysis for android applications[J]. 2017.

[52] Lou S, Cheng S, Huang J, et al. TFDroid: Android malware detection by topics and sensitive data flows using machine learning techniques[C]//2019 IEEE 2Nd international conference on information and computer technologies (ICICT). IEEE, 2019: 30-36.

[53] Ceschin F, Botacin M, Gomes H M, et al. Fast & furious: On the modelling of malware detection as an evolving data stream[J]. Expert Systems with Applications, 2023, 212: 118590.

[54] Thangavelooa R, Jinga W W, Lenga C K, et al. Datdroid: Dynamic analysis technique in android malware detection[J]. International Journal on Advanced Science, Engineering and Information Technology, 2020, 10(2): 536-541.

[55] Hasan H, Ladani B T, Zamani B. MEGDroid: A model-driven event generation framework for dynamic android malware analysis[J]. Information and Software Technology, 2021, 135: 106569.

[56] 姚烨,钱亮,朱怡安,等.一种基于混合特征的移动终端恶意软件检测方法[J]. 信息安全学报,2022,7(02):120-138. DOI:10.19363/J.cnki.cn10-1380/tn.2022.03.08.

[57] Kim J , Ban Y , Ko E ,et al. MAPAS: a practical deep learning-based android malware detection system[J]. International Journal of Information Security,2025, 1-14.

[58] 陈非,曹晓梅,王少辉.基于特征图像生成的Android恶意软件检测方法[J]. 计算机技术与发展,2023,33(06):125-132.

[59] Wu Y, Li M, Zeng Q, et al. DroidRL: Feature selection for android malware detection with reinforcement learning[J]. Computers & Security, 2023, 128: 103126.

[60] Hei Y, Yang R, Peng H, et al. Hawk: Rapid android malware detection through heterogeneous graph attention networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021.

[61] 张志,尹昱凯,孙奕灵,等.基于多模态特征融合的Android恶意软件检测模型研究[J/OL].计算机工程,1-12[2025-03-02].https://doi.org/10.19678/j.issn.1000-3428.0070175.

[62] Qaisar Z H, Li R. Multimodal information fusion for android malware detection using lazy learning[J]. Multimedia Tools and Applications, 2022: 1-15.

[63] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arXiv preprint arXiv: 1312.6199, 2013.

[64] Papernot N, McDaniel P, Jha S, et al. The limitations of deep learning in adversarial settings[C]//2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, 2016: 372-387.

[65] Grosse K, Papernot N, Manoharan P, et al. Adversarial examples for malware det-ection[C]//Computer Security–ESORICS 2017: 22nd European Symposium on Rese-arch in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, P-art II 22. Springer International Publishing, 2017: 62-79.

[66] 唐川,张义,杨岳湘,等.DroidGAN:基于DCGAN的Android对抗样本生成框架[J]. 通信学报,2018,39(S1):64-69.

[67] Meng G, Xue Y, Mahinthan C, et al. Mystique: Evolving android malware for auditing anti-malware tools[C]//Proceedings of the 11th ACM on Asia conference on computer and communications security. 2016: 365-376.

[68] Xue Y, Meng G, Liu Y, et al. Auditing anti-malware tools by evolving android malware and dynamic loading technique[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(7): 1529-1544.

[69] Aydogan E, Sen S. Automatic generation of mobile malwares using genetic programming[C]//Applications of Evolutionary Computation: 18th European Conference, EvoApplications 2015, Copenhagen, Denmark, April 8-10, 2015, Proceedings 18. Springer International Publishing, 2015: 745-756.

[70] Chan K H, Cheng B H C. Evoattack: suppressive adversarial attacks against object detection models using evolutionary search[J]. Automated Software Engineering, 2025, 32(1): 3.

[71] Ling X, Wu Z, Wang B, et al. A wolf in sheep's clothing: practical black-box adversarial attacks for evading learning-based windows malware detection in the wild[C]//33rd USENIX Security Symposium (USENIX Security 24). 2024: 7393-7410.

[72] Wang C, Zhang L, Zhao K, et al. Advandmal: Adversarial training for android malware detection and family classification[J]. Symmetry, 2021, 13(6): 1081.

[73] 张嘉楠.基于GAN的恶意软件对抗样本研究[D]. 北京交通大学,2021. DOI:10.26944/d.cnki.gbfju.2021.002603.

[74] Li S, Tang Z, Li H, et al. GMADV: An Android malware variant generation and classification adversarial training framework[J]. Journal of Information Security and Applications, 2024, 84: 103800.

[75] Berger H, Hajaj C, Mariconti E, et al. MaMaDroid2. 0-The Holes of Control Flow Graphs[J]. arXiv preprint arXiv: 2202.13922, 2022.

[76] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[J]. Advances in neural information processing systems, 2014, 27.

[77] Reina A, Fattori A, Cavallaro L. A system call-centric analysis and stimulation technique to automatically reconstruct android malware behaviors[J]. EuroSec, April, 2013.

[78] Wang W, Gao Z, Zhao M, et al. DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features[J]. IEEE Access, 2018, 6: 31798-31807.

[79] Li L, Bartel A, Bissyandé T F, et al. Iccta: Detecting inter-component privacy leaks in android apps[C]//2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 2015, 1: 280-291.

[80] Dolev S, Ghanayim M, Binun A, et al. Relationship of Jaccard and edit distance in malware clustering and online identification[C]//2017 IEEE 16th International Symposium on Network Computing and Applications (NCA). IEEE, 2017: 1-5.

[81] Fletcher R. An ideal penalty function for constrained optimization[J]. IMA Journal of Applied Mathematics, 1975, 15(3): 319-342.

[82] Arp D, Spreitzenbarth M, Hubner M, et al. Drebin: Effective and explainable detection of android malware in your pocket[C]//Ndss. 2014, 14: 23-26.

[83] Caracciolo S, Hartmann A, Kirkpatrick S, et al. Simulated annealing, optimization, searching for ground states[M]//Spin Glass Theory and Far Beyond: Replica Symmetry Breaking After 40 Years. 2023: 1-20.

[84] He C, Huang S, Cheng R, et al. Evolutionary multiobjective optimization driven by generative adversarial networks (GANs)[J]. IEEE transactions on cybernetics, 2020, 51(6): 3129-3142.

[85] Chowdhery A, Narang S, Devlin J, et al. Palm: Scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240): 1-113.

[86] Liu P, Yuan W, Fu J, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing[J]. ACM computing surveys, 2023, 55(9): 1-35.

[87] Liu P, Yuan W, Fu J, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing[J]. ACM computing surveys, 2023, 55(9): 1-35.

[88] Zhang Z, Yao Y, Zhang A, et al. Igniting language intelligence: The hitchhiker's guide from chain-of-thought reasoning to language agents[J]. ACM Computing Surveys, 2023.

[89] Gao T, Yen H, Yu J, et al. Enabling large language models to generate text with citations[J]. arXiv preprint arXiv: 2305.14627, 2023.

[90] Wang L, Lyu C, Ji T, et al. Document-level machine translation with large language models[J]. arXiv preprint arXiv: 2304.02210, 2023.

[91] Jiang X, Dong Y, Wang L, et al. Self-planning code generation with large language models[J]. ACM Transactions on Software Engineering and Methodology, 2024, 33(7): 1-30.

[92] Mahdavifar S, Kadir A F A, Fatemi R, et al. Dynamic android malware category classification using semi-supervised deep learning[C]//2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, 2020: 515-522.

[93] Samanta P, Jain S. SmartHash: Perceptual Hashing for Image Tampering Detection and Authentication[C]//Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2024: 1983-1993.

[94] Liu P, Yuan W, Fu J, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing[J]. ACM computing surveys, 2023, 55(9): 1-35.

[95] White J, Fu Q, Hays S, et al. A prompt pattern catalog to enhance prompt engineering with chatgpt[J]. arXiv preprint arXiv: 2302.11382, 2023.

中图分类号:

 TP309.2    

开放日期:

 2025-06-17    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式