论文中文题名: | 基于对抗样本生成的Android恶意程序检测器重训练方法研究 |
姓名: | |
学号: | 22208223057 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2025 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 软件安全 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2025-06-16 |
论文答辩日期: | 2025-05-27 |
论文外文题名: | Research on Retraining Method for Android Malware Detector Using Adversarial Sample Generation |
论文中文关键词: | Android恶意程序检测 ; 对抗样本生成 ; 程序变异 ; 模拟退火算法 ; 约束组合优化 |
论文外文关键词: | Android malware detection ; Adversarial samples generation ; Program mutations ; Simulated annealing ; Constrained combinatorial optimization |
论文中文摘要: |
随着机器学习技术的蓬勃发展和Android恶意程序样本的持续积累,基于机器学习的Android恶意程序检测方法已成为恶意程序检测领域的主流方法之一。然而此类方法严重依赖于已有恶意样本的质量和数量,难以识别使用复杂规避技术或零日漏洞开发的新型恶意程序变体。为解决该问题,对抗训练提供了一种有效的方法来提高检测恶意程序变体的鲁棒性,但在Android恶意程序检测领域,对抗训练仍面临一项关键问题,即如何生成有效且难以识别的对抗样本。针对这一挑战,本研究提出了一种基于对抗样本生成的Android恶意程序检测器重训练方法。主要研究内容如下: (1)为了对种子样本施加有效的扰动,使其能够成功逃脱检测,提出了一组基于程序转换的原子变异方法。这些变异仅对程序进行语义等价的变异,不删除或修改基本程序特征,从而保留了种子程序的基本功能不变,并确保生成样本的有效性。采用大语言模型的提示工程指导原子变异的实现,提高了开发效率和代码质量。实验结果表明,原子变异的引入有助于恶意程序样本逃脱安全检测。 (2)为了寻找能够以最低成本逃脱检测的扰动,将生成对抗性恶意程序的问题建模为一个约束组合优化问题,即对抗样本在逃脱检测的同时,应消耗最少的生成代价。为此,结合生成对抗网络和模拟退火算法的优势,设计了一种优化求解策略,将生成对抗样本的过程视为生成器与替代检测器之间的博弈,在探索和开发之间取得平衡,以筛选最优对抗样本。实验结果显示,与现有方法相比,优化求解策略可以生成具有更快收敛速度和更低检测率的对抗样本,其中有95%能够成功逃脱检测。 (3)为了提高Android恶意程序检测器的鲁棒性,使用对抗训练方法得到增强的分类器。本研究将一组第三方检测器视为黑盒参考检测器,并训练一个替代检测器以拟合其性能。然后使用生成的对抗恶意样本扩充数据集,重新训练替代检测器以实现增强的恶意程序分类器,提高检测新恶意程序变体的性能。实验表明,通过对抗训练获得的增强分类器在检测新出现和不断发展的恶意程序变体方面具有很强的鲁棒性,并且表现出优于最先进的恶意程序检测器的性能。 |
论文外文摘要: |
With the rapid development of machine learning technologies and the continuous accumulation of Android malware samples, Machine learning-based Android malware detection methods have become one of the mainstream methods in the field of malware detection. Since machine learning-based methods heavily rely on the quality and quantity of the accessible malware samples, making it hard to identify new malware variants developed using sophisticated evasion techniques or zero-day exploits. To tackle this problem, adversarial training offers a promising approach to improving the resilience of detection against newly emerging malware variants. However, in the setting of Android malware detection, adversarial training still faces a critical challenge—how to craft valid and hard-to-detect adversarial samples. In response to this challenge, this study proposes a retraining method for Android malware detector based on adversarial sample generation. The primary research content is as follows: (1) To introduce proper perturbations to the seed samples and enable them to successfully evade detection, a set of atomic mutation methods based on program transformation are proposed. These program mutations are semantically equivalent, and do not remove or modify essential program features. This treatment ensures the atomic mutations preserve the core functions of seed malware to ensure the generated samples can be successfully installed and run. Furthermore, the mutations, in general, act on distinct program Smali code locations, guaranteeing mutations are mutually independent. Meanwhile, prompt engineering based on the Large Language Models is employed to guide the implementation of atomic mutations, in order to improve development efficiency and code quality. (2) To seek for such perturbations, the problem of crafting adversarial malware is formulated as a constrained combinatorial optimization problem—adversarial samples should evade detection while consuming minimal crafting efforts. For this problem, an optimization solution strategy is devised. This strategy combines strengths of the Generative Adversarial Networks and the Simulated Annealing algorithm, treating the adversarial sample generation process as a game between a generator and a substitute detector. It achieves a balance between exploration and exploitation to screen the optimal adversarial samples. (3) To enhance the robustness of Android malware detectors, an enhanced classifier is obtained through a retraining method. In this study, a set of third-party detectors are regarded as black-box reference detectors, and a reference detector is trained to closely approximate their performance. Subsequently, an enhanced malware classifier is retrained by augmenting the dataset with the generated adversarial malware samples to improve the performance of detection against new malware variants. Extensive experimental evaluation shows that, the optimization solution strategy can generate adversarial samples with a faster convergence and lower detection rate than existing leading approaches; 95% of the generated adversarial samples can successfully evade detection. The enhanced classifier obtained through adversarial training achieves a strong robustness in detecting against newly emerging and evolving malware variants, and exhibits a superior performance over the state-of-the-art malware detectors. |
参考文献: |
[2] 范铭,刘烃,刘均,等.安卓恶意软件检测方法综述[J]. 中国科学:信息科学,2020,50(08): 1148-1177. [39] 陈颖,林雨衡,王志强,等.基于Transformer的安卓恶意软件多分类模型[J]. 信息安全研究,2023,9(12):1138-1144. [40] 印杰,黄肖宇,刘家银,等.基于预训练语言模型的安卓恶意软件检测方法[J]. 计算机工程与科学,2023,45(08):1433-1442. [51] Arzt S. Static data flow analysis for android applications[J]. 2017. [58] 陈非,曹晓梅,王少辉.基于特征图像生成的Android恶意软件检测方法[J]. 计算机技术与发展,2023,33(06):125-132. [66] 唐川,张义,杨岳湘,等.DroidGAN:基于DCGAN的Android对抗样本生成框架[J]. 通信学报,2018,39(S1):64-69. [73] 张嘉楠.基于GAN的恶意软件对抗样本研究[D]. 北京交通大学,2021. DOI:10.26944/d.cnki.gbfju.2021.002603. |
中图分类号: | TP309.2 |
开放日期: | 2025-06-17 |