论文中文题名: | 电子贸易中异常交易检测方法研究 |
姓名: | |
学号: | 20208223055 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 数据挖掘 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-19 |
论文答辩日期: | 2023-06-05 |
论文外文题名: | Research on abnormal transaction detection methods in electronic trade |
论文中文关键词: | |
论文外文关键词: | Abnormal Transaction Detection ; Mixed Dimensionality Reduction ; Parameter adaptation ; Generate Adversarial Network ; Minkowski Distance |
论文中文摘要: |
电子贸易的快速发展为人们的生活带来了极大的便利,但其中存在的欺诈、虚假交易等异常行为,严重威胁了消费者的利益和电子贸易的健康发展,因此异常交易检测在电子贸易领域中具有重要的意义。而电子贸易数据通常是高维离散的无标注数据,同时存在极度类不平衡性;且传统的无监督异常交易检测方法使用单一算法进行训练,导致最终的检测结果准确率较低,且模型泛化性不足。本文提出一种新的电子贸易中异常交易检测方法,该方法主要分为两个阶段。第一阶段改进了DBSCAN聚类算法,用于进行数据预分类,并筛选出电子贸易中的多数类数据;第二阶段提出一种新的基于生成对抗网络的单类异常交易检测算法,对多数类数据进行单类稳定训练。本文电子贸易中异常交易检测方法具体研究内容如下: (1)针对原始DBSCAN聚类算法在电子贸易数据预分类中存在的维数灾难和参数确定困难问题,本文提出了一种自适应确定参数的DBSCAN聚类算法。该算法首先采用混合数据降维方法对原始数据进行特征降维,以解决维数灾难问题;随后使用Sigmoid函数自衰减确定DBSCAN参数并进行聚类。本文所提Sigmoid函数自衰减确定参数具体步骤包括:a.随机选择m条数据构建KD树,并搜索计算所选m条数据的k平均最近邻,得到距离矩阵;b.距离矩阵按行求平均值并排序,获取预选Eps参数列表;c.使用Sigmoid函数对预选参数列表进行自衰减,获得Sigmoid参数列表;d.采用数学期望法确定MinPts参数列表。实验结果表明,所提算法在二维人工数据集和UCI数据集聚类上具有一定的优越性。此外,为验证算法预分类效果,使用自建数据集进行了多数类筛检对比实验,结果表明本节所提算法多数类提取准确率更高。 (2)针对电子贸易中存在的数据极度类不平衡问题对异常交易检测模型准确率的影响,本文提出了一种基于LSTM和aMLP的生成对抗网络(aLMGAN),使用预分类后的多数类数据进行单类稳定训练。在训练中,生成器模型不断生成形似于真实数据的生成数据,以欺骗训练判别器模型,最终使判别器模型形成围绕多数类数据的特征模式,以此实现对异常交易的检测。在此基础上,为解决生成对抗网络训练不稳定以及模式崩溃问题,本文设计了基于闵可夫斯基距离的生成对抗网络损失函数。实验首先使用两种kaggle信用卡交易数据集对aLMGAN进行验证,结果显示了其在信用卡欺诈检测中的良好性能。其次,为了验证本文所提出的无监督异常交易检测方法的有效性,采用自建数据集将本文所提方法与现有方法进行对比实验,结果表明本文方法检测准确率更高,且误报率较低。 (3)设计并实现异常商品交易检测系统。该系统可以对商品交易数据集和训练权重文件进行上传,通过后台调用论文所提算法实现异常商品交易检测,并将算法检测结果进行直观可视化展示。 |
论文外文摘要: |
The rapid development of electronic trade has brought great convenience to people's daily life, but there are abnormal behaviors such as fraud and false transactions that seriously threaten the interests of consumers and the healthy development of electronic trade. Therefore, abnormal transaction detection is of great significance in the field of electronic trade. Electronic trade data is usually high-dimensional, discrete, and unlabeled, with extreme class imbalance; Moreover, traditional unsupervised abnormal transaction detection methods use a single algorithm for training, resulting in low accuracy of the final detection results and insufficient model generalization. This article proposes a new method for detecting abnormal transactions in electronic trade, which is mainly divided into two stages. In the first stage, the DBSCAN clustering algorithm was improved to perform data pre classification and screen out most types of data in electronic trade; In the second stage, a new single class abnormal transaction detection algorithm based on generative adversarial networks is proposed, which performs single class stable training on majority class data. The specific research content of abnormal transaction detection methods in electronic trade in this article is as follows: (1)Aiming at the curse of dimensionality and parameter determination difficulties of original DBSCAN clustering algorithm in electronic trade data pre classification, this article proposes a DBSCAN clustering algorithm that adaptively determines parameters. First, the algorithm uses mixed data dimension reduction method to reduce the original data feature dimension to solve the curse of dimensionality problem; Subsequently, the DBSCAN parameters were determined using the Sigmoid function self decay and clustered. The specific steps to determine the parameters of the sigmoid function self attenuation in this paper include: a. randomly select m pieces of data to build a KD tree, and search and calculate the k-average nearest neighbor of the selected m pieces of data to obtain a distance matrix; b. Average and sort the distance matrix by row to obtain the preselected Eps parameter list; c. Use the sigmoid function to self attenuate the pre-selected parameter list and obtain the Eps parameter list; d. Determine the MinPts parameter list using the mathematical expectation method. The experimental results show that the algorithm proposed in this section has certain advantages in clustering artificial two-dimensional datasets and UCI datasets. In addition, to verify the algorithm's pre classification effect, a majority class screening comparison experiment was conducted using a self built dataset, and the results showed that the algorithm proposed in this section has a higher accuracy in majority class extraction. (2)Aiming at the influence of extreme data imbalance on the accuracy of abnormal transaction detection model in electronic trade, this paper proposes a generative adversarial network (aLMGAN) based on LSTM and aMLP, which uses pre-classified majority data for single-class stability training. During training, the generator model continuously generates data resembling real data to deceive and train discriminator model, ultimately forming feature patterns around the majority class data, in order to detect abnormal transactions. On this basis, in order to solve the problem of unstable training and mode collapse of the generation adversarial network, this paper designs a loss function of the generation adversarial network based on Minkowski distance. The experiment first uses two kinds of kaggle credit card transaction data sets to verify aLMGAN, and the results show its good performance in credit card fraud detection. Secondly, in order to verify the effectiveness of the unsupervised abnormal transaction detection method proposed in this article, a self built dataset was used to compare the proposed method with existing methods. The results showed that the detection accuracy of the proposed method was higher and the false alarm rate was lower. (3)Design and implement an abnormal commodity transaction detection system. This system can upload commodity transaction datasets and training weight files, and detect abnormal commodity transactions by calling the algorithm proposed in the paper in the background, and visually display the algorithm detection results. |
参考文献: |
[30]周鹏, 程艳云. 一种改进的 LOF 异常点检测算法[J]. 计算机技术与发展, 2017, 27(12): 115–118. [35]李少波, 孟伟, 璩晶磊. 基于密度的异常数据检测算法GSWCLOF[J]. 计算机工程与应用, 2016, 52(19): 7–11. [38]关大伟. 数据挖掘中的数据预处理[D]. 吉林: 吉林大学, 2006. [46]Han J, Pei J, Tong H. Data mining: concepts and techniques[M]. Morgan Kaufmann, 2022. [48]Huang X, Wei S. An improved K-means clustering algorithm[C]. World Automation Congress. 2016. [52]冯少荣, 肖文俊. DBSCAN聚类算法的研究与改进[J]. 中国矿业大学学报, 2008, 37(1):1–7. [59]李文杰, 闫世强, 蒋莹,等. 自适应确定DBSCAN算法参数的算法研究[J]. 计算机工程与应用, 2019, 55(5): 1–7. [60]王兆丰, 单甘霖. 一种基于k-均值的 DBSCAN 算法参数动态选择方法[J]. 计算机工程与应用, 2017, 53(3): 80–86. |
中图分类号: | TP391 |
开放日期: | 2023-06-19 |