- 无标题文档
查看论文信息

论文中文题名:

 电子贸易中异常交易检测方法研究    

姓名:

 唐成    

学号:

 20208223055    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 计算机科学与技术学院    

专业:

 计算机技术    

研究方向:

 数据挖掘    

第一导师姓名:

 李占利    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-19    

论文答辩日期:

 2023-06-05    

论文外文题名:

 Research on abnormal transaction detection methods in electronic trade    

论文中文关键词:

 异常交易检测 ; 混合降维 ; 参数自适应 ; 生成对抗网络 ; 闵可夫斯基距离    

论文外文关键词:

 Abnormal Transaction Detection ; Mixed Dimensionality Reduction ; Parameter adaptation ; Generate Adversarial Network ; Minkowski Distance    

论文中文摘要:

电子贸易的快速发展为人们的生活带来了极大的便利,但其中存在的欺诈、虚假交易等异常行为,严重威胁了消费者的利益和电子贸易的健康发展,因此异常交易检测在电子贸易领域中具有重要的意义。而电子贸易数据通常是高维离散的无标注数据,同时存在极度类不平衡性;且传统的无监督异常交易检测方法使用单一算法进行训练,导致最终的检测结果准确率较低,且模型泛化性不足。本文提出一种新的电子贸易中异常交易检测方法,该方法主要分为两个阶段。第一阶段改进了DBSCAN聚类算法,用于进行数据预分类,并筛选出电子贸易中的多数类数据;第二阶段提出一种新的基于生成对抗网络的单类异常交易检测算法,对多数类数据进行单类稳定训练。本文电子贸易中异常交易检测方法具体研究内容如下:

(1)针对原始DBSCAN聚类算法在电子贸易数据预分类中存在的维数灾难和参数确定困难问题,本文提出了一种自适应确定参数的DBSCAN聚类算法。该算法首先采用混合数据降维方法对原始数据进行特征降维,以解决维数灾难问题;随后使用Sigmoid函数自衰减确定DBSCAN参数并进行聚类。本文所提Sigmoid函数自衰减确定参数具体步骤包括:a.随机选择m条数据构建KD树,并搜索计算所选m条数据的k平均最近邻,得到距离矩阵;b.距离矩阵按行求平均值并排序,获取预选Eps参数列表;c.使用Sigmoid函数对预选参数列表进行自衰减,获得Sigmoid参数列表;d.采用数学期望法确定MinPts参数列表。实验结果表明,所提算法在二维人工数据集和UCI数据集聚类上具有一定的优越性。此外,为验证算法预分类效果,使用自建数据集进行了多数类筛检对比实验,结果表明本节所提算法多数类提取准确率更高。

(2)针对电子贸易中存在的数据极度类不平衡问题对异常交易检测模型准确率的影响,本文提出了一种基于LSTM和aMLP的生成对抗网络(aLMGAN),使用预分类后的多数类数据进行单类稳定训练。在训练中,生成器模型不断生成形似于真实数据的生成数据,以欺骗训练判别器模型,最终使判别器模型形成围绕多数类数据的特征模式,以此实现对异常交易的检测。在此基础上,为解决生成对抗网络训练不稳定以及模式崩溃问题,本文设计了基于闵可夫斯基距离的生成对抗网络损失函数。实验首先使用两种kaggle信用卡交易数据集对aLMGAN进行验证,结果显示了其在信用卡欺诈检测中的良好性能。其次,为了验证本文所提出的无监督异常交易检测方法的有效性,采用自建数据集将本文所提方法与现有方法进行对比实验,结果表明本文方法检测准确率更高,且误报率较低。

(3)设计并实现异常商品交易检测系统。该系统可以对商品交易数据集和训练权重文件进行上传,通过后台调用论文所提算法实现异常商品交易检测,并将算法检测结果进行直观可视化展示。

论文外文摘要:

The rapid development of electronic trade has brought great convenience to people's daily life, but there are abnormal behaviors such as fraud and false transactions that seriously threaten the interests of consumers and the healthy development of electronic trade. Therefore, abnormal transaction detection is of great significance in the field of electronic trade. Electronic trade data is usually high-dimensional, discrete, and unlabeled, with extreme class imbalance; Moreover, traditional unsupervised abnormal transaction detection methods use a single algorithm for training, resulting in low accuracy of the final detection results and insufficient model generalization. This article proposes a new method for detecting abnormal transactions in electronic trade, which is mainly divided into two stages. In the first stage, the DBSCAN clustering algorithm was improved to perform data pre classification and screen out most types of data in electronic trade; In the second stage, a new single class abnormal transaction detection algorithm based on generative adversarial networks is proposed, which performs single class stable training on majority class data. The specific research content of abnormal transaction detection methods in electronic trade in this article is as follows:

(1)Aiming at the curse of dimensionality and parameter determination difficulties of original DBSCAN clustering algorithm in electronic trade data pre classification, this article proposes a DBSCAN clustering algorithm that adaptively determines parameters. First, the algorithm uses mixed data dimension reduction method to reduce the original data feature dimension to solve the curse of dimensionality problem; Subsequently, the DBSCAN parameters were determined using the Sigmoid function self decay and clustered. The specific steps to determine the parameters of the sigmoid function self attenuation in this paper include: a. randomly select m pieces of data to build a KD tree, and search and calculate the k-average nearest neighbor of the selected m pieces of data to obtain a distance matrix; b. Average and sort the distance matrix by row to obtain the preselected Eps parameter list; c. Use the sigmoid function to self attenuate the pre-selected parameter list and obtain the Eps parameter list; d. Determine the MinPts parameter list using the mathematical expectation method. The experimental results show that the algorithm proposed in this section has certain advantages in clustering artificial two-dimensional datasets and UCI datasets. In addition, to verify the algorithm's pre classification effect, a majority class screening comparison experiment was conducted using a self built dataset, and the results showed that the algorithm proposed in this section has a higher accuracy in majority class extraction.

(2)Aiming at the influence of extreme data imbalance on the accuracy of abnormal transaction detection model in electronic trade, this paper proposes a generative adversarial network (aLMGAN) based on LSTM and aMLP, which uses pre-classified majority data for single-class stability training. During training, the generator model continuously generates data resembling real data to deceive and train discriminator model, ultimately forming feature patterns around the majority class data, in order to detect abnormal transactions. On this basis, in order to solve the problem of unstable training and mode collapse of the generation adversarial network, this paper designs a loss function of the generation adversarial network based on Minkowski distance. The experiment first uses two kinds of kaggle credit card transaction data sets to verify aLMGAN, and the results show its good performance in credit card fraud detection. Secondly, in order to verify the effectiveness of the unsupervised abnormal transaction detection method proposed in this article, a self built dataset was used to compare the proposed method with existing methods. The results showed that the detection accuracy of the proposed method was higher and the false alarm rate was lower.

(3)Design and implement an abnormal commodity transaction detection system. This system can upload commodity transaction datasets and training weight files, and detect abnormal commodity transactions by calling the algorithm proposed in the paper in the background, and visually display the algorithm detection results.

参考文献:

[1]Chauhan N, Tekta P. Fraud detection and verification system for online transactions: a brief overview[J]. International Journal of Electronic Banking, 2020, 2(4): 267–274.

[2]Policarpo L M, Silveira D E, Rosa Righi R, et al. Machine learning through the lens of e-commerce initiatives: An up-to-date systematic literature review[J]. Computer Science Review, 2021, 41(14): 100414.

[3]Najem S M, Kadeem S M. A survey on fraud detection techniques in e-commerce[J]. Tech-Knowledge, 2021, 1(1): 33–47.

[4]Jha B K, Sivasankari G G, Venugopal K R. Fraud detection and prevention by using big data analytics[C]//2020 Fourth international conference on computing methodologies and communication (ICCMC). IEEE, 2020: 267–274.

[5]Zhao Y, Yu Y, Li Y, et al. Machine learning based privacy-preserving fair data trading in big data market[J]. Information Sciences, 2019, 478(25): 449–460.

[6]Denning D E. An Intrusion-Detection Model[J]. IEEE Transactions on Software Engineering, 1987, 13(2): 222¬–232.

[7]Azeez N A, Victor O E, Misra S, et al. Extracted rule-based technique for anomaly detection in a global network[J]. International Journal of Electronic Security and Digital Forensics, 2022, 14(6): 616–637.

[8]Noiboar A, Cohen I. Anomaly detection based on wavelet domain GARCH random field modeling[J]. IEEE Transactions on Geoscience and Remote Sensing, 2007, 45(5): 1361–1373.

[9]Weeraddana N R, Silva A T P, Jayathilake P. Detection of black regions in the forex market by analyzing high-frequency intraday data[C]//2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 2018: 384–391.

[10]Xue Q, Li G, Zhang Y, et al. Fault diagnosis and abnormality detection of lithium-ion battery packs based on statistical distribution[J]. Journal of Power Sources, 2021, 482(15): 0378–7753.

[11]Raza S, Haider S. Suspicious activity reporting using dynamic bayesian networks[J]. Procedia Computer Science, 2011, 3: 987–991.

[12]Pu G, Wang L, Shen J, et al. A hybrid unsupervised clustering-based anomaly detection method[J]. Tsinghua Science and Technology, 2020, 26(2): 146–153.

[13]Yasami Y, Mozaffari S P. A novel unsupervised classification approach for network anomaly detection by k-Means clustering and ID3 decision tree learning methods[J]. The Journal of Supercomputing, 2010, 53(1): 231–245.

[14]Tang B, Kay S, He H, et al. EEF: Exponentially embedded families with class-specific features for classification[J]. IEEE Signal Processing Letters, 2016, 23(7): 969–973.

[15]Wang Y, Wong J, Miner A. Anomaly intrusion detection using one class SVM[C]//Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004. IEEE, 2004: 358–364.

[16]Wei H, Xiao Y, Li R, et al. Crowd abnormal detection using two-stream fully convolutional neural networks[C]//2018 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA). IEEE, 2018: 332–336.

[17]Dhiman H S, Deb D, Muyeen S M, et al. Wind turbine gearbox anomaly detection based on adaptive threshold and twin support vector machines[J]. IEEE Transactions on Energy Conversion, 2021, 36(4): 3462–3469.

[18]Zhang Q. Financial data anomaly detection method based on decision tree and random forest algorithm[J]. Journal of Mathematics, 2022, 2022: 1–10.

[19]Hojjati H, Armanfard N. Dasvdd: Deep autoencoding support vector data descriptor for anomaly detection[J/OL]. arXiv preprint arXiv:2106.05410, 2021.

[20]Chalapathy R, Menon A K, Chawla S. Anomaly detection using one-class neural networks[J/OL]. arXiv preprint arXiv:1802.06360, 2018.

[21]Aytekin C, Ni X, Cricri F, et al. Clustering and unsupervised anomaly detection with l2 normalized deep auto-encoder representations[C]//2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018: 1–6.

[22]Amarbayasgalan T, Jargalsaikhan B, Ryu K H. Unsupervised novelty detection using deep autoencoders with density based clustering[J]. Applied Sciences, 2018, 8(9): 1468.

[23]Peng P, Zhang W, Zhang Y, et al. Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis[J]. Applied Soft Computing, 2022, 114: 108138.

[24]Wu X, Jiang G, Wang X, et al. A multi-level-denoising autoencoder approach for wind turbine fault detection[J]. Ieee Access, 2019, 7: 59376–59387.

[25]Radmanesh M, Rezaei A A, Jalili M, et al. Online spike sorting via deep contractive autoencoder[J]. Neural Networks, 2022, 155: 39–49.

[26]Schlegl T, Seeböck P, Waldstein S M, et al. Unsupervised anomaly detection withgenerative adversarial networks to guide marker discovery[C]//Information Processing in Medical Imaging: 25th International Conference, IPMI 2017, Boone, NC, USA, June 25-30, 2017, Proceedings. Cham: Springer International Publishing, 2017:146–157.

[27]Zenati H, Foo C S, Lecouat B, et al. Efficient gan-based anomaly detection[J/OL]. arXiv preprint arXiv:1802.06222, 2018.

[28]Dumoulin V, Belghazi I, Poole B, et al. Adversarially learned inference[J/OL]. arXiv preprint arXiv:1606.00704, 2016.

[29]Akcay S, Atapour-Abarghouei A, Breckon T P. Ganomaly: Semi-supervised anomaly detection via adversarial training[C]//Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14. Springer International Publishing, 2019: 622–637.

[30]周鹏, 程艳云. 一种改进的 LOF 异常点检测算法[J]. 计算机技术与发展, 2017, 27(12): 115–118.

[31]Liu C, Sun L, Ao X, et al. Intention-aware heterogeneous graph attention networks for fraud transactions detection[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 3280–3288.

[32]Zhang X, Han Y, Xu W, et al. HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture[J]. Information Sciences, 2021, 557(1): 302–316.

[33]Rtayli N, Enneya N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization[J]. Journal of Information Security and Applications, 2020, 55(8): 1–15.

[34]Yu L, Zhang N, Wen W. Abnormal Transaction Detection based on Graph Networks[C]//2021 IEEE 45th Annual Computers, Software, and Applications Conference(COMPSAC). IEEE, 2021: 312–317.

[35]李少波, 孟伟, 璩晶磊. 基于密度的异常数据检测算法GSWCLOF[J]. 计算机工程与应用, 2016, 52(19): 7–11.

[36]Guo F, Zou F, Luo S, et al. The fast detection of abnormal ETC data based on an improved DTW algorithm[J]. Electronics, 2022, 11(13): 1981–1996.

[37]Liu Z, Qin T, Guan X, et al. An integrated method for anomaly detection from massive system logs[J]. IEEE Access, 2018, 6(6): 30602–30611.

[38]关大伟. 数据挖掘中的数据预处理[D]. 吉林: 吉林大学, 2006.

[39]Jackson E, Agrawal R. Performance Evaluation of Different Feature Encoding Schemes on Cybersecurity Logs[C]// SoutheastCon 2019. 2019.

[40]Pargent F, Pfisterer F, Thomas J, et al. Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features[J]. Computational Statistics, 2022, 37(5): 2671–2692.

[41]Kraevoy V, Sheffer A. Mean-value geometry encoding[J]. International Journal of Shape Modeling, 2006, 12(1): 29–46.

[42]Tsikriktsis N. A review of techniques for treating missing data in OM survey research[J]. Journal of Operations Management, 2005, 24(1): 53–62.

[43]Peng J, Hahn J, Huang K W. Handling missing values in information systems: A review of methods and assumptions[J]. Information Systems Research, 2023, 34(1): 5–26.

[44]Purwar A, Singh S K. Hybrid prediction model with missing value imputation for medical data[J]. Expert Systems with Applications, 2015, 42(13): 5621–5631.

[45]Laqueur H S, Shev A B, Kagawa R M C. SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations[J]. American Journal of Epidemiology, 2022, 191(3): 516–525.

[46]Han J, Pei J, Tong H. Data mining: concepts and techniques[M]. Morgan Kaufmann, 2022.

[47]Bouguettaya A, Yu Q, Liu X, et al. Efficient agglomerative hierarchical clustering[J]. Expert Systems with Applications, 2015, 42(5): 2785–2797.

[48]Huang X, Wei S. An improved K-means clustering algorithm[C]. World Automation Congress. 2016.

[49]Rani P. A Survey on STING and CLIQUE Grid Based Clustering Methods[J]. International Journal of Advanced Research in Computer Science, 2017, 8(5): 1–10.

[50]Zong B, Song Q, Min M R, et al. Deep autoencoding gaussian mixture model forunsupervised anomaly detection[C]//International Conference on Learning Represent-ations. 2018.

[51]Wang J, Su X. An improved K-Means clustering algorithm[C]//2011 IEEE 3rd international conference on communication software and networks. IEEE, 2011: 44–46.

[52]冯少荣, 肖文俊. DBSCAN聚类算法的研究与改进[J]. 中国矿业大学学报, 2008, 37(1):1–7.

[53]Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139–144.

[54]Mirza M, Osindero S. Conditional generative adversarial nets[J/OL]. arXiv preprint arXiv:1411.1784, 2014.

[55]Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning[J/OL]. arXiv preprint arXiv:1605.09782, 2016.

[56]Benchaji I, Douzi S, El Ouahidi B, et al. Enhanced credit card fraud detection based on attention mechanism and LSTM deep model[J]. Journal of Big Data, 2021,8(1): 1–21.

[57]Wang H C, Wen F W, Song F. Clustering Algorithm Based on Improved Particle Swarm Optimization[J]. Advanced Materials Research, 2014, 765(2): 486–488.

[58]Li H, Xu W, Qiu C, et al. Fast Markov clustering algorithm based on belief dynamics[J]. IEEE Transactions on Cybernetics, 2022, 53(6): 3716-3725.

[59]李文杰, 闫世强, 蒋莹,等. 自适应确定DBSCAN算法参数的算法研究[J]. 计算机工程与应用, 2019, 55(5): 1–7.

[60]王兆丰, 单甘霖. 一种基于k-均值的 DBSCAN 算法参数动态选择方法[J]. 计算机工程与应用, 2017, 53(3): 80–86.

[61]Zhang X, Wang J, &Cao J. A novel DBSCAN clustering algorithm based on density and distance with adaptive parameter setting[J]. Journal of AmbientIntelligenceand Humanized Computing, 2020, 11(9), 4007–4019

[62]Cui H, Niu S, Li K, et al. A k-means++ based user classification method for social e-commerce[J]. Intelligent Automation & Soft Computing, 2021, 28(1): 277–291.

[63]Wang A, Zhang J. Topic discovery method based on topic model combined with hierarchical clustering[C]//2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). IEEE, 2020: 814–818.

[64]Janani R, Vijayarani S. Text document clustering using spectral clustering algorithm with particle swarm optimization[J]. Expert Systems with Applications, 2019, 134(NOV): 192–200.

[65]Dhanaraj R K. Enhance QoS with fog computing based on sigmoid NN clustering and entropy-based scheduling[J]. Multimedia Tools and Applications, 2023: 1–22.

[66]Patil S, Nemade V, Soni P K. Predictive modelling for credit card fraud detection using data analytics[J]. Procedia computer science, 2018, 132(9): 385–395.

[67]Makki S, Assaghir Z, Taher Y, et al. An experimental study with imbalanced classification approaches for credit card fraud detection[J]. IEEE Access, 2019, 7(7): 93010–93022.

[68]Zeager M F, Sridhar A, Fogal N, et al. Adversarial learning in credit card fraud detection[C]//2017 Systems and Information Engineering Design Symposium (SIEDS). IEEE, 2017: 112–116.

[69]Cheng D, Xiang S, Shang C, et al. Spatio-temporal attention-based neural network for credit card fraud detection[C]//Proceedings of The AAAI Conference on Artificial Intelligence. 2020: 362–369.

[70]Taha A A, Malebary S J. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine[J]. IEEE Access, 2020, 8(8): 25579–25587.

[71]Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks[C]//International Conference on Machine Learning. PMLR, 2019: 7354–7363.

[72]Tang J, Deng C, Huang G B. Extreme learning machine for multilayer perceptron[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 27(4): 809–821.

[73]Wu T Y, Wang Y T. Locally interpretable one-class anomaly detection for credit card fraud detection[C]//2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE, 2021: 25–30.

[74]Goyal A, Khiari J. Diversity-Aware Weighted Majority Vote Classifier for Imbalanced Data[C]//2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020: 1–8.

[75]Maulidevi N U, Surendro K. SMOTE-LOF for noise identification in imbalanced data classification[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(6): 3413–3423.

[76]Alam S, Sonbhadra S K, Agarwal S, et al. One-class support vector classifiers: A survey[J]. Knowledge-Based Systems, 2020, 196(34): 1–17.

[77]Zhao X, Wu Y, Lee D L, et al. iforest: Interpreting random forests via visual analytics[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 25(1): 407–416.

[78]Thill M, Konen W, Wang H, et al. Temporal convolutional autoencoder for unsupervised anomaly detection in time series[J]. Applied Soft Computing, 2021, 112(21): 1–22.

中图分类号:

 TP391    

开放日期:

 2023-06-19    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式