查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于混合策略的个人信用风险预测算法研究
姓名：	陈巩
学号：	20208049007
保密级别：	公开
论文语种：	chi
学科代码：	0812
学科名称：	工学 - 计算机科学与技术（可授工学、理学学位）
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机科学与技术
研究方向：	智能信息处理
第一导师姓名：	李占利
第一导师单位：	西安科技大学
论文提交日期：	2023-06-19
论文答辩日期：	2023-06-06
论文外文题名：	Research on Individual Credit Risk Prediction Algorithm Based on Hybrid Strategy
论文中文关键词：	信用风险预测 ; 深度学习 ; 特征融合 ; 集成学习
论文外文关键词：	Credit risk forecasting ; Deep learning ; Feature fusion ; Ensemble learning
论文中文摘要：	︿伴随经济的高速发展和金融环境的复杂变化，银行信贷业务中的信用风险度量面临着难以量化和不易衡量的困难。信用风险预测对降低不良贷款率，降低信用违约风险有重要的意义。本文对当前信用风险预测进行分析，构建了深度特征提取器WENET，并设计了三层式的信用风险预测算法，具体工作如下： 1.由于金融信用数据特征较多，而不同的特征之间可能存在复杂的内在联系，传统的机器学习方法与集成学习方法的有效性依赖于特征的选择，而忽略数据的内在联系。针对该问题，使用一维卷积神经网络，构建WENET深度特征提取器，提取对预测有效的抽象特征。首先，通过在不同的通道上使用不同感受野的卷积核进行了多尺度特征提取，捕获不同特征之间的内在联系；然后，采用基于注意力融合策略，使局部特征和整体特征互补，以此得到更为关键的特征。在公有信用违约数据集上，把通过WENET提取的高维度特征输入到机器学习分类器中得到最后预测结果，结合WENET的LR、DT、SVM和RF模型，AUC值比原始分类器分别提高了0.175、0.013、0.213和0.023。 2.单个模型往往只能从模型本身偏好进行评估预测，在预测的准确度和稳定性及泛化能力方面都有一定的不足。为了提高模型的泛化能力和稳定性，基于深度学习与集成学习方法，采用混合策略建立三层式预测算法模型。首先，使用WENET深度特征提取器对原始特征进行提取，构建算法模型的第一层；然后，采用四种不同的基学习器并行的方式构建第二层，对WENET提取后的特征进行学习，得到新的数据集；最后，使用逻辑回归构建算法模型第三层，对更新后的数据进行学习训练，增强了泛化能力和稳定性。通过对比信用风险预测最新方法mfXGBoost，算法的AUC值和KS值分别提高了0.028与0.054。本文算法的稳定性和泛化能力在公共数据集上均有良好的表现。 3.结合上述的研究内容，开发设计了个人信用风险预测系统。此系统可以根据客户所提供的特征信息，预测该客户是否存在信用风险，可以在信贷发放之前为银行提供重要的参考价值。﹀
论文外文摘要：	︿ With the rapid economic development and complex changes in the financial environment, credit risk measurement in the banking industry is facing difficulties that are difficult to quantify and not easy to measure. Credit risk prediction is of great significance to reduce the rate of non-performing loans and reduce the risk of credit default. In this thesis, we analyze the current credit risk prediction, construct a deep feature extractor called WENET, and design a three-level credit risk prediction algorithm that works as follows: 1. Due to the large number of financial credit data features and the possible complex intrinsic connections between different features, the effectiveness of traditional machine learning methods and integrated learning methods depends on the selection of features and ignores the intrinsic connections of data, while the feature selection can also cause data loss. To address this problem, a one-dimensional convolutional neural network is used as the underlying framework to construct the WENET deep feature extractor to extract the key features that are effective for prediction. First, multi-scale feature extraction was performed by using convolutional kernels with different perceptual fields on different channels to capture the intrinsic connections between different features; then, an attention-based fusion strategy was used to make the local features and the overall features complementary so as to obtain more critical features. On the public credit default dataset, the high-dimensional features extracted by WENET are fed into the machine learning classifier to obtain the final prediction results, and the AUC values were improved by 0.175, 0.013, 0.213, and 0.023, respectively, over the original classifier by combining WENET's LR, DT, SVM, and RF models. 2. A single model can often only evaluate predictions based on the model's own preferences, which has certain shortcomings in terms of prediction accuracy, stability, and generalization ability. In order to improve the generalization ability and stability of the model, a three-layer prediction model is built using a hybrid strategy based on deep learning and integrated learning methods. First, the original features are extracted using the WENET deep feature extractor to build the first layer of the model; then, the second layer is built using four different base learners in parallel to learn the features extracted by WENET to obtain a new data set; finally, the third layer of the model is built using logistic regression to learn and train the updated data to reduce the impact of data perturbation and enhance the The generalization ability and stability are enhanced. By comparing the latest methods of credit risk prediction, mfXGBoost, the AUC value and KS value of the algorithms are improved by 0.028 and 0.054, respectively. the stability and generalization ability of the algorithms in this paper are good on public datasets. 3. In addition to the above research, a personal credit risk prediction system has been developed and designed. This system can predict whether the customer has credit risk based on the information provided by the customer's characteristics, which can provide important reference values for banks before credit is granted. ﹀
参考文献：	︿ [1] 边清雅. 基于集成学习的信用贷款违约预测[D]. 大连理工大学, 2021. [2] Umar M, Farid S, Naeem M A. Time-frequency connectedness among clean-energy stocks and fossil fuel markets: Comparison between financial, oil and pandemic crisis[J]. Energy, 2022, 240: 1-12. [3] 杨立生, 杨杰. 国际大宗商品价格波动对中国金融市场的风险溢出效应——波动溢出网络视角[J]. 金融监管研究, 2022, 128(08): 58-77. [4] 王钊. P2P借贷信用风险动态评价方法研究[D]. 合肥工业大学, 2019. [5] Kruppa J, Schwarz A, Arminger G, et al. Consumer credit risk: Individual probability estimates using machine learning[J]. Expert Systems with Applications, 2013, 40(13): 5125-5131. [6] Walek B, Fojtik V. A hybrid recommender system for recommending relevant movies using an expert system[J]. Expert Systems with Applications, 2020, 158: 1-18. [7] Koopman R., William P., Zhi W., et al. Give Credit where Credit is Due: Tracing Value Added in Global Production Chains[J]. National Bureau of Economic Research, 2010, 3(1): 254-265. [8] 方匡南, 章贵军, 张惠颖. 基于Lasso-logistic模型的个人信用风险预警方法[J]. 数量经济技术经济研究, 2014, 31(2): 125-136. [9] 王小燕, 方匡南, 谢邦昌. Logistic回归的双层变量选择研究[J]. 统计研究, 2014, 31(9): 107-112. [10] 方匡南, 赵梦峦. 基于多源数据融合的个人信用评分研究[J]. 统计研究, 2018, 35(12): 94-103. [11] 方匡南, 陈子岚. 基于半监督广义可加Logistic回归的信用评分方法[J]. 系统工程理论与实践. 2020, 40(2): 392-402. [12] LUO J, YAN X, TIAN Y. Unsupervised Quadratic Surface Support Vector Machine with Application to Credit Risk Assessment[J]. European Journal of Operational Research, 2020, 280(3): 1008-1017. [13] 冯昊, 李树青. 基于多种支持向量机的多层级联式分类器研究及其在信用评分中的应用[J]. 数据分析与知识发现, 2021, 5(10): 28-36. [14] Kao L J, Chiu C C, Chiu F Y. A Bayesian latent variable model with classification and regression tree approach for behavior and credit scoring[J]. Knowledge-Based Systems, 2012, 36: 245-252. [15] Hsieh N.C., Hung L. P. A data driven ensemble classifier for credit scoring analysis[J]. Expert systems with applications, 2010, 37(1): 534-545. [16] XIAO H S, XIAO Z, WANG Y. Ensemble classification based on supervised clustering for credit scoring[J]. Applied Soft Computing, 2016, 43: 73-86. [17] Chen T, Guestrin C. Xgboost: A scalable tree boosting system[C]// Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794. [18] Xia Y, Liu C, Li Y Y, et al. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring[J]. Expert systems with applications, 2017, 78: 225-241. [19] Liaw A, Wiener M. Classification and Regression by Random Forest[J]. R news, 2002, 2(3): 18-22. [20] CHANG Y C, CHANG K H, WU G J. Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions[J]. Applied Soft Computing, 2018, 73: 914-920. [21] Qi M. LightGBM: A Highly Efficient Gradient Boosting Decision Tree[C]// Neural Information Processing Systems. Curran Associates Inc. 2017: 3146-3154. [22] Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features[J]. Advances in neural information processing systems, 2018, 31: 6638-6648. [23] Zhang Y, Chen L. A study on forecasting the default risk of bond based on XGBoost algorithm and over-sampling method[J]. Theoretical Economics Letters, 2021, 11(2): 258-267. [24] Gerhardt L. Pattern recognition and machine learning[J]. IEEE Transactions on Automatic Control, 2003, 19(4): 461-462. [25] Lahmiri S. A Comparative Study Of Backpropagation Algorithms In Financial Prediction[J]. International Journal of Computer Science Engineering and Applications, 2011, 1(4): 15-21. [26] CHUANG C L, HUANG S T. A hybrid neural network approach for credit scoring[J]. Expert Systems, 2011, 28(2): 185-196. [27] Fonseca D P, Wanke P F, Correa H L. A two-stage fuzzy neural approach for credit risk assessment in a Brazilian credit card company[J]. Applied Soft Computing, 2020, 92: 1-13. [28] 牛晓健, 凌飞. 基于组合学习的个人信用风险评估模型研究[J]. 复旦学报: 自然科学版, 2021, 60(6): 703-719. [29] Kamran S, Zahid A, Haeng-Gon L, et al. Toward Bulk Synchronous Parallel-Based Machine Learning Techniques for Anomaly Detection in High-Speed Big Data Networks[J]. Symmetry, 2017, 9(9): 1-12. [30] Elaidi H, Benabbou Z, Abbar H. Using Game Theory to Handle Missing Data at Prediction Time of ID3 and C4. 5 Algorithms[J]. International Journal of Advanced Computer Science and Applications, 2018, 9(12): 218-224. [31] Ayyagari M R. Classification of imbalanced datasets using one-class SVM, k-nearest neighbors and CART algorithm[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(11):1-5. [32] Cortes C, Vapnik V. Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297. [33] 宋晓涛, 孙海龙. 基于神经网络的自动源代码摘要技术综述[J]. 软件学报, 2022, 33(01): 55-77. [34] 章琳, 袁非牛, 张文睿, 等. 全卷积神经网络研究综述[J]. 计算机工程与应用, 2020, 56(1): 25-37. [35] Cho H, Sang Y. Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening[J]. Sensors, 2018, 18(4): 1-24. [36] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014: 1-15. [37] Wang F, Tax D. Survey on the attention based RNN model and its applications in computer vision[J], 10.48550/arXiv.1601.06823. 2016: 1-42. [38] Galassi A, Lippi M, Torroni P. Attention in natural language processing[J]. IEEE transactions on neural networks and learning systems, 2020, 32(10): 4291-4308. [39] Cho K, Courville A, Bengio Y. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks[J]. IEEE Transactions on Multimedia, 2015, 17(11): 1875-1886. [40] Liang W, Luo S, Zhao G, et al. Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms[J]. Mathematics, 2020, 8(5): 1-17. [41] Xia Y, Liu C, Liu N. Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending[J]. Electronic Commerce Research and Applications, 2017, 24(6): 30-49. [42] Ala'Raj M, Abbod M F. Classifiers consensus system approach for credit scoring[J]. Knowledge-Based Systems, 2016, 104(7): 89-105. [43] Li Y, Chen W. Entropy method of constructing a combined model for improving loan default prediction: A case study in China[J]. Journal of the Operational Research Society, 2019(4): 1-11. [44] Breiman L. Bagging predictors[J]. Machine learning, 1996, 24(2): 241-259. [45] Zhou Z H, Wu J, Tang W. Ensembling neural networks: Many could be better than all[J]. Artificial Intelligence, 2002, 137(1-2): 239-263. [46] Wolpert D H. Stacked generalization[J]. Neural networks, 1992, 5(2): 241-259. [47] 高云强. 电网电能计量异常解决方案研究[D]. 东南大学, 2021. [48] Torosyan N. Application of binary logistic regression in credit scoring[D]. University of Tartu, 2017. [49] WEI L J. Research and Application of Credit Score Based on Decision Tree Model[C]// International Conference on Applied Informatics and Communication. Springer, Berlin, Heidelberg, 2011: 493-501. [50] Moula F E, Guotai C, Abedin M Z. Credit default prediction modeling: an application of support vector machine[J]. Risk Management, 2017, 19(2): 158-187. [51] Rahim A H A, Rashid N A, Nayan A, et al. Smote approach to imbalanced dataset in logistic regression analysis[C]// Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017). Springer, Singapore, 2019: 429-433. [52] Rattan V, Sharma S, Mittal R, et al. Applying SMOTE with Decision Tree Classifier for Campus Placement Prediction[C]// 2021 International Conference on Computing, Communication and Green Engineering (CCGE). IEEE, 2021: 1-6. [53] Aktar H, Masud M A, Aunto N J, et al. Classification Using Random Forest on Imbalanced Credit Card Transaction Data[C]// 2021 3rd International Conference on Sustainable Technologies for Industry 4.0 (STI). IEEE, 2021: 1-4. [54] Zhang Y, Chen L. A study on forecasting the default risk of bond based on XGBoost algorithm and over-sampling method[J]. Theoretical Economics Letters, 2021, 11(2): 258-267. [55] Hsieh N C, Hung L P. A data driven ensemble classifier for credit scoring analysis[J]. Expert Systems with Applications, 2010, 37(1): 534-545. [56] Xia Y, Liu C, Da B, et al. A novel heterogeneous ensemble credit scoring model based on bstacking approach[J]. Expert Systems with Applications, 2017, 93(5): 182-199. [57] Li W, Ding S, Chen Y, et al. Heterogeneous ensemble for default prediction of peer-to-peer lending in China[J]. IEEE Access, 2018, 6: 54396-54406. [58] Li H, Feng A, Lin B, et al. A novel method for credit scoring based on feature transformation and ensemble model[J]. PeerJ Computer Science, 2021, 7(6): 579-597. [59] Mushava J, Murray M. A novel XGBoost extension for credit scoring class-imbalanced data combining a generalized extreme value link and a modified focal loss function[J]. Expert Systems with Applications, 2022, 202(9): 1-17. [60] Qin C, Zhang Y, Bao F, et al. XGBoost optimized by adaptive particle swarm optimization for credit scoring[J]. Mathematical Problems in Engineering, 2021, 2021: 1-18. ﹀
中图分类号：	TP391
开放日期：	2023-06-19

附件下载