- 无标题文档
查看论文信息

论文中文题名:

 基于Stacking集成学习的信用评分卡模型    

姓名:

 鲁国慧    

学号:

 20201221058    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科名称:

 经济学 - 应用统计    

学生类型:

 硕士    

学位级别:

 经济学硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 理学院    

专业:

 应用统计    

研究方向:

 金融统计    

第一导师姓名:

 夏小刚    

第一导师单位:

 西安科技大学    

第二导师姓名:

 冯卫兵    

论文提交日期:

 2023-06-14    

论文答辩日期:

 2023-06-01    

论文外文题名:

 Credit Scoring Card Model Based on Stacking Ensemble Learning    

论文中文关键词:

 Stacking集成学习 ; 违约风险 ; 信用评分 ; 不平衡分类 ; 评价指标体系    

论文外文关键词:

 Stacking ensemble learning ; Default risk ; Credit rating ; Unbalanced classification ; Evaluation index system    

论文中文摘要:

近年来,我国互联网金融取得了迅猛发展,与此同时,由于监管力度不够导致互联网金融风险日益增加,信用违约风险已经变得越来越复杂和难以预测。传统的单一模型往往无法全面刻画所有的欺诈场景,因此需要采用更加灵活和多样化的方法来量化和控制信用违约风险。基于这一背景本文构建一种基于Stacking集成算法的信用评分卡模型(SIML),用于量化和控制信用违约风险。主要研究内容如下:

(1)针对信用评分不平衡样本会影响分类效果,提出了基于SMOTE重采样的信用评分不平衡算法。以Lending Club平台2018年第二季度的数据集为实验对象,将本文的算法与Boderline-SMOTE、ADASYN以及具有代表性的过采样方法进行效果对比;实验结果表明,与其他过采样算法相比,SMOTE能够降低分类器错分的概率;在使用随机森林分类时,平均AUC值比Boderline-SMOTE、ADASYN分别提高了11.4和9.4个百分点,说明SMOTE算法能提升分类器的平均分类能力。

(2)构建了一个信息均衡且具有显著风险评估能力的信用评价指标体系,为信用评分模型做准备。根据信用5C分析法确立了信用评价指标体系的一级指标层;将变量相关性分析和IV-WOE框架相结合,逐层对指标进行筛选;选出27个指标确立了最终信用评价指标体系,并给出了其与信用5C标准的对应关系。该方法这不仅避免了人为主观误删的问题,也保证了选中指标具有较强的风险评估能力。

(3)构建了一种基于SLRX-Stacking集成算法的信用评分卡模型,并对模型的有效性进行了验证。首先采用SMOTE算法处理Lending Club平台信用评分数据集;其次按照违约样本比划分训练集和测试集,训练了5种单一分类器,以ROC-AUC为性能评价标准,选取3种效果较优的分类器作为基分类器;最后构建了以LR 、RF和XGBoost模型为基学习器,LR为元学习器的SLRX-Stacking集成分类模型。实验对比结果表明,模型更加适应信用评分数据的非平衡性特点,根据不同模型的AUC值和KS值对比分析,SLRX-Stacking融合模型都取得了比其他模型更好的分类效果。

论文外文摘要:

In recent years, China's internet finance has achieved rapid development. At the same time, due to insufficient regulatory efforts, the risks of internet finance are increasing, and credit default risk has become increasingly complex and difficult to predict. The traditional single model is often unable to comprehensively describe all fraud scenarios, so more flexible and diversified methods are needed to quantify and control credit default risk. Based on this background, this article constructs a credit scoring model (SIML) based on Stacking ensemble algorithm, which realizes the quantification and control of credit default risk. The main research content is as follows:

(1) A credit score imbalance algorithm based on SMOTE resampling is proposed to address the impact of imbalanced credit score samples on classification performance. Taking LendingClub's data set in the second quarter of 2018 as the experimental object, the algorithm in this paper is compared with Boderline SMOTE, ADASYN and representative oversampling methods; Experimental results show that SMOTE can reduce the probability of classifier misclassification compared with other oversampling algorithms; When using random forest classification, the average AUC value is 11.4 and 9.4 percentage points higher than that of Boderline SMOTE and ADASYN, respectively, indicating that SMOTE algorithm can improve the average classification ability of the classifier.

(2) A credit evaluation index system with balanced information and significant risk assessment capabilities has been constructed to prepare for the credit scoring model. The first level indicator layer of the credit evaluation index system was established based on the credit 5C analysis method; Combining variable correlation analysis with the IV-WOE framework to screen indicators layer by layer; 27 indicators were selected to establish the final credit evaluation indicator system, and their corresponding relationships with the credit 5C standard were given. This method not only avoids the problem of subjective deletion by humans, but also ensures that the selected indicators have strong risk assessment ability.

(3) A credit scoring card model based on the SLRX Stacking integrated algorithm was constructed and its effectiveness was verified. Firstly, the SMOTE algorithm is used to process the LendingClub platform credit scoring dataset; Secondly, the training and testing sets were divided according to the default sample ratio, and five single classifiers were trained. Using ROC-AUC as the performance evaluation criterion, three classifiers with better performance were selected as the base classifier; Finally, an SLRX-Stacking ensemble classification model was constructed using LR, RF, and XGBoost models as the base learners, and LR as the meta learner. The experimental comparison results show that the model is more adaptable to the imbalanced characteristics of credit scoring data, and the SLRX Stacking fusion model has achieved better classification performance than other models, whether in terms of AUC or KS values.

参考文献:

[1]Thomas L C. A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers [J]. International Journal of Forecasting, 2000, 16(2): 149-172.

[2]Wiginton J C. A note on the comparison of logit and discriminant models of consumer credit behavior [J]. Journal of Financial and Quantitative Analysis, 1980, 15(3): 757-770.

[3]Albert A, Anderson J A. Probit and logistic discriminant functions [J]. Communications in statistics-theory and methods, 1981, 10(7): 641-657.

[4]Rosenberg E, Gleit A. Quantitative methods in credit management: a survey [J]. Operations research, 1994, 42(4): 589-613.

[5]张玲. 基于判别分析和期望违约率方法的信用风险度量及管理研究[D]. 长沙: 湖南大学, 2004.

[6]迟国泰, 许文, 孙秀峰. 个人信用卡信用风险评价体系与模型研究[J]. 同济大学学报(自然科学版), 2006, 34(4): 557-563.

[7]张成虎, 李育林, 吴鸣. 基于判别分析的个人信用评分模型研究与实证分析[J]. 大连理工大学学报(社会科学版), 2009, 30(1): 6-10.

[8]姜明辉, 许佩, 任潇, 等. 个人信用评分模型的发展及优化算法分析[J]. 哈尔滨工业大学学报, 2015, 47(5): 40-45.

[9]邓超, 胡梅梅, 曾文潮. 基于贝叶斯界定折叠法的小企业信用评分模型研究[J]. 管理工程学报, 2015, 29(4): 162-170.

[10]Wang Y, Wang S, Lai K K. A new fuzzy support vector machine to evaluate credit risk [J]. IEEE Transactions on Fuzzy Systems, 2005, 13(6): 820-831.

[11]Farquad M A H, Ravi V, Praveen G. Credit scoring using pca-svm hybrid model[C]// International Conference on Advances in Communication, Network, and Computing. Springer, Berlin, Heidelberg, 2011: 249-253.

[12]Kao L J, Chiu C C, Chiu F Y. A Bayesian latent variable model with classification and regression tree approach for behavior and credit scoring [J]. Knowledge-Based Systems, 2012, 36: 245-252.

[13]Khashei M, Mirahmadi A. A soft intelligent risk evaluation model for credit scoring classification [J]. International Journal of Financial Studies, 2015, 3(3): 411-422.

[14]Djeundje V B, Crook J. Dynamic survival models with varying coefficients for credit risks [J]. European Journal of Operational Research, 2019, 275(1): 319-333.

[15]陆爱国, 王珏, 刘红卫. 基于改进的 SVM 学习算法及其在信用评分中的应用[J]. 系统工程理论与实践, 2012, 32(3): 515-521.

[16]姚潇, 余乐安. 模糊近似支持向量机模型及其在信用风险评估中的应用[J]. 系统工程理论与实践, 2012, 32(3): 549-554.

[17]王磊, 范超, 解明明. 数据挖掘模型在小企业主信用评分领域的应用[J]. 统计研究, 2014 (10): 89-98.

[18]方匡南,章贵军,张惠颖. 基于Lasso-logistic模型的个人信用风险预警方法[J]. 数量经济技术经济研究, 2014, 31(2):125-136.

[19]陈煜, 周继恩, 杜金泉. 基于交易数据的信用评估方法[J]. 计算机应用与软件, 2018, 35(5): 168-171.

[20]刘欣阳, 曲彦文,周琪云. 自注意力信用评估模型[J]. 计算机工程与应用, 2019,55(13): 36-41.

[21]王凯. 基于改进随机森林算法的P2P贷前信用风险评估方法研究[D]. 南京: 南京邮电大学, 2020.

[22]王名豪, 梁雪春. 基于CPSO-XGboost的个人信用评估[J]. 计算机工程与设计, 2019, 40(7): 1891-1895.

[23]Dželihodžić A, Đonko D, Kevrić J. Improved credit scoring model based on bagging neural network [J]. International Journal of Information Technology & Decision Making, 2018, 17(6): 1725-1741.

[24]Finlay S. Multiple classifier architectures and their application to credit risk assessment [J]. European Journal of Operational Research, 2011, 210(2): 368-378.

[25]Qin C, Zhang Y, Bao F, et al. XGBoost optimized by adaptive particle swarm optimization for credit scoring [J]. Mathematical Problems in Engineering, 2021, 2021: 1-18.

[26]Tripathi D, Edla D R, Bablani A, et al. Experimental analysis of machine learning methods for credit score classification [J]. Progress in Artificial Intelligence, 2021, 10(3): 217-243.

[27]李睿. 基于SA-GA算法的组合预测模型在个人信用评分中的应用研究[D]. 哈尔滨: 哈尔滨工业大学, 2010.

[28]徐娟, 胡学钢. 基于GP+ BP的信用评估模型研究[J]. 合肥工业大学学报(自然科学版) 2010, 33(4): 533-537.

[29]王重仁, 王雯, 佘杰. 融合深度神经网络的个人信用评估方法[J]. 计算机工程, 2020, 46(10): 308-314.

[30]牛晓健, 凌飞. 基于组合学习的个人信用风险评估模型研究[J]. 复旦学报(自然科学版), 2021, 60(6): 703-719.

[31]Chopra A, Bhilare P. Application of ensemble models in credit scoring models [J]. Business Perspectives and Research, 2018, 6(2): 129-141.

[32]Erdal H, Karahanoğlu İ. Bagging Ensemble Models for Bank Profitability: An Emprical Research on Turkish Development and Investment Banks [J]. Applied Soft Computing, 2016, 49: 861-867.

[33]Assouline D, Mohajeri N, Scartezzini J L. Large-Scale Rooftop Solar Photovoltaic Technical Potential Estimation Using Random Forests [J]. Applied Energy, 2018, 217: 189-211.

[34]He H, Zhang W, Zhang S. A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios [J]. Expert Systems with Applications, 2018, 98: 105-117.

[35]Rao H, Shi X, Rodrigue A K, et al. Feature selection based on artificial bee colony and gradient boosting decision tree [J]. Applied Soft Computing, 2019, 7(4): 634-642.

[36]Wolpert D H. Stacked Generalization [J]. Neural Networks, 1992, 5(2): 241-259.

[37]Fawcett T. An Introduction to ROC Analysis [J]. Pattern Recognition Letters, 2006, 27(8): 861-874.

[38]Sattar F, Karray F O. Dental X-ray image segmentation and object detection based on phase congruency[C]// Image Analysis and Recognition, 2012: 172-179.

[39]Tao X, Han Z, Xin C, et al. Multi-feature based Benchmark for Cervical Dysplasia Classification Evaluation [J]. Pattern Recognition, 2017, 63:468-475.

[40]Tingting C, Xinjun M, Xingde Y, et al. Multi-Modal Fusion Learning For Cervical Dysplasia Diagnosis[C]// 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI), 2019: 08-11.

[41]Hui H, Wenyuan W, Binghuan M, Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning [J]. International Conference on Intelligent Computing 2005:878-887.

[42]He H, Bai Y, Garcia E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//IEEE International Joint Conference on Neural Networks. IEEE,2008: 1322-1328.

[43]李婷婷. 基于遗传算法的个人信用风险组合评估研究[D]. 成都: 电子科技大学,2014.

[44]黄震. 基于BP神经网络模型的中国P2P借款人信用风险评估研究[D]. 北京: 北京交通大学, 2015.

[45]臧建莲, 臧丽娜, 程冬玲. 改进的ID3算法在个人贷款信用风险评估中的应用[J]. 无线互联科技, 2016, 57(14): 140-142.

[46]喻光丽. 基于Logistic回归模型的P2P网络借贷平台借款人信用风险评估研究[D]. 兰州: 兰州大学, 2017.

[47]都红雯, 卢孝伟. 基于SVM-Logistic组合模型的P2P借款者信用风险评估――以微贷网为例[J]. 生产力研究, 2018, 315(10): 37-42.

[48]Ignatius J, Hatami Marbini A, Rahman A, et al. A fuzzy decision support system for credit scoring [J]. Neural Computing and Applications, 2018, 29: 921-937.

[49]胡晓丽, 成力为. 国外商业银行信贷风险管理中国别风险的评估方法评介及启示[J]. 浙江金融, 2012, (5): 50-52.

[50]Serrano Cinca C, Gutiérrez-Nieto B, López-Palacios L. Determinants of Default in P2P Lending [J]. PloS ONE, 2015, 10(10): 1-22.

[51]Carmichael D. Modeling Default for Peer-to-Peer Loans [J]. Available at SSRN, 2014: 43.

[52]张亚京. 基于违约状态鉴别的农户小额贷款信用评级模型研究[D]. 大连: 大连理工大学, 2019.

[53]Liu Z F, Pan S. Fuzzy-rough instance selection combined with effective classifiers in credit scoring [J]. Neural Processing Letters, 2018, 47: 193-202.

[54]Fang F, Chen Y. A new approach for credit scoring by directly maximizing the Kolmogorov-Smirnov statistic [J]. Computational Statistics and Data Analysis, 2018, 133: 180-194.

[55]Luo C, Wu D. A deep learning approach for credit scoring using credit default swaps [J]. Engineering Applications of Artificial Intelligence, 2017, 65: 465-470.

[56]Papouskova M, Hajek P. Two-stage consumer credit risk modelling using heterogeneous ensemble learning [J]. Decision Support Systems, 2019, 118: 33-45.

中图分类号:

 F832.4    

开放日期:

 2023-06-14    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式