论文中文题名: | 基于Stacking集成学习的信用评分卡模型 |
姓名: | |
学号: | 20201221058 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科名称: | 经济学 - 应用统计 |
学生类型: | 硕士 |
学位级别: | 经济学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 金融统计 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
论文提交日期: | 2023-06-14 |
论文答辩日期: | 2023-06-01 |
论文外文题名: | Credit Scoring Card Model Based on Stacking Ensemble Learning |
论文中文关键词: | Stacking集成学习 ; 违约风险 ; 信用评分 ; 不平衡分类 ; 评价指标体系 |
论文外文关键词: | Stacking ensemble learning ; Default risk ; Credit rating ; Unbalanced classification ; Evaluation index system |
论文中文摘要: |
近年来,我国互联网金融取得了迅猛发展,与此同时,由于监管力度不够导致互联网金融风险日益增加,信用违约风险已经变得越来越复杂和难以预测。传统的单一模型往往无法全面刻画所有的欺诈场景,因此需要采用更加灵活和多样化的方法来量化和控制信用违约风险。基于这一背景本文构建一种基于Stacking集成算法的信用评分卡模型(SIML),用于量化和控制信用违约风险。主要研究内容如下: (1)针对信用评分不平衡样本会影响分类效果,提出了基于SMOTE重采样的信用评分不平衡算法。以Lending Club平台2018年第二季度的数据集为实验对象,将本文的算法与Boderline-SMOTE、ADASYN以及具有代表性的过采样方法进行效果对比;实验结果表明,与其他过采样算法相比,SMOTE能够降低分类器错分的概率;在使用随机森林分类时,平均AUC值比Boderline-SMOTE、ADASYN分别提高了11.4和9.4个百分点,说明SMOTE算法能提升分类器的平均分类能力。 (2)构建了一个信息均衡且具有显著风险评估能力的信用评价指标体系,为信用评分模型做准备。根据信用5C分析法确立了信用评价指标体系的一级指标层;将变量相关性分析和IV-WOE框架相结合,逐层对指标进行筛选;选出27个指标确立了最终信用评价指标体系,并给出了其与信用5C标准的对应关系。该方法这不仅避免了人为主观误删的问题,也保证了选中指标具有较强的风险评估能力。 (3)构建了一种基于SLRX-Stacking集成算法的信用评分卡模型,并对模型的有效性进行了验证。首先采用SMOTE算法处理Lending Club平台信用评分数据集;其次按照违约样本比划分训练集和测试集,训练了5种单一分类器,以ROC-AUC为性能评价标准,选取3种效果较优的分类器作为基分类器;最后构建了以LR 、RF和XGBoost模型为基学习器,LR为元学习器的SLRX-Stacking集成分类模型。实验对比结果表明,模型更加适应信用评分数据的非平衡性特点,根据不同模型的AUC值和KS值对比分析,SLRX-Stacking融合模型都取得了比其他模型更好的分类效果。 |
论文外文摘要: |
In recent years, China's internet finance has achieved rapid development. At the same time, due to insufficient regulatory efforts, the risks of internet finance are increasing, and credit default risk has become increasingly complex and difficult to predict. The traditional single model is often unable to comprehensively describe all fraud scenarios, so more flexible and diversified methods are needed to quantify and control credit default risk. Based on this background, this article constructs a credit scoring model (SIML) based on Stacking ensemble algorithm, which realizes the quantification and control of credit default risk. The main research content is as follows: (1) A credit score imbalance algorithm based on SMOTE resampling is proposed to address the impact of imbalanced credit score samples on classification performance. Taking LendingClub's data set in the second quarter of 2018 as the experimental object, the algorithm in this paper is compared with Boderline SMOTE, ADASYN and representative oversampling methods; Experimental results show that SMOTE can reduce the probability of classifier misclassification compared with other oversampling algorithms; When using random forest classification, the average AUC value is 11.4 and 9.4 percentage points higher than that of Boderline SMOTE and ADASYN, respectively, indicating that SMOTE algorithm can improve the average classification ability of the classifier. (2) A credit evaluation index system with balanced information and significant risk assessment capabilities has been constructed to prepare for the credit scoring model. The first level indicator layer of the credit evaluation index system was established based on the credit 5C analysis method; Combining variable correlation analysis with the IV-WOE framework to screen indicators layer by layer; 27 indicators were selected to establish the final credit evaluation indicator system, and their corresponding relationships with the credit 5C standard were given. This method not only avoids the problem of subjective deletion by humans, but also ensures that the selected indicators have strong risk assessment ability. (3) A credit scoring card model based on the SLRX Stacking integrated algorithm was constructed and its effectiveness was verified. Firstly, the SMOTE algorithm is used to process the LendingClub platform credit scoring dataset; Secondly, the training and testing sets were divided according to the default sample ratio, and five single classifiers were trained. Using ROC-AUC as the performance evaluation criterion, three classifiers with better performance were selected as the base classifier; Finally, an SLRX-Stacking ensemble classification model was constructed using LR, RF, and XGBoost models as the base learners, and LR as the meta learner. The experimental comparison results show that the model is more adaptable to the imbalanced characteristics of credit scoring data, and the SLRX Stacking fusion model has achieved better classification performance than other models, whether in terms of AUC or KS values. |
参考文献: |
[5]张玲. 基于判别分析和期望违约率方法的信用风险度量及管理研究[D]. 长沙: 湖南大学, 2004. [6]迟国泰, 许文, 孙秀峰. 个人信用卡信用风险评价体系与模型研究[J]. 同济大学学报(自然科学版), 2006, 34(4): 557-563. [7]张成虎, 李育林, 吴鸣. 基于判别分析的个人信用评分模型研究与实证分析[J]. 大连理工大学学报(社会科学版), 2009, 30(1): 6-10. [8]姜明辉, 许佩, 任潇, 等. 个人信用评分模型的发展及优化算法分析[J]. 哈尔滨工业大学学报, 2015, 47(5): 40-45. [9]邓超, 胡梅梅, 曾文潮. 基于贝叶斯界定折叠法的小企业信用评分模型研究[J]. 管理工程学报, 2015, 29(4): 162-170. [15]陆爱国, 王珏, 刘红卫. 基于改进的 SVM 学习算法及其在信用评分中的应用[J]. 系统工程理论与实践, 2012, 32(3): 515-521. [16]姚潇, 余乐安. 模糊近似支持向量机模型及其在信用风险评估中的应用[J]. 系统工程理论与实践, 2012, 32(3): 549-554. [17]王磊, 范超, 解明明. 数据挖掘模型在小企业主信用评分领域的应用[J]. 统计研究, 2014 (10): 89-98. [18]方匡南,章贵军,张惠颖. 基于Lasso-logistic模型的个人信用风险预警方法[J]. 数量经济技术经济研究, 2014, 31(2):125-136. [19]陈煜, 周继恩, 杜金泉. 基于交易数据的信用评估方法[J]. 计算机应用与软件, 2018, 35(5): 168-171. [20]刘欣阳, 曲彦文,周琪云. 自注意力信用评估模型[J]. 计算机工程与应用, 2019,55(13): 36-41. [21]王凯. 基于改进随机森林算法的P2P贷前信用风险评估方法研究[D]. 南京: 南京邮电大学, 2020. [22]王名豪, 梁雪春. 基于CPSO-XGboost的个人信用评估[J]. 计算机工程与设计, 2019, 40(7): 1891-1895. [27]李睿. 基于SA-GA算法的组合预测模型在个人信用评分中的应用研究[D]. 哈尔滨: 哈尔滨工业大学, 2010. [28]徐娟, 胡学钢. 基于GP+ BP的信用评估模型研究[J]. 合肥工业大学学报(自然科学版) 2010, 33(4): 533-537. [29]王重仁, 王雯, 佘杰. 融合深度神经网络的个人信用评估方法[J]. 计算机工程, 2020, 46(10): 308-314. [30]牛晓健, 凌飞. 基于组合学习的个人信用风险评估模型研究[J]. 复旦学报(自然科学版), 2021, 60(6): 703-719. [36]Wolpert D H. Stacked Generalization [J]. Neural Networks, 1992, 5(2): 241-259. [43]李婷婷. 基于遗传算法的个人信用风险组合评估研究[D]. 成都: 电子科技大学,2014. [44]黄震. 基于BP神经网络模型的中国P2P借款人信用风险评估研究[D]. 北京: 北京交通大学, 2015. [45]臧建莲, 臧丽娜, 程冬玲. 改进的ID3算法在个人贷款信用风险评估中的应用[J]. 无线互联科技, 2016, 57(14): 140-142. [46]喻光丽. 基于Logistic回归模型的P2P网络借贷平台借款人信用风险评估研究[D]. 兰州: 兰州大学, 2017. [47]都红雯, 卢孝伟. 基于SVM-Logistic组合模型的P2P借款者信用风险评估――以微贷网为例[J]. 生产力研究, 2018, 315(10): 37-42. [49]胡晓丽, 成力为. 国外商业银行信贷风险管理中国别风险的评估方法评介及启示[J]. 浙江金融, 2012, (5): 50-52. [51]Carmichael D. Modeling Default for Peer-to-Peer Loans [J]. Available at SSRN, 2014: 43. |
中图分类号: | F832.4 |
开放日期: | 2023-06-14 |