- 无标题文档
查看论文信息

论文中文题名:

 多模型融合Stacking集成学习在个人信贷违约预测中的研究    

姓名:

 高宇佳    

学号:

 21201221059    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科名称:

 经济学 - 应用统计    

学生类型:

 硕士    

学位级别:

 经济学硕士    

学位年度:

 2024    

培养单位:

 西安科技大学    

院系:

 理学院    

专业:

 应用统计    

研究方向:

 数据挖掘    

第一导师姓名:

 夏小刚    

第一导师单位:

 西安科技大学    

论文提交日期:

 2024-06-14    

论文答辩日期:

 2024-06-04    

论文外文题名:

 Research on Multi-model Fusion Stacking Ensemble Learning in Personal Credit Default Prediction    

论文中文关键词:

 信贷违约预测 ; 集成学习 ; 黏菌优化算法 ; Stacking模型融合    

论文外文关键词:

 Credit default prediction ; Ensemble learning ; Slime Mould Algorithm ; Stacking model fusion    

论文中文摘要:

随着社会经济的高速发展,越来越多的人选择超前消费,个人信贷需求呈增长态势。银行等金融机构的贷款数量和金额增加的同时,伴随而来的是要承担更高的个人信用贷款违约风险。所以,本文利用信贷平台真实贷款数据信息通过机器学习模型,对借贷人是否具备如期偿还的能力进行评估,进而实现有效规避风险、减小损失,保障信贷平台健康长期地发展的目标,具体研究如下。

首先,对信贷数据进行预处理和特征选择。运用基于LightGBM的递归特征消除法对特征变量进行筛选,筛选出30个特征作为最优特征子集进行后续的建模;针对数据不平衡问题,采用欠采样、过采样、欠采样和过采样结合的方法分别对训练数据集进行处理,得出ADASYN过采样和ENN欠采样相结合结果最优的结论。其次,分别用 XGBoost、LightGBM、RF算法构建单一模型。为改善单一模型预测效果,利用黏菌优化算法(SMA)对其超参数进行优化;为提高黏菌优化算法的收敛速度和精度,通过Sinusoidal混沌初始化和柯西变异策略对其进行改进。最后,基于ISMA优化算法选择模型最优的超参数组合,优化后的各单一模型指标有所提升。为提高模型的精度和泛化能力,融合优化后的单一模型构建 Stacking集成模型,同时利用SHAP方法对模型结果进行可解释性分析,为信贷平台分析影响借贷人贷款违约的因素提供了参考。

实验结果表明,对训练数据集进行平衡化处理和利用ISMA算法对单一模型的超参数进行优化可以提高模型的分类性能。其中在各单一模型中LightGBM模型的预测结果AUC值最高,Stacking融合模型的分类及预测效果显著优于单个模型。本文所构建的个人贷款违约预测模型具有良好的泛化效果和预测能力,为信贷平台风险管理提供重要参考价值。

论文外文摘要:

With the rapid development of the social economy, more and more people choose to overspend, and the demand for personal credit is on the rise. The increase in the number and amount of loans from banks and other financial institutions is accompanied by a higher risk of default on personal credit loans. Therefore, this paper utilizes the real loan data information of the credit platform through the machine learning model to assess whether the borrower has the ability to repay as scheduled, so as to achieve the goal of effectively avoiding risks, reducing losses, and guaranteeing the healthy and long-term development of the credit platform, the specific research is as follows.

Firstly, preprocessing and feature selection of credit data. The recursive feature elimination method based on LightGBM was applied to screen the feature variables, and 30 features were screened as the optimal feature subset for subsequent modeling; to address the data imbalance problem, the training dataset was processed using under-sampling, over-sampling, and a combination of under-sampling and over-sampling, respectively. Conclude that ADASYN oversampling and ENN under-sampling results are optimal. Secondly, XGBoost, LightGBM and RF algorithms were used to construct a single model. In order to improve the prediction effect of a single model, slime mold optimization algorithm (SMA) was used to optimize its hyperparameters. In order to improve the convergence speed and accuracy of the slime mold optimization algorithm, Sinusoidal chaos initialization and Cauchy mutation strategy were used to improve it. Finally, the optimal combination of hyperparameters of the model is selected based on the ISMA optimization algorithm, and the optimized indicators of each single model are improved. In order to improve the accuracy and generalization ability of the model fusion optimized single model to construct Stacking integrated model, at the same time using SHAP method to interpretable analysis of the model results, for the credit platform to analyze the factors affecting the borrower's loan default provides a reference.

The experimental results show that balancing the training data set and ISMA optimization of the hyperparameters of a single model can improve the classification performance of the model. Among the single models, the LightGBM model has the highest AUC value of prediction results, and the classification and prediction effect of the Stacking fusion model is significantly better than that of the single model. The personal loan default prediction model constructed in this paper has good generalization effect and prediction ability, which provides important reference value for credit platform risk management.

中图分类号:

 F832.4    

开放日期:

 2024-06-14    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式