论文中文题名: | 基于混合策略的个人信用风险预测算法研究 |
姓名: | |
学号: | 20208049007 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 0812 |
学科名称: | 工学 - 计算机科学与技术(可授工学、理学学位) |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 智能信息处理 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-19 |
论文答辩日期: | 2023-06-06 |
论文外文题名: | Research on Individual Credit Risk Prediction Algorithm Based on Hybrid Strategy |
论文中文关键词: | |
论文外文关键词: | Credit risk forecasting ; Deep learning ; Feature fusion ; Ensemble learning |
论文中文摘要: |
伴随经济的高速发展和金融环境的复杂变化,银行信贷业务中的信用风险度量面临着难以量化和不易衡量的困难。信用风险预测对降低不良贷款率,降低信用违约风险有重要的意义。本文对当前信用风险预测进行分析,构建了深度特征提取器WENET,并设计了三层式的信用风险预测算法,具体工作如下: 1.由于金融信用数据特征较多,而不同的特征之间可能存在复杂的内在联系,传统的机器学习方法与集成学习方法的有效性依赖于特征的选择,而忽略数据的内在联系。针对该问题,使用一维卷积神经网络,构建WENET深度特征提取器,提取对预测有效的抽象特征。首先,通过在不同的通道上使用不同感受野的卷积核进行了多尺度特征提取,捕获不同特征之间的内在联系;然后,采用基于注意力融合策略,使局部特征和整体特征互补,以此得到更为关键的特征。在公有信用违约数据集上,把通过WENET提取的高维度特征输入到机器学习分类器中得到最后预测结果,结合WENET的LR、DT、SVM和RF模型,AUC值比原始分类器分别提高了0.175、0.013、0.213和0.023。 2.单个模型往往只能从模型本身偏好进行评估预测,在预测的准确度和稳定性及泛化能力方面都有一定的不足。为了提高模型的泛化能力和稳定性,基于深度学习与集成学习方法,采用混合策略建立三层式预测算法模型。首先,使用WENET深度特征提取器对原始特征进行提取,构建算法模型的第一层;然后,采用四种不同的基学习器并行的方式构建第二层,对WENET提取后的特征进行学习,得到新的数据集;最后,使用逻辑回归构建算法模型第三层,对更新后的数据进行学习训练,增强了泛化能力和稳定性。通过对比信用风险预测最新方法mfXGBoost,算法的AUC值和KS值分别提高了0.028与0.054。本文算法的稳定性和泛化能力在公共数据集上均有良好的表现。 3.结合上述的研究内容,开发设计了个人信用风险预测系统。此系统可以根据客户所提供的特征信息,预测该客户是否存在信用风险,可以在信贷发放之前为银行提供重要的参考价值。 |
论文外文摘要: |
With the rapid economic development and complex changes in the financial environment, credit risk measurement in the banking industry is facing difficulties that are difficult to quantify and not easy to measure. Credit risk prediction is of great significance to reduce the rate of non-performing loans and reduce the risk of credit default. In this thesis, we analyze the current credit risk prediction, construct a deep feature extractor called WENET, and design a three-level credit risk prediction algorithm that works as follows: 1. Due to the large number of financial credit data features and the possible complex intrinsic connections between different features, the effectiveness of traditional machine learning methods and integrated learning methods depends on the selection of features and ignores the intrinsic connections of data, while the feature selection can also cause data loss. To address this problem, a one-dimensional convolutional neural network is used as the underlying framework to construct the WENET deep feature extractor to extract the key features that are effective for prediction. First, multi-scale feature extraction was performed by using convolutional kernels with different perceptual fields on different channels to capture the intrinsic connections between different features; then, an attention-based fusion strategy was used to make the local features and the overall features complementary so as to obtain more critical features. On the public credit default dataset, the high-dimensional features extracted by WENET are fed into the machine learning classifier to obtain the final prediction results, and the AUC values were improved by 0.175, 0.013, 0.213, and 0.023, respectively, over the original classifier by combining WENET's LR, DT, SVM, and RF models. 2. A single model can often only evaluate predictions based on the model's own preferences, which has certain shortcomings in terms of prediction accuracy, stability, and generalization ability. In order to improve the generalization ability and stability of the model, a three-layer prediction model is built using a hybrid strategy based on deep learning and integrated learning methods. First, the original features are extracted using the WENET deep feature extractor to build the first layer of the model; then, the second layer is built using four different base learners in parallel to learn the features extracted by WENET to obtain a new data set; finally, the third layer of the model is built using logistic regression to learn and train the updated data to reduce the impact of data perturbation and enhance the The generalization ability and stability are enhanced. By comparing the latest methods of credit risk prediction, mfXGBoost, the AUC value and KS value of the algorithms are improved by 0.028 and 0.054, respectively. the stability and generalization ability of the algorithms in this paper are good on public datasets. 3. In addition to the above research, a personal credit risk prediction system has been developed and designed. This system can predict whether the customer has credit risk based on the information provided by the customer's characteristics, which can provide important reference values for banks before credit is granted. |
参考文献: |
[1] 边清雅. 基于集成学习的信用贷款违约预测[D]. 大连理工大学, 2021. [3] 杨立生, 杨杰. 国际大宗商品价格波动对中国金融市场的风险溢出效应——波动溢出网络视角[J]. 金融监管研究, 2022, 128(08): 58-77. [4] 王钊. P2P借贷信用风险动态评价方法研究[D]. 合肥工业大学, 2019. [8] 方匡南, 章贵军, 张惠颖. 基于Lasso-logistic模型的个人信用风险预警方法[J]. 数量经济技术经济研究, 2014, 31(2): 125-136. [9] 王小燕, 方匡南, 谢邦昌. Logistic回归的双层变量选择研究[J]. 统计研究, 2014, 31(9): 107-112. [10] 方匡南, 赵梦峦. 基于多源数据融合的个人信用评分研究[J]. 统计研究, 2018, 35(12): 94-103. [11] 方匡南, 陈子岚. 基于半监督广义可加Logistic回归的信用评分方法[J]. 系统工程理论与实践. 2020, 40(2): 392-402. [13] 冯昊, 李树青. 基于多种支持向量机的多层级联式分类器研究及其在信用评分中的应用[J]. 数据分析与知识发现, 2021, 5(10): 28-36. [19] Liaw A, Wiener M. Classification and Regression by Random Forest[J]. R news, 2002, 2(3): 18-22. [28] 牛晓健, 凌飞. 基于组合学习的个人信用风险评估模型研究[J]. 复旦学报: 自然科学版, 2021, 60(6): 703-719. [32] Cortes C, Vapnik V. Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297. [33] 宋晓涛, 孙海龙. 基于神经网络的自动源代码摘要技术综述[J]. 软件学报, 2022, 33(01): 55-77. [34] 章琳, 袁非牛, 张文睿, 等. 全卷积神经网络研究综述[J]. 计算机工程与应用, 2020, 56(1): 25-37. [44] Breiman L. Bagging predictors[J]. Machine learning, 1996, 24(2): 241-259. [46] Wolpert D H. Stacked generalization[J]. Neural networks, 1992, 5(2): 241-259. |
中图分类号: | TP391 |
开放日期: | 2023-06-19 |