论文中文题名: | 改进的人工蜂群算法及其在贷款违约中的应用研究 |
姓名: | |
学号: | 21201221058 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科名称: | 经济学 - 应用统计 |
学生类型: | 硕士 |
学位级别: | 经济学硕士 |
学位年度: | 2024 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 智能优化算法 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
论文提交日期: | 2024-06-14 |
论文答辩日期: | 2024-06-04 |
论文外文题名: | Research on improved artificial bee colony algorithm and its application in loan default |
论文中文关键词: | |
论文外文关键词: | Artificial colony algorithm ; Sine and cosine algorithm ; Loan defaults ; Machine learning |
论文中文摘要: |
人工蜂群算法是一种基于蜜蜂觅食行为的启发式优化算法。该算法通过模仿蜜蜂寻找食物的行为来求解优化问题,涉及引领蜂、跟随蜂和侦查蜂三种类型的蜜蜂,以其简便性、鲁棒性和广泛适用性而适用于多种优化问题。然而,该算法也面临着收敛速度慢和搜索能力受限的挑战。随着贷款行业的发展,贷款违约问题成为金融机构关注的焦点,因此对贷款违约进行预测变得尤为重要。机器学习在该问题的应用已经相当成熟,但机器学习超参数的选择对预测结果影响重大,如何找到最优超参数一直是一个挑战。就此问题,提出了一种改进的人工蜂群算法,并使用该算法对机器学习超参数进行寻优,找到贷款违约问题的最优解。 针对标准人工蜂群算法初始化随机性大、收敛能力弱等问题,提出了一种基于正弦余弦算子改进的人工蜂群优化算法(SCAABC)。首先,该算法使用分层抽样初始化蜂群,在保证种群多样性的条件下遍历了初始解的位置,充分利用种群寻找优异食物源;其次,引入正弦余弦算子的位置更新公式,增加全局最优解,同时自适应调整迭代步长,提升算法寻优速度和精度;最后,在保证子代种群随机性的前提下将轮盘赌法替换为复杂度更小的锦标赛选择算法,优化整体寻优结果。 对八个标准测试函数进行仿真实验,对比了这些函数在20维和40维下四种不同的优化算法的性能表现。实验结果表明,SCAABC算法在多个维度下寻优结果最优且最为稳定。同时,在维度增加的情况下,并未出现寻优速度成倍增加的现象,证实了SCAABC算法显著提升了寻优精度和收敛速度,这对于解决高维优化问题具有重要意义。 针对贷款违约预测问题,在对比决策树、随机森林、支持向量机以及逻辑回归四种常见的分类模型的效果基础上,选取在此问题中表现最佳的随机森林模型。使用优化算法对随机森林中的决策树的数量、数的深度以及节点分类的最小样本数进行寻优,以找到最优超参数。从不同模型的混淆矩阵、正确率、精确率、召回率、AUC值以及F1分数六个维度进行对比,结果表明SCAABC算法表现最优,整体预测错误样本量最小,并显著降低了第二类错误样本量,并在贷款违约中有较好应用。 |
论文外文摘要: |
The artificial bee colony algorithm is a heuristic optimization algorithm based on the foraging behavior of bees. The algorithm solves the optimization problem by imitating the behavior of bees looking for food, involving three types of bees: leader bee, follower bee and scout bee, and is suitable for a variety of optimization problems because of its simplicity, robustness and wide applicability. However, the algorithm also faces the challenges of slow convergence speed and limited search ability. With the development of the loan industry, the problem of loan default has become the focus of attention of financial institutions, so it has become particularly important to predict loan default. The application of machine learning in this problem has been quite mature, but the selection of machine learning hyperparameters has a great impact on the prediction results, and how to find the optimal hyperparameters has always been a challenge. To solve this problem, an improved artificial bee colony algorithm was proposed, and the algorithm was used to optimize the machine learning hyperparameters to find the optimal solution to the loan default problem. In order to solve the problems of large initialization randomness and weak convergence ability of the standard artificial bee colony algorithm, an artificial bee colony optimization algorithm based on sine cosine operator (SCAABC) was proposed. Firstly, the algorithm uses stratified sampling to initialize the bee colony, and traverses the position of the initial solution under the condition of ensuring the diversity of the population, making full use of the population to find the best food source. Secondly, the position update formula of the sine cosine operator is introduced to increase the global optimal solution, and the iteration step size is adaptively adjusted to improve the optimization speed and accuracy of the algorithm. Finally, under the premise of ensuring the randomness of the offspring population, the roulette method was replaced by a tournament selection algorithm with less complexity to optimize the overall optimization results. Simulation experiments are carried out on eight standard test functions, and the performance of these functions in four different optimization algorithms in 20D and 40D is compared. Experimental results show that the SCAABC algorithm has the best and most stable optimization results in multiple dimensions. At the same time, the optimization speed does not increase exponentially when the dimension increases, which proves that the SCAABC algorithm significantly improves the optimization accuracy and convergence speed, which is of great significance for solving the high-dimensional optimization problem. For the problem of loan default prediction, the random forest model with the best performance in this problem is selected on the basis of comparing the effects of four common classification models: decision tree, random forest, support vector machine and logistic regression. The optimization algorithm is used to optimize the number of decision trees, the depth of the number and the minimum number of samples for node classification in the random forest to find the optimal hyperparameters. The results show that the SCAABC algorithm has the best performance, the overall prediction error sample size is the smallest, and the second type of error sample size is significantly reduced, and it is well used in loan default. |
中图分类号: | TP18 |
开放日期: | 2024-06-17 |