论文中文题名: | 基于粗糙集和BP神经网络的乳腺癌分类诊断预测 |
姓名: | |
学号: | 20201221051 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科名称: | 经济学 - 应用统计 |
学生类型: | 硕士 |
学位级别: | 经济学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 机器学习 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-14 |
论文答辩日期: | 2023-06-06 |
论文外文题名: | Classification and Diagnosis model of breast cancer based on rough set and BP neural network |
论文中文关键词: | |
论文外文关键词: | BP neural network ; Rough set ; Attribute reduction ; Genetic algorithm ; Classification diagnosis |
论文中文摘要: |
乳腺癌一直是我国女性发病率第一位的恶性肿瘤,但早期发现率不足20%,因此乳腺癌的早期诊断和预测对乳腺癌患者筛查、治疗从而提高患者生存率有着极其重要的价值。已有的基于BP神经网络的乳腺癌诊断方法,存在着如果初始权值等参数选择不当时,学习收敛过程缓慢,易陷入局部极小,导致乳腺癌的诊断准确率较低的缺陷。同时,乳腺癌数据中存在着大量的冗余属性,会影响分类器对乳腺癌数据诊断决策的效率。考虑到粗糙集在属性约简以及遗传算法对参数优化方面的显著优势,本文结合了粗糙集的属性约简算法和BP神经网络,建立了基于RS-GA-BP(Rough set-Genetic Algorithm-BP Neural Network)的乳腺癌诊断模型。主要工作分为以下几个方面: 首先,通过基于粗糙集的属性约简算法对UCI机器学习库中由威斯康星州临床科学中心的相关人员提取的699组乳腺癌数据进行处理,在原有遗传算法属性约简的基础上,对适应度函数和属性依赖度进行了改进,提出了改进属性依赖度的属性约简算法。该适应度函数在原来已有的最小化特征属性的个数和最大化区分矩阵可区别属性的个数的基础上,还考虑了条件特征属性对决策特征属性的依赖度,并且为了保证属性集合能够实现完全分类,在属性依赖度上加入了惩罚因子来改善本次的适应度函数。通过分析乳腺癌数据集中有关乳腺癌影响因素的9个属性特征指标,筛选出与乳腺癌密切相关的7个特征属性指标,从而确定BP神经网络的初始网络结构。 其次,运用遗传算法的全局寻优能力,优化BP神经网络初始结构参数,构建了BP 神经网络最优结构设计用于乳腺癌诊断预测,建立了基于RS-GA-BP的乳腺癌分类模型,通过预测得到该模型的分类准确率为98.54%,证明了RS-GA-BP在乳腺癌分类诊断中的有效性。 最后,通过使用UCI数据库中乳腺癌测试数据集对BP,RS-BP,GA-BP,RS-GA-BP,SVM五种算法进行对比,对比了算法的敏感性、特异性、分类准确率以及混淆矩阵。结果表明本文提出的改进乳腺癌分类诊断模型具有更高的准确性和敏感性,分类准确率提高了6.13%,分类能力更强。 |
论文外文摘要: |
Breast cancer has always been the first malignant tumor with the highest incidence rate among women in China, but the early detection rate is less than 20%. Therefore, the early diagnosis and prediction of breast cancer is of great value to the screening and treatment of breast cancer patients to improve the survival rate of patients. The existing diagnosis methods for breast cancer based on BP neural network have the defects that if the initial weight and other parameters are not selected properly, the learning convergence process is slow, and it is easy to fall into local minima, leading to low diagnostic accuracy of breast cancer. At the same time, there are a large number of redundant attributes in breast cancer data, which will affect the efficiency of the classifier in diagnosing breast cancer data. Considering the significant advantages of rough set in attribute reduction and genetic algorithm in parameter optimization, this thesis combines the attribute reduction algorithm of rough set and BP neural network to establish a diagnosis model of breast cancer based on RS-GA-BP (Rough set Genetic Algorithm BP Neural Network). The main work is divided into the following aspects: First, 699 groups of breast cancer data extracted from the Wisconsin Clinical Science Center in the UCI machine learning database were processed by the attribute reduction algorithm based on rough set. On the basis of the original genetic algorithm attribute reduction, the fitness function and attribute dependency were improved, and an attribute reduction algorithm with improved attribute dependency was proposed. On the basis of minimizing the number of feature attributes and maximizing the number of distinguishable attributes in the discernibility matrix, this fitness function also considers the dependency of conditional feature attributes on decision feature attributes. In order to ensure that the attribute set can achieve complete classification, a penalty factor is added to the attribute dependency to improve the fitness function. By analyzing 9 attribute characteristic indexes related to the influencing factors of breast cancer in breast cancer data set, 7 attribute characteristic indexes closely related to breast cancer were screened out, and the initial network structure of BP neural network was determined. Secondly, the global optimization ability of genetic algorithm is used to optimize the initial structural parameters of BP neural network, and the optimal structural design of BP neural network is constructed for breast cancer diagnosis and prediction. A breast cancer classification model based on RS-GA-BP is established. The classification accuracy of the model is 98.54% through prediction, which proves the effectiveness of RS-GA-BP in breast cancer classification and diagnosis. Finally, the sensitivity, specificity, classification accuracy and Confusion matrix of the five algorithms BP, RS-BP, GA-BP, RS-GA-BP, SVM are compared by using the breast cancer test data set in the UCI database. The results show that the improved breast cancer classification and diagnosis model proposed in this paper has higher accuracy and sensitivity, the classification accuracy rate has increased by 6.13%, and the classification ability is stronger. |
参考文献: |
[2]刘宗超,李哲轩,张阳.2020全球癌症统计报告解读[J].肿瘤综合治疗电子杂志,2021,7(02):1-14. [3]卢星凝,张莉.基于属性约简和支持向量机集成的乳腺癌诊断决策[J].计算机应用,2015,35(10):2793-2797. [4]徐一云,陈佳静,秦悦农.机器学习在乳腺癌全程全方位管理中的研究进展[J].医学综述,2021,27(22):4465-4469. [11]王小凤,周明全,耿国华.一种基于模糊粗糙集理论的算法及其在医学影像中的应用[J].计算机应用研究,2005,22(11):3-3. [12]刘兴华,蔡从中,袁前飞.基于支持向量机的乳腺癌辅助诊断[J].重庆大学学报:自然科学版,2007,30(6):5-5. [13]吴辰文,李长生,王伟.一种改进的SVM算法在乳腺癌诊断方面的应用[J].计算机工程与科学,2017,39(03):562-566. [14]叶琳,石胜源,罗铁清.AdaBoost算法在乳腺癌疾病预测中的研究[J].计算机时代,2021,No.349(07):61-64. [15]李国友,夏永彬,张凤岭.遗传算法优化的RS-BP神经网络在聚合釜故障诊断中的应用研究[J].计算机与应用化学,2017,34(08):621-624. [17]闵增.基于集成学习的乳腺癌诊断模型研究[D].武汉市,湖北工业大学,2018. [18]金强,高普中.人工神经网络在乳腺癌诊断中的应用[J].计算机仿真,2011,28(06):235-238. [19]王小凤,周明全,郑建国.一种基于粗糙集的集成算法及应用[J].计算机应用与软件,2006,23(2):3-3. [33]董华,马岚.基于机器学习的三阴乳腺癌预测模型[J].云南大学学报,2017,39(S1):111-115. [34]苗立志,刁继尧,娄冲.基于Spark和随机森林的乳腺癌风险预测分析[J].计算机技术与发展,2019,29(08):142-146. [37]李莉,汪咏,陆宁.基于多分类算法混合比较的乳腺癌预测(英文)[J].控制理论与应用,2021,38(10):1503-1510. [38]夏永彬. 聚合釜粗糙集及神经网络故障诊断研究[D].秦皇岛市,燕山大学,2018. [39]孙宇航,常晋义,谢从华. 一种启发信息遗传算法的粗糙集属性约简算法[J]. 电脑知识与技术: 学术版, 2015 (3): 281-285. [40]王作飞,昝红英.一种改进的基于粗糙集理论的特征选取方法[J].微计算机信息,2012,28(03):150-152. [41]刘盾,胡培,李天瑞.基于偏好关系的不完全信息变精度粗集方法[J].西南交通大学学报,2009,44(03):396-401. [42]亢婷,魏立力.一种改进的基于粗糙集理论的启发式特征选择算法[J].宁夏大学学报,2008,No.117(02):126-130. [43]朱琦,刘遵仁,李书达.基于互信息的属性约简改进算法[J].青岛大学学报,2022,35(03):22-26. [44]邬阳阳,汤建国.大数据背景下粗糙集属性约简研究进展[J].计算机工程与应用,2019,55(06):31-38+177. [45]周涛,陆惠玲,任海玲.基于粗糙集的属性约简算法综述[J].电子学报,2021,49(07):1439-1449. [50]王美玲,王念平,李晓.BP神经网络算法的改进及应用[J].计算机工程与应用,2009,45(35):47-48. [51]张乃龙,杨文通,刘志峰.提高BP神经网络训练时间的研究[J].微计算机信息,2006(19):305-306+53. [52]胡霞, 殷海. 基于神经网络的乳腺癌自动分类[J]. 电脑知识与技术: 学术交流, 2013,9(33): 7558-7559+7565. [56]柴尔烜,曾平良,马士聪,邢浩,赵兵.利用GA优化后的RS-BP神经网络进行电网故障定位的方法研究[J].电力科学与工程,2019,35(09):22-28. |
中图分类号: | R737.9; O29 |
开放日期: | 2023-06-15 |