- 无标题文档
查看论文信息

论文中文题名:

 基于粗糙集和BP神经网络的乳腺癌分类诊断预测    

姓名:

 李佳欣    

学号:

 20201221051    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 025200    

学科名称:

 经济学 - 应用统计    

学生类型:

 硕士    

学位级别:

 经济学硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 理学院    

专业:

 应用统计    

研究方向:

 机器学习    

第一导师姓名:

 冯卫兵    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-14    

论文答辩日期:

 2023-06-06    

论文外文题名:

 Classification and Diagnosis model of breast cancer based on rough set and BP neural network    

论文中文关键词:

 BP神经网络 ; 粗糙集 ; 属性约简 ; 遗传算法 ; 分类诊断    

论文外文关键词:

 BP neural network ; Rough set ; Attribute reduction ; Genetic algorithm ; Classification diagnosis    

论文中文摘要:

乳腺癌一直是我国女性发病率第一位的恶性肿瘤,但早期发现率不足20%,因此乳腺癌的早期诊断和预测对乳腺癌患者筛查、治疗从而提高患者生存率有着极其重要的价值。已有的基于BP神经网络的乳腺癌诊断方法,存在着如果初始权值等参数选择不当时,学习收敛过程缓慢,易陷入局部极小,导致乳腺癌的诊断准确率较低的缺陷。同时,乳腺癌数据中存在着大量的冗余属性,会影响分类器对乳腺癌数据诊断决策的效率。考虑到粗糙集在属性约简以及遗传算法对参数优化方面的显著优势,本文结合了粗糙集的属性约简算法和BP神经网络,建立了基于RS-GA-BP(Rough set-Genetic Algorithm-BP Neural Network)的乳腺癌诊断模型。主要工作分为以下几个方面:

首先,通过基于粗糙集的属性约简算法对UCI机器学习库中由威斯康星州临床科学中心的相关人员提取的699组乳腺癌数据进行处理,在原有遗传算法属性约简的基础上,对适应度函数和属性依赖度进行了改进,提出了改进属性依赖度的属性约简算法。该适应度函数在原来已有的最小化特征属性的个数和最大化区分矩阵可区别属性的个数的基础上,还考虑了条件特征属性对决策特征属性的依赖度,并且为了保证属性集合能够实现完全分类,在属性依赖度上加入了惩罚因子来改善本次的适应度函数。通过分析乳腺癌数据集中有关乳腺癌影响因素的9个属性特征指标,筛选出与乳腺癌密切相关的7个特征属性指标,从而确定BP神经网络的初始网络结构。

其次,运用遗传算法的全局寻优能力,优化BP神经网络初始结构参数,构建了BP 神经网络最优结构设计用于乳腺癌诊断预测,建立了基于RS-GA-BP的乳腺癌分类模型,通过预测得到该模型的分类准确率为98.54%,证明了RS-GA-BP在乳腺癌分类诊断中的有效性。

最后,通过使用UCI数据库中乳腺癌测试数据集对BP,RS-BP,GA-BP,RS-GA-BP,SVM五种算法进行对比,对比了算法的敏感性、特异性、分类准确率以及混淆矩阵。结果表明本文提出的改进乳腺癌分类诊断模型具有更高的准确性和敏感性,分类准确率提高了6.13%,分类能力更强。

论文外文摘要:

Breast cancer has always been the first malignant tumor with the highest incidence rate among women in China, but the early detection rate is less than 20%. Therefore, the early diagnosis and prediction of breast cancer is of great value to the screening and treatment of breast cancer patients to improve the survival rate of patients. The existing diagnosis methods for breast cancer based on BP neural network have the defects that if the initial weight and other parameters are not selected properly, the learning convergence process is slow, and it is easy to fall into local minima, leading to low diagnostic accuracy of breast cancer. At the same time, there are a large number of redundant attributes in breast cancer data, which will affect the efficiency of the classifier in diagnosing breast cancer data. Considering the significant advantages of rough set in attribute reduction and genetic algorithm in parameter optimization, this thesis combines the attribute reduction algorithm of rough set and BP neural network to establish a diagnosis model of breast cancer based on RS-GA-BP (Rough set Genetic Algorithm BP Neural Network). The main work is divided into the following aspects:

First, 699 groups of breast cancer data extracted from the Wisconsin Clinical Science Center in the UCI machine learning database were processed by the attribute reduction algorithm based on rough set. On the basis of the original genetic algorithm attribute reduction, the fitness function and attribute dependency were improved, and an attribute reduction algorithm with improved attribute dependency was proposed. On the basis of minimizing the number of feature attributes and maximizing the number of distinguishable attributes in the discernibility matrix, this fitness function also considers the dependency of conditional feature attributes on decision feature attributes. In order to ensure that the attribute set can achieve complete classification, a penalty factor is added to the attribute dependency to improve the fitness function. By analyzing 9 attribute characteristic indexes related to the influencing factors of breast cancer in breast cancer data set, 7 attribute characteristic indexes closely related to breast cancer were screened out, and the initial network structure of BP neural network was determined.

Secondly, the global optimization ability of genetic algorithm is used to optimize the initial structural parameters of BP neural network, and the optimal structural design of BP neural network is constructed for breast cancer diagnosis and prediction. A breast cancer classification model based on RS-GA-BP is established. The classification accuracy of the model is 98.54% through prediction, which proves the effectiveness of RS-GA-BP in breast cancer classification and diagnosis.

Finally, the sensitivity, specificity, classification accuracy and Confusion matrix of the five algorithms BP, RS-BP, GA-BP, RS-GA-BP, SVM are compared by using the breast cancer test data set in the UCI database. The results show that the improved breast cancer classification and diagnosis model proposed in this paper has higher accuracy and sensitivity, the classification accuracy rate has increased by 6.13%, and the classification ability is stronger.

参考文献:

[1]Pawlak Z. Rough sets[J]. International Journal of Computer and Information Science, 1982, 11(5): 341-356.

[2]刘宗超,李哲轩,张阳.2020全球癌症统计报告解读[J].肿瘤综合治疗电子杂志,2021,7(02):1-14.

[3]卢星凝,张莉.基于属性约简和支持向量机集成的乳腺癌诊断决策[J].计算机应用,2015,35(10):2793-2797.

[4]徐一云,陈佳静,秦悦农.机器学习在乳腺癌全程全方位管理中的研究进展[J].医学综述,2021,27(22):4465-4469.

[5]Samanta B. Artificial neural networks and genetic algorithms for gear fault detection[J]. Mechanical Systems and Signal Processing, 2004, 5(18):1273-1282.

[6]Karim El-Jabali A. Neural network modeling and control of type 1 diabetes mellitus[J]. Bioprocess and Biosystems Engineering, 2005,27(2):75-79.

[7]Goodman D E, Boggess L, Watkins A. Artificial immune system classification of multiple-class problems[J]. Proceedings of the Artificial Neural Networks in Engineering Annie, 2002, 2(2002): 179-183.

[8]Quinlan J R. Improved use of continuous attributes in C4.5[J]. Journal of Artificial Intelligence Research,1996,4(1):77-90.

[9]Kononenko I. Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition[J]. Current trends in knowledge acquisition, 1990, 8: 190.

[10]Pena-Reyesca,Sipper.A fuzzy-genetic approach to breast cancer diagnosis [J].Artificial Intelligence in Medicine,1999,17(2):131-155.

[11]王小凤,周明全,耿国华.一种基于模糊粗糙集理论的算法及其在医学影像中的应用[J].计算机应用研究,2005,22(11):3-3.

[12]刘兴华,蔡从中,袁前飞.基于支持向量机的乳腺癌辅助诊断[J].重庆大学学报:自然科学版,2007,30(6):5-5.

[13]吴辰文,李长生,王伟.一种改进的SVM算法在乳腺癌诊断方面的应用[J].计算机工程与科学,2017,39(03):562-566.

[14]叶琳,石胜源,罗铁清.AdaBoost算法在乳腺癌疾病预测中的研究[J].计算机时代,2021,No.349(07):61-64.

[15]李国友,夏永彬,张凤岭.遗传算法优化的RS-BP神经网络在聚合釜故障诊断中的应用研究[J].计算机与应用化学,2017,34(08):621-624.

[16]Mangasarian W O L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology[J]. Proceedings of the National Academy of Sciences, 1990, 87(23):9193-9196.

[17]闵增.基于集成学习的乳腺癌诊断模型研究[D].武汉市,湖北工业大学,2018.

[18]金强,高普中.人工神经网络在乳腺癌诊断中的应用[J].计算机仿真,2011,28(06):235-238.

[19]王小凤,周明全,郑建国.一种基于粗糙集的集成算法及应用[J].计算机应用与软件,2006,23(2):3-3.

[20]Hassanien A E. Rough set approach for generation of classification rules of breast cancer data[J]. Informatica, 2004,15(1): 23-38.

[21]Chen H L, Yang B, Liu J, et al. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis [J]. Expert Systems with Applications, 2011, 38(7):9014-9022.

[22]Hamouda S K M, Wahed M E, Alez R H A, et al. Robust breast cancer prediction system based on rough set theory at National Cancer Institute of Egypt[J]. Computer Methods and Programs in Biomedicine, 2018, 153: 259-268.

[23]Lundin M, Lundin J, Burke H B, et al. Artificial neural networks applied to survival prediction in breast cancer[J]. Oncology, 1999, 57(4): 281-286.

[24]Sarvestani A S, Safavi A , Parandeh N M, et al. Predicting breast cancer survivability using data mining techniques[C].International Conference on Software Technology and Engineering. IEEE, 2010, 2: 227-231.

[25]Ahmad F, Isa N A M, Noor M H M, et al. Intelligent breast cancer diagnosis using hybrid GA-ANN[C].2013 Fifth International Conference on Computational Intelligence, Communication Systems and Networks. IEEE, 2013: 9-12.

[26]Saad G, Khadour A, Kanafani Q. ANN and Adaboost application for automatic detection of microcalcifications in breast cancer[J]. The Egyptian Journal of Radiology and Nuclear Medicine, 2016, 47(4): 1803-1814.

[27]Aalaei S, Shahraki H, Rowhanimanesh A, et al. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets[J]. Iranian Journal of Basic Medical Sciences, 2016, 19(5): 476-476.

[28]Jafari-Marandi R, Davarzani S, Gharibdousti M S, et al. An optimum ANN-based breast cancer diagnosis: Bridging gaps between ANN learning and decision-making goals[J]. Applied Soft Computing, 2018, 72: 108-120.

[29]Zhou Z H, Jiang Y, Yang Y B, et al. Lung cancer cell identification based on artificial neural network ensembles[J]. Artificial Intelligence in Medicine, 2002, 24(1): 25-36.

[30]Asri H, Mousannif H, Al Moatassime H, et al. Using machine learning algorithms for breast cancer risk prediction and diagnosis[J]. Procedia Computer Science, 2016, 83: 1064-1069.

[31]Rahman M A, chandren Muniyandi R, Albashish D, et al. Artificial neural network with Taguchi method for robust classification model to improve classification accuracy of breast cancer[J]. Peerj Computer Science, 2021, 7(5): 344-344.

[32]Hassan M R, Hossain M M, Begg R K, et al. Breast-cancer identification using HMM-fuzzy approach[J]. Computers in Biology and Medicine, 2010, 40(3): 240-251.

[33]董华,马岚.基于机器学习的三阴乳腺癌预测模型[J].云南大学学报,2017,39(S1):111-115.

[34]苗立志,刁继尧,娄冲.基于Spark和随机森林的乳腺癌风险预测分析[J].计算机技术与发展,2019,29(08):142-146.

[35]Meenachi L, Ramakrishnan S. Metaheuristic search based feature selection methods for classification of cancer[J]. Pattern Recognition, 2021, 119: 108079-108079.

[36]Punitha S,Turjman F, Stephan T. An automated breast cancer diagnosis using feature selection and parameter optimization in ANN[J]. Computers Electrical Engineering, 2021, 90: 106958-106965.

[37]李莉,汪咏,陆宁.基于多分类算法混合比较的乳腺癌预测(英文)[J].控制理论与应用,2021,38(10):1503-1510.

[38]夏永彬. 聚合釜粗糙集及神经网络故障诊断研究[D].秦皇岛市,燕山大学,2018.

[39]孙宇航,常晋义,谢从华. 一种启发信息遗传算法的粗糙集属性约简算法[J]. 电脑知识与技术: 学术版, 2015 (3): 281-285.

[40]王作飞,昝红英.一种改进的基于粗糙集理论的特征选取方法[J].微计算机信息,2012,28(03):150-152.

[41]刘盾,胡培,李天瑞.基于偏好关系的不完全信息变精度粗集方法[J].西南交通大学学报,2009,44(03):396-401.

[42]亢婷,魏立力.一种改进的基于粗糙集理论的启发式特征选择算法[J].宁夏大学学报,2008,No.117(02):126-130.

[43]朱琦,刘遵仁,李书达.基于互信息的属性约简改进算法[J].青岛大学学报,2022,35(03):22-26.

[44]邬阳阳,汤建国.大数据背景下粗糙集属性约简研究进展[J].计算机工程与应用,2019,55(06):31-38+177.

[45]周涛,陆惠玲,任海玲.基于粗糙集的属性约简算法综述[J].电子学报,2021,49(07):1439-1449.

[46]Zhang D, Lou S. The application research of neural network and BP algorithm in stock price pattern classification and prediction[J]. Future Generation Computer Systems, 2021, 115: 872-879.

[47]Li X, Xiang S, Zhu P, et al. Establishing a dynamic self-adaptation learning algorithm of the BP neural network and its applications[J]. International Journal of Bifurcation and Chaos, 2015, 25(14): 1540030-1540030.

[48]Ruan F, Ding X, Li H, et al. Back propagation neural network model for medical expenses in patients with breast cancer[J]. Mathematical Biosciences and Engineering, 2021, 18(4): 3690-3698.

[49]Markey M K, Lo J Y, Floyd Jr C E. Differences between computer-aided diagnosis of breast masses and that of calcifications[J]. Radiology, 2002, 223(2): 489-493.

[50]王美玲,王念平,李晓.BP神经网络算法的改进及应用[J].计算机工程与应用,2009,45(35):47-48.

[51]张乃龙,杨文通,刘志峰.提高BP神经网络训练时间的研究[J].微计算机信息,2006(19):305-306+53.

[52]胡霞, 殷海. 基于神经网络的乳腺癌自动分类[J]. 电脑知识与技术: 学术交流, 2013,9(33): 7558-7559+7565.

[53]Wang S, Zhang N, Wu L, et al. Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method[J].Renewable Energy,2016, 94:629-636.

[54]Zhang Y, Liu S, Zhang X. An optimized supply chain network model based on modified genetic algorithm[J]. Chinese Journal of Electronics, 2017, 26(3): 468-476.

[55]Ding C, Chen L, Zhong B. Exploration of intelligent computing based on improved hybrid genetic algorithm[J]. Cluster Computing, 2019, 22: 9037-9045.

[56]柴尔烜,曾平良,马士聪,邢浩,赵兵.利用GA优化后的RS-BP神经网络进行电网故障定位的方法研究[J].电力科学与工程,2019,35(09):22-28.

[57]Zeng A, Li T, Liu D, et al. A fuzzy rough set approach for incremental feature selection on hybrid information systems[J].Fuzzy Sets and Systems,2015,258(1):39-60.

[58]Das A K, Sengupta S, Bhattacharyya S. A group incremental feature selection for classification using rough set theory based genetic algorithm[J]. Applied Soft Computing, 2018, 65: 400-411.

[59]Huang C L, Li T S, Peng T K. A hybrid approach of rough set theory and genetic algorithm for fault diagnosis[J]. The International Journal of Advanced Manufacturing Technology, 2005, 27: 119-127.

[60]Wang J, Zhang Q, Abdel-Rahman H, et al. A rough set approach to feature selection based on scatter search metaheuristic[J]. Journal of Systems Science and Complexity, 2014, 27(1): 157-168.

[61]Xu X. Simulation analysis of intrusion detection system based on genetic attribute reduction algorithm and neural network based on rough set theory[J]. Journal of Intelligent and Fuzzy Systems, 2018, 35(3): 2937-2942.

[62]Ma H, Bandos A I, Gur D. On the use of partial area under the ROC curve for comparison of two diagnostic tests[J]. Biometrical Journal, 2015, 57(2): 304-320.

中图分类号:

 R737.9; O29    

开放日期:

 2023-06-15    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式