查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于FL-CNN-Attention模型的冠心病患病风险预测研究
姓名：	付叶叶
学号：	22201221063
保密级别：	公开
论文语种：	chi
学科代码：	025200
学科名称：	经济学 - 应用统计
学生类型：	硕士
学位级别：	经济学硕士
学位年度：	2025
培养单位：	西安科技大学
院系：	理学院
专业：	应用统计
研究方向：	数据挖掘
第一导师姓名：	冯卫兵
第一导师单位：	西安科技大学
论文提交日期：	2025-06-18
论文答辩日期：	2025-06-08
论文外文题名：	Prediction of coronary heart disease risk based on FL-CNN- Attention modeling
论文中文关键词：	冠心病预测 ; 特征选择 ; 类别不平衡 ; 卷积神经网络 ; 可解释性
论文外文关键词：	Coronary heart disease prediction ; Feature selection ; Category imbalance ; Convolutional neural networks ; Interpretability
论文中文摘要：	︿冠心病的高发病率与年轻化趋势对早期筛查技术提出了迫切需求。然而，基于机器学习的筛查方法仍面临特征冗余、类别不平衡及模型可解释性不足等核心挑战。本文以冠心病为研究对象，聚焦医学数据的特性，提出了一套从数据处理到模型构建的系统性解决方案，并通过可解释性分析验证其临床合理性。具体研究内容如下：第一，本文提出了一套针对冠心病数据的特征优化与平衡化处理流程。首先，从NHANES数据库中获取所需数据，并进行缺失值和归一化处理。其次，针对特征冗余问题，本文采用互信息法、LASSO回归与相关性分析的混合方法进行了特征选择，最终筛选出33个与冠心病相关的关键特征。接着，为缓解类别不平衡问题，本文采用NearMiss-2算法进行欠采样，将正负样本比例调整为1:3，在有效平衡数据分布的同时，尽可能地保证了不损失过多原始数据信息。上述处理流程为后续预测模型提供了更优的数据输入。第二，为使模型更有效地聚焦高维医学数据中的关键特征，并提升模型对少数类样本的识别性能，本文构建了一种融合注意力机制与Focal Loss损失函数的FL-CNN-Attention冠心病预测模型。该模型通过卷积神经网络提取局部特征，引入SE注意力模块对通道特征进行自适应加权，突出关键特征，同时利用Focal Loss动态调控难易样本权重，以缓解类别不平衡导致的识别偏倚。实验结果表明，在NHANES数据集上模型准确率达93.61%, 召回率与F1值分别为84.08%和86.78%，显著优于XGBoost、随机森林等传统方法，且消融实验进一步证实了注意力机制与Focal Loss在提升预测性能方面的有效性。第三，针对深度学习模型难以解释其决策依据的问题，本文采用SHAP与LIME方法对模型进行了全局与个体层面的可解释性分析。分析结果表明，对FL-CNN-Attention模型进行全局解释识别出了年龄、糖尿病、家族病史等核心风险因素，与临床指南高度一致；个体层面的可解释性分析则揭示了具体决策过程的合理性。由此，模型在兼顾性能的同时也实现了良好的可解释性，为临床应用提供了可信的参考依据。综上，本文通过“数据优化-模型构建-决策解释”的全链条设计，为冠心病的智能筛查提供了新的理论支撑与方法参考。该研究还可以辅助医生制定更精准的诊疗方案，从而降低冠心病的患病风险。﹀
论文外文摘要：	︿ The high incidence and the increasing trend of younger populations affected by coronary heart disease (CHD) have created an urgent need for advanced early screening techniques. However, machine learning-based screening methods still face core challenges, including feature redundancy, class imbalance, and insufficient model interpretability. This paper focuses on CHD and, considering the specific characteristics of medical data, proposes a systematic solution from data preprocessing to model construction, while validating its clinical rationale through interpretability analysis. The key contributions are outlined as follows: First, this paper proposes a feature optimization and balancing procedure for CHD data. The necessary data is extracted from the NHANES database and processed for missing values and normalization. To address feature redundancy, a hybrid approach combining mutual information, LASSO regression, and correlation analysis is used for feature selection, ultimately identifying 33 key features related to CHD. To alleviate class imbalance, the NearMiss-2 algorithm is applied for undersampling, adjusting the positive-to-negative sample ratio to 1:3. This adjustment effectively balances the data distribution while minimizing the loss of original information, providing optimized input for subsequent prediction models. Second, to enable the model to focus more effectively on critical features in high-dimensional medical data and improve its ability to recognize minority class samples, this paper proposes the FL-CNN-Attention model, which integrates the attention mechanism with the Focal Loss function. The convolutional neural network extracts local features, while the SE attention module adaptively assigns weights to channel features, emphasizing key information. Additionally, Focal Loss is employed to adjust the weights of easy and hard samples, mitigating the bias caused by class imbalance. Experimental results on the NHANES dataset show that the model achieves an accuracy of 93.61%, with recall and F1 scores of 84.08% and 86.78%, respectively, significantly outperforming traditional methods such as XGBoost and Random Forest. Ablation studies further confirm the effectiveness of the attention mechanism and Focal Loss in improving model performance. Third, to address the challenge of explaining deep learning models' decision-making processes, this paper applies SHAP and LIME methods for both global and local interpretability analysis. The global interpretation of the FL-CNN-Attention model identifies core risk factors such as age, diabetes, and family history, which align closely with clinical guidelines. Local-level interpretability reveals the rationale behind specific predictions, further demonstrating the transparency of the model’s decision-making process. As a result, the model not only delivers strong predictive performance but also ensures interpretability, providing a reliable basis for clinical application. In conclusion, this paper presents a novel theoretical framework and a systematic approach to intelligent screening for coronary heart disease, focusing on "data optimization, model construction, and decision interpretation." The proposed model can assist healthcare professionals in developing more accurate diagnostic and treatment plans, ultimately helping to reduce the risk of coronary heart disease. ﹀
参考文献：	︿ [1] 王妍焱, 周诚. CT在冠心病诊断中的应用[J]. 中国心血管杂志, 2018, 23(01): 7-10. [2] 刘明波, 何新叶, 杨晓红, 等. 《中国心血管健康与疾病报告2023》要点解读[J]. 中国心血管杂志, 2024, 29(04): 305-324. [3] Wang J, Rao C, Goh M, et al. Risk assessment of coronary heart disease based on cloud-random forest[J]. Artificial Intelligence Review, 2023, 56(1): 203-232. [4] Rubin G D. Emerging and evolving roles for CT in screening for coronary heart disease[J]. Journal of the American College of Radiology, 2013, 10(12): 943-948. [5] Alizadehsani R, Abdar M, Roshanzamir M, et al. Machine learning-based coronary artery disease diagnosis: A comprehensive review[J]. Computers in Biology and Medicine, 2019, 111: 103346. [6] Rubini P E, Subasini C A, Katharine A V, et al. A cardiovascular disease prediction using machine learning algorithms[J]. Annals of the Romanian Society for Cell Biology, 2021, 25(2): 904-912. [7] 陆浩轩, 徐瑾妍, 程可爱, 等. 基于多因素回归分析和机器学习算法的冠心病预测模型构建及比较[J]. 宁波大学学报(理工版), 2022, 35(03): 57-62. [8] Valarmathi R, Sheela T. Heart disease prediction using hyper parameter optimization (HPO) tuning[J]. Biomedical Signal Processing and Control, 2021, 70: 103033. [9] Huang A A, Huang S Y. Use of machine learning to identify risk factors for coronary artery disease[J]. PloS One, 2023, 18(4): e0284103. [10] Yang H, Chen Z, Yang H, et al. Predicting coronary heart disease using an improved LightGBM model: Performance analysis and comparison[J]. IEEE Access, 2023, 11: 23366-23380. [11] Arabasadi Z, Alizadehsani R, Roshanzamir M, et al. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm[J]. Computer Methods and Programs in Biomedicine, 2017, 141: 19-26. [12] Amarbayasgalan T, Park K H, Davagdorj K, et al. Variational Autoencoder-Based Deep Neural Network for Coronary Heart Disease Risk Prediction[M]. Singapore: Springer Nature Singapore, 2022. [13] Ayon S I, Islam M M, Hossain M R. Coronary artery heart disease prediction: a comparative study of computational intelligence techniques[J]. IETE Journal of Research, 2022, 68(4): 2488-2507. [14] 蒋林甫, 袁贞明, 张邢炜, 等. 基于PCHD-TabNet的十年冠心病预测[J]. 数据分析与知识发现, 2023, 7(05): 133-144. [15] Haq A U, Li J P, Memon M H, et al. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms[J]. Mobile Information Systems, 2018, 2018(1): 3860146. [16] Verma L, Srivastava S, Negi P C. A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data[J]. Journal of Medical Systems, 2016, 40: 1-7. [17] Shah S M S, Shah F A, Hussain S A, et al. Support vector machines-based heart disease diagnosis using feature subset, wrapping selection and extraction methods[J]. Computers & Electrical Engineering, 2020, 84: 106628. [18] 卢金, 胡豪畅, 修佳明, 等. 机器学习驱动的冠心病风险评估: 1999至2018年NHANES数据分析[J]. 中南大学学报(医学版), 2024, 49(08): 1175-1186. [19] 陈小昆, 左航旭, 廖彬, 等. 融合XGBoost与SHAP的冠心病预测及其特征分析模型[J]. 计算机应用研究, 2022, 39(06): 1796-1804. [20] Rai H M, Chatterjee K. Hybrid CNN-LSTM deep learning model and ensemble technique for automatic detection of myocardial infarction using big ECG data[J]. Applied Intelligence, 2022, 52(5): 5366-5384. [21] Daraei A, Hamidi H. An efficient predictive model for myocardial infarction using cost-sensitive J48 model[J]. Iranian Journal of Public Health, 2017, 46(5): 682. [22] Lu Y, Jiang M, Wei L, et al. Automated arrhythmia classification using depthwise separable convolutional neural network with focal loss[J]. Biomedical Signal Processing and Control, 2021, 69: 102843. [23] 周传华, 徐文倩, 朱俊杰. 基于代价敏感卷积神经网络的集成分类算法[J]. 应用科学学报, 2022, 40(01): 69-79. [24] Dutta A, Batabyal T, Basu M, et al. An efficient convolutional neural network for coronary heart disease prediction[J]. Expert Systems with Applications, 2020, 159: 113408. [25] 卢小宾, 张杨燚, 杨冠灿, 等. 新兴技术识别中的不均衡分类研究——基于代价敏感的随机森林算法[J]. 情报学报, 2022, 41(10): 1059-1070. [26] 刘彧祺, 张智斌, 陈昊昱, 等. 基于XGBoost集成的可解释信用评分模型[J]. 数据通信, 2019, (03): 27-32. [27] Singh A, Sengupta S, Lakshminarayanan V. Explainable deep learning models in medical image analysis[J]. Journal of Imaging, 2020, 6(6): 52. [28] Stiglic G, Kocbek P, Fijacko N, et al. Interpretability of machine learning‐based prediction models in healthcare[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2020, 10(5): 1379. [29] Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and health care[J]. Neural Computing and Applications, 2020, 32(24): 18069-18083. [30] Lu H, Uddin S. Explainable stacking-based model for predicting hospital readmission for diabetic patients[J]. Information, 2022, 13(9): 436. [31] Rezk N G, Alshathri S, Sayed A, et al. XAI-Augmented voting ensemble models for heart disease prediction: A SHAP and LIME-based approach[J]. Bioengineering, 2024, 11(10): 1016. [32] Vergara J R, Estévez P A. A review of feature selection methods based on mutual information[J]. Neural Computing and Applications, 2014, 24: 175-186. [33] Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996, 58(1): 267-288. [34] Gogtay N J, Thatte U M. Principles of correlation analysis[J]. Journal of the Association of Physicians of India, 2017, 65(3): 78-81. [35] 李昂, 韩萌, 穆栋梁, 等. 多类不平衡数据分类方法综述[J]. 计算机应用研究, 2022, 39(12): 3534-3545. [36] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988. [37] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017, 40(06): 1229-1251. [38] Li Z, Liu F, Yang W, et al. A survey of convolutional neural networks: analysis, applications, and prospects[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(12): 6999-7019. [39] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [40] 陈冲, 陈杰, 张慧, 等. 深度学习可解释性综述[J]. 计算机科学, 2023, 50(05): 52-63. [41] 贺国秀, 任佳渝, 李宗耀, 等. 以可解释工具重探基于深度学习的谣言检测[J]. 数据分析与知识发现, 2024, 8(04): 1-13. [42] Lundberg S M, Lee S I. A unified approach to interpreting model predictions[J]. Advances in Neural Information Processing Systems, 2017, 30. [43] Ribeiro M T, Singh S, Guestrin C. " Why should i trust you?" Explaining the predictions of any classifier[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016: 1135-1144. [44] Fain J A. NHANES: use of a free public data set[J]. The Diabetes Educator, 2017, 43(2): 151-151. [45] Arabasadi Z, Alizadehsani R, Roshanzamir M, et al. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm[J]. Computer Methods and Programs in Biomedicine, 2017, 141: 19-26. [46] Jerez J M, Molina I, García-Laencina P J, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem[J]. Artificial Intelligence in Medicine, 2010, 50(2): 105-115. [47] 李业棉, 赵芫, 杨箭惠, 等. 队列研究中纵向缺失数据填补方法的模拟研究[J]. 中华流行病学杂志, 2021, 42(10): 1889-1894. [48] Jain D, Singh V. Feature selection and classification systems for chronic disease prediction: A review[J]. Egyptian Informatics Journal, 2018, 19(3): 179-189. [49] 施启军, 潘峰, 龙福海, 等. 特征选择方法研究综述[J]. 微电子学与计算机, 2022, 39(03): 1-8. [50] Alfebi F H, Anasanti M D. Improving cardiovascular disease prediction by integrating imputation, imbalance resampling, and feature selection techniques into machine learning model[J]. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 2023, 17(1): 55-66. [51] McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction[J]. arXiv preprint arXiv:1802.03426, 2018. [52] 肖衡, 李莉莉. 基于随机欠采样算法的信用风险研究[J]. 青岛大学学报(自然科学版), 2022, 35(04): 126-130. [53] Mqadi N M, Naicker N, Adeliyi T. Solving misclassification of the credit card imbalance problem using near miss[J]. Mathematical Problems in Engineering, 2021, 2021(1): 7194728. [54] Lin W C, Tsai C F, Hu Y H, et al. Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17-26. [55] 谭朋柳, 徐光勇, 张露玉, 等. 基于卷积神经网络和Adaboost的心脏病预测模型[J]. 计算机应用, 2023, 43(S1): 19-25. [56] 王建荣, 邓黎明, 程伟, 等. 基于CNN-LSTM-SE的心电图分类算法研究[J]. 测试技术学报, 2024, 38(03): 264-273. [57] 薛浩, 马静, 郭小宇. 基于Focal Loss改进LightGBM的供水管网毛刺数据检测[J]. 计算机与现代化, 2024, (09): 74-81+90. [58] 朱翌民, 郭茹燕, 巨家骥, 等. 一种结合Focal Loss的不平衡数据集提升树分类算法[J]. 软件导刊, 2021, 20(11): 65-69. [59] Muhammad G, Naveed S, Nadeem L, et al. Enhancing prognosis accuracy for ischemic cardiovascular disease using K nearest neighbor algorithm: A robust approach[J]. IEEE Access, 2023, 11: 97879-97895. [60] Ayon S I, Islam M M, Hossain M R. Coronary artery heart disease prediction: a comparative study of computational intelligence techniques[J]. IETE Journal of Research, 2022, 68(4): 2488-2507. [61] Swathy M, Saruladha K. A comparative study of classification and prediction of Cardio-Vascular Diseases (CVD) using Machine Learning and Deep Learning techniques[J]. ICT Express, 2022, 8(1): 109-116. [62] Mihajlović D, Mihajlović B, Todorović N, et al. Risk factors for coronary heart disease and family medicine: What can be done?[J]. Scripta Medica, 2021, 52(4): 258-265. [63] Nadeem M, Ahmed S S, Mansoor S, et al. Risk factors for coronary heart disease in patients below 45 years of age[J]. Pakistan Journal of Medical Sciences, 2013, 29(1): 91. [64] Ndrepepa G, Braun S, King L, et al. Association of uric acid with mortality in patients with stable coronary artery disease[J]. Metabolism, 2012, 61(12): 1780-1786. [65] Christiansen M K, Jensen J M, Brøndberg A K, et al. Cardiovascular risk factor control is insufficient in young patients with coronary artery disease[J]. Vascular Health and Risk Management, 2016: 219-227. [66] Sattelmair J, Pertman J, Ding E L, et al. Dose response between physical activity and risk of coronary heart disease: a meta-analysis[J]. Circulation, 2011, 124(7): 789-795. ﹀
中图分类号：	R541.4;TP181
开放日期：	2025-06-19

附件下载