论文中文题名: | 基于FL-CNN-Attention模型的冠心病患病风险预测研究 |
姓名: | |
学号: | 22201221063 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科名称: | 经济学 - 应用统计 |
学生类型: | 硕士 |
学位级别: | 经济学硕士 |
学位年度: | 2025 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 数据挖掘 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2025-06-18 |
论文答辩日期: | 2025-06-08 |
论文外文题名: | Prediction of coronary heart disease risk based on FL-CNN- Attention modeling |
论文中文关键词: | |
论文外文关键词: | Coronary heart disease prediction ; Feature selection ; Category imbalance ; Convolutional neural networks ; Interpretability |
论文中文摘要: |
冠心病的高发病率与年轻化趋势对早期筛查技术提出了迫切需求。然而,基于机器学习的筛查方法仍面临特征冗余、类别不平衡及模型可解释性不足等核心挑战。本文以冠心病为研究对象,聚焦医学数据的特性,提出了一套从数据处理到模型构建的系统性解决方案,并通过可解释性分析验证其临床合理性。具体研究内容如下: 第一,本文提出了一套针对冠心病数据的特征优化与平衡化处理流程。首先,从NHANES数据库中获取所需数据,并进行缺失值和归一化处理。其次,针对特征冗余问题,本文采用互信息法、LASSO回归与相关性分析的混合方法进行了特征选择,最终筛选出33个与冠心病相关的关键特征。接着,为缓解类别不平衡问题,本文采用NearMiss-2算法进行欠采样,将正负样本比例调整为1:3,在有效平衡数据分布的同时,尽可能地保证了不损失过多原始数据信息。上述处理流程为后续预测模型提供了更优的数据输入。 第二,为使模型更有效地聚焦高维医学数据中的关键特征,并提升模型对少数类样本的识别性能,本文构建了一种融合注意力机制与Focal Loss损失函数的FL-CNN-Attention冠心病预测模型。该模型通过卷积神经网络提取局部特征,引入SE注意力模块对通道特征进行自适应加权,突出关键特征,同时利用Focal Loss动态调控难易样本权重,以缓解类别不平衡导致的识别偏倚。实验结果表明,在NHANES数据集上模型准确率达93.61%, 召回率与F1值分别为84.08%和86.78%,显著优于XGBoost、随机森林等传统方法,且消融实验进一步证实了注意力机制与Focal Loss在提升预测性能方面的有效性。 第三,针对深度学习模型难以解释其决策依据的问题,本文采用SHAP与LIME方法对模型进行了全局与个体层面的可解释性分析。分析结果表明,对FL-CNN-Attention模型进行全局解释识别出了年龄、糖尿病、家族病史等核心风险因素,与临床指南高度一致;个体层面的可解释性分析则揭示了具体决策过程的合理性。由此,模型在兼顾性能的同时也实现了良好的可解释性,为临床应用提供了可信的参考依据。 综上,本文通过“数据优化-模型构建-决策解释”的全链条设计,为冠心病的智能筛查提供了新的理论支撑与方法参考。该研究还可以辅助医生制定更精准的诊疗方案,从而降低冠心病的患病风险。 |
论文外文摘要: |
The high incidence and the increasing trend of younger populations affected by coronary heart disease (CHD) have created an urgent need for advanced early screening techniques. However, machine learning-based screening methods still face core challenges, including feature redundancy, class imbalance, and insufficient model interpretability. This paper focuses on CHD and, considering the specific characteristics of medical data, proposes a systematic solution from data preprocessing to model construction, while validating its clinical rationale through interpretability analysis. The key contributions are outlined as follows: First, this paper proposes a feature optimization and balancing procedure for CHD data. The necessary data is extracted from the NHANES database and processed for missing values and normalization. To address feature redundancy, a hybrid approach combining mutual information, LASSO regression, and correlation analysis is used for feature selection, ultimately identifying 33 key features related to CHD. To alleviate class imbalance, the NearMiss-2 algorithm is applied for undersampling, adjusting the positive-to-negative sample ratio to 1:3. This adjustment effectively balances the data distribution while minimizing the loss of original information, providing optimized input for subsequent prediction models. Second, to enable the model to focus more effectively on critical features in high-dimensional medical data and improve its ability to recognize minority class samples, this paper proposes the FL-CNN-Attention model, which integrates the attention mechanism with the Focal Loss function. The convolutional neural network extracts local features, while the SE attention module adaptively assigns weights to channel features, emphasizing key information. Additionally, Focal Loss is employed to adjust the weights of easy and hard samples, mitigating the bias caused by class imbalance. Experimental results on the NHANES dataset show that the model achieves an accuracy of 93.61%, with recall and F1 scores of 84.08% and 86.78%, respectively, significantly outperforming traditional methods such as XGBoost and Random Forest. Ablation studies further confirm the effectiveness of the attention mechanism and Focal Loss in improving model performance. Third, to address the challenge of explaining deep learning models' decision-making processes, this paper applies SHAP and LIME methods for both global and local interpretability analysis. The global interpretation of the FL-CNN-Attention model identifies core risk factors such as age, diabetes, and family history, which align closely with clinical guidelines. Local-level interpretability reveals the rationale behind specific predictions, further demonstrating the transparency of the model’s decision-making process. As a result, the model not only delivers strong predictive performance but also ensures interpretability, providing a reliable basis for clinical application. In conclusion, this paper presents a novel theoretical framework and a systematic approach to intelligent screening for coronary heart disease, focusing on "data optimization, model construction, and decision interpretation." The proposed model can assist healthcare professionals in developing more accurate diagnostic and treatment plans, ultimately helping to reduce the risk of coronary heart disease. |
参考文献: |
[1] 王妍焱, 周诚. CT在冠心病诊断中的应用[J]. 中国心血管杂志, 2018, 23(01): 7-10. [2] 刘明波, 何新叶, 杨晓红, 等. 《中国心血管健康与疾病报告2023》要点解读[J]. 中国心血管杂志, 2024, 29(04): 305-324. [7] 陆浩轩, 徐瑾妍, 程可爱, 等. 基于多因素回归分析和机器学习算法的冠心病预测模型构建及比较[J]. 宁波大学学报(理工版), 2022, 35(03): 57-62. [14] 蒋林甫, 袁贞明, 张邢炜, 等. 基于PCHD-TabNet的十年冠心病预测[J]. 数据分析与知识发现, 2023, 7(05): 133-144. [18] 卢金, 胡豪畅, 修佳明, 等. 机器学习驱动的冠心病风险评估: 1999至2018年NHANES数据分析[J]. 中南大学学报(医学版), 2024, 49(08): 1175-1186. [19] 陈小昆, 左航旭, 廖彬, 等. 融合XGBoost与SHAP的冠心病预测及其特征分析模型[J]. 计算机应用研究, 2022, 39(06): 1796-1804. [23] 周传华, 徐文倩, 朱俊杰. 基于代价敏感卷积神经网络的集成分类算法[J]. 应用科学学报, 2022, 40(01): 69-79. [25] 卢小宾, 张杨燚, 杨冠灿, 等. 新兴技术识别中的不均衡分类研究——基于代价敏感的随机森林算法[J]. 情报学报, 2022, 41(10): 1059-1070. [26] 刘彧祺, 张智斌, 陈昊昱, 等. 基于XGBoost集成的可解释信用评分模型[J]. 数据通信, 2019, (03): 27-32. [35] 李昂, 韩萌, 穆栋梁, 等. 多类不平衡数据分类方法综述[J]. 计算机应用研究, 2022, 39(12): 3534-3545. [37] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017, 40(06): 1229-1251. [40] 陈冲, 陈杰, 张慧, 等. 深度学习可解释性综述[J]. 计算机科学, 2023, 50(05): 52-63. [41] 贺国秀, 任佳渝, 李宗耀, 等. 以可解释工具重探基于深度学习的谣言检测[J]. 数据分析与知识发现, 2024, 8(04): 1-13. [47] 李业棉, 赵芫, 杨箭惠, 等. 队列研究中纵向缺失数据填补方法的模拟研究[J]. 中华流行病学杂志, 2021, 42(10): 1889-1894. [49] 施启军, 潘峰, 龙福海, 等. 特征选择方法研究综述[J]. 微电子学与计算机, 2022, 39(03): 1-8. [52] 肖衡, 李莉莉. 基于随机欠采样算法的信用风险研究[J]. 青岛大学学报(自然科学版), 2022, 35(04): 126-130. [55] 谭朋柳, 徐光勇, 张露玉, 等. 基于卷积神经网络和Adaboost的心脏病预测模型[J]. 计算机应用, 2023, 43(S1): 19-25. [56] 王建荣, 邓黎明, 程伟, 等. 基于CNN-LSTM-SE的心电图分类算法研究[J]. 测试技术学报, 2024, 38(03): 264-273. [57] 薛浩, 马静, 郭小宇. 基于Focal Loss改进LightGBM的供水管网毛刺数据检测[J]. 计算机与现代化, 2024, (09): 74-81+90. [58] 朱翌民, 郭茹燕, 巨家骥, 等. 一种结合Focal Loss的不平衡数据集提升树分类算法[J]. 软件导刊, 2021, 20(11): 65-69. |
中图分类号: | R541.4;TP181 |
开放日期: | 2025-06-19 |