查看论文信息

免费浏览

查看论文信息

论文中文题名：	CVITLNN:基于视觉变换器与液态神态网络的X光影像新冠肺炎检测研究
姓名：	WAQAS MUHAMMAD
学号：	19508049006
保密级别：	公开
论文语种：	eng
学科代码：	081203
学科名称：	工学 - 计算机科学与技术（可授工学、理学学位） - 计算机应用技术
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2025
培养单位：	西安科技大学
院系：	人工智能与计算机学院
专业：	计算机科学与技术
研究方向：	人工神经网络与计算机
第一导师姓名：	于振华
第一导师单位：	西安科技大学
论文提交日期：	2025-06-18
论文答辩日期：	2025-05-29
论文外文题名：	CVITLNN: A Hybrid Approach Based on Vision Transfer and Liquid Neural Network for Covid-19 Detection in X-Ray Images
论文中文关键词：	视觉变压器(ViTs) ; 液体神经网络（LNN) ; COVID-19检测 ; 医学影像分析 ; 胸部X光分类 ; 医疗人工智能
论文外文关键词：	Vision Transformers(ViTs) ; Liquid Neural Network (LNN) ; COVID-19 Detection ; Medical Imaging ; Chest X-ray Analysis ; Healthcare
论文中文摘要：	︿从胸部X 光(CXR) 图像诊断COVID-19 存在较大难度，因为该病毒在影像学表现上与其他肺部疾病高度相似，而传统测试方法耗时较长且准确率有限。针对这个问题，本文设计了一种深度学习模型—ViTsLNN，通过整合视觉变换器和液态神经网络的技术优势，实现了胸部 X光片在COVID-19、肺炎和正常病例三类判读中的高效精准识别。利用视觉变换器解析图像全局结构特征，通过将图像分块序列化处理，采用多头自注意力机制同步捕捉图像各区域关联，并借助前馈网络实现数据复杂特征的深度挖掘，这种特性使其特别适合需要整体图像理解的视觉任务；利用液态神经网络赋予模型动态适应能力，为复杂模式识别提供更强的灵活性与鲁棒性。ViTsLNN模型通过整合这两种技术，不仅能全面把握胸部X光影像的整体结构，还能动态适应细微特征变化，使模型聚焦于最具诊断价值的肺部区域。同时，在图像输入模型前，采用肺部区域分割与降噪技术来隔离目标区域并消除无关信息干扰，从而提升模型专注度与性能表现。该模型还可生成视觉注意力热力图，直观展示影响诊断决策的关键影像区域，有效增强临床可信度与透明度。此外，为进一步提升模型鲁棒性，采用AutoAugment技术使模型在训练中自动实施多种图像变换（如翻转、旋转、亮度调节）。这种数据增强策略显著提升了模型对不平衡多样化数据集的泛化能力，使其在面对真实场景中的未知X光影像时表现更佳。与多种成熟AI模型对比测试显示，ViTsLNN在准确性、效率及可解释性方面均展现出显著优势。实验结果表明，ViTsLNN模型在COVID-19检测中准确率达到96%，精确率达到94.98%，召回率达到94.5%，证明了结合AutoAugment技术的ViTsLNN模型为COVID-19及肺部疾病检测提供了高效可靠且具可解释性的智能工具。该解决方案通过提升诊断准确性与临床决策质量，为医疗工作者应对全球健康危机提供了重要技术支持。﹀
论文外文摘要：	︿ Diagnosing COVID-19 from chest X-ray (CXR) images poses a significant challenge due to the virus's similarity to other lung conditions, coupled with the limitations of traditional diagnostic tests, which can be slow or prone to inaccuracies. This study introduces a spatial attention capability of Vision Transformers and the adaptive temporal dynamics of Liquid Neural Networks to improve classification accuracy under noisy or limited data conditions, ViTsLNN, which synergistically combines the strengths of Vision Transformers (ViTs) and Liquid Neural Networks (LNNs) for more accurate and efficient classification of CXR images into COVID-19, pneumonia, and normal categories. This study utilizes Vision Transformers (ViTs) to capture long-range dependencies and global contextual information in images by treating image patches as sequences. This network leverages the Multi-Head Self-Attention (MHSA) mechanism, which allows the model to focus on different parts of the image simultaneously, and the Feedforward Network (MLP), which enables the model to perform complex transformations and capture intricate relationships within the data. These features make ViTs particularly powerful for visual tasks that require a holistic understanding of the image. On the other hand, this study utilizes the adaptability of Liquid Neural Networks (LNNs) to dynamic and non-stationary data, providing enhanced flexibility and robustness in handling complex patterns. By combining these two powerful approaches, the ViTsLNN model can not only capture the overall structure of CXR images but also dynamically adjust to subtle variations, helping the model focus on the most diagnostically relevant regions of the lungs. Before feeding the CXR images into the model, lung segmentation and noise reduction techniques are employed to isolate the lungs and minimize irrelevant information, thus improving model focus and performance. Additionally, the model generates visual attention maps, providing interpretable insights into the regions of the X-ray that were most influential in the decision-making process, thus enhancing clinical trust and transparency. To further boost the model's robustness, an AutoAugment technique is applied, enabling the model to automatically apply various image transformations (e.g., flipping, rotation, brightness adjustments) during training. This augmentation strategy enriches the model’s ability to generalize to diverse and imbalanced datasets, improving its performance on real-world, unseen X-ray images. The ViTsLNN model achieves impressive results, with 96% accuracy, 94.98% precision, and 94.5% recall for COVID-19 detection. In comparison to several well-established AI models, the ViTsLNN consistently outperforms them in terms of accuracy, efficiency, and explainability. In conclusion, the ViTsLNN model, in combination with AutoAugment, offers a highly effective and interpretable tool for COVID-19 and lung disease detection. This solution offers valuable support to healthcare professionals, particularly in the context of global health crises, by enhancing diagnostic accuracy and facilitating informed clinical decision-making. ﹀
中图分类号：	TP391
开放日期：	2025-06-18

附件下载