查看论文信息

查看全文

免费浏览

查看论文信息

论文中文题名：	基于Transformer的牙齿病灶图像分割算法研究
姓名：	钱家黎
学号：	21208223045
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机技术
研究方向：	图像处理
第一导师姓名：	马天
第一导师单位：	西安科技大学
论文提交日期：	2024-06-19
论文答辩日期：	2024-05-31
论文外文题名：	Research on Lightweight Dental Lesion Image Segmentation Algorithm Based on Transformer
论文中文关键词：	语义分割 ; 医学分割 ; 牙齿病灶 ; 轻量化Transformer ; 多尺度算法
论文外文关键词：	Semantic Segmentation ; Medical Segmentation ; Dental Lesions ; Lightweight Transformer ; Multi-scale Algorithm
论文中文摘要：	︿随着人工智能技术的发展，研究人员开始将语义分割技术应用于医学图像分析，旨在提高医生临床检查中诊断病灶的效率。在临床检查中，牙结石、牙龈炎和磨损面是出现数量较多的三类病灶，针对这三类病灶的分割和应用难点提出对应的解决方案具有重要的现实意义。本文重点探讨了牙齿病灶分割模型的轻量化和病灶类别分布不平衡的问题，提出了基于Transformer和卷积神经网络双分支并行的轻量化牙齿病灶分割算法和基于牙齿病灶特征增强的轻量化多尺度分割算法。本文的主要工作和创新点如下：（1）针对现有的牙齿病灶分割算法计算复杂度高、模型参数量大的问题，提出了一种基于Transformer和卷积神经网络双分支并行的轻量化语义分割算法。首先，基于Transformer分支的卷积注意力提取空间局部特征和全局语义特征，卷积分支采用轻量化的卷积网络，快速下采样以获得更大的感受野，编码高级特征的上下文信息。然后通过嵌入设计的高效残差注意力模块进一步细化病灶特征的边缘信息，通过设计的主干特征对齐模块与Transformer分支保持通道的一致性，减少特征融合阶段的特征损失。实验结果表明，所提方法在牙齿病灶数据集上的平均准确率为87.35%，参数量为14.73M，保持了较低的计算复杂度和较少的模型参数量。（2）针对牙齿病灶数据集中病灶类别分布不均衡和同一病灶外观差异大的问题，在以上工作的基础上，提出了一种基于牙齿病灶特征增强的多尺度分割方法。首先，通过优化离散小波变换的参数，对牙齿病灶图像进行特征增强，以提升病灶区域的对比度和可辨识度；其次，设计了多尺度特征融合模块，通过并行金字塔池化算法和带状卷积算法进行多尺度特征融合，进一步提高模型对不同尺度特征的提取能力；最后，采用多尺度特征轴向注意力解码器，轴向注意力通过分解Transformer的自注意力机制降低计算复杂度，并保留了多尺度信息和病灶边缘细节信息。实验结果表明，所提方法在少量增加算法复杂度的情况下，提升了对牙龈炎和磨损面的分割效果以及整体的分割精度。（3）针对口腔内牙齿病灶分割任务，本文基于Transformer和卷积神经网络双分支并行的轻量化语义分割算法和多尺度分割算法，设计并实现了一套基于B/S架构的智能口腔诊断系统，主要实现了病灶分割模块、数据管理模块、算法管理模块和系统管理模块，经过测试验证，系统具有良好的稳定性和可靠性，能够有效分割口腔内牙齿病灶区域。﹀
论文外文摘要：	︿ With the development of artificial intelligence technology, researchers have begun to apply semantic segmentation technology to medical image analysis, aiming to improve the efficiency of doctors in diagnosing lesions in clinical examination. In the clinical examination, dental calculus, gingivitis, and wear surfaces are the three categories of lesions that appear in large numbers, and it is of great practical significance to propose corresponding solutions for the segmentation and application difficulties of these three categories of lesions. In this thesis, we focus on the lightweight dental lesion segmentation model and the problem of unbalanced distribution of lesion categories, and propose a lightweight dental lesion segmentation algorithm based on the dual-branch parallelism of Transformer and convolutional neural network, and a lightweight multiscale segmentation algorithm based on the feature enhancement of dental lesions. The main contributions and innovations of this thesis are as follows: (1) In response to the high computational complexity, large number of model parameters in existing dental lesion segmentation algorithms, a lightweight semantic segmentation algorithm based on a dual-branch parallel architecture of Transformer and Convolutional Neural Networks is proposed. The Transformer branch utilizes convolutional attention to extract spatial local features and global semantic features. The convolutional branch employs a lightweight convolutional network for rapid downsampling to obtain a larger receptive field, encoding the context information of advanced features. Subsequently, the edge information of the lesion features is further refined through an efficiently designed residual attention module. Finally, a backbone feature alignment module is designed to maintain channel consistency with the Transformer branch, reducing feature loss during the feature fusion stage. The experimental results show that the proposed method has an average accuracy of 87.35% on the dental lesion dataset with 14.73M covariates, which maintains a low computational complexity and a small number of model covariates. (2) Addressing the issue of imbalanced lesion category distribution and significant appearance variation within the same lesion type in dental lesion datasets, a multi-scale segmentation method based on dental lesion feature enhancement is proposed, building upon previous work. First, feature enhancement of dental lesion images is conducted by optimizing the parameters of the discrete wavelet transform to enhance the contrast and recognizability of the lesion areas. Second, a multi-scale feature fusion module is designed, which employs parallel pyramid pooling and dilated convolution algorithms to integrate features at multiple scales, thereby improving the model's ability to extract features of different scales. Lastly, a multi-scale feature axial attention decoder is utilized. Axial attention, by decomposing the self-attention mechanism of the Transformer, reduces computational complexity while preserving multi-scale information and lesion edge detail information. Experimental results demonstrate that the proposed method, with only a slight increase in algorithm complexity, improves the segmentation effectiveness of gingivitis and abrasion surfaces as well as the overall segmentation accuracy. (3) For the task of segmenting dental lesions in the oral cavity, this thesis designs and implements a browser/server (B/S) architecture-based intelligent oral diagnosis system. The system is based on a lightweight semantic segmentation algorithm and a multi-scale segmentation algorithm with dual-branch parallelism of Transformer and Convolutional Neural Network (CNN). It primarily realizes the lesion segmentation module, data management module, algorithm management module, and system management module. Through testing and verification, the system demonstrates good stability and reliability, effectively segmenting dental lesion areas in the oral cavity. ﹀
参考文献：	︿ [1] 邹宁劼.基于深度学习的齿科图像语义分割方法[D].浙江工商大学,2023.DOI:10.27462/d.cnki.ghzhc.2023.001442. [2] TAN M, LE Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]// International conference on machine learning. PMLR, 2019: 6105-6114. [3] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[A]. 2016. [4] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25. [5] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[A]. 2017. [6] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 45104520. [7] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856. [8] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131. [9] ZHANG Y M, LEE C C, HSIEH J W, et al. Csl-yolo: A new lightweight object detection system for edge computing[A]. 2021. [10] Wang H, Jiang X, Ren H, et al. Swiftnet: Real-time video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 1296-1305. [11] Yu C, Wang J, Peng C, et al. Bisenet: Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 325-341. [12] Huang Z, Wei Y, Wang X, et al. Alignseg: Feature-aligned segmentation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(1): 550-557. [13] Lee J, Kim D, Ponce J, et al. Sfnet: Learning object-aware semantic correspondence[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2278-2287. [14] Mehta S, Rastegari M, Caspi A, et al. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the european conference on computer vision (ECCV). 2018: 552-568. [15] Zhang B, Tian Z, Shen C. Dynamic neural representational decoders for high-resolution semantic segmentation[J]. Advances in Neural Information Processing Systems, 2021, 34: 17388-17399. [16] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324. [17] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022. [18] Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention[C]//International conference on machine learning. PMLR, 2021: 10347-10357. [19] Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 558-567. [20] Graham B, El-Nouby A, Touvron H, et al. Levit: a vision transformer in convnet's clothing for faster inference[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 12259-12269. [21] Chen Y, Dai X, Chen D, et al. Mobile-former: Bridging mobilenet and transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 5270-5279. [22] Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv preprint arXiv:2110.02178, 2021. [23] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440. [24] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495. [25] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 2015: 234-241. [26] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890. [27] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848. [28] ALbahbah A A, El-Bakry H M, Abd-Elgahany S. Detection of caries in panoramic dental X-ray images using back-propagation neural network[J]. International Journal of Electronics Communication and Computer Engineering, 2016, 7(5): 250. [29] Ali R B, Ejbali R, Zaied M. Detection and classification of dental caries in x-ray images using deep neural networks[C]//International conference on software engineering advances (ICSEA). 2016: 236. [30] Prajapati S A, Nagaraj R, Mitra S. Classification of dental lesions using CNN and transfer learning[C]//2017 5th International Symposium on Computational and Business Intelligence (ISCBI). IEEE, 2017: 70-74. [31] Setzer F C, Shi K J, Zhang Z, et al. Artificial intelligence for the computer-aided detection of periapical lesions in cone-beam computed tomographic images[J]. Journal of endodontics, 2020, 46(7): 987-993. [32] Haghanifar A, Majdabadi M M, Ko S B. Paxnet: Dental caries detection in panoramic x-ray using ensemble transfer learning and capsule classifier[J]. arXiv preprint arXiv:2012.13666, 2020. [33] Chandran V, Nizar G S, Simon P. Segmentation of dental radiograph images[C]//Proceedings of the Third International Conference on Advanced Informatics for Computing Research. 2019: 1-5. [34] Majanga V, Viriri S. Dental Images' Segmentation Using Threshold Connected Component Analysis[J]. Computational Intelligence and Neuroscience, 2021, 2021. [35] Hatamimajoumerd E, Tajeripour F. Developing a Novel Approach for Periapical Dental Radiographs Segmentation[J]. arXiv preprint arXiv:2111.07156, 2021. [36] Zhao Y, Li P, Gao C, et al. TSASNet: Tooth segmentation on dental panoramic X-ray images by Two-Stage Attention Segmentation Network[J]. Knowledge-Based Systems, 2020, 206: 106338. [37] Ma T, Zhou X, Yang J, et al. Dental lesion segmentation using an improved icnet network with attention[J]. Micromachines, 2022, 13(11): 1920. [38] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19. [39] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020. [40] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv:1607.06450, 2016. [41] 傅励瑶, 尹梦晓, 杨锋. 基于Transformer的U型医学图像分割网络综述[J]. 计算机应用, 2023, 43(05): 1584-1595. [42] Hatamizadeh A, Tang Y, Nath V, et al. Unetr: Transformers for 3d medical image segmentation[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2022: 574-584. [43] Cao H, Wang Y, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 205-218. [44] Huang X, Deng Z, Li D, et al. Missformer: An effective medical image segmentation transformer[J]. arXiv preprint arXiv:2109.07162, 2021. [45] Xie Y, Zhang J, Shen C, et al. Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24. Springer International Publishing, 2021: 171-180. [46] Jiang Y, Zhang Y, Lin X, et al. SwinBTS: A method for 3D multimodal brain tumor segmentation using swin transformer[J]. Brain sciences, 2022, 12(6): 797. [47] Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306, 2021. [48] Zhou H Y, Guo J, Zhang Y, et al. nnformer: Interleaved transformer for volumetric segmentation[J]. arXiv preprint arXiv:2109.03201, 2021. [49] Liu W, Tian T, Xu W, et al. Phtrans: Parallelly aggregating global and local representations for medical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2022: 235-244. [50] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258. [51] 文敏.基于多尺度特征融合的无人车图像语义分割技术研究[D].北京邮电大学,2022.DOI:10.26969/d.cnki.gbydu.2022.000785. [52] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. [53] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning. pmlr, 2015: 448-456. [54] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [55] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11534-11542. [56] Zhao H, Qi X, Shen X, et al. Icnet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 405-420. [57] Yu C, Gao C, Wang J, et al. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129: 3051-3068. [58] Tan M, Le Q. Efficientnetv2: Smaller models and faster training[C]//International conference on machine learning. PMLR, 2021: 10096-10106. [59] Guo M H, Lu C Z, Hou Q, et al. Segnext: Rethinking convolutional attention design for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2022, 35: 1140-1156. [60] Gao L, Zhang L, Liu C, et al. Handling imbalanced medical image data: A deep-learning-based one-class classification approach[J]. Artificial intelligence in medicine, 2020, 108: 101935. [61] Hossain M S, Betts J M, Paplinski A P. Dual focal loss to address class imbalance in semantic segmentation[J]. Neurocomputing, 2021, 462: 69-87. [62] Xia T, Huang G, Pun C M, et al. Multi-scale contextual semantic enhancement network for 3D medical image segmentation[J]. Physics in Medicine & Biology, 2022, 67(22): 225014. [63] Sinha A, Dolz J. Multi-scale self-guided attention for medical image segmentation[J]. IEEE journal of biomedical and health informatics, 2020, 25(1): 121-130. [64] Tian J, Mithun N C, Seymour Z, et al. Striking the right balance: Recall loss for semantic segmentation[C]//2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022: 5063-5069. [65] Bhabatosh C. Digital image processing and analysis[M]. PHI Learning Pvt. Ltd., 2011. [66] Hou Q, Zhang L, Cheng M M, et al. Strip pooling: Rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 4003-4012. [67] Huang Z, Wang X, Huang L, et al. Ccnet: Criss-cross attention for semantic segmentation[C]. roceedings of the IEEE/CVF international conference on computer vision. 2019: 603-612. [68] Cui Y, Jia M, Lin T Y, et al. Class-balanced loss based on effective number of samples[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 9268-9277. [69] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017. [70] Xu Z, Wu D, Yu C, et al. SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(6): 6378-6386. ﹀
中图分类号：	TP391
开放日期：	2025-06-19

附件下载