论文中文题名: | 基于视觉Transformer的牙齿病灶智能检测 |
姓名: | |
学号: | 20208049001 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 0812 |
学科名称: | 工学 - 计算机科学与技术(可授工学、理学学位) |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 图像识别 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-13 |
论文答辩日期: | 2023-06-06 |
论文外文题名: | Intelligent detection of dental lesions based on vision Transformer |
论文中文关键词: | |
论文外文关键词: | Tooth lesion detection ; Vision Transformer ; Multi-scale ; Self-supervised training ; Lightweight ; Semi-supervised learning |
论文中文摘要: |
近年来,随着居民口腔健康意识的增强及就诊率的提高,牙科临床对智能口腔疾病筛查的需求也随之扩增,各种深度牙齿病灶检测算法被相继提出。作为诊疗的重要环节之一,研究这些算法对智能口腔疾病筛查设备的开发具备较强实际应用意义。针对现有算法存在的检测病灶类型单一、可迁移性差等问题,本文构建一种基于视觉Transformer的牙齿病灶智能检测算法,旨在完成牙齿RGB图像中龋齿、牙龈炎、牙结石、牙齿磨损及畸形舌侧窝5种病灶的检测任务。课题研究工作及取得成果简要概括如下。 为验证Transformer模型对牙齿病灶特征学习的有效性,本文首先设计一种基于多尺度注意力的自监督视觉Transformer特征提取模型——SMVT,旨在缓解标签样本有限导致的模型对病灶表征能力不足、迁移性差等问题。模型以Swin Transformer为改进主干网,设计一种多尺度空洞金字塔作为Transformer模块的嵌入层,进行层内多尺度病灶特征的融合计算,提升了模型表征能力;采用对比式自监督训练框架,学习病灶正样本间的特征不变性,提高了模型可迁移性,最后连接分类器,在基础分类任务上验证模型对牙齿病灶特征的学习能力。实验表明,SMVT模型的表现优于其它卷积和Transformer模型,获得了最佳牙齿图像的分类性能。 之后在SMVT模型研究的基础上,进一步结合临床诊疗轻量便捷的实际需求,提出最终的轻量级牙齿病灶检测算法——NLST,旨在优化自监督训练稳定性差、滑窗注意力计算复杂度高等弊端。为达到轻量化的目的,取消SMVT模型的滑窗机制,以低维特征作局部窗口的表征进行全局注意力计算;之后为发挥无标记牙齿样本的研究价值,降低领域适应性对自监督训练的影响,以融合伪标签和正则化的方式进行半监督训练,并设计自适应阈值与反向优化策略以减少错误伪标签的生成,缓解了模型的自训练偏差。相同实验条件下,NLST算法取得了5类牙齿病灶的最佳平均检测精度,相较于SMVT、Swin Transformer等其他模型,检测精度和速度均取得一定程度的提升。 实验证明,本文提出的SMVT模型及NLST算法成功完成了牙齿病灶特征的提取和检测任务,体现了视觉Transformer模型在牙齿病灶检测任务中的应用价值。SMVT模型可应用于海量临床样本的筛选任务,便于牙齿健康宣教;NLST算法可应用于智能口腔筛查设备,成为未来辅助诊疗的有效方法之一。 |
论文外文摘要: |
In recent years, as people' oral health awareness and visit rates have improved, the demand for intelligent oral disease screening in dental clinics has increased, and several deep dental lesion detection algorithms have been presented one after the other. As one of the critical linkages between diagnosis and therapy, research into these algorithms has significant practical application implications for the development of intelligent oral disease screening equipment. In order to address the problems of single lesion type and poor mobility of existing algorithms, this paper builds an intelligent detection algorithm of dental lesions based on a vision Transformer, with the goal of completing the detection tasks of five types and locations of lesions in RGB images of teeth, including caries, gingivitis, dental calculus, tooth wear, and deformed lingual fossa. The following is a brief summary of the research work and accomplishments. To validate the feasibility and superiority of the Transformer model in learning dental lesion features, this paper first creates a self-supervised visual Transformer feature extraction model based on multi-scale attention, SMVT, with the goal of alleviating the issues of insufficient lesion representation ability and poor model mobility caused by limited tooth label samples. The model employs Swin Transformer as an upgraded backbone network and creates a multi-scale hollow pyramid as the Transformer module's embedding layer, which enables the fusion computation of multi-scale lesion features in the layer and increases model representation ability. The comparative self-supervised training approach is utilized to increase the model's mobility by learning feature invariance between positive samples of lesions. Finally, the classifier is linked, and the model's learning ability to the characteristics of dental lesions is effectively validated using the basic classification task. Experiments reveal the performance of the SMVT model is better than other convolution and Transformer models, and the classification performance of the best tooth image is obtained. Following that, the final lightweight tooth lesion detection algorithm, NLST, is proposed to optimize the poor stability of self-supervised training and the high computational complexity of sliding window attention, based on the research of the SMVT model and the actual needs of lightweight and convenient clinical diagnosis and treatment. To achieve the lightweight goal, the SMVT sliding window mechanism is disabled, and the global attention computation is completed by employing low-dimensional features as local window representation. Then, to fully exploit the research value of a large number of unlabeled tooth samples and reduce the influence of domain adaptability on self-supervised training, semi-supervised training is performed by combining pseudo-label and regularization, and adaptive threshold and reverse optimization strategies are designed to reduce the generation of false pseudo-labeling and alleviate the problem of self-training bias. The NLST algorithm achieved the highest average detection accuracy of five types of dental lesions under the same testing settings. The detection accuracy and speed have improved when compared to other backbone models such as the SMVT and Swin Transformer. Experiments reveal that the SMVT model and NLST method described in this research successfully complete the extraction and detection tasks of dental lesion features, demonstrating the visual Transformer model's application value in dental lesion detection tasks. The SMVT model can be used to screen large clinical samples, which is useful for oral health education. The NLST algorithm, when applied to intelligent oral screening equipment, has the potential to become one of the most effective approaches for future supplementary diagnosis and treatment. |
中图分类号: | TP391.4 |
开放日期: | 2023-06-13 |