查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习的西餐食材识别方法研究与实现
姓名：	张桓瑞
学号：	22208223044
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2025
培养单位：	西安科技大学
院系：	人工智能与计算机学院
专业：	计算机技术
研究方向：	计算机视觉
第一导师姓名：	张婧
第一导师单位：	西安科技大学
论文提交日期：	2025-06-16
论文答辩日期：	2025-05-27
论文外文题名：	Research and Implementation of Western Food Ingredient Recognition Method Based on Deep Learning
论文中文关键词：	目标检测 ; 语义分割 ; 图像识别 ; 食品计算 ; 食品识别
论文外文关键词：	Object Detection ; Semantic Segmentation ; Image Recognition ; Food Computing ; Food Recognition
论文中文摘要：	︿社交网络和物联网等多源数据的涌现催生了海量西餐图像数据，在深度学习迅速发展的背景下，这一趋势推动了新兴交叉领域食品计算的形成。作为食品计算的核心任务之一，西餐图像识别已成为智能餐饮服务、智慧健康和营养分析系统的重要基础。然而，当前的西餐图像识别中的西餐菜品识别和西餐食材识别在实际应用中面临两个主要问题：西餐菜品识别方法存在模型参数量较大，推理速度较慢的问题；西餐食材识别算法难以解决多类别食材识别的边缘重叠问题。因此，本文在西餐菜品识别和西餐食材识别两个连续的工作流程展开研究，具体研究内容如下：（1）为了解决现有西餐菜品识别模型参数量较大，推理速度较慢问题，提出了一种基于DBFMViT-ASDH-YOLO11（Dual-Branch Fusion MobileViT Alterable Shared Detection Head-YOLO11）的轻量级西餐菜品识别算法。首先，在YOLO11的骨干网络部分设计了双分支融合移动视觉转换器DBFMViT，DBFMViT在轻量化网络MobileViT v3上通过双分支跳跃连接结构将浅层特征和深层特征进行跨阶段融合，有效捕获菜品特征并减少模型参数量。其次，设计了可变共享检测头ASDH，通过引入可变核卷积和共享卷积实现进一步模型轻量化，大大减少模型参数量。实验结果表明，在Food Detection数据集上，该模型在识别精度方面表现较好，mAP提升了1.11%。同时，该模型在计算成本和推理速度上表现较好，Params减少了约37.9%。推理速度提升了31.25%。（2）为了解决多类别食材识别中所存在的边缘重叠问题，提出了一种基于MEFPN-SC-Mask2Former（Multi-scale Enhanced Feature Pyramid Network Semantic Concern-Mask2Former）的西餐食材识别方法。首先，设计了一种多尺度增强特征金字塔网络MEFPN，对多尺度特征图中不同级别特征图分别处理以获取全局特征和局部特征，用于提取食材对象的不规则形状特征。其次，设计了一种语义关注SC模块，实现对语义信息的重点关注，从而理解食材对象之间的重叠关系。实验结果表明，在FoodSeg103数据集上对比西餐食材识别代表性较优的方法，该模型分别在指标mIoU、mAcc和aAcc提高了2.16%、2.37%和1.18%。（3）基于本文提出的西餐菜品识别模型和西餐食材识别模型，设计并实现了一个西餐食材识别系统。该系统基于B/S架构，实现了用户注册登录、西餐菜品识别、西餐食材识别、模型优化、AI营养分析等功能。﹀
论文外文摘要：	︿ The proliferation of multi-source data from social networks and the Internet of Things has led to the generation of massive Western food image datasets. Against the backdrop of rapid advancements in deep learning, this trend has fostered the emergence of the interdisciplinary field of food computing. As one of its core tasks, Western food image recognition has become a fundamental technology for intelligent dining services, smart health, and nutrition analysis systems. However, current approaches to Western food image recognition—specifically Western dish recognition and Western ingredient recognition—face two major challenges in practical applications: (1) Western dish recognition methods typically suffer from large model sizes and slow inference speeds; (2) Western ingredient recognition algorithms struggle to resolve edge-overlapping issues when recognizing multiple ingredient categories. Therefore, this study focuses on the two consecutive workflows of Western dish recognition and ingredient recognition, with the following contributions: To address the issues of large model parameters and slow inference in existing Western dish recognition models, we propose a lightweight algorithm named DBFMViT-ASDH-YOLO11 (Dual-Branch Fusion MobileViT Alterable Shared Detection Head-YOLO11) for Western dish recognition. Specifically, we design a dual-branch fusion MobileViT (DBFMViT) backbone for YOLO11, which integrates shallow and deep features via dual-branch skip connections based on the lightweight MobileViT v3 network, thereby effectively capturing dish features while reducing the number of parameters. In addition, an Alterable Shared Detection Head (ASDH) is introduced, which leverages deformable convolutions and shared convolutions to further lighten the model and significantly decrease parameter count. Experimental results on the Food Detection dataset demonstrate that the proposed model achieves superior recognition accuracy, with a 1.11% improvement in mAP. Moreover, the model exhibits substantial reductions in computational cost and inference latency, decreasing the number of parameters by approximately 37.9% and increasing inference speed by 31.25%. To tackle the edge-overlapping issue in multi-category ingredient recognition, we propose a novel Western ingredient recognition method based on MEFPN-SC-Mask2Former (Multi-scale Enhanced Feature Pyramid Network Semantic Concern-Mask2Former). First, we design a Multi-scale Enhanced Feature Pyramid Network (MEFPN) that separately processes multi-level feature maps to extract both global and local features, which are crucial for capturing irregular ingredient shapes. Additionally, a Semantic Concern (SC) module is developed to focus on semantic information and better understand overlapping relationships among ingredient objects. Experimental results on the FoodSeg103 dataset show that, compared with representative state-of-the-art methods, our approach achieves improvements of 2.16%, 2.37%, and 1.18% in mIoU, mAcc, and aAcc, respectively. Based on the proposed Western dish and ingredient recognition models, we have designed and implemented a Western ingredient recognition system. The system, built on a B/S (Browser/Server) architecture, supports functionalities including user registration and login, Western dish recognition, ingredient recognition, model optimization, and AI-driven nutrition analysis. ﹀
参考文献：	︿ [1] Min W, Jiang S, Liu L, et al. A survey on food computing[J]. Acm Computing Surveys (CSUR), 2019, 52(5): 1-36. [2] Wang Z, Min W, Li Z, et al. Ingredient-guided region discovery and relationship modeling for food category-ingredient prediction[J]. IEEE Transactions on Image Processing, 2022, 31: 5214-5226. [3] Wang W, Min W, Li T, et al. A review on vision-based analysis for automatic dietary assessment[J]. Trends in Food Science & Technology, 2022, 122: 223-237. [4] Alahmari S S, Gardner M R, Salem T. Attention guided approach for food type and state recognition[J]. Food and Bioproducts Processing, 2024, 145: 1-10. [5] 闵巍庆, 刘林虎, 刘宇昕, 等. 食品图像识别方法综述[J]. 计算机学报, 2022, 45(3): 542-566. [6] Xiao Z, Ling R, Deng Z. FoodCSWin: A high-accuracy food image recognition model for dietary assessment[J]. Journal of Food Composition and Analysis, 2025, 139: 107110. [7] Liu Y, Min W, Jiang S, et al. Convolution-enhanced bi-branch adaptive transformer with cross-task interaction for food category and ingredient recognition[J]. IEEE Transactions on Image Processing, 2024. [8] Amugongo L M, Kriebitz A, Boch A, et al. Mobile computer vision-based applications for food recognition and volume and calorific estimation: A systematic review[C]. Healthcare. MDPI, 2022, 11(1): 59. [9] Aktı Ş, Qaraqe M, Ekenel H K. A mobile food recognition system for dietary assessment[C]. International Conference on Image Analysis and Processing. Cham: Springer International Publishing, 2022: 71-81. [10] Vinod G, Shao Z, Zhu F. Image based food energy estimation with depth domain adaptation[C]. 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2022: 262-267. [11] Feng Z, Li X, Li Y. MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition[J]. Multimedia Systems, 2024, 30(3): 149. [12] Nadeem M, Shen H, Choy L, et al. Smart diet diary: real-Time mobile application for food recognition[J]. Applied System Innovation, 2023, 6(2): 53. [13] Min W, Wang Z, Liu Y, et al. Large scale visual food recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 9932-9949. [14] Luo L. Research on food image recognition of deep learning algorithms[C]. 2023 International Conference on Computers, Information Processing and Advanced Education (CIPAE). IEEE, 2023: 733-737. [15] Sreetha E S, Naveen Sundar G, Narmadha D. Comparative study on recognition of food item from images for analyzing the nutritional contents[M]. Disruptive Technologies for Big Data and Cloud Applications: Proceedings of ICBDCC 2021. Singapore: Springer Nature Singapore, 2022: 269-276. [16] He Z, Zhang Z, Feng G, et al. Dishes recognition system based on deep learning[J]. Academic Journal of Computing & Information Science, 2022, 5(2): 48-53. [17] Khan M T, Khan M H. A Cloud Edge Collaboration of Food Recognition Using Deep Neural Networks[J]. Journal of Artificial Intelligence and Computing, 2024, 2(1): 9-18. [18] Giovany S, Putra A, Hariawan A S, et al. Machine learning and SIFT approach for Indonesian food image recognition[J]. Procedia computer science, 2017, 116: 612-620. [19] Zhang S, Callaghan V, Che Y. Image-based methods for dietary assessment: A survey[J]. Journal of Food Measurement and Characterization, 2024, 18(1): 727-743. [20] Kong F, He H, Raynor H A, et al. DietCam: Multi-view regular shape food recognition with a camera phone[J]. Pervasive and Mobile Computing, 2015, 19: 108-121. [21] Kawano Y, Yanai K. Foodcam: A real-time food recognition system on a smartphone[J]. Multimedia Tools and Applications, 2015, 74: 5263-5287. [22] Bossard L, Guillaumin M, Van Gool L. Food-101–mining discriminative components with random forests[C]. Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part VI 13. Springer International Publishing, 2014: 446-461. [23] Alahmari S S, Salem T. Food state recognition using deep learning[J]. Ieee Access, 2022, 10: 130048-130057. [24] Wei P, Wang B. Food image classification and image retrieval based on visual features and machine learning[J]. Multimedia Systems, 2022, 28(6): 2053-2064. [25] Xiao L, Lan T, Xu D, et al. A simplified CNNs visual perception learning network algorithm for foods recognition[J]. Computers & Electrical Engineering, 2021, 92: 107152. [26] Sheng G, Min W, Yao T, et al. Lightweight food image recognition with global shuffle convolution[J]. IEEE Transactions on AgriFood Electronics, 2024. [27] Lohala S, Alsadoon A, Prasad P W C, et al. A novel deep learning neural network for fast-food image classification and prediction using modified loss function[J]. Multimedia Tools and Applications, 2021, 80(17): 25453-25476. [28] Chopra M, Purwar A. Food Image Recognition Using CNN, Faster R-CNN and YOLO[M]. Applications of Artificial Intelligence, Big Data and Internet of Things in Sustainable Development. CRC Press, 2022: 81-89. [29] Yumang A N, Banguilan D E S, Veneracion C K S. Raspberry PI based food recognition for visually impaired using YOLO algorithm[C]. 2021 5th International Conference on Communication and Information Systems (ICCIS). IEEE, 2021: 165-169. [30] Jubayer F, Soeb J A, Mojumder A N, et al. Detection of mold on the food surface using YOLOv5[J]. Current Research in Food Science, 2021, 4: 724-728. [31] Shams M Y, Hussien A, Atiya A, et al. Food item recognition and calories estimation using YOLOv5[C]. International Conference on Computer & Communication Technologies. Singapore: Springer Nature Singapore, 2023: 241-252. [32] 杨青华,钟世昊,杨观赐.基于改进YOLOv7的轻量级多菜品识别方法[J].贵州大学学报,2024,41(06):85-90. [33] 徐涛.基于神经网络特征提取的食堂结算系统中菜品识别方法研究[J/OL].机械工程师,1-5[2025-02-25]. [34] Louro J, Fidalgo F, Oliveira Â. Recognition of Food Ingredients—Dataset Analysis[J]. Applied Sciences, 2024, 14(13): 5448. [35] Romero-Tapiador S, Tolosana R, Morales A, et al. Leveraging automatic personalised nutrition: food image recognition benchmark and dataset based on nutrition taxonomy[J]. Multimedia Tools and Applications, 2024: 1-22. [36] Smith S P, Adam M T P, Manning G, et al. Food volume estimation by integrating 3d image projection and manual wire mesh transformations[J]. IEEE Access, 2022, 10: 48367-48378. [37] Rodrigues M S, Fidalgo F, Oliveira A. RecipeIS—Recipe Recommendation System Based on Recognition of Food Ingredients[J]. Applied Sciences, 2023, 13(13): 7880. [38] Konstantakopoulos F S, Georga E I, Fotiadis D I. A review of image-based food recognition and volume estimation artificial intelligence systems[J]. IEEE Reviews in Biomedical Engineering, 2023, 17: 136-152. [39] Chen J, Zhao Q, Rao W, et al. An Automatic Nutrition Estimation Framework Based on Food Images from Diabetic Patients[C]. 2024 IEEE International Conference on E-health Networking, Application & Services (HealthCom). IEEE, 2024: 1-6. [40] Wu X, Fu X, Liu Y, et al. A large-scale benchmark for food image segmentation[C]. Proceedings of the 29th ACM international conference on multimedia. 2021: 506-515. [41] Meyers A, Johnston N, Rathod V, et al. Im2Calories: towards an automated mobile vision food diary[C]. Proceedings of the IEEE international conference on computer vision. 2015: 1233-1241. [42] Wang Q, Dong X, Wang R, et al. Swin transformer based pyramid pooling network for food segmentation[C]. 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI). IEEE, 2022: 64-68. [43] Aguilar E, Nagarajan B, Remeseiro B, et al. Bayesian deep learning for semantic segmentation of food images[J]. Computers and Electrical Engineering, 2022, 103: 108380. [44] Ando Y, Ege T, Cho J, et al. Depthcaloriecam: A mobile application for volume-based foodcalorie estimation using depth cameras[C]. Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management. 2019: 76-81. [45] Vlachopoulou V, Sarafis I, Papadopoulos A. Food Image Classification and Segmentation with Attention-Based Multiple Instance Learning[C]. 2023 18th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP) 18th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP 2023). IEEE, 2023: 1-5. [46] Reibel Y. Towards a Smart Food Diary: Evaluating semantic segmentation models on a newly annotated dataset: FoodSeg103[J]. 2024. [47] Zou Z, Chen K, Shi Z, et al. Object detection in 20 years: A survey[J]. Proceedings of the IEEE, 2023, 111(3): 257-276. [48] Zaidi S S A, Ansari M S, Aslam A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2022, 126: 103514. [49] Arkin E, Yadikar N, Xu X, et al. A survey: object detection methods from CNN to transformer[J]. Multimedia Tools and Applications, 2023, 82(14): 21353-21383. [50] More B, Bhosale S. A Comprehensive Survey on Object Detection Using Deep Learning[J]. Revue d'Intelligence Artificielle, 2023, 37(2). [51] Gonten F, Nfwan F, Gital A Y. Pre-Review Convolutional Neural Network for Detecting Object in Image Comprehensive Survey and Analysis[J]. Journal of Information Systems and Technology Research, 2024, 3(2): 45-64. [52] Hmidani O, Alaoui E M I. A comprehensive survey of the R-CNN family for object detection[C]. 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet). IEEE, 2022: 1-6. [53] Jiang P, Ergu D, Liu F, et al. A Review of Yolo algorithm developments[J]. Procedia computer science, 2022, 199: 1066-1073. [54] Chen Z, Guo H, Yang J, et al. Fast vehicle detection algorithm in traffic scene based on improved SSD[J]. Measurement, 2022, 201: 111655. [55] Mo Y, Wu Y, Yang X, et al. Review the state-of-the-art technologies of semantic segmentation based on deep learning[J]. Neurocomputing, 2022, 493: 626-646. [56] Minaee S, Boykov Y, Porikli F, et al. Image segmentation using deep learning: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(7): 3523-3542. [57] Zhou T, Wang W, Konukoglu E, et al. Rethinking semantic segmentation: A prototype view[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 2582-2593. [58] Lin K, Zhou T, Gao X, et al. Deep convolutional neural networks for construction and demolition waste classification: VGGNet structures, cyclical learning rate, and knowledge transfer[J]. Journal of Environmental Management, 2022, 318: 115501. [59] Qin D, Leichner C, Delakis M, et al. MobileNetV4: universal models for the mobile ecosystem[C]. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 78-96. [60] Wadekar S N, Chaurasia A. Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features[J]. arXiv preprint arXiv:2209.15159, 2022. [61] Zhang H, Dun Y, Pei Y, et al. HF-HRNet: A simple hardware friendly high-resolution network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024. [62] Yuan W, Wang J, Xu W. Shift pooling PSPNet: rethinking PSPNet for building extraction in remote sensing images from entire local feature pooling[J]. Remote sensing, 2022, 14(19): 4889. [63] Huang J, Ren L, Zhou X, et al. An improved neural network based on SENet for sleep stage classification[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(10): 4948-4956. [64] Huang Z, Wang X, Wei Y, et al. CCNet: Criss-Cross Attention for Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2023, 45(06): 6896-6908. [65] Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 6881-6890. [66] Xie E, Wang W, Yu Z, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers[J]. Advances in neural information processing systems, 2021, 34: 12077-12090. [67] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022. [68] Cheng B, Misra I, Schwing A G, et al. Masked-attention mask transformer for universal image segmentation[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 1290-1299. [69] Zhang Y, Yan J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting[C]. The eleventh international conference on learning representations. 2023. [70] Guo M H, Xu T X, Liu J J, et al. Attention mechanisms in computer vision: A survey[J]. Computational visual media, 2022, 8(3): 331-368. [71] Guo M H, Lu C Z, Liu Z N, et al. Visual attention network[J]. Computational visual media, 2023, 9(4): 733-752. [72] Khanam R, Hussain M. Yolov11: An overview of the key architectural enhancements[J]. arXiv preprint arXiv:2410.17725, 2024. [73] Zhang X, Song Y, Song T, et al. AKConv: Convolutional kernel with arbitrary sampled shapes and arbitrary number of parameters[J]. arXiv preprint arXiv:2311.11587, 2023: 2-10. [74] Utami G C, Widiawati C R, Subarkah P. Detection of Indonesian Food to Estimate Nutritional Information Using YOLOv5[J]. Teknika, 2023, 12(2): 158-165. [75] Han W, He N, Wang X, et al. IDPD: Improved deformable-DETR for crowd pedestrian detection[J]. Signal, Image and Video Processing, 2024, 18(3): 2243-2253. [76] Wang R, Shivanna R, Cheng D, et al. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems[C]. Proceedings of the web conference 2021. 2021: 1785-1797. [77] Battini Sönmez E, Memiş S, Arslan B, et al. The segmented UEC Food-100 dataset with benchmark experiment on food detection[J]. Multimedia Systems, 2023, 29(4): 2049-2057. [78] Okamoto K, Yanai K. Uec-foodpix complete: A large-scale food image segmentation dataset[C]. Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part V. Springer International Publishing, 2021: 647-659. [79] Koonce B, Koonce B. ResNet 50[J]. Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization, 2021: 63-72. [80] Oquab M, Darcet T, Moutakanni T, et al. Dinov2: Learning robust visual features without supervision[J]. arXiv preprint arXiv:2304.07193, 2023. [81] Zhang H, Ogasawara K. Grad-CAM-based explainable artificial intelligence related to medical text processing[J]. Bioengineering, 2023, 10(9): 1070. ﹀
中图分类号：	TP391
开放日期：	2025-06-24

附件下载