查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于YOLOv7-tiny的手语识别算法研究
姓名：	胡其胜
学号：	21207223118
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	电子与通信工程
研究方向：	计算机视觉
第一导师姓名：	韩晓冰
第一导师单位：	西安科技大学
第二导师姓名：	师文
论文提交日期：	2024-06-12
论文答辩日期：	2024-06-01
论文外文题名：	Research on sign language recognition algorithm based on YOLOv7-tiny
论文中文关键词：	手语识别 ; 注意力机制 ; 关键帧提取 ; 目标跟踪 ; 无迹卡尔曼滤波
论文外文关键词：	Sign Language Recognition ; Attention Mechanism ; Keyframe Extraction ; Target Tracking ; Unscented Kalman Filtering
论文中文摘要：	︿手语作为一种特殊的沟通交流方式，是许多聋哑人士和听力障碍者最常用的交流工具，传统的手语识别方法主要存在问题是数据采集和处理难度大、特定人识别以及对背景环境要求高。为了解决上述问题，本文引入了YOLOv7-tiny检测算法和DeepSORT跟踪算法并进行了改进，主要内容与创新点总结如下：（1）YOLOv7-tiny目标检测算法的改进。针对复杂背景下YOLOv7-tiny目标检测算法存在检测难度大和准确度低的问题，本文引入ECA-Net模块来改进CBAM注意力机制，并集成到YOLOv7-tiny的Neck层中，使模型更加精准地定位和识别到关键目标。其次，使用SIOU损失函数以加速边界框回归和提高模型准确度。此外，将普通卷积替换为Ghost卷积，减少模型计算量并加快检测速度。通过在Roboflow ASL的手语数据集上的算法对比仿真，结果表明改进的YOLOv7-tiny算法相比原算法精确率提升了6.53%，召回率提升了2.73%，平均精确率提升5.31%，模型参数量减少了4%。（2）DeepSORT目标跟踪算法的改进。首先本文提出一种基于特征的适用动态手语关键帧匹配算法，将动态的手语识别转变为对静态手语图片的识别，通过等间隔提取法提取手语动作的关键帧，利用MediaPipe框架获取手部21个关节点坐标，构建手部姿态骨架数据集，作为手语特征库。其次，针对原DeepSORT算法中检测速度较慢的问题，本文在检测器部分使用改进的YOLOv7-tiny算法，提高算法检测速度；用MobilenetV2网络替换原DeepSORT的特征提取网络，减少网络参数，增强特征提取；在IOU匹配中引入GIOU，提高匹配时的准确率；针对复杂环境下手部动作预测准确度低的问题，在目标预测阶段采用无迹卡尔曼滤波（UKF）算法。最后将改进的DeepSORT算法在CSL手语数据集进行测试。实验结果表明，本文提出的算法相比于原DeepSORT算法在跟踪准确性MOTA和跟踪精准度MOTP上分别提高了6.6%和3.2%，目标身份切换总数IDSW降低了24%，并在个人录制的手语视频上进行了连续手语语句识别，取得了良好的测试效果，验证了本文改进算法的可行性与有效性，为未来普遍应用提供了可能性与参考价值。﹀
论文外文摘要：	︿ Sign language, as a special way of communication, is the most commonly used communication tool for many deaf and hearing-impaired people, and the main problems of traditional sign language recognition methods are the difficulty of data acquisition and processing, specific person recognition, and the high requirements for the background environment. In order to solve the above problems, YOLOv7-tiny detection algorithm and DeepSORT tracking algorithm are introduced and improved in this paper, and the main contents and innovations are summarized as follows: (1) Improvement of YOLOv7-tiny target detection algorithm. Aiming at the problems of high detection difficulty and low accuracy of YOLOv7-tiny target detection algorithm in complex background, this paper introduces the ECA-Net module to improve the CBAM attention mechanism, and integrates it into the Neck layer of YOLOv7-tiny, so as to make the model more accurately localize and identify the key targets. Second, the SIOU loss function is used to accelerate bounding box regression and improve model accuracy. In addition, normal convolution is replaced with Ghost convolution to reduce model computation and speed up detection. Through the algorithm comparison simulation on the sign language dataset of Roboflow ASL, the results show that the improved YOLOv7-tiny algorithm improves the precision rate by 6.53%, the recall rate by 2.73%, the average precision rate by 5.31%, and the amount of the model parameter is reduced by 4% compared with the original algorithm. (2) Improvement of DeepSORT target tracking algorithm. Firstly, this paper proposes a feature-based applicable dynamic sign language key frame matching algorithm, which transforms the dynamic sign language recognition into the recognition of static sign language pictures, extracts the key frames of sign language actions through the equal interval extraction method, obtains the coordinates of 21 joints points of the hand using the MediaPipe framework, and constructs the hand gesture skeleton dataset, which serves as a feature library for sign language. Secondly, for the problem of slow detection speed in the original DeepSORT algorithm, this paper uses the improved YOLOv7-tiny algorithm in the detector part to improve the detection speed of the algorithm; replaces the feature extraction network of the original DeepSORT with the MobilenetV2 network, which reduces the network parameters and enhances the feature extraction; introduces the GIOU in the IOU matching, which increases the matching accuracy; for the problem of low accuracy of hand movement prediction in complex environments, the untraceable Kalman filter (UKF) algorithm is used in the target prediction stage. Finally, the improved DeepSORT algorithm is tested on the CSL sign language dataset. The experimental results show that the algorithm proposed in this paper improves 6.6% in tracking accuracy MOTA and 3.2% in tracking precision MOTP compared with the original DeepSORT algorithm, and continuous sign language utterance recognition is performed on personal recorded sign language video, which achieves good test results, verifies the feasibility and effectiveness of the improved algorithm in this paper, and provides the possibility of future general application and reference value. ﹀
参考文献：	︿ [1]余晓婷, 贺荟中. 国内手语研究综述[J]. 中国特殊教育, 2009(4): 36-41+2. [2]米娜瓦尔·阿不拉,阿里甫·库尔班,解启娜,耿丽婷.手语识别方法与技术综述[J].计算机工程与应用,2021,57(18):1-12. [3]刘建军 . 基于改进 YOLOv3 算法手语识别问题研究 [D]. 昆明理工大学,2019. [4]Zimmerman, T., Lanier, J., Blanchard, C., Bryson, S., & Harvill, Y. (1987). A hand gesture interface device[C]. In 87th Proceedings of the SIGCHI/GI conference on human factors in computing systems and graphics, toronto, Ontario, Canada (pp. . 189–192). [5]Rekimoto J. Gesture Wrist and GesturePad : Unobtrusive Wearable Interaction Devicesk[C]. In Proceedings of the 5th International Symposium on Wearable Computers(ISWC2001). Oct. 08-09,2001, Zurich, Switzerland. IEEE, pp:21-27. [6]Oz, C., Leu, M. C. (2011). American sign language word recognition with a sensory glove using artificial neural networks. Engineering Applications of Artificial Intelligence, 24(7), 1204–1213. [7]Wen, F ., Zhang, Z., He, T ., Lee, C. AI enabled sign language recognition and VR space bidirectional communication using triboelectric smart glove. Nature Communications, 12(1), Article 1. [8]Dias, T . S., Alves Mendes Junior, J . J ., Pichorim, S. F. An instrumented glove for recognition of Brazilian Sign Language Alphabet. IEEE Sensors Journal, 22(3), 2518–2529. [9]DelPreto, J ., Hughes, J ., D’Aria, M., de Fazio, M., Rus, D. A wearable smart glove and its application of pose and gesture detection to sign language classification. IEEE Robotics and Automation Letters, 7(4), 10589–10596. [10]Alzubaidi, M. A., Otoom, M., Abu R waq, A. M. A novel assistive glove to convert arabic sign language into speech. ACM Transactions on Asian and Low-Resource Language Information Processing,22(2), 1–16. [11]Lu, C., Amino, S., Jing, L. (2023). Data glove with bending sensor and inertial sensor based on weighted DTW fusion for sign language recognition. Electronics, 12(3), 613. [12]Li, J ., Zhong, J ., Wang, N. (2023). A multimodal human-robot sign language interaction framework applied in social robots. Frontiers in Neuroscience, 17, 1168888. [13]Pacifici, I., Sernani, P ., Falcionelli, N., T omassini, S., Dragoni, A. F . (2020). A surface electromyography and inertial measurement unit dataset for the Italian Sign Language alphabet. Data in Brief , 33, 106455. [14]Mendes Junior, J . J . A., Freitas, M. L. B., Campos, D. P ., Farinelli, F . A., Stevan, S. L. et al. (2020). Analysis of influence of segmentation, features, and classification in sEMG processing: A case study of recognition of Brazilian sign language alphabet. Sensors, 20(16), 4359. [15]Zhang, L., Zhang, Y ., Zheng, X. (2020). WiSign: Ubiquitous American sign language recognition using commercial Wi-Fi devices. ACM Transactions on Intelligent Systems and Technology, 11(3), 1–24. [16]Jingqiu W, Ting Z. An ARM-based embedded gesture recognition system using a data glove[C]. The 26th Chinese Control and Decision Conference (2014 CCDC), Changsha, China, 2014: 1580-1584. [17]Fang B, Sun F, Liu H, et al. 3D human gesture capturing and recognition by the IMMU-based data glove[J]. Neurocomputing, 2018, 277(2): 198-207. [18]Huang X, Wang Q, Zang S, et al. Tracing the motion of finger joints for gesture recognition via sewing RGO-coated fibers onto a textile glove[J]. IEEE sensors journal, 2019, 19(20): 9504-9511. [19]Dong Y, Liu J, Yan W. Dynamic hand gesture recognition based on signals from specialized data glove and deep learning algorithms[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70(1): 1-14. [20]Minaee S, Boykov Y Y, Porikli F, et al. Image segmentation using deep learning: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(7): 3523-3542. [21]Otter D W, Medina J R, Kalita J K, et al. A survey of the usages of deep learning for natural language processing[J]. IEEE transactions on neural networks and learning systems, 2020, 32(2): 604-624. [22]鹿姝. 基于模态融合的手语识别方法研究[D]. 中国矿业大学, 2021: 9- 11. [23]Ang, M. C., Taguibao, K. R. C., Manlises, C. O. (2022). Hand gesture recognition for Fil-ipino sign language under different backgrounds. 2022 IEEE International Conference on Artifi-cial Intelligence in Engineering and Technology (IICAIET), pp. 1–6. Kota Kinabalu, Malaysia. [24]Sunanda Das, Md. Samir Imtiaz, Nieb Hasan Neom, Nazmul Siddique, Hui Wang,A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier,Expert Systems with Applications,Volume 213, Part B,2023,118914. [25]Zhang, L., Tian, Q., Ruan, Q., Shi, Z. (2023). A simple and effective static gesture recognition method based on attention mechanism. Journal of Visual Communication and Image Representation, 92, 103783. [26]Venugopalan, A., Reghunadhan, R. (2021). Applying deep neural networks for the automatic recognition of sign language words: A communication aid to deaf agriculturists. Exp-ert Systems with Applications, 185,115601. [27]Rastgoo, R., Kiani, K., Escalera, S. (2021). Hand pose aware multimodal isolated sign language recognition. Multimedia Tools and Applications, 80(1), 127–163. [28]Das, S., Biswas, S. K., Purkayastha, B. (2023). A deep sign language recognition system for Indian sign language. Neural Computing & Applications, 35(2), 1469–1481. [29]Abdallah, M. S. S., Samaan, G. H. H., Wadie, A. R. R., Makhmudov, F., Cho, Y. I. (2023). Light-weightdeep learning techniques with advanced processing for real-time hand ge-sture recognition. Sensors, 23(1),2. [30]Cui, R., Liu, H., Zhang, C. (2019). A deep neural framework for continuous sign languag-e recognition by iterative training. IEEE Transactions on Multimedia, 21(7), 1880–1891. [31]Papastratis, I., Dimitropoulos, K., Konstantinidis, D., Daras, P. (2020). Continuous sign language recognition through cross-modal alignment of video and text embeddings in a joint-latent space. IEEE Access,8, 91170–91180. [32]Zhou, H., Zhou, W., Zhou, Y., Li, H. (2022). Spatial-temporal multi-cue network for sig-n language recognition and translation. IEEE Transactions on Multimedia, 24, 768–779. [33]Hu, J., Liu, Y., Lam, K. M., Lou, P. (2023). STFE-Net: A spatial-temporal feature extra-ction network for continuous sign language translation. IEEE Access, 11, 46204–46217. [34]Zhou, Z., Tam, V. W. L., Lam, E. Y. (2021). SignBERT: A BERT-based deep learningframework for continuous sign language recognition. IEEE Access, 9, 161669–161682. [35]Zhou, Z., Tam, V. W. L., Lam, E. Y. (2022). A cross-attention BERT-based frameworkfor continuous sign language recognition. IEEE Signal Processing Letters, 29, 1818–1822. [36]Redmon J,Divvala S,Girshick R,et al.You only look once:unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:779-788. [37]Liau W,Anguelovd,Erhan D,et al.SSD:single shot multiboxdetector [C]//European Conference on Computer Vision,Amsterdam,The Netherlands,Oct 11-14,2016.Heidelberg:Springer,2016:21-37. [38]Rens, He K, Irshick R, et al.Faster R-CNN: towards real-time object detection with region proposal networks[J].IEEE Transactions on Analysis & Machine Intelligence,2017,39(6):1137-1149. [39]Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105. [40]Girshick, R. (2015). “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision (Washington, DC. United States: IEEE Computer Society), 1440–1448. [41]Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28. [42]He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017). “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision. (Venice, Italy:IEEE), 2961–2969. [43]Redmon, J., and Farhadi, A. (2017). “YOLO9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition,(Honolulu, Hawaii: IEEE Computer Society) 7263–7271. [44]Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). “Ssd: Single shot multibox detector”, in European Conference on computer vision(Cham: Springer), 21–37. [45]Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017a).“Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition. (Honolulu, Hawaii:IEEE), 2117–2125. [46]Cheng, M., Bai, J., Li, L., Chen, Q., Zhou, X., Zhang, H., et al. (2020). “Tiny-RetinaNet: a one-stage detector for real-time object detection,” in Eleventh international conference on graphics and image processing (ICGIP 2019),vol.11373.(Hangzhou,Chin a:International Society for Optics and Photonics),113730R. [47]Joseph Redmon, Ali Farhadi.YOLOv3: An Incremental Improvement[C].In Computer Vision and Pattern Recognition (2018). [48]Bochkovskiy A, Wang C Y, Liao H Y M.YOLOv4: optimal speed and accuracy of object detection[C]. Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition,2020. [49]Wang C Y,Bochkovskiy A,Liao H Y M. YOLOv7:trainable bag-of-freebies sets new state-of-the-art for realtime object detectors［EB/OL］［2022-09-01］. [50]Song Q,Li S,Bai Q,et al. Object detection method for grasping robot based on improved YOLOv5［J］.Micromachines,2021,12(11):1273. [51]Cai J, Xu M, Li W, et al. MeMOT: multi-object tracking with memory[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8090-8100. [52]Harris C, Stephens M. A combined corner and edge detector[C]//Alvey vision conference. 1988, 15(50): 10-5244. [53]Welch G F. Kalman filter[J]. Computer Vision: A Reference Guide, 2020:1-3 [54]Wojke N , Bewley A , Paulus D . Simple Online and Realtime Tracking with a Deep Association Metric[J]. IEEE, 2017:3645-3649. [55]Wang Z, Zheng L, Liu Y, et al. Towards real-time multi-object tracking[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23– 28, 2020, Proceedings, Part XI 16. Springer International Publishing, 2020: 107-122. [56]Sun P, Cao J, Jiang Y, et al. Transtrack: Multiple object tracking with transformer[J]. arXiv preprint arXiv:2012.15460, 2020. [57]S.Woo, J.Park, J.Y.Lee, I.S.Kweon: CBAM: Convolutional block attention module[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, 2018:3-19. [58]Q.Wang, B.Wu, P.Zhu, P.Li, W.Zuo, Q.Hu: ECA-Net: Efficient channel attention for deep convolutional neural networks[C]. Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition, Seattle, 2020. [59]Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589. [60]Gevorgyan Z. SIoU loss: More powerful learning for bounding box regression[J]. arXiv preprint arXiv:2205.12740, 2022. [61]刘润楠. 手语语素对比提取法探究 [J]. 中国特殊教育, 2012, (7): 42-48. [62]中国聋人协会. 中国手语 [M]. 华夏出版社, 1990. [63]王春立. 面向大词汇量的连续中国手语识别系统的研究与实现 [D]. 大连理工大学, 2003. [64]黄子龙.视频关键帧提取算法的比较[J].数字技术与应用,2023,41(08):50-52. [65]Zhang F , Bazarevsky V , Vakunov A , et al. MediaPipe Hands: On-device Real-time Hand Tracking[J]. 2020. [66]Howard A G , Zhu M , Chen B , et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[J]. 2017. [67]Cui M,Liu H Z,Liu W.An adaptive unscented Kalman filter-based controller for simultaneous obstacle avoidance and tracking of wheeled mobile robots with unknown slipping parameters［J］. Journal of Intelligent & Robotic Systems,2018,92(3-4):17-20. [68]Suryana J,Candra D. Implementation of link16 based tactical data link system using software defined radio ［C］. 2019 International Conference on Electrical Engineering and Informatics(ICEEI). [69]Arulampalam, M, Sanjeev, et al. A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking.[J]. IEEE Transactions on Signal Processing, 2002, 50(2):174-174. ﹀
中图分类号：	TP391
开放日期：	2024-06-12

附件下载