论文中文题名: | 强噪声下的陕北方言语音识别系统研究 |
姓名: | |
学号: | 20208223071 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 语音识别 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-26 |
论文答辩日期: | 2023-06-26 |
论文外文题名: | Research on speech recognition system of northern shaanxi dialect under strong noise |
论文中文关键词: | |
论文外文关键词: | Dialect dataset ; Dialect speech recognition ; Strong noise ; Denoising auto encoder |
论文中文摘要: |
随着语音识别技术的迅速发展、使用范围不断扩大,目前在大语种方面取得了良好的成果。然而在陕北煤矿实际生产中的会议、调度、指挥等一系列沟通交流时,却仍存在陕北方言使用频率高于普通话的问题,因此陕北方言语音识别研究具有现实意义。针对研究过程中存在的煤矿强噪声信号对语音信号存在干扰、方言数据集不足及方言识别率较低的问题,进行了以下相关工作。 针对煤矿强噪声对语音识别准确率的影响,提出了改进的堆叠去噪自编码器(SDAE)语音去噪算法,它能有效地消除强噪声对语音信号的干扰。首先对含噪语音信号使用谱减法对强噪声初次去除,再使用堆叠去噪自编码器进行二次去噪。对自编码其进行堆叠有效的加快了训练速度,并降低了反解码过程中梯度消失的问题,从而实现对煤矿环境下强噪声的二次去除,对语音波形重建后得到较为纯净的语音。SDAE同时解决了谱减法过程中的边界定义、音乐噪声及参数调整等问题。通过对去噪处理后的语音信号进行语音可懂度(NCM值)评估,分别在信噪比为(-15DB、-10DB及-5DB)时,不同煤矿环境噪声下进行验证,结果表明本文所提出的融合谱减法的DAE去噪算法较当前的一些主流去噪算法均有所提升和改善。 针对方言语音识别率远低于普通话语音识别率的问题,提出了一种以CNN+TDNN-F神经网络为声学模型的语音识别模型,通过融合卷积神经网络和因子化时延神经网络,以更加准确的方式同时捕获语音信号在空间和时间上的特征,从而达到改善语音识别的效果。语言模型采用SRILM工具包构建。使用Kaldi作为语音识别工具,通过速度扰动算法扩充了原本的数据集,将参数分别设置为0.9和1.1,获得了3倍的语音数据。同时使用了i-vector特征,增加了模型的鲁棒性。使用Chain模型进行序列鉴别性训练,编解码后得到词错率结果。实验结果表明使用本文提出的CNN+TDNN-F声学模型将词错误率降低至了11.96%,较之前的语音识别算法在方言语音识别的准确率上有了明显的提高和改善。此外对还对降噪后的语音进行波形重建后在该模型上进行错字率验证,结果表明降噪后的语音错字率为12.11%,与纯净语音基本持平。 本文的最后对煤矿强噪声环境下陕北方言语音识别系统进行了需求分析、功能分析设计与实现,并在陕北矿业小宝当煤矿进行了实际应用 |
论文外文摘要: |
With the progress of science and technology, speech recognition technology has rapidly developed and its usage has been expanding. Currently, it has achieved good results in large languages. However, in the actual production of coal mines in northern Shaanxi, during a series of communication and communication such as meetings, scheduling, and command, the frequency of using Shaanxi dialect is higher than that of Mandarin. Therefore, the research on speech recognition of Shaanxi dialect has practical significance. In response to the problems of insufficient dialect dataset and strong noise signal interference in coal mines during the research process, the following related work has been carried out. In response to the strong noise in coal mines has a great impact on the accuracy of speech recognition, the algorithm uses spectral subtraction to remove the strong noise for the first time, and then introduces Species reintroduction to remove the noise for the second time. The use of spectral subtraction reduces the learning time and parameter quantity of DAE, reduces signal fluctuations, and is more conducive to feature mapping of pure and noisy speech by DAE. The introduction of DAE also solves the problems of boundary definition, music noise, and parameter adjustment during spectral subtraction. By evaluating the NCM value of the denoised speech signal and verifying it under different coal mine environmental noise levels when the signal-to-noise ratio is (-15DB, -10DB, and -5DB), the results show that the DAE denoising algorithm proposed in this paper, which integrates spectral subtraction, has improved and improved compared to some current mainstream denoising algorithms In response to the recognition rate of dialect speech is far lower than that of mandarin speech, a new Acoustic model (CNN+TDNN-F) is proposed. By combining Convolutional neural network and factorized delay neural network, the spatial and temporal characteristics of speech signals are simultaneously captured in a more accurate way, so as to improve the effect of speech recognition. The language model is constructed using the SRILM toolkit. Using Kaldi as a speech recognition tool, the original dataset was expanded through speed perturbation algorithm, with parameters set to 0.9 and 1.1, respectively, resulting in three times the speech data. Simultaneously using i-vector features increases the robustness of the model. Finally, the Chain model is used for sequence discriminant training. The experimental results show that the word error rate is reduced to 11.96% by using the CNN+TDNN-F Acoustic model proposed in this paper, which has significantly improved the accuracy of dialect speech recognition compared with previous speech recognition algorithms. In addition, waveform reconstruction was performed on the denoised speech and word error rate verification was performed on the model. The results showed that the word error rate of the denoised speech was 12.11%, which was basically the same as that of pure speech. At the end of the thesis, a requirement analysis, functional analysis, design, and implementation of the Shaanxi dialect speech recognition system under strong noise environment in coal mines were conducted, and it was applied in the Xiaobaodang coal mine of Shaanxi mining industry. |
参考文献: |
[1] 杨正哲, 任玉玲, 杜省, 柳瑞波. 分区域方言客服语音识别系统研究[J]. 网络新媒体技术, 2019, 8(1):37–42. [2] 刘伟波, 曾庆宁, 罗瀛, 郑展恒. 低信噪比环境下语音识别的鲁棒性方法研究[J]. 声学技术, 2019, 38(6):650–656. [3] 李轶杰,关海欣,刘升平.医疗场景下智能语音技术难点及解决方法探讨[J].中国数字医学,2021,16(8):7–11. [4] 杨逸舟,陈海江. 方言口音普通话的语音识别优化方法及系统[P]. 浙江省:CN113643695A,2021-11-12. [5] 栗婧, 王真, 秦亚茹等. 不同噪声强度对煤矿工人作业失误率的影响研究[J]. 中国安全科学学报, 2021,31(2):179–184. [6] 鱼昆, 张绍阳, 侯佳正等. 语音识别及端到端技术现状及展望[J]. 计算机系统应用, 2021, 30(3): 14–23. [30] 丁枫林, 郭武, 孙健. 端到端维吾尔语语音识别研究[J]. 小型微型计算机系统, 2020, 41(1):19–23. [31] 南措吉, 才让卓玛, 都格草. 基于BLSTM和CTC的藏语语音识别[J]. 青海师范大学学报(自然科学版), 2019,35(4):26–33. [32] 张策, 韦鹏程, 陆晓燕, 石熙. 重庆方言语音识别系统的设计与实现[J]. 计算机测量与控制, 2018, 26(1):256–259+263. [33] 刘晓峰. 山西大同地方方言语音识别技术及应用研究[D]. 山西:中北大学, 2020. [34] 吴君钦, 王迎福. 基于GCC-NMF的语音分离研究[J]. 江西理工大学学报, 2020,41(5):65–72. [35] 郭静芳. 基于深度学习的白语语音识别系统[D].云南:大理大学,2021. [36] 陈康宁. 基于深度学习的语音关键词检测技术研究[D].广东:华南农业大学,2019. [37] 李凯飞. 基于机器学习的工业语音指令识别研究及设计[D].贵州:贵州大学,2022. [38] 俞栋, 邓力. 解析深度学习:语音识别实践[ M]. 余凯, 钱彦是, 译. 5版. 北京: 电子工业出版社, 2017:78–89. [40] 肖林,肖倩宏,魏莉莉等.基于大数据和深度学习的电网调度语音识别声学模型研究[J].电力大数据,2022,25(9):30–36. [41] 郇晋侠. 山西朔州方言语音识别方法研究[D].山西:中北大学,2020. [43] 张敏,杜丹阳,李洪海.智能语音控制系统设计[J].工业控制计算机,2019,32(1):144–145+150. [46] 丁晓鸽,王成义.基于MATLAB GUI的语音信号去噪处理[J].信息技术与信息化,2023,275(2):26–29. [48] 刘笑. 基于改进谱减法的机载语音通话系统研究与实现[D]. 安徽大学, 2020. [55] 杜宇斌,赵磊.基于HTK的孤立词语音识别[J].山东理工大学学报(自然科学版),2019,33(5):63–69. [58] 周婕. 基于Kaldi的中文语音识别研究[D]. 江苏:南京邮电大学, 2022. [59] 徐金石, 杨立东. 基于多窗谱减和LMS在工厂中的去噪实现[J]. 电子测量技术, 2021,44(24):66–71. [60] 陈修凯, 陆志华, 金涛. 基于改进Berouti谱减法和维纳滤波结合的语音增强算法[J].无线通信技术, 2020,29(2):1–5+11. [62] 来杰,王晓丹,向前,宋亚飞,权文.自编码器及其应用综述[J].通信学报,2021,42(9):218–230. [67] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. nature, 2015, 521(7553): 436–444. [69] 张瑞华. 英文语音纠错自动识别系统设计与实现[J]. 自动化技术与应用,2019,38(10):170–172. [70] 王建领. 陕西方言集成.榆林卷[M]. 商务印书馆, 2020. [71] 梁玉龙, 屈丹, 邱泽宇. 基于改进i-vector的说话人感知训练方法研究[J]. 计算机工程, 2018, 44(5):262–267. [72] 舒帆, 屈丹, 张文林, 周利莉, 郭武. 采用长短时记忆网络的低资源语音识别方法[J]. 西安交通大学学报, 2017, 51(10):120–127. [73] 俞栋, 邓力, 俞凯等. 解析深度学习: 语音识别实践[M]. 电子工业出版社, 2016. |
中图分类号: | TP391 |
开放日期: | 2023-06-26 |