论文中文题名: | 基于生成对抗网络的矿井下语音增强算法研究 |
姓名: | |
学号: | 20208223038 |
保密级别: | 保密(1年后开放) |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 语音信号处理 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
论文提交日期: | 2023-06-14 |
论文答辩日期: | 2023-06-05 |
论文外文题名: | Research on Underground speech Enhancement Algorithm based on Generative adversarial Network |
论文中文关键词: | |
论文外文关键词: | Speech enhancement ; Generate adversarial network ; Mine noise ; Channel Attention mechanism |
论文中文摘要: |
实现强噪声和低频噪声环境下的语音通信是一项具有挑战性的高难度任务,鉴于现代通信技术愈加发达,人们对语音助手和通信系统的要求也日益变高,面向复杂环境的语音增强技术逐渐受到科学工作者们的重视。语音增强技术的主要功能在于提高语音信号的质量以及可懂度,以此来提升人与人之间的信息传递效率,在当今时代,随着人工智能的高速发展,语音增强算法也实现了质的飞跃,成为了现代通信技术不断进取的重要助推器。然而,在含有多噪声、非平稳噪声和未知噪声的矿井下,传统的语音增强算法往往无法有效去除噪声,增强效果往往较差,导致在井下的信息交互和录入十分困难,甚至会对工人造成生命威胁。因此,本文主要针对环境下语音增强算法进行研究,论文的工作内容如下: (1)生成对抗网络算法训练过程不稳定,易产生梯度爆炸,且与传统方法类似,多以语音频域信息作为输入训练网络,从而忽视了语音信号的相位信息。针对该问题,本文提出了一种结合相对损失和梯度惩罚项的时域生成对抗网络语音增强算法,首先引入相对生成对抗网络的思想对训练过程进行优化,重构网络训练流程。其次,引入了Huberloss损失函数优化训练过程,改善网络训练过程中梯度传递的稳定性。通过不同数据集上的实验表明,本文所提出算法与未改进模型相比在五项语音评价指标上提升的平均值为0.048,0.109,0.124,0.084,0.004。 (2)针对生成对抗网络自身结构限制,导致特征提取能力不足从而使得生成语音质量下降等问题,本文提出了一种基于通道的注意力机制优化的语音增强算法。通过使用一维卷积模块代替原模型中的两个全连接层,以避免通道信息损失的同时将通道注意力机制中的一维卷积模块中替换为空洞卷积块,用于增加卷积模块中的感受野,获取更多的特征信息。实验结果表明,引入通道注意力机制的生成对抗网络在选取的五项指标上与未改进模型相比提升的平均值为0.132,0.091,0.171,0.208,0.026。 最后,本文设计了面向矿井下的语音增强系统,用于展现本算法的实用性和应用价值,将所实现的算法融入语音增强系统中,并通过可视化的结果展现本算法的优越性。 |
论文外文摘要: |
It is a challenging and difficult task to realize speech communication under strong noise and low-frequency noise. In view of the development of modern communication technology, people have increasingly high requirements for voice assistants and communication systems. Therefore, speech enhancement technology for complex environments has gradually received the attention of scientists. The main function of speech enhancement technology is to improve the quality and intelligibility of speech signals, so as to enhance the efficiency of information transmission between people. In today's era, with the rapid onset of artificial intelligence, speech enhancement algorithm has also achieved a qualitative leap and has become an important booster of continuous improvement of modern communication technology. However, in the mine with multi-noise, non-stationary noise, and unknown noise, the traditional voice enhancement algorithm is often unable to effectively remove noise, and the enhancement effect is often poor, resulting in extremely inconvenient information interaction and input in the mine, and even a threat to workers' lives. (1) The training process of the generated antagonistic network algorithm is unstable and prone to gradient explosion. Similar to the traditional method, it mostly uses the audio domain information as the input training network, thus ignoring the phase information of the speech signal. In order to solve this problem, this paper proposes a voice enhancement algorithm of a time-domain generated adversarial network combining relative loss and gradient penalty terms. Firstly, the idea of a relatively generated adversarial network is introduced to optimize the training process and reconstruct the network training process. Secondly, the Huberloss function is introduced to optimize the training process and improve the stability of gradient transmission in the network training process. Experiments on different data sets show that compared with the original model, the proposed algorithm improves the average value of five speech evaluation indexes by 0.048, 0.109, 0.124, 0.084, and 0.004. (2) In view of the structural limitations of the generative adduction network itself, resulting in insufficient feature extraction capability and degradation of the quality of generated speech, this paper proposes a speech enhancement algorithm based on the optimization of attention mechanism based on channels. The one-dimensional convolution module is used to replace the two fully connected layers in the original model to avoid channel information loss. Meanwhile, the one-dimensional convolution module in the channel attention mechanism is replaced with empty convolution blocks to increase the receptive field in the convolutional module and obtain more feature information. The experimental results show that the average improvement of the generative adversarial network with channel attention mechanism is 0.132, 0.091, 0.171, 0.208, and 0.026 compared with the original model. Finally, this paper designs a speech enhancement system for the underground mine, which is used to show the practicability and application value of this algorithm. The realized algorithm is integrated into the speech enhancement system to show the superiority of this algorithm with visual results. |
参考文献: |
[1]周晓凤, 尘兴邦, 刘兰亭, 佟瑞鹏. 煤矿井下噪声诱发职业健康损害评估方法及应用 [J] .中国安全科学学报. 2022;32(08):08-14. [2]程丽平,李国豪.矿井噪声主动控制技术研究及参数优化 [J] .中国矿业,2021,30(01):90-94. [3]王冰,王玉玲,刘寅超等.某煤矿井下噪声危害程度调查分析 [J] .中国卫生工程学,2019,18(05):660-661. [4]王远声. 综采工作面噪声对作业人员影响关系研究 [D] .河南理工大学,2019. [5]雷柏伟, 吴兵, 程根银, 苏赟, 董梁. 煤矿井下主要设备噪声源测定分析研究 [J] . 中国安全生产科学技术. 2011;7(01):72-75. [6]梁馨月,寇晓波,康望等.某煤矿职业危害接触调查分析 [J].工业卫生与职业病,2022,48(01):57-61. [8]刘炜杰,安桐,张涛.基于Katz维数的改进谱减算法 [J] .信息与控制,2021,50(06):677-684. [9]董胡,刘刚,马振中.基于自适应MMSE-LSA与NMF的语音增强算法 [J] .探测与控制学报,2021,43(04):81-85+91. [10]汪春华,冯焱侠.基于VMD-维纳滤波的时间序列去噪[J].自动化技术与应用,2022,41(01):9-13. [11]李维松,许伟杰,张涛.基于小波变换阈值去噪算法的改进[J].计算机仿真,2021,38(06):348-351+356. [12]冉福星,傅勇,潘晴.基于EMD与SSA的语音增强算法研究 [J] .信息技术,2018(03):113-116. [15]潘晴,冉福星,李雅昆.基于EMD的前后置滤波语音增强算法[J].河南师范大学学报(自然科学版),2018,46(03):33-39. [16]王霞, 王丹, 王光艳, 张艳. 压缩感知与EMD相结合的带噪面罩语音增强 [J]. 计算机工程与应用. 2017;53(18):137-140. [30]向前, 唐勇. 基于生成对抗网络的汉语语音增强技术研究 [J]. 计算机应用研究. 2020;37(02):150-151. [34]王怡斐, 韩俊刚, 樊良辉. 基于WGAN 的语音增强算法研究 [J]. 重庆邮电大学学报(自然科学版). 2019,31(01):42-46. [35]谭诺亚. 基于生成式对抗网络的语音增强算法 [D]. 湖南大学; 2020. [41]曹洁, 周尧风, 于泓, 李晓旭. 基于SI-SDR优化的生成对抗网络语音增强方法 [J]. 华中科技大学学报(自然科学版). 2020;48(11):17-23. |
中图分类号: | TN912 |
开放日期: | 2024-06-21 |