查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于麦克风阵列的语音增强方法研究
姓名：	吴宝桐
学号：	19307205010
保密级别：	公开
论文语种：	chi
学科代码：	085208
学科名称：	工学 - 工程 - 电子与通信工程
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2022
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	电子与通信工程
研究方向：	语音增强
第一导师姓名：	贺顺
第一导师单位：	西安科技大学
论文提交日期：	2023-01-05
论文答辩日期：	2022-12-07
论文外文题名：	Research on speech enhancement method based on microphone array
论文中文关键词：	波束形成技术 ; 麦克风阵列 ; 语音增强 ; 广义旁瓣抵消器
论文外文关键词：	Beamforming technology ; Microphone array ; Voice enhancement ; Generalized sidelobe canceller
论文中文摘要：	︿在实际应用中，语音通信过程中总会受到外界噪声的影响，使用语音增强方法能够将原始语音中混有的噪声消除，就显得非常重要了。使用多个麦克风搭建麦克风阵列模型，具有很强的空间选择性，同时能够获取多个声源信号，能够定位、自动检测以及在一定范围内可以跟踪说话人，对接收到的信号能抑制环境噪声的干扰，效果非常明显。然而在实际应用中，环境中的噪声是复杂多变的，针对复杂多变的声学环境问题，研究基于麦克风阵列的语音增强方法，非常有实际应用价值。本文在分析麦克风阵列语音信号处理理论基础上研究了几种传统麦克风阵列中语音增强算法，并提出两种改进算法，具体工作如下：（1）广义旁瓣抵消器（GSC）算法能够有效抑制相干噪声，但对非相干噪声抑制能力不强，并且GSC结构中阻塞矩阵不能完全抑制目标语音信号，导致语音泄露。针对上述问题，本文提出基于改进谱减法的GSC语音增强算法（ASS-GSC），该算法能对GSC结构中阻塞矩阵（BM）的方向参数进行自适应的调整，能降低语音信号在BM模块中的泄露，再将GSC处理完的信号输入改进的谱减法中，通过对过减因子和增益补偿因子的调节，消除残留的音乐噪声。仿真结果表明，所提ASS-GSC算法对噪声谱的估算比较准确，不仅降低了混有的音乐影响，同时有效抑制非相干噪声的干扰，相较于经典的RSS-GSC算法、GSC与谱减法相结合的方法，该算法输出的语音质量更高。（2）GSC与后置滤波相结合的算法中，该算法上支路延时-求和波束形成算法（DBF），要求麦克风阵元的数量必须超过一定规模，并且该算法只能在较大信噪比情形下去除非相干噪声，针对这一问题，本文提出一种改进后置滤波方法的GSC算法，该算法引入多源选择算法（MSS）选取能力最大的一路语音信息，提高信噪比，同时解决FBF算法对阵元数量的要求，并实现语音信号中相干和非相干噪声的抑制，从而达到语音增强的目的。仿真结果表明，相较于卷积传递函数广义旁瓣抵消器算法（CTF-GSC）的多通道后置滤波方法、传输函数比率（TF-GSC）算法及其优化方法，所提改进算法对非相干噪声的消除能力更强，具有较强的鲁棒性。﹀
论文外文摘要：	︿ In practical applications, speech communication is always affected by external noise. It is very important to use speech enhancement methods to eliminate the noise mixed in the original speech. Using multiple microphones to build a microphone array model has strong spatial selectivity. At the same time, it can obtain multiple sound source signals. It can locate, automatically detect and track the speaker within a certain range. It can suppress the interference of environmental noise on the received signal, with obvious results. However, in practical applications, the noise in the environment is complex and changeable. For the complex and changeable acoustic environment, it is very valuable to study the speech enhancement method based on microphone array. Based on the analysis of microphone array speech signal processing theory, this paper studies several speech enhancement algorithms in traditional microphone arrays, and proposes two improved algorithms. The specific work is as follows： (1) The generalized sidelobe canceller (GSC) algorithm can effectively suppress coherent noise, but it has weak ability to suppress incoherent noise, and the blocking matrix in the GSC structure can not completely suppress the target speech signal, resulting in speech leakage. To solve the above problems, this paper proposes a GSC speech enhancement algorithm based on improved spectral subtraction (ASS-GSC). This algorithm can adaptively adjust the direction parameters of the blocking matrix (BM) in the GSC structure, reduce the leakage of voice signals in the BM module, and then input the signal processed by GSC into the improved spectral subtraction to eliminate the residual music noise by adjusting the over subtraction factor and gain compensation factor. The simulation results show that the proposed ASS-GSC algorithm is more accurate in estimating the noise spectrum, which not only reduces the influence of mixed music, but also effectively suppresses the interference of incoherent noise. Compared with the classical RSS-GSC algorithm and the method of combining GSC with spectral subtraction, the output voice quality of this algorithm is higher. (2)In the algorithm combining GSC and post filtering, the branch delay sum beamforming algorithm (DBF) on this algorithm requires that the number of microphone array elements must exceed a certain scale, and this algorithm can only be used in the case of large signal to noise ratio except for coherent noise. To solve this problem, this paper proposes a GSC algorithm that improves the post filtering method. This algorithm introduces the multi-source selection algorithm (MSS) to select the most powerful voice information, Improve the signal-to-noise ratio, solve the requirements of FBF algorithm on the number of elements, and achieve the suppression of coherent and incoherent noise in speech signals, so as to achieve the purpose of speech enhancement. The simulation results show that, compared with the multi-channel post filtering method, the transfer function ratio (TF-GSC) algorithm and its optimization method of the convolutional transfer function generalized sidelobe canceller algorithm (CTF-GSC), the improved algorithm has stronger ability to eliminate incoherent noise and stronger robustness. ﹀
参考文献：	︿ [1] GANNOT S, VINCENT E, MARKOVICH-GOLAN S, et al. A consolidated perspective on multimicrophone speech enhancement and source separation [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(4): 692-730. [2] 陈建荣. 基于麦克风阵列的混响消减技术研究 [D]; 苏州大学, 2018. [3] FROST O L. An algorithm for linearly constrained adaptive array processing [J]. Proceedings of the IEEE, 1972, 60(8): 926-35. [4] JAN E, FLANAGAN J. Microphone arrays for speech processing; proceedings of the Proceedings of ISSE'95-International Symposium on Signals, Systems and Electronics, F, 1995 [C]. IEEE. [5] CAPON J. High-resolution frequency-wavenumber spectrum analysis [J]. Proceedings of the IEEE, 1969, 57(8): 1408-18. [6] GRIFFITHS L, JIM C. An alternative approach to linearly constrained adaptive beamforming [J]. IEEE Transactions on antennas and propagation, 1982, 30(1): 27-34. [7] 戴华骅, 王亚森, 赵英潇等. 一种基于GSC框架波束域快速稳健自适应波束形成算法 [Z]//戴华骅, 王亚森, 赵英潇等. 科学技术与工程. 2014: 39-43 [8] 林静然. 基于麦克风阵列的语音增强算法研究 [D]; 电子科技大学, 2007. [9] 胡永刚, 张雄伟, 邹霞等. 改进的非负矩阵分解语音增强算法 [J]. 信号处理, 2015, 31(09): 1117-23. [10] HOSHUYAMA O, SUGIYAMA A, HIRANO A. A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters [J]. IEEE Transactions on signal processing, 1999, 47(10): 2677-84. [11] GANNOT S, COHEN I. Speech enhancement based on the general transfer function GSC and postfiltering [J]. IEEE Transactions on Speech and Audio Processing, 2004, 12(6): 561-71. [12] 张正文, 汤敏慎, 尹波. 相干滤波与广义旁瓣相消器结合的小阵列语音增强算法 [J]. 河南科技大学学报(自然科学版), 2015, 36(03): 38-42+6-7. [13] 于春和, 苏龙. 基于GSC与谱减法的麦克风阵列语音增强方法 [Z]//于春和, 苏龙. 沈阳航空航天大学学报. 2015: 80-5 [14] 马子骥, 余旭, 倪忠. 基于CTF-GSC和后置滤波的麦克风阵列语音增强 [J]. 西北大学学报(自然科学版), 2017, 47(06): 829-37. [15] 李剑汶, 章宇栋, 童峰等. 一种采用旁瓣增强的麦克风阵列抗混响算法 [J]. 厦门大学学报(自然科学版), 2017, 56(05): 711-7. [16] 杨蕾. 麦克风阵列语音增强方法研究 [D]; 南京信息工程大学, 2019. [17] 罗瀛, 曾庆宁, 龙超. 多噪声环境下双微阵列语音增强算法 [J]. 计算机应用, 2019, 39(08): 2426-30. [18] 唐石磊. 基于广义旁瓣抵消器的主动噪声控制参考信号重塑 [D]; 重庆大学, 2020. [19] LI J, MA Z, MAO L, et al. Broadband Generalized Sidelobe Canceler Beamforming Applied to Ultrasonic Imaging [J]. 2020, 10(4): 1207. [20] LI G, LIANG S, NIE S, et al. Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition; proceedings of the INTERSPEECH, F, 2020 [C]. [21] 陆浩志. 麦克风阵列自适应GSC语音增强方法研究 [D]; 南京信息工程大学, 2021. [22] PRIYANKA S S, KUMAR T K. Generalized Sidelobe Canceller Beamforming with Combined Postfilter and Sparse NMF for Speech Enhancement [J]. Fluctuation and Noise Letters, 2021, 20(02): 2150014. [23] 唐兴潮, 伍星, 柳小勤等. 广义旁瓣抵消器算法的轴承噪声信号增强研究 [J]. 机械科学与技术: 1-6. [24] 蔡野锋, 叶超, 马登永等. 一种基于信噪比乘积更新的广义旁瓣抵消器 [J]. 网络新媒体技术, 2022, 11(01): 50-6. [25] YANG J, CHEN X, CAI H, et al. Generalized sidelobe canceler beamforming combined with eigenspace-wiener postfilter for medical ultrasound imaging [J]. Technology and Health Care, 2022, (Preprint): 1-12. [26] BAI M R, KUNG F-J. Speech Enhancement by Denoising and Dereverberation Using a Generalized Sidelobe Canceller-Based Multichannel Wiener Filter [J]. Journal of the Audio Engineering Society, 2022, 70(3): 140-55. [27] V. Sedlák, D. Ďuračková and R. Záluský, "Investigation impact of environment for performance of ICA for speech separation," 2012 ELEKTRO, Rajecke Teplice, Slovakia, 2012, pp. 89-93. [28] Y. Zhou, C. Bao and R. Cheng. GSC Based Speech Enhancement with Generative Adversarial Network[C]//2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 2019, pp. 901-906. [29] 廖志林. 人类有声语言的最早形态及其发展研究 [J]. 学术交流, 2007, (05): 142-5. [30] 姚若河, 钟子敏, 邱桂明等. 汉语单音语音的频谱分析实验 [J]. 广西物理, 2001, (01): 8-11. [31] ASTAPOV S, BERDNIKOVA J, PREDEN J-S. Optimized acoustic localization with SRP-PHAT for monitoring in distributed sensor networks [J]. International Journal of Electronics and Telecommunications, 2015, 59(4): 383-90. [32] 张勇, 刘轶, 刘宏. 结合人耳听觉感知的两级语音增强算法 [J]. 信号处理, 2014, 30(04): 363-73. [33] D K R, I L S, M S L, et al. Masking effect of high IQ on the Rey Auditory Verbal Learning Test in an adult sample with attention deficit/hyperactivity disorder [J]. Applied neuropsychology Adult, 2021. [34] YIN G, ZHU Y. Almost sure convergence of stochastic approximation algorithms with non-additive noise [J]. International Journal of Control, 1989, 49(4): 1361-76. [35] JIA M, SUN J, BAO C. Real-time multiple sound source localization and counting using a soundfield microphone [J]. Journal of Ambient Intelligence and Humanized Computing, 2017, 8(6): 829-44. [36] HERTZ D. Time delay estimation by combining efficient algorithms and generalized cross-correlation methods [J]. IEEE transactions on acoustics, speech, and signal processing, 1986, 34(1): 1-7. [37] ELKO G W, MEYER J. Microphone arrays [M]. Springer handbook of speech processing. Springer. 2008: 1021-41. [38] PADOIS T, DOUTRES O, SGARD F. On the use of modified phase transform weighting functions for acoustic imaging with the generalized cross correlation [J]. The Journal of the Acoustical Society of America, 2019, 145(3): 1546-55. [39] ARAKI S, ONO N, KINOSHITA K, et al. Meeting recognition with asynchronous distributed microphone array; proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), F, 2017 [C]. IEEE. [40] LI B, CHEN C, WANG W, et al. Certified adversarial robustness with additive noise [J]. Advances in neural information processing systems, 2019, 32. [41] 班琦. 基于麦克风阵列的远场声源目标定位技术研究 [D]; 沈阳理工大学, 2017. [42] 陈震昊. 基于麦克风阵列的语音增强算法的研究与实现 [D]; 南京邮电大学, 2021. [43] VAN DEN BROECK B, BERTRAND A, KARSMAKERS P, et al. Time-domain generalized cross correlation phase transform sound source localization for small microphone arrays; proceedings of the 2012 5th European DSP Education and Research Conference (EDERC), F, 2012 [C]. IEEE. [44] 陈颖睿. 麦克风阵列波束成形算法研究与实现 [D]; 南京邮电大学, 2020. [45] HERTZ D. Time delay estimation by combining efficient algorithms and generalized cross-correlation methods [J]. IEEE transactions on acoustics, speech, and signal processing, 1986, 34(1): 1-7. [46] 于春和, 马跃. 基于麦克风阵列的语音增强算法研究 [J]. 电脑与信息技术, 2021, 29(03): 39-42. [47] VAN DEN BROECK B, BERTRAND A, KARSMAKERS P, et al. Time-domain generalized cross correlation phase transform sound source localization for small microphone arrays; proceedings of the 2012 5th European DSP Education and Research Conference (EDERC), F, 2012 [C]. IEEE. [48] 徐娜, 吴长奇. 结合差分阵列与幅度谱减的双麦语音增强算法 [J]. 信号处理, 2018, 34(07): 876-81. [49] MARINESCU R-S, BUZO A, CUCU H, et al. Fast accurate time delay estimation based on enhanced accumulated cross-power spectrum phase; proceedings of the 21st European Signal Processing Conference (EUSIPCO 2013), F, 2013 [C]. IEEE. [50] 胡勇. 麦克风阵列语音增强算法研究[D].电子科技大学,2014. [51] HAO L, CAO S, ZHOU P, et al. Denoising Method Based on Spectral Subtraction in Time-Frequency Domain [J]. Advances in Civil Engineering, 2021, 2021. [52] 郭莉莉, 陈永红. 一种改进的谱减法语音增强算法 [J]. 通信技术, 2021, 54(06): 1350-5. [53] HERTZ D. Time delay estimation by combining efficient algorithms and generalized cross-correlation methods [J]. IEEE transactions on acoustics, speech, and signal processing, 1986, 34(1): 1-7. [54] LEBART K, BOUCHER J-M, DENBIGH P N. A new method based on spectral subtraction for speech dereverberation [J]. Acta Acustica united with Acustica, 2001, 87(3): 359-66. [55] 刘力玮. 基于GSC结构的麦克风阵列语音增强技术研究 [D]; 南京信息工程大学, 2021. [56] 郭业才, 许雪, 刘力玮. 基于Kalman滤波的GSC改进语音增强算法 [J]. 数据采集与处理, 2021, 36(05): 884-90. [57] 周培. 基于麦克风阵列语音增强算法的研究及TMS320C6678实现[D].湖南大学,2019. ﹀
中图分类号：	TN912.35
开放日期：	2023-04-12

附件下载