查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度学习方法的矿工多模态情绪状态评估
姓名：	卢兆祥
学号：	20206043038
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	081101
学科名称：	工学 - 控制科学与工程 - 控制理论与控制工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	电气与控制工程学院
专业：	控制科学与工程
研究方向：	脑机交互与人工智能
第一导师姓名：	汪梅
第一导师单位：	西安科技大学
论文提交日期：	2023-06-12
论文答辩日期：	2023-06-02
论文外文题名：	Multimodal emotional state assessment of miners based on deep learning methods
论文中文关键词：	多模态融合 ; 情绪识别 ; 自适应寻优 ; 情绪状态评估 ; 深度学习
论文外文关键词：	Multimodal fusion ; Emotion recognition ; Adaptive optimization seeking ; Mental state estimation ; Deep learning
论文中文摘要：	︿随着煤矿智能化和现代化的开采，煤矿工作人员从事井下作业过程中，其人员安全已经逐渐成为整个煤矿行业中相当重视的一项工作。其中，作为煤矿安全生产中的重要因素之一的矿工情绪状态也越来越受到人们的重视。针对于三种单模态不能准确的对矿工情绪进行准确识别的问题，本课题从矿工的生理状态下的脑电情绪状态和非生理状态下的人脸和语音情绪三种情绪状态层面出发，研究了基于深度学习模型的多模态融合矿工情绪状态评估的问题。课题主要研究内容如下：针对矿工的生理层面情绪状态识别问题，从人脸和语音两种模态信息实现情绪状态的判断。人脸情绪模态信息上，针对人脸识别模型收敛速度慢以及不能提取到深层次的人脸特征的问题，搭建了改进主干特征提取网络下多尺度人脸情绪识别网络模型，实现基于矿工人脸的情绪状态识别。语音情绪模态信息上，针对传统语音识别模型参数多体积冗杂的问题，通过对数梅尔谱的特征提取，搭建了基于轻量级深度可分离卷积残差神经网络模型提高对语音模态的情绪识别精度。结果表明，改进主干特征提取网络下多尺度人脸情绪识别网络模型在惊讶、开心、伤心等情绪的识别准确度较高。能够分别达到90.16%、85.87%、81.43%的准确度，较对比的深度学习模型情绪识别准确度分别提高了9.41%、8.53%、6.36%。轻量级深度可分离卷积残差神经网络模型提高对语音模态下惊讶、高兴、生气等情绪的识别能够分别达到88.65%、91.24%、83.19%的准确度，比其他的深度学习模型情绪识别准确度分别提高了10.28%、2.05%、7.74%。针对矿工的非生理层面情绪状态识别问题，对脑电模态信息进行情绪状态判别。针对单通道脑电特征单一的问题，通过对脑电时频域全局场功率和微分熵进行特征提取，然后从脑电多通道特征增强的角度出发，构建了一种基于Transformer的特征增强和注意力机制特征融合的脑电情绪识别网络，来提高对多通道脑电情绪识别的准确率。对于积极、消极和中性情绪的平均识别准确率分别为89.73%、88.68%和87.43%，比其他深度学习模型的情绪识别准确度分别提高了6.37%、7.04%、7.15%。针对矿工情绪状态的评估问题，本课题对脑电模态、人脸模态和语音模态分别进行情绪状态识别实验后，提出了一种基于多模态自适应融合下的权值优化算法来对人脸、脑电和语音模态信息的决策层权值进行多模态融合，然后得出多模态融合下的矿工情绪状态识别结果。结果表明，多模态自适应融合的识别结果准确率比改进的三种单模态效果优越。经过多模态信息融合后对生气、中性、开心和惊讶情绪状态识别精度为91.31%、88.36%、91.57%和90.75%，分别比脑电模态下的识别准确率提高2.64%、1.02%、1.98%、3.43%，比人脸模态下的识别准确率提高10.03%、19.72%、5.94%和1.29%，较语音模态下提高0.64%、6.82%、3.01%和7.39%。然后根据矿工情绪状态评估算法，针对被试者的主观情绪状态和客观的生理状态下情绪模态与非生理状态下情绪模态识别结果对矿工情绪状态进行评估。然后经过阈值的评判对被试者的情绪状态进行判断，以此来评估矿工的井下工作前和工作后的情绪状态能否适合继续工作。矿工情绪状态评估算法对矿工的情绪状态的研究提供了有效的支撑。本课题提出的针对矿工情绪状态评估的多模态自适应融合下的权值优化算法，可以将矿工的脑电、人脸和语音三种模态有效地进行融合，并对矿工的情绪状态评估提供了有效和准确的效果。基本上可以对矿工的情绪状态评估完成。为矿工的井下工作和煤矿安全生产中提供了一定的参考价值。﹀
论文外文摘要：	︿ With the intelligent and modernized mining in coal mines, the safety of personnel of underground coal mine workers engaged in the process of underground operations has gradually become of considerable importance in the entire coal mining industry. Among them, the emotional state of miners, which is one of the important factors in coal mine safety production, is also receiving more and more attention. In this project, the problem of multimodal fusion miners' emotional state assessment based on a deep learning model is investigated from three emotional state levels: EEG emotional state in the physiological state and face and voice emotion in the non-physiological state of miners. The main research contents of the project are as follows: (1) For the problem of recognizing emotional states at the physiological level of miners, the discrimination of emotional states is performed from both face and voice modal information. For face emotional modal information, this topic proposes an improved multi-scale face emotional recognition network model under the backbone feature extraction network to realize the emotional state recognition based on miners' faces. For speech emotion modality information, firstly, a light-weight depth separable convolutional residual neural network model is proposed to improve the accuracy of emotion recognition for speech modality by logarithmic merle spectrum for speech feature extraction. The results show that the multi-scale face emotion recognition network model with an improved backbone feature extraction network has higher recognition accuracy for the emotions of happy, sad, and surprised. It is able to achieve 90.16%, 85.87%, and 81.43% accuracy, respectively, which is 9.41%, 8.53%, and 6.36% improvement in emotion recognition accuracy over the compared deep learning models. The lightweight deep separable convolutional residual neural network model improves the recognition of emotions such as surprised, happy, and angry under speech modality can achieve 88.65%, 91.24%, and 83.19% accuracy, respectively, which improves 10.28%, 2.05%, and 7.74% accuracy than the other deep learning models for emotion recognition, respectively. (2) For the problem of identifying the non-physiological level emotional states of miners, the EEG modal information was used to discriminate the emotional states. Through feature extraction of global field power and differential entropy in the EEG time-frequency domain, and then from the perspective of the importance of EEG physical channels, an EEG emotion recognition network structure based on Transformer's feature enhancement and attention mechanism was constructed to enhance the EEG channels related to the EEG emotion recognition task, so as to achieve an improved accuracy rate of multi-channel EEG emotion recognition. The average recognition accuracies for positive, negative, and neutral emotions were 89.73%, 88.68%, and 87.43%, respectively, which were 6.37%, 7.04%, and 7.15% higher than the emotion recognition accuracies of other deep learning models, respectively. (3) To address the problem of miners' emotional state evaluation, this project proposes a weight optimization algorithm based on multimodal adaptive fusion under multimodal fusion to the multimodal fusion of decision-level weights of face, EEG, and voice modal information after conducting emotional state recognition experiments on EEG modal, and then derives the results of miners' emotional state recognition under multimodal fusion. The results show that the accuracy of the recognition results of multimodal adaptive fusion is superior to that of the improved three unimodal results. The recognition accuracies of angry, neutral, happy, and surprised emotional states after multimodal information fusion were 91.31%, 88.36%, 91.57%, and 90.75%, which were 2.64%, 1.02%, 1.98%, and 3.43% higher than those under EEG modality, respectively, and 10.03%, 19.72%, 5.94% and 1.29% higher than those under face modality, 5.94% and 1.29%, and 0.64%, 6.82%, 3.01% and 7.39% over that in the voice modality. The miners' emotional states were then assessed according to the miners' emotional state assessment algorithm for the subjective emotional state and objective emotional modality recognition results in the physiological state and the non-physiological state. A threshold value is set to determine the emotional state of the subject. The proposed weight optimization algorithm under multi-modal adaptive fusion for miners' emotional state assessment can suitably fuse three modalities of miners' EEG, face and voice, and provide effective and accurate results for miners' emotional state assessment. Basically, the emotional state assessment of miners can be completed. ﹀
参考文献：	︿ [1]郑翠, 张玉峰, 龙陆军. 新时期我国主体能源消费结构调整优化研究[J]. 中国煤炭地质, 2021 ,33(S1): 49-51. [2]程莉, 李争艳. 基于Kaya-LMDI的中国煤炭消费需求影响因素分析[J]. 煤炭经济研究, 2021, 41(06): 4-9. [3]聂聃, 王晓韡, 段若男, 等. 基于脑电的情绪识别研究综述[J]. 中国生物医学工程学报, 2012, 31(4): 595-606. [4]王冠男. 基于多特征的驾驶员疲劳驾驶识别[D]. 山西: 太原科技大学, 2021. [5]潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述[J]. 智能系统学报, 2020, 15(4): 633-645. [6]黄秦, 彭惠子, 余文婷, 等. 团体简化认知行为治疗对援鄂医务人员认知评价、正负性情绪的作用[J]. 心理月刊, 2021, 16(22): 20-21. [7]Suhaimi N S, Mountstephens J, Teo J. EEG-based emotion recognition: A state-of-the-art review of current trends and opportunities[J]. Computational intelligence and neuroscience, 2020, 2020. [8]尹永强. 基于卷积神经网络的脑电情绪识别方法研究[D]. 济南:山东师范大学, 2021. [9]蔡梓良, 郭苗苗, 杨新生, 等. 基于最大分类器差异域对抗方法的跨被试脑电情绪识别研究[J]. 生物医学工程学杂志, 2021, 38(03): 455-462. [10]Chao H, Dong L, Liu Y, et al. Emotion recognition from multiband EEG signals using CapsNet[J]. Sensors, 2019, 19(9): 1-16. [11]何群, 李冉冉, 付子豪, 等. 基于改进MEDA算法的脑电情绪识别[J]. 仪器仪表学报, 2021, 42(12): 157-166. [12]Hongxia L, Hongxi D, Jian L, et al. Research on the application of the improved genetic algorithm in the electroencephalogram-based mental workload evaluation for miners[J]. Journal of Algorithms & Computational Technology, 2016, 10(3): 198-207. [13]SHANECHI M M. Brain-machine interfaces from motor to mood[J]. Nature neuroscience, 2019, 22(10): 1554-1564. [14]Wang X, Ma Y, Cammon J, et al. Self-Supervised EEG Emotion Recognition Models Based on CNN[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2023, 31: 1952-1962. [15]Li D, Liu J, Yang Y, et al. Emotion Recognition of Subjects with Hearing Impairment Based on Fusion of Facial Expression and EEG Topographic Map[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2022. [16]Wang X, Yu C, Gu Y, et al. Multi‐Task and Attention Collaborative Network for Facial Emotion Recognition[J]. IEEJ Transactions on Electrical and Electronic Engineering, 2021, 16(4): 568-576. 8. [17]Lee J, Kim S, Kim S, et al. Multi-modal recurrent attention networks for facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 6977-6991. [18]Ghofrani A, Toroghi R M, Ghanbari S. Realtime face-detection and emotion recognition using mtcnn and minishufflenet v2[C]//2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI). IEEE, 2019: 817-821. [19]简腾飞, 王佳, 曹少中, 等. 基于Mixer Layer的人脸表情识别[J]. 计算机系统应用, 2022, 31(07): 128-134. [20]Ngo Q T, Yoon S. Facial expression recognition based on weighted-cluster loss and deep transfer learning using a highly imbalanced dataset[J]. Sensors, 2020, 20(9): 2639. [21]Barros P, Churamani N, Sciutti A. The facechannel: a fast and furious deep neural network for facial expression recognition[J]. SN Computer Science, 2020, 1: 1-10. [22]丛昊然. 基于LF-ResNet特征网络的矿井人脸识别研究[D]. 西安: 西安科技大学, 2021. [23]Basu S, Chakraborty J, Aftabuddin M. Emotion recognition from speech using convolutional neural network with recurrent neural network architecture[C]//2017 2nd International Conference on Communication and Electronics Systems (ICCES). IEEE, 2017: 333-336. [24]Li D, Liu J, Yang Z, et al. Speech emotion recognition using recurrent neural networks with directional self-attention[J]. Expert Systems with Applications, 2021, 173: 114683. [25]朱学超, 张飞, 高鹭, 等. 基于残差网络和门控卷积网络的语音识别研究[J]. 计算机工程与应用, 2022,58(07): 185-191. [26]徐鸣珂, 张帆. Head Fusion: 一种提高语音情绪识别的准确性和鲁棒性的方法[J]. 计算机科学, 2022, 49(07): 132-141. [27]陈巧红, 于泽源, 贾宇波. 基于混合分布注意力机制与混合神经网络的语音情绪识别方法[J]. 计算机工程与科学, 2022, 44(12): 2246-2254. [28]Zheng W L, Liu W, Lu Y, et al. Emotionmeter: A multimodal framework for recognizing human emotions[J]. IEEE transactions on cybernetics, 2018, 49(3): 1110-1122. [29]吴良庆, 张栋, 李寿, 等. 基于多任务学习的多模态情绪识别方法[J]. 计算机科学, 2019, 46(11): 284-290. [30]徐志京, 高姗. 基于Transformer-ESIM注意力机制的多模态情绪识别[J]. 计算机工程与应用, 2022,58(10): 132-138. [31]杨轩文. 结合图神经网络和多模态生理信号的情绪识别方法[D]. 武汉: 华中师范大学, 2022. [32]刘红梅. 通过静息态功能磁共振成像探讨解郁丸对WKY大鼠海马及前额叶皮层的脑功能活动的影响[D]. 太原: 山西医科大学,2020. [33]马捃凯. 基于转移熵的因效脑网络情绪识别研究[D]. 南京: 南京邮电大学, 2021. [34]吕宝粮, 张亚倩, 郑伟龙. 情感脑机接口研究综述[J]. 智能科学与技术学报, 2021, 3(01): 36-48. [35]Bajaj V, Rai K, Kumar A, et al. Rhythm-based features for classification of focal and non-focal EEG signals[J]. IET Signal Processing, 2017, 11(6): 743-748. [36]Placidi G, Cinque L, Polsinelli M. A fast and scalable framework for automated artifact recognition from EEG signals represented in scalp topographies of Independent Components[J]. Computers in Biology and Medicine, 2021, 132: 104347. [37]Koelstra S, Muhl C, Soleymani M, et al. Deap: A database for emotion analysis; using physiological signals[J]. IEEE transactions on affective computing, 2011, 3(1): 18-31. [38]Cowen A S, Keltner D, Schroff F, et al. Sixteen facial expressions occur in similar contexts worldwide[J]. Nature, 2021, 589: 251-257. [39]Farooq F, Ahmed J, Zheng L. Facial expression recognition using hybrid features and self-organizing maps[C]. 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017: 409-414. [40]Hajarolasvadi N, Demirel H. Deep facial emotion recognition in video using eigenframes[J]. IET Image Processing, 2020, 14(14): 3536-3546. [41]Chen Z, Li J, Liu H, et al. Learning multi-scale features for speech emotion recognition with connection attention mechanism[J]. Expert Systems with Applications, 2023, 214: 118943. [42]丽佳. 基于声谱图的语音情绪识别算法研究及实现[D]. 重庆: 重庆邮电大学, 2021. [43]卢艳. 基于神经网络与注意力机制结合的语音情感识别研究[D]. 北京: 北京邮电大学, 2019. [44]郑纯军, 王春立, 贾宁. 语音任务下声学特征提取综述[J]. 计算机科学, 2020, 47(05): 110-119. [45]Busso C, Bulut M, Lee C C, et al. IEMOCAP: Interactive emotional dyadic motion capture database[J]. Language resources and evaluation, 2008, 42: 335-359. [46]Livingstone S R, Russo F A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English[J]. PloS one, 2018, 13(5): e0196391. [47]Pan S, Tao J, Li Y. The CASIA audio emotion recognition method for audio/visual emotion challenge 2011[C]//Affective Computing and Intelligent Interaction: Fourth International Conference, ACII 2011, Memphis, TN, USA, October 9–12, 2011, Proceedings, Part II. Springer Berlin Heidelberg, 2011: 388-395. [48]刘志成, 王殿伟, 刘颖, 等. 基于二维伽马函数的光照不均匀图像自适应校正算法[J]. 北京理工大学学报, 2016, 36(2): 191-196. [49]闵秋莎, 刘能, 陈雅婷, 等. 基于面部特征点定位的头部姿态估计[J]. 计算机工程, 2018, 44(6): 263-269. [50]黄飞, 尤启房, 杨晋吉. ASM 的手骨提取方法研究[J]. 计算机工程与应用, 2016, 3. [51]李东民, 李静, 梁大川, 等. 基于多尺度先验深度特征的多目标显著性检测方法[J]. 自动化学报, 2019, 45(11): 2058-2070. [52]Zhang M, Pang K, Gao C, et al. Multi-scale aerial target detection based on densely connected inception ResNet[J]. IEEE Access, 2020, 8: 84867-84878. [53]Qu F, Liu J, Liu X, et al. A multi-fault detection method with improved triplet loss based on hard sample mining[J]. IEEE Transactions on Sustainable Energy, 2020, 12(1): 127-137. [54]Wen G, Chen H, Cai D, et al. Improving face recognition with domain adaptation[J]. Neurocomputing, 2018, 287: 45-51. [55]梁华刚, 温晓倩, 梁丹丹, 等.多级卷积特征金字塔的细粒度食物图片识别[J]. 中国图象图形学报, 2019, 24(06): 870-881. [56]Zheng W L, Lu B L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks[J]. IEEE Transactions on autonomous mental development, 2015, 7(3): 162-175. [57]任银芝. 脑卒中的脑电信号特征提取研究[D]. 杭州: 杭州电子科技大学, 2016. [58]王一, 薄华. 基于自发性脑电信号身份识别的方法[J]. 现代计算机 (专业版), 2017. [59]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. [60]周渤玉. 基于音频和视频的多模态情绪识别[D]. 北京: 中国矿业大学, 2022. [61]陈巧红, 于泽源, 孙麒, 等. 基于注意力机制与LSTM的语音情绪识别[J]. 浙江理工大学学报(自然科学版), 2020, 43(06): 815-822. [62]Zheng W Q, Yu J S, Zou Y X. An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//2015 international conference on affective computing and intelligent interaction (ACII). IEEE, 2015: 827-831. [63]Jothimani S, Premalatha K. MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network[J]. Chaos, Solitons & Fractals, 2022, 162: 112512. [64]熊珞琳, 毛帅, 唐漾, 等. 基于强化学习的综合能源系统管理综述[J]. 自动化学报, 2021, 47(10): 2321-2340. [65]赵旭, 黄光球, 江晋, 等. 基于深度强化学习的资源受限条件下的 DIDS 任务调度优化方法[J]. 控制与决策, 2021. [66]Khaleghi B, Khamis A, Karray F O, et al. Multisensor data fusion: A review of the state-of-the-art[J]. Information fusion, 2013, 14(1): 28-447. ﹀
中图分类号：	TN911.7
开放日期：	2024-06-14

附件下载