查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于多模态融合的体征状态识别方法研究
姓名：	杨晓玲
学号：	20206223076
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2032
培养单位：	西安科技大学
院系：	电气与控制工程学院
专业：	控制科学与工程
研究方向：	人工智能与模式识别
第一导师姓名：	黄向东
第一导师单位：	西安科技大学
论文提交日期：	2023-06-19
论文答辩日期：	2023-06-02
论文外文题名：	Research of Body Sign Recognition Based on Multimodal Fusion
论文中文关键词：	体征状态识别 ; 多模态融合 ; 小波包分解 ; 门控循环单元网络 ; Transformer+ 模型
论文外文关键词：	Body sign recognition ; Multimodal fusion ; Wavelet packet decomposition ; Gate recurrent unit ; Transformer+ model
论文中文摘要：	︿随着体域网技术和通信技术的发展，体征状态识别的实际应用需求已不满足于对基本生命体征状态的监测，而是期望能识别出更加全面且复杂的体征状态。本研究以构建科研人员体征状态的智能识别系统为背景，研究了人体基本生命体征状态、情绪状态、疲劳状态和综合体征状态这4 种体征状态的识别方法。主要研究工作如下： 1. 针对相关体征状态的数据集缺失问题，本文选取了心电信号、皮肤电信号、体温、血氧饱和度、血压、面部视频共6 个模态数据，并对其进行采集和预处理，从而实现了体征状态数据集的构建。 2. 针对生理信号的噪声与冗余信息，本文提出了生理信号特征提取（Physiological Signal Feature Extraction, PSFE）模型：首先，采用小波包分解方法对生理信号进行去噪与初步特征提取，接着进一步提取其时域特征和频域特征。针对面部视频的静态特征和动态特征，本文提出了面部视频特征提取（Time and Space Feature Extraction, TSFE）模型：首先，采用SE-ResNeXt-50 网络提取面部视频的静态空间特征；其次，采用门控循环单元网络提取面部视频的动态时序特征。 3. 针对多模态数据与体征状态的复杂关联关系，本文提出了基于多模态融合的体征状态识别（Body Sign Recognition Based on Multimodal Fusion, BSRM-MF）学习框架：首先，采用PSFE 模型和TSFE 模型提取各模态数据的特征信息；其次，选取了并联形式的模型层融合方法作为特征融合方法；再次，提出了Transformer+ 模型作为具体的特征融合模型，从而进行多模态特征信息的有效融合；最后，采用全连接层和 Softmax 激活函数识别出相应体征状态。针对本文提出的BSRM-MF 学习框架，本文进行了相关实验验证：首先，不同输入数据的消融实验验证了BSRM-MF 学习框架的可靠性与有效性，最高可实现94.62% 的识别准确率；其次，特征融合模型的对比实验验证了Transformer+ 模型相比于图注意力模型，其性能表现更为出色；最后，在公共数据RAMAS 上的模型验证实验进一步验证了BSRM-MF 学习框架的有效性和可靠性。 4. 针对体征状态识别应用需求，本文对体征状态识别系统进行相关研究与设计，根据数据处理的维度对系统进行模块化+ 层次化设计：首次，将不同数据处理过程封装成模块，从而实现低耦合、可拓展、可维护的体征状态识别系统；其次，该系统的逻辑架构分为感知层、网络层、应用层，分别进行数据的采集、传输、识别与可视化展示，从而为人体体征状态识别提供服务。本文通过构建基于多模态融合的体征状态识别的数据集、学习框架和系统，实现了对科研人员体征状态的精确识别。该研究成果可为科研人员的日常生活提供指导依据，从而维护科研人员身心健康，进一步提升科研人员的科研质量。此外，在可穿戴设备逐步成熟的背景下，该研究还可应用于远程医疗和健康监护等领域。该研究为体征状态识别的实际应用提供了研究基础，具有重要的研究价值和现实意义。﹀
论文外文摘要：	︿ With the development of body area network technology and communication technology, the practical application requirements of body sign recognition are no longer limited to monitoring basic body sign, but rather expect to recognize more comprehensive and complex body sign. This research focuses on building an intelligent recognition system for the body sign of scientific researchers, and studies the recognition methods of four body signs: basic body sign, emotional state, fatigue state, and comprehensive body sign. The main research work includes: 1. To solve the problem of missing data set of relevant signs, this study selected ECG signal, skin electrical signal, body temperature, oxygen saturation, blood pressure and facial video, and collected and preprocessed them, thus realizing the construction of body sign dataset. 2. Aiming at the noise and redundant information of physiological signals, this paper proposes a physiological signal feature extraction (PSFE) model: firstly, the wavelet packet decomposition method is used to denoise and preliminary feature extraction of physiological signals, and then its time domain features and frequency domain features are further extracted. Aiming at the static features and dynamic features of facial video, this paper proposes a Time and Space Feature Extraction (TSFE) model: firstly, the SE-ResNeXt-50 network is used to extract the static spatial features of facial video. Secondly, the gated recurrent unit network is used to extract the dynamic temporal features of facial video. 3. Aiming at the complex association between multimodal data and body sign, this paper proposes a learning framework of body sign recognition based on multimodal fusion (BSRMMF): firstly, the PSFE model and TSFE model are used to extract the feature information of each modal data; Secondly, the model layer fusion method in parallel form is selected as the feature fusion method. Thirdly, the Transformer+ model is proposed as a specific feature fusion model, so as to effectively fuse multimodal feature information. Finally, the fully connected layer and the Softmax activation function are used to identify the corresponding physical states. For the BSRM-MF learning framework proposed in this paper, this paper conducts relevant experimental verification: firstly, the ablation experiment of different input data verifies the reliability and effectiveness of the BSRM-MF learning framework, and the recognition accuracy can be achieved up to 94.62%. Secondly, the comparative experiment of the feature fusion model verifies that the Transformer+ model has better performance than the graph attention model. Finally, model verification experiments on the public dataset RAMAS further verify the effectiveness and reliability of the BSRM-MF learning framework. 4. In view of the application requirements of body sign recognition, this paper conducts relevant research and design of the body sign recognition system, and carries out modular + hierarchical design of the system according to the dimension of data processing: for the first time, different data processing processes are packaged into modules, so as to realize a low-coupling, expandable and maintainable sign state recognition system; Secondly, the logical architecture of the system is divided into perception layer, network layer, and application layer, and data collection, transmission, recognition and visual display are carried out respectively, so as to provide services for human body sign recognition. In this paper, by constructing a dataset, learning framework and system for body sign recognition based on multimodal fusion, this paper realizes the accurate recognition of the body sign of researchers. The research results can provide guidance for the daily life of researchers, so as to maintain the physical and mental health of researchers and further improve the quality of scientific research of researchers. In addition, in the context of the gradual maturity of wearable devices, the research can also be applied to telemedicine and health monitoring. This study provides a research basis for the practical application of body sign recognition, and has important research value and practical significance. ﹀
参考文献：	︿ [1] 陈凯华, 盛夏, 李博强, 等. 加强青年科研队伍建设, 加速实现科技自立自强——兼论中国科学院青年创新促进会发展经验与展望[J]. 中国科学院院刊, 2021, 36(5): 589–596. [2] 石长慧, 李睿婕, 何光喜, 等. 我国科研人员身心健康状况及干预对策研究[J]. 中国科技人才, 2022, 67(5): 51–59. [3] Libo Z, Shuang S, Junxin C, et al. Self-adaptive reconstruction for compressed sensingbased ECG acquisition in wireless body area network[J]. Future Generation Computer Systems, 2023, 142: 228–236. [4] 刘昌鑫. 基于多生理信号融合的体征识别技术研究与实现[D]. 南京: 东南大学, 2020. [5] 邓丽娜, 王晓亮. 基于生理信号的情绪识别研究综述[J]. 物联网技术, 2021, 11(7): 33–41. [6] Chen G, Hong Z, Guo Y, et al. A cascaded classifier for multi-lead ECG based on feature fusion[J]. Computer Methods and Programs in Biomedicine, 2019, 178: 135–143. [7] Gupta V, Chopda M D, Pachori R B. Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals[J]. IEEE Sensors Journal, 2018, 19(6): 2266–2274. [8] Wu X, Zheng Y, Chu C H, et al. Extracting deep features from short ECG signals for early atrial fibrillation detection[J]. Artificial Intelligence in Medicine, 2020, 109: 101896. [9] Jiao Y, Wang X, Kang Y, et al. A quick identification model for assessing human anxiety and thermal comfort based on physiological signals in a hot and humid working environment[J]. International Journal of Industrial Ergonomics, 2023, 94: 103423. [10] Zitouni M S, Park C Y, Lee U, et al. LSTM-modeling of emotion recognition using peripheral physiological signals in naturalistic conversations[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27(2): 912–923. [11] Li Y, Pang Y, Wang J, et al. Patient-specific ECG classification by deeper CNN from generic to dedicated[J]. Neurocomputing, 2018, 314: 336–346. [12] Oh S L, Ng E Y K, San T R, et al. Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats[J]. Computers in Biology and Medicine, 2018, 102: 278–287. [13] Huang Y, Li H, Yu X. A novel time representation input based on deep learning for ECG classification[J]. Biomedical Signal Processing and Control, 2023, 83: 104628. [14] Lu W, Hou H, Chu J. Feature fusion for imbalanced ECG data analysis[J]. Biomedical Signal Processing and Control, 2018, 41: 152–160. [15] Dokur Z, Ölmez T. Heartbeat classification by using a convolutional neural network trained with Walsh functions[J]. Neural Computing and Applications, 2020, 32: 12515–12534. [16] Zhang S, Zhao X, Lei B. Facial expression recognition based on local binary patterns and local fisher discriminant analysis[J]. WSEAS Transactions on Signal Processing, 2012, 8(1): 21–31. [17] Baltrušaitis T, Mahmoud M, Robinson P. Cross-dataset learning and person-specific normalization for automatic action unit detection[C]//In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Piscataway: IEEE, 2015, 6: 1–6. [18] 刘军, 景晓军, 孙松林, 等. 一种用于人脸识别的基于主导近邻像素的局部 Gabor 空间直方图特征 [J]. 北京邮电大学学报, 2015, 38(1): 51–54. [19] Zhang T, Zheng W, Cui Z, et al. A deep neural network-driven feature learning method for multi-view facial expression recognition[J]. IEEE Transactions on Multimedia, 2016, 18(12): 2528–2536. [20] Fan X, Tjahjadi T. A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences[J]. Pattern Recognition, 2015, 48(11): 3407–3416. [21] 刘涛, 周先春, 严锡君. 基于光流特征与高斯 LDA 的面部表情识别算法 [J]. 计算机科学, 2018, 45(10): 286–290+319. [22] 张鹏, 孔韦韦, 滕金保. 基于多尺度特征注意力机制的人脸表情识别 [J]. 计算机工程与应用, 2022, 58(1): 182–189. [23] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. [24] Elman J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179–211. [25] Liang D, Liang H, Yu Z, et al. Deep convolutional BiLSTM fusion network for facial expression recognition[J]. The Visual Computer, 2020, 36: 499–508. [26] Pan X, Zhang S, Guo W, et al. Video-based facial expression recognition using deep temporal–spatial networks[J]. IETE Technical Review, 2020, 37(4): 402–409. [27] Minaee S, Minaei M, Abdolrashidi A. Deep-emotion: Facial expression recognition using attentional convolutional network[J]. Sensors, 2021, 21(9): 3046. [28] Yang H, Ciftci U, Yin L. Facial expression recognition by de-expression residue learning[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2018: 2168–2177. [29] Zhao X, Liang X, Liu L, et al. Peak-piloted deep network for facial expression recognition[C]//In Proceedings of European Conference on Computer Vision (ECCV). Piscataway: IEEE, 2016: 425–442. [30] Kollias D, Zafeiriou S. Exploiting multi-cnn features in cnn-rnn based dimensional emotion recognition on the omg in-the-wild dataset[J]. IEEE Transactions on Affective Computing, 2020, 12(3): 595–606. [31] Mollahosseini A, Hasani B, Mahoor M H. Affectnet: A database for facial expression, valence, and arousal computing in the wild[J]. IEEE Transactions on Affective Computing, 2017, 10(1): 18–31. [32] Sun B, Li L, Zhou G, et al. Facial expression recognition in the wild based on multimodal texture features[J]. Journal of Electronic Imaging, 2016, 25(6): 061407–061407. [33] Yang Y, Gao Q, Song X, et al. Facial expression and EEG fusion for investigating continuous emotions of deaf subjects[J]. IEEE Sensors Journal, 2021, 21(15): 16894–16903. [34] Huang Y, Yang J, Liao P, et al. Fusion of facial expressions and EEG for multimodal emotion recognition[J]. Computational Intelligence and Neuroscience, 2017, 2017. [35] Cimtay Y, Ekmekcioglu E, Caglar O S. Cross-subject multimodal emotion recognition based on hybrid fusion[J]. IEEE Access, 2020, 8: 168865–168878. [36] Shi Q, Eriksson A, Van D H A, et al. Is face recognition really a compressive sensing problem?[C]//In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2011: 553–560. [37] Dmello S K, Kory J. A review and meta-analysis of multimodal affect detection systems[J]. Association for Computing Machinery Computing Surveys, 2015, 47(3): 1–36. [38] Shoumy N J, Ang L M, Seng K P, et al. Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals[J]. Journal of Network and Computer Applications, 2020, 149: 102447. [39] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527–1554. [40] Goodfellow I, Pouget J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139–144. [41] Makhzani A, Shlens J, Jaitly N, et al. Adversarial autoencoders[J]. ArXiv Preprint ArXiv: 1511.05644, 2015. [42] 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022(1479-1503). [43] 王黎, 韩清鹏. 人体生理信号的非线性分析方法[M]. 北京: 科学出版社, 2011: 1–96. [44] 马永昊. 基于心电的非接触精神疲劳监测系统的设计与研究[D]. 兰州: 兰州大学, 2019. [45] 张启飞. 皮肤电信号下学习焦虑的识别与调节技术研究[D]. 重庆: 西南大学, 2017. [46] Hutchison J S, Ward R E, Lacroix J, et al. Hypothermia therapy after traumatic brain injury in children[J]. New England Journal of Medicine, 2008, 358(23): 2447–2456. [47] Koelstra S, Muhl C, Soleymani M, et al. DEAP: A Database for emotion analysis; Using physiological signals[J]. IEEE Transactions on Affective Computing, 2012, 3(1): 18–31. [48] Perepelkina O, Kazimirova E, Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing[C]//In Proceedings of the 18th International Conference on Speech and Computer. Cham: Springer International Publishing, 2018: 501–510. [49] Zheng W L, Liu W, Lu Y, et al. EmotionMeter: A multimodal framework for recognizing human emotions[J]. IEEE Transactions on Cybernetics, 2019, 49(3): 1110–1122. [50] Zepf S, Hernandez J, Schmitt A, et al. Driver emotion recognition for intelligent vehicles: A survey[J]. Association for Computing Machinery, 2020, 53(3): 1424–1430. [51] Crouse M S, Nowak R D, Baraniuk R G. Wavelet-based statistical signal processing using hidden Markov models[J]. IEEE Transactions on Signal Processing, 1998, 46(4): 886–902. [52] 王黎, 韩清鹏. 人体生理信号的非线性分析方法[M]. 北京: 科学出版社, 2011: 1–96. [53] He K, Zhang X, Ren S, et al. Deep rresidual learning for image recognition[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 770–778. [54] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2015: 1–9. [55] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2017: 1492–1500. [56] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2018: 7132–7141. [57] Jeong E, Oh G, Lim S. Multi-task learning for human affect prediction with auditory-visual synchronized representation[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 2438–2445. [58] Oh G, Ryu J, Jeong E, et al. DRER: Deep learning–based driver’s real emotion recognizer[J]. Sensors, 2021, 21(6): 2438–2445. [59] Graves A, Graves A. Long short-term memory[J]. Supervised Sequence Labelling with Recurrent Neural Networks, 2012: 37–45. [60] 邱锡鹏. 神经网络与深度学习[M]. 北京: 清华大学出版社, 2019: 145–147. [61] Nguyen H H, Huynh V T, Kim S H. An Ensemble Approach for Facial Behavior Analysis in-the-wild Video[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 2512–2517. [62] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//In Proceedings of Advances in Neural Information Processing Systems. Vancouver: NIPS, 2017: 285–301. [63] 刘超, 朱波. 融合画像和文本信息的轻量级关系图注意推荐模型[J]. 计算机应用研究, 2022, 40(4): 1037–1043. ﹀
中图分类号：	TP391
开放日期：	2024-06-20

附件下载