- 无标题文档
查看论文信息

论文中文题名:

 基于图像语义翻译的图文融合情感分析算法研究    

姓名:

 王颖    

学号:

 20207040031    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081002    

学科名称:

 工学 - 信息与通信工程 - 信号与信息处理    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 信息与通信工程    

研究方向:

 自然语言处理及计算机视觉    

第一导师姓名:

 黄健    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-15    

论文答辩日期:

 2023-06-02    

论文外文题名:

 Research on Sentiment Analysis Algorithm of Image-Text Fusion Based on Image Semantic Translation    

论文中文关键词:

 图文融合 ; 多模态情感分析 ; 图像描述 ; 情感相关性    

论文外文关键词:

 Image-text fusion ; Multimodal sentiment analysis ; Image caption ; Emotional correlation    

论文中文摘要:

随着互联网信息化的快速发展,研究表明多数用户倾向于图文结合的方式在推特、微博等社交平台发布观点。因此,图文融合的情感分析逐渐成为热点研究方向。然而传统的图文融合情感分析方法又存在不足,例如计算机视觉存在语义鸿沟问题、多模态数据之间的异构性问题等。针对目前存在的问题,本文提出了改进的图像语义翻译模型以及基于图像语义翻译的图文融合情感分析算法,具体内容如下:

首先,针对计算机视觉的语义鸿沟问题,本文将分析图像模态的高级语义以及使图像的情感更加明确作为重点研究内容,提出了改进的图像语义翻译模型,在编解码器架构的图像描述模型基础上融入情感极性的词向量参与训练,突出图像的情感倾向。同时,为了进一步提升模型性能和效果,使用改进的残差网络ResNeXt提取图像特征并引入双层通道-空间注意力机制,融合图像中的语义信息,使得生成更加自然全面的句子。

其次,针对图文两个模态之间相互作用及多模态数据之间存在异构性的问题,本文提出了基于图像语义翻译的图文融合情感分析算法。首先,将图像送入本文提出的图像语义翻译模型生成多风格图像描述,充分理解图像的高级语义,同时使得图像和文本特征处于同一语义空间并减少两个模态之间的差异性。其次,图像翻译为文本后使得多模态转化为单模态,同一特征空间的数据可以进行相关性分析,采用余弦相似性计算图文相关性并选择相关性最大的图像描述作为图像模态特征,充分挖掘了图文之间的语义相关性且使得两个模态的信息尽可能地互补。最后,采用特征融合以及辅助语句方式来进行情感分析,图像是用自然语言表达的,翻译使融合及情感分析具有更好的可解释性。

最后,实验结果表明,本文提出的图像语义翻译模型在Personality - Captions和MS COCO数据集的实验结果都优于其他算法。并将图像语义翻译模型应用到本文提出的基于图像语义翻译的图文融合情感分析网络中也取得了更有效的效果,在辅助语句融合的方式下能更好的理解图文的情感,在社交情感媒体数据集Twitter-15和Twitter-17的Accuracy和Macro-F1均高于基准模型。

论文外文摘要:

With the rapid development of Internet information technology, research has shown that most users prefer to express their opinions on social platforms such as Twitter and Weibo using a combination of text and images. Therefore, the integration of text and images in sentiment analysis has gradually become a hot research topic. However, traditional methods for sentiment analysis through text and image fusion have some limitations, such as the semantic gap in computer vision and the heterogeneity between multimodal data. To address these issues, this paper proposes an improved image semantic translation model and a text-image fusion sentiment analysis algorithm based on image semantic translation. The specific content is as follows:

Firstly, to address the semantic gap in computer vision, this paper focuses on analyzing the high-level semantics of image modality and making the sentiment of the image more explicit. It proposes an improved image semantic translation model that incorporates sentiment polarity word vectors into the training process based on the encoder-decoder architecture of image description models, highlighting the emotional inclination of the image. To further enhance the model's performance and effectiveness, an improved residual network called ResNeXt is used to extract image features, and a dual-channel spatial attention mechanism is introduced to fuse the semantic information in the image, resulting in more natural and comprehensive sentence generation.

Secondly, in view of the interaction between the two modalities of graphics and text and the heterogeneity of multi-modal data, this paper proposes an image-text fusion sentiment analysis algorithm based on image semantic translation. First, the image is sent to the image semantic translation model proposed in this paper to generate a multi-style image description, which fully understands the high-level semantics of the image, and at the same time makes the image and text features in the same semantic space and reduces the difference between the two modalities. Secondly, after the image is translated into text, the multi-modality is converted into a single-modality, and the data in the same feature space can be correlated. The cosine similarity is used to calculate the correlation between the image and the text, and the image description with the greatest correlation is selected as the image modality feature. , which fully exploits the semantic correlation between images and texts and makes the information of the two modalities as complementary as possible. Finally, feature fusion and auxiliary sentences are used for sentiment analysis. Images are expressed in natural language, and translation makes fusion and sentiment analysis more interpretable.

Finally, experimental results demonstrate that the proposed image semantic translation model outperforms other algorithms on the Personality-Captions and MS COCO datasets. Additionally, applying the image semantic translation model to the text-image fusion sentiment analysis network proposed in this paper achieves more effective results. When employing the auxiliary sentence fusion method, it better understands the sentiment of the text-image pair. The accuracy and macro-F1 scores on the Twitter-15 and Twitter-17 social emotion media datasets are higher than those of the baseline model.

参考文献:

[1]JI, Rongrong, et al. Survey of visual sentiment prediction for social media analysis. Frontiers of Computer Science, 2016, 10: 602-611.

[2]Zadeh A, Zellers R, Pincus E, et al. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages[J]. IEEE Intelligent Systems, 2016, 31(6): 82-88.

[3]彭浩, 朱望鹏, 赵丹丹, 等. 面向多源社交网络舆情的情感分析算法研究[J]. 信息技术, 2019(02):43-48.DOI:10.13274/j.cnki.hdzj.2019.02.010.

[4]Al Ajrawi S, Agrawal A, Mangal H, et al. WITHDRAWN: Evaluating business Yelp’s star ratings using sentiment analysis[J]. 2021.

[5]Subasic P, Huettner A. Affect analysis of text using fuzzy semantic typing[J]. IEEE Transactions on Fuzzy systems, 2001, 9(4): 483-496.

[6]Pandey V, Iyer C. Sentiment analysis of microblogs[J]. CS 229: Machine learning final projects, 2009.

[7]Gu B, Sung Y. Enhanced reinforcement learning method combining one-hot encoding-based vectors for CNN-based alternative high-level decisions[J]. Applied Sciences, 2021, 11(3): 1291.

[8]Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.

[9]Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. 2013: 3111-3119.

[10]Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks[C]//Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 2015: 959-962.

[11]Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.

[12]Deng D, Jing L, Yu J, et al. Sparse self-attention LSTM for sentiment lexicon construction[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(11): 1777-1790.

[13]Souma W, Vodenska I, Aoyama H. Enhanced news sentiment analysis using deep learning methods[J]. Journal of Computational Social Science, 2019, 2(1): 33-46.

[14]黄泽民, 吴迎岗. 结合BERT和卷积双向简单循环网络的文本情感分析[J]. 计算机应用与软件, 2022, 39(12):213-218.

[15]令狐阳. 基于图文融合的情感分析研究与应用[D].电子科技大学, 2021. DOI:10.27005/d.cnki.gdzku. 2021. 003839.

[16]Ko E, Kim E Y. Recognizing the sentiments of web images using hand-designed features[C]//2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 2015: 156-161.

[17]Yuan J, Mcdonough S, You Q, et al. Sentribute: image sentiment analysis from a mid-level perspective[C]//Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining. 2013: 1-8.

[18]Cao D, Ji R, Lin D, et al. Visual sentiment topic model based microblog image sentiment analysis[J]. Multimedia Tools and Applications, 2016, 75(15): 8955-8968.

[19]Zhu X, Li L, Zhang W, et al. Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition[C]//proceedings of the 26th international joint conference on artificial intelligence. 2017: 3595-3601.

[20]Rao T, Li X, Xu M. Learning multi-level deep representations for image emotion classification[J]. Neural Processing Letters, 2020, 51(3): 2043-2061.

[21]Zhang J, Liu X, Chen M, et al. Image sentiment classification via multi-level sentiment region correlation analysis[J]. Neurocomputing, 2022, 469: 221-233.

[22]张红斌, 石皞炜, 熊其鹏, 等. 基于主动样本精选与跨模态语义挖掘的图像情感分析[J]. 控制与决策, 2022, 37(11):2949-2958.DOI:10.13195/j.kzyjc.2021.0622.

[23]Gandhi A, Adhvaryu K, Poria S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion, 2022.

[24]Chen X, Wang Y, Liu Q. Visual and textual sentiment analysis using deep fusion convolutional neural networks[C]//2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017: 1557-1561.

[25]刘星. 融合局部语义信息的多模态舆情分析模型[J]. 信息安全研究, 2019, 5(4): 340-345.

[26]Wu W, Wang Y, Xu S, et al. SFNN: Semantic Features Fusion Neural Network for Multimodal Sentiment Analysis[C]//2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE). IEEE, 2020: 661-665.

[27]Kumar A, Srinivasan K, Cheng W H, et al. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data[J]. Information Processing & Management, 2020, 57(1): 102141.

[28]Lopes V, Gaspar A, Alexandre L A, et al. An AutoML-based approach to multimodal image sentiment analysis[C]//2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-9.

[29]Xu J, Huang F, Zhang X, et al. Sentiment analysis of social images via hierarchical deep fusion of content and links[J]. Applied Soft Computing, 2019, 80: 387-399.

[30]Cao M, Zhu Y, Gao W, et al. Various syncretic co‐attention network for multimodal sentiment analysis[J]. Concurrency and Computation: Practice and Experience, 2020, 32(24): e5954.

[31]Li Z, Xu B, Zhu C, et al. CLMLF: a contrastive learning and multi-layer fusion method for multimodal sentiment detection[J]. arXiv preprint arXiv:2204.05515, 2022.

[32]胡慧君, 冯梦媛, 曹梦丽, 等. 基于语义相关的多模态社交情感分析[J]. 北京航空航天大学学报, 2021, 47(03):469-477.DOI:10.13700/j.bh.1001-5965.2020. 0451.

[33]Wang B, Li Y, Li S, et al. Sentiment Analysis Model Based on Adaptive Multi-modal Feature Fusion[C]//2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP). IEEE, 2022: 761-766.

[34]Ye J, Zhou J, Tian J, et al. Sentiment-aware multimodal pre-training for multimodal sentiment analysis[J]. Knowledge-Based Systems, 2022, 258: 110021.

[35]Lu M, Zhao T, Mao C, et al. Target-level Sentiment Analysis Based on Image and Text Fusion[C]//2022 4th International Conference on Robotics and Computer Vision (ICRCV). IEEE, 2022: 305-309.

[36]Gu J, Cai J, Joty S R, et al. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7181-7189.

[37]Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3156-3164.

[38]Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International conference on machine learning. PMLR, 2015: 2048-2057.

[39]Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6077-6086.

[40]Liu M, Li L, Hu H, et al. Image caption generation with dual attention mechanism[J]. Information Processing & Management, 2020, 57(2): 102178.

[41]Guo L, Liu J, Yao P, et al. Mscap: Multi-style image captioning with unpaired stylized text[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 4204-4213.

[42]Shuster K, Humeau S, Hu H, et al. Engaging image captioning via personality[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 12516-12526.

[43]Zhang Z, Zhang H, Wang J, et al. Generating news image captions with semantic discourse extraction and contrastive style-coherent learning[J]. Computers and Electrical Engineering, 2022, 104: 108429.

[44]Duan Y, Wang Z, Li Y, et al. Cross-domain multi-style merge for image captioning[J]. Computer Vision and Image Understanding, 2023, 228: 103617.

[45]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[46]Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.

[47]Nguyen D Q, Vu T, Nguyen A T. BERTweet: A pre-trained language model for English Tweets[J]. arXiv preprint arXiv:2005.10200, 2020.

[48]Khan Z, Fu Y. Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation[C]//Proceedings of the 29th ACM Interna-tional Conference on Multimedia. 2021: 3034-3042.

[49]Yu J, Chen K, Xia R. Hierarchical interactive multimodal transformer for aspect-based multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2022.

[50]Atliha V, Šešok D. Comparison of VGG and ResNet used as Encoders for Image Captioning[C]//2020 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream). IEEE, 2020: 1-4.

[51]Lu D, Neves L, Carvalho V, et al. Visual attention model for name tagging in multimodal social media[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018: 1990-1999.

[52]Zhang Q, Fu J, Liu X, et al. Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).

[53]Yu J, Jiang J. Adapting BERT for target-oriented multimodal sentiment classification[C]. IJCAI, 2019.

[54]Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.

[55]Chen L, Zhang H, Xiao J, et al. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5659-5667.

[56]Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755.

中图分类号:

 TP391    

开放日期:

 2023-06-16    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式