- 无标题文档
查看论文信息

论文中文题名:

 中文短文本在线评论情感分析模型研究    

姓名:

 白瑜    

学号:

 19208088024    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 083500    

学科名称:

 工学 - 软件工程    

学生类型:

 硕士    

学位级别:

 工学硕士    

学位年度:

 2022    

培养单位:

 西安科技大学    

院系:

 计算机科学与技术学院    

专业:

 软件工程    

研究方向:

 人工智能与信息处理    

第一导师姓名:

 张小艳    

第一导师单位:

 西安科技大学    

论文提交日期:

 2022-06-22    

论文答辩日期:

 2022-06-06    

论文外文题名:

 Research on Sentiment Analysis Model of Chinese Short Text Online Comments    

论文中文关键词:

 情感分析 ; 在线评论 ; BERT ; 情感值 ; 卷积神经网络    

论文外文关键词:

 Sentiment analysis ; Online reviews ; BERT ; Sentiment value ; Convolutional neural networks    

论文中文摘要:

    文本情感分析是自然语言处理的重要研究方向之一,在产品分析、舆情监测、个性化服务等领域均得到广泛应用。在互联网的积极推动下,各网络平台上用户评论数据急剧增加,这些评论文本中包含的情感信息可以帮助用户了解产品,促进企业、平台调整产品质量和服务。因此,针对媒体在线评论数据的情感信息挖掘研究就显得尤为重要。然而,在线评论文本的长度对语义信息的表达有一定影响,导致在文本情感分析任务中获取语义信息不充足、分类准确率不高。针对上述问题,本文以中文短文本在线评论为研究对象,分别针对不同类型的评论文本进行研究。

   (1)针对在线评论中文本长度较小、特征极稀疏的极短文本进行情感分析。由于文本极短存在提取特征信息不足且忽略短文本本身所蕴含的情感信息的问题,提出一种基于文本情感值加权融合字词向量表示的模型SVW-BERT。首先,基于BERT及其变体WOBERT模型获取极短文本的字、词级别向量表示并融合来表征文本句向量,最大程度获取语义表征;其次,运用BosonNLP情感词典同时考虑副词、否定词、感叹句及疑问句对极短文本情感的影响,通过权值计算得到极短文本的情感值;最后构建情感值加权融合字词向量的中文极短文本情感分析模型。通过网络平台在线评论数据集对模型的可行性和优越性进行验证。实验结果表明,字词向量融合特征提取语义的能力更强,同时情感值加权句向量考虑了极短文本蕴含的情感信息,达到了提升情感分析能力的效果。

     (2)相较于文本长度较小的极短文本,文本长度较大的短文本包含的语义信息更复杂,随着语义深度的增大,上下文逻辑增强,而现有模型在复杂语境下提取上下文语义信息的能力较弱,导致情感分析的准确性较低。为此,提出了一种融合动态字词向量的短文本在线评论情感分析模型WBERT-CNN。该模型在BERT模型捕获动态字义信息的基础上,结合WOBERT模型捕获短文本动态词义信息,充分融合字义、词义在语义信息表征中的优越性,将词向量向对应字向量进行维度序列扩充,通过向量交互融合充分表征短文本的上下文语义特征,并将特征信息经过卷积神经网络进一步提取信息后进行文本情感分析。通过对在线评论数据集中的短文本进行情感极性二分类和三分类验证模型的有效性。 实验结果表明,该模型较主流的神经网络模型和已提出的基于BERT模型的网络相比分类能力更好。

论文外文摘要:

     Text sentiment analysis is one of the important research directions of natural language processing, and has been widely used in product analysis, public opinion monitoring, person­alized services and other fields. Under the active promotion of the Internet, user comment da­ta on various online platforms has increased dramatically. The emotional information con­tained in these comment texts can help users understand products and promote enterprises and platforms to adjust product quality and services. Therefore, the research on emotional infor­mation mining for media online comment data is particularly important. However, the length of online review texts has a certain impact on the expression of semantic information, result­ing in insufficient semantic information acquisition and low classification accuracy in text sentiment analysis tasks. In response to the above problems, this paper takes Chinese short text online reviews as the research object, and conducts research on different types of review texts.

     (1) Sentiment analysis is performed on extremely short texts with small text length and extremely sparse features in online reviews. Due to the problem of insufficient feature infor­mation extraction and ignoring the emotional information contained in the short text itself due to the extremely short text, a model SVW-BERT based on the weighted fusion of text senti­ment value and word vector representation is proposed. First, based on the BERT and its var­iant WOBERT model, the word and word-level vector representations of very short texts are obtained and fused to represent the text sentence vectors, so as to obtain the semantic repre­sentation to the greatest extent; secondly, the BosonNLP sentiment dictionary is used to con­sider adverbs, negative words, and exclamations at the same time. And the influence of inter­rogative sentences on the sentiment of very short texts, the sentiment value of very short texts is obtained by weight calculation; finally, a sentiment analysis model of Chinese very short texts with weighted sentiment value fusion word vectors is constructed. The feasibility and superiority of the model are verified through the online review dataset of the network plat­form. The experimental results show that the ability of word vector fusion feature to extract semantics is stronger, and the sentiment value-weighted sentence vector takes into account the emotional information contained in very short texts, and achieves the effect of improving the ability of sentiment analysis.

      (2) Compared with extremely short texts with small text lengths, short texts with large text lengths contain more complex semantic information. With the increase of semantic depth, the context logic is enhanced, and the existing models are in complex contexts. The weak ability to extract contextual semantic information leads to lower accuracy of sentiment analy­sis. To this end, a sentiment analysis model WBERT-CNN for short text online reviews fused with dynamic word vectors is proposed. Based on the dynamic word meaning information captured by the BERT model, combined with the WOBERT model to capture the dynamic word meaning information of short texts, this model fully integrates the advantages of word meaning and word meaning in the representation of semantic information, and extends the word vector to the corresponding word vector. The vector interaction fusion fully characteriz­es the contextual semantic features of short texts, and the feature information is further ex­tracted by convolutional neural networks for text sentiment analysis. The effectiveness of the model is verified by performing sentiment polarity binary classification and triple classifica­tion on short texts in an online review dataset. The experimental results show that the model has better classification ability than the mainstream neural network model and the proposed network based on the BERT model.

参考文献:

[1] CNNIC发布第49次《中国互联网络发展状况统计报告》[J].新闻潮,2022(02):3.

[2] Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications[J]. Knowledge-based systems, 2015, 89: 14-46.

[3] 吴应良, 黄媛, 王选飞. 在线中文用户评论研究综述: 基于情感计算的视角[J]. 情报科学, 2017, 35(6): 159-163.

[4] Naragund G H, Santhosh Kumar K L, Majumdar J. Development of decision making and analysis on customer reviews using sentiment dictionary for Human-robot interaction[J]. International Journal of Advanced Research in Computer and Communication Engi-neer¬ing (IJARCCE), 2015, 4(8).

[5] Mullen T, Collier N. Sentiment analysis using support vector machines with diverse in-formation sources[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25-26 July 2004, Barcelona, Spain. 2004.

[6] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neu-ral computation, 2006, 18(7): 1527-1554.

[7] 叶霞, 曹军博, 许飞翔, 等. 中文领域情感词典自适应学习方法[J]. 计算机工程与设计, 2020, 41(8):7.

[8] Ahmed M , Chen Q , Li Z . Constructing domain-dependent sentiment dictionary for sentiment analysis[J]. Neural Computing and Applications, 2020, 32(18):14719-14732.

[9] 曾雪强, 华鑫, 刘平生, 等. 基于情感轮和情感词典的文本情感分布标记增强方法[J].计算机学报,2021,44(06):1080-1094.

[10] Wicentowski R, Sydes M R. Emotion detection in suicide notes using maximum entropy classification[J]. Biomedical informatics insights, 2012, 5: BII. S8972.

[11] 黄文明, 孙艳秋. 基于最大熵的中文短文本情感分析[J]. 计算机工程与设计, 2017, 38(1): 138-143.

[12] RUZ G A, HENRÍQUEZ P A, MASCAREÑO A. Sentiment analysis of Twitter data dur-ing critical events through Bayesian networks classifiers[J]. Future Generation Comput-er Systems, 2020, 106:92-104.

[13] Zhang S, Li X, Zong M, et al. Efficient knn classification with different numbers of near¬est neighbors[J]. IEEE transactions on neural networks and learning systems, 2017, 29(5): 1774-1785.

[14] Chikersal P , Poria S , Cambria E . SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning[C]//Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 2015: 647-651.

[15] 李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J].计算机应用研究,2015,32(04):978-981.

[16] 蒋盛益, 郭林东, 王连喜, 等. 评价对象抽取研究综述[J]. 自动化学报, 2018, 44(7): 1165-1182.

[17] 张冬雯, 杨鹏飞, 许云峰. 基于 word2vec 和 SVMperf 的中文评论情感分类研究[J]. 计算机科学, 2016, 43(Z6): 418-421, 447.

[18] Wang J, Yu L C, Lai K R, et al. Dimensional sentiment analysis using a regional CNN-LSTM model[C]// Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers). 2016: 225-230.

[19] 王伟, 孙玉霞, 齐庆杰, 等. 基于BiGRU-attention神经网络的文本情感分类模型[J].计算机应用研究,2019,36(12):3558-3564.DOI:10.19734/j.issn.1001-3695.2018.07.0413.

[20] 戚天梅, 过弋, 王吉祥, 等. 基于机器学习的外汇新闻情感分析[J]. 计算机工程与设计, 2020, 41(6): 7.

[21] 孙庆庆. 阅读情境驱动下长文本情感分析模型构建研究[D]. 郑州: 郑州航空工业管理学院, 2021.

[22] 卢玲, 杨武, 王远伦, 等. 结合注意力机制的长文本分类方法[J]. 计算机应用, 2018, 38(5): 1272-1277.

[23] 尹春勇, 章荪. 面向短文本情感分类的端到端对抗变分贝叶斯方法[J]. 计算机应用, 2020, 40(9): 2536-2542.

[24] 李文慧, 张英俊, 潘理虎. 改进biLSTM网络的短文本分类方法[J]. 计算机工程与设计, 2020, 41(3): 7.

[25] Prasad A G, Sanjana S, Bhat S M, Et Al. Sentiment analysis for sarcasm detection on streaming short text data[C]// International Conference On Knowledge Engineering & Applications. Ieee, 2017.

[26] 王军, 李子舰, 刘潇蔓. 不同文本长度的体验型产品在线评论时间序列研究——以电影评论为例[J]. 图书情报工作, 2019, 63(16): 103.

[27] 岳永政. 基于特征表示的中文极短文本分类方法研究[D]. 合肥: 合肥工业大学, 2020.

[28] 牛振东, 石鹏飞, 朱一凡, 等. 基于深度随机森林的商品类超短文本分类研究[J]. 北京理工大学学报自然版, 2021, 41(12): 1277-1285.

[29] 杨瑞丽. 基于深度特征和加权word2vec融合模型的情感分析算法研究[D]. 西安: 西安科技大学, 2020.

[30] 陶永才, 张鑫倩, 石磊, 等. 面向短文本情感分析的多特征融合方法研究[J]. 小型微型计算机系统, 2020, 41(6): 7.

[31] Bengio Y, Ducharme R, Vincent P. A neural probabilistic language model[J]. Advances in Neural Information Processing Systems, 2000, 13.

[32] Mnih A, Hinton G E. A scalable hierarchical distributed language model[C]// Interna-tion¬al Conference On Neural Information Processing Systems. Curran Associates Inc. 2008.

[33] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in neural information processing systems, 2013, 26.

[34] Yang L, Hongbin D. Text sentiment analysis based on feature fusion of convolution neu-ral network and bidirectional long short-term memory network[J]. Journal of Computer Applications, 2018, 38(11): 3075.

[35] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation[C]// Conference On Empirical Methods In Natural Language Processing. 2014.

[36] 李卫疆, 漆芳. 基于多通道双向长短期记忆网络的情感分析[J]. 中文信息学报, 2019, 33(12): 119-128.

[37] Devlin J, Chang M W, Lee K,et al.Bert: Pre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv:1810.04805, 2018.

[38] 陈涛, 安俊秀. 基于特征融合的微博短文本情感分类研究[J]. 数据与计算发展前沿, 2020, 2(6): 21-29.

[39] Vaswani A, Shazeer N, Parmar N, Et Al. Attention is all you need[C]// Advances In Neu-ral Information Processing Systems. 2017: 5998-6008.

[40] Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of lan-guage representations[J]. arXiv preprint arXiv:1909.11942, 2019.

[41] Yang Z, Dai Z, Yang Y, et al. Xlnet: Generalized autoregressive pretraining for language understanding[J]. Advances in neural information processing systems, 2019, 32.

[42] 苏剑林. 提速不掉点: 基于词颗粒度的中文WoBERT[EB/OL]. Https://Kexue.Fm/Archives/7758, 2020-09-18.

[43] Peters M, Neumann M, et al. Deep contextualized word representations[J].arXiv: 1802.05365, 2018.

[44] Cho K, Van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv:1406.1

078, 2014.

[45] 厍向阳, 杨瑞丽, 董立红. 基于Sword2vect的中文在线商品评价情感分析[J]. 西安科技大学学报, 2020, 40(3): 8.

[46] Hong Z, Le W, Weijie W. Text sentiment analysis based on serial hybrid model of bi-directional long short-term memory and convolutional neural network[J]. Journal of Computer Applications, 2020, 40(1): 16.

[47] Dong J, He F, Guo Y, Et Al. A commodity review sentiment analysis based on Bert-cnn model[C]// 2020 5th International Conference On Computer And Communication Sys-tems (Icccs). 2020.

[48] 谢润忠, 李烨. 基于 BERT 和双通道注意力的文本情感分类模型[J]. 数据采集与处理, 2020, 35(4): 642-652.

[49] Abas AR, Elhenawy I, Zidan M, Othman M. Bert-cnn: A deep learn-ing model for detecti-

ng emotions from text[J]. CMC-Computers Materials & Continua, 2021, 71 (2) : 2943-2

961.

中图分类号:

 TP391.1    

开放日期:

 2022-06-22    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式