论文中文题名: | 中文短文本在线评论情感分析模型研究 |
姓名: | |
学号: | 19208088024 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 083500 |
学科名称: | 工学 - 软件工程 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2022 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 人工智能与信息处理 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-06-22 |
论文答辩日期: | 2022-06-06 |
论文外文题名: | Research on Sentiment Analysis Model of Chinese Short Text Online Comments |
论文中文关键词: | |
论文外文关键词: | Sentiment analysis ; Online reviews ; BERT ; Sentiment value ; Convolutional neural networks |
论文中文摘要: |
文本情感分析是自然语言处理的重要研究方向之一,在产品分析、舆情监测、个性化服务等领域均得到广泛应用。在互联网的积极推动下,各网络平台上用户评论数据急剧增加,这些评论文本中包含的情感信息可以帮助用户了解产品,促进企业、平台调整产品质量和服务。因此,针对媒体在线评论数据的情感信息挖掘研究就显得尤为重要。然而,在线评论文本的长度对语义信息的表达有一定影响,导致在文本情感分析任务中获取语义信息不充足、分类准确率不高。针对上述问题,本文以中文短文本在线评论为研究对象,分别针对不同类型的评论文本进行研究。 (1)针对在线评论中文本长度较小、特征极稀疏的极短文本进行情感分析。由于文本极短存在提取特征信息不足且忽略短文本本身所蕴含的情感信息的问题,提出一种基于文本情感值加权融合字词向量表示的模型SVW-BERT。首先,基于BERT及其变体WOBERT模型获取极短文本的字、词级别向量表示并融合来表征文本句向量,最大程度获取语义表征;其次,运用BosonNLP情感词典同时考虑副词、否定词、感叹句及疑问句对极短文本情感的影响,通过权值计算得到极短文本的情感值;最后构建情感值加权融合字词向量的中文极短文本情感分析模型。通过网络平台在线评论数据集对模型的可行性和优越性进行验证。实验结果表明,字词向量融合特征提取语义的能力更强,同时情感值加权句向量考虑了极短文本蕴含的情感信息,达到了提升情感分析能力的效果。 (2)相较于文本长度较小的极短文本,文本长度较大的短文本包含的语义信息更复杂,随着语义深度的增大,上下文逻辑增强,而现有模型在复杂语境下提取上下文语义信息的能力较弱,导致情感分析的准确性较低。为此,提出了一种融合动态字词向量的短文本在线评论情感分析模型WBERT-CNN。该模型在BERT模型捕获动态字义信息的基础上,结合WOBERT模型捕获短文本动态词义信息,充分融合字义、词义在语义信息表征中的优越性,将词向量向对应字向量进行维度序列扩充,通过向量交互融合充分表征短文本的上下文语义特征,并将特征信息经过卷积神经网络进一步提取信息后进行文本情感分析。通过对在线评论数据集中的短文本进行情感极性二分类和三分类验证模型的有效性。 实验结果表明,该模型较主流的神经网络模型和已提出的基于BERT模型的网络相比分类能力更好。 |
论文外文摘要: |
Text sentiment analysis is one of the important research directions of natural language processing, and has been widely used in product analysis, public opinion monitoring, personalized services and other fields. Under the active promotion of the Internet, user comment data on various online platforms has increased dramatically. The emotional information contained in these comment texts can help users understand products and promote enterprises and platforms to adjust product quality and services. Therefore, the research on emotional information mining for media online comment data is particularly important. However, the length of online review texts has a certain impact on the expression of semantic information, resulting in insufficient semantic information acquisition and low classification accuracy in text sentiment analysis tasks. In response to the above problems, this paper takes Chinese short text online reviews as the research object, and conducts research on different types of review texts. (1) Sentiment analysis is performed on extremely short texts with small text length and extremely sparse features in online reviews. Due to the problem of insufficient feature information extraction and ignoring the emotional information contained in the short text itself due to the extremely short text, a model SVW-BERT based on the weighted fusion of text sentiment value and word vector representation is proposed. First, based on the BERT and its variant WOBERT model, the word and word-level vector representations of very short texts are obtained and fused to represent the text sentence vectors, so as to obtain the semantic representation to the greatest extent; secondly, the BosonNLP sentiment dictionary is used to consider adverbs, negative words, and exclamations at the same time. And the influence of interrogative sentences on the sentiment of very short texts, the sentiment value of very short texts is obtained by weight calculation; finally, a sentiment analysis model of Chinese very short texts with weighted sentiment value fusion word vectors is constructed. The feasibility and superiority of the model are verified through the online review dataset of the network platform. The experimental results show that the ability of word vector fusion feature to extract semantics is stronger, and the sentiment value-weighted sentence vector takes into account the emotional information contained in very short texts, and achieves the effect of improving the ability of sentiment analysis. (2) Compared with extremely short texts with small text lengths, short texts with large text lengths contain more complex semantic information. With the increase of semantic depth, the context logic is enhanced, and the existing models are in complex contexts. The weak ability to extract contextual semantic information leads to lower accuracy of sentiment analysis. To this end, a sentiment analysis model WBERT-CNN for short text online reviews fused with dynamic word vectors is proposed. Based on the dynamic word meaning information captured by the BERT model, combined with the WOBERT model to capture the dynamic word meaning information of short texts, this model fully integrates the advantages of word meaning and word meaning in the representation of semantic information, and extends the word vector to the corresponding word vector. The vector interaction fusion fully characterizes the contextual semantic features of short texts, and the feature information is further extracted by convolutional neural networks for text sentiment analysis. The effectiveness of the model is verified by performing sentiment polarity binary classification and triple classification on short texts in an online review dataset. The experimental results show that the model has better classification ability than the mainstream neural network model and the proposed network based on the BERT model. |
参考文献: |
[1] CNNIC发布第49次《中国互联网络发展状况统计报告》[J].新闻潮,2022(02):3. [3] 吴应良, 黄媛, 王选飞. 在线中文用户评论研究综述: 基于情感计算的视角[J]. 情报科学, 2017, 35(6): 159-163. [7] 叶霞, 曹军博, 许飞翔, 等. 中文领域情感词典自适应学习方法[J]. 计算机工程与设计, 2020, 41(8):7. [9] 曾雪强, 华鑫, 刘平生, 等. 基于情感轮和情感词典的文本情感分布标记增强方法[J].计算机学报,2021,44(06):1080-1094. [11] 黄文明, 孙艳秋. 基于最大熵的中文短文本情感分析[J]. 计算机工程与设计, 2017, 38(1): 138-143. [15] 李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J].计算机应用研究,2015,32(04):978-981. [16] 蒋盛益, 郭林东, 王连喜, 等. 评价对象抽取研究综述[J]. 自动化学报, 2018, 44(7): 1165-1182. [17] 张冬雯, 杨鹏飞, 许云峰. 基于 word2vec 和 SVMperf 的中文评论情感分类研究[J]. 计算机科学, 2016, 43(Z6): 418-421, 447. [20] 戚天梅, 过弋, 王吉祥, 等. 基于机器学习的外汇新闻情感分析[J]. 计算机工程与设计, 2020, 41(6): 7. [21] 孙庆庆. 阅读情境驱动下长文本情感分析模型构建研究[D]. 郑州: 郑州航空工业管理学院, 2021. [22] 卢玲, 杨武, 王远伦, 等. 结合注意力机制的长文本分类方法[J]. 计算机应用, 2018, 38(5): 1272-1277. [23] 尹春勇, 章荪. 面向短文本情感分类的端到端对抗变分贝叶斯方法[J]. 计算机应用, 2020, 40(9): 2536-2542. [24] 李文慧, 张英俊, 潘理虎. 改进biLSTM网络的短文本分类方法[J]. 计算机工程与设计, 2020, 41(3): 7. [26] 王军, 李子舰, 刘潇蔓. 不同文本长度的体验型产品在线评论时间序列研究——以电影评论为例[J]. 图书情报工作, 2019, 63(16): 103. [27] 岳永政. 基于特征表示的中文极短文本分类方法研究[D]. 合肥: 合肥工业大学, 2020. [28] 牛振东, 石鹏飞, 朱一凡, 等. 基于深度随机森林的商品类超短文本分类研究[J]. 北京理工大学学报自然版, 2021, 41(12): 1277-1285. [29] 杨瑞丽. 基于深度特征和加权word2vec融合模型的情感分析算法研究[D]. 西安: 西安科技大学, 2020. [30] 陶永才, 张鑫倩, 石磊, 等. 面向短文本情感分析的多特征融合方法研究[J]. 小型微型计算机系统, 2020, 41(6): 7. [36] 李卫疆, 漆芳. 基于多通道双向长短期记忆网络的情感分析[J]. 中文信息学报, 2019, 33(12): 119-128. [38] 陈涛, 安俊秀. 基于特征融合的微博短文本情感分类研究[J]. 数据与计算发展前沿, 2020, 2(6): 21-29. [42] 苏剑林. 提速不掉点: 基于词颗粒度的中文WoBERT[EB/OL]. Https://Kexue.Fm/Archives/7758, 2020-09-18. [45] 厍向阳, 杨瑞丽, 董立红. 基于Sword2vect的中文在线商品评价情感分析[J]. 西安科技大学学报, 2020, 40(3): 8. [48] 谢润忠, 李烨. 基于 BERT 和双通道注意力的文本情感分类模型[J]. 数据采集与处理, 2020, 35(4): 642-652. [49] Abas AR, Elhenawy I, Zidan M, Othman M. Bert-cnn: A deep learn-ing model for detecti- ng emotions from text[J]. CMC-Computers Materials & Continua, 2021, 71 (2) : 2943-2 |
中图分类号: | TP391.1 |
开放日期: | 2022-06-22 |