- 无标题文档
查看论文信息

论文中文题名:

 煤矿安全装备知识图谱构建及智能问答算法研究    

姓名:

 董亚欣    

学号:

 19308208013    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085212    

学科名称:

 工学 - 工程 - 软件工程    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 计算机科学与技术学院    

专业:

 软件工程    

研究方向:

 知识图谱    

第一导师姓名:

 付燕    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-20    

论文答辩日期:

 2023-06-05    

论文外文题名:

 Research on knowledge graph Construction and intelligent Question Answering algorithm of coal mine safety    

论文中文关键词:

 知识图谱 ; 煤矿安全装备领域 ; 智能问答 ; 信息抽取 ; 意图分类    

论文外文关键词:

 Knowledge graph ; Coal mine safety equipment field ; Intelligent question answering ; Information extraction ; Intent classification    

论文中文摘要:

煤矿安全装备的发展是煤矿安全生产的前提和保障,而现有的煤矿领域知识图谱技术缺乏对煤矿安全装备领域的系统研究,本文以具有复杂文本结构的煤矿安全装备领域相关数据为研究对象,构建了煤矿安全装备知识图谱。同时,由于知识图谱中常用的查询语言(Cypher)较为复杂,为能使用户更方便、快捷地利用知识图谱中大量丰富的知识,本文研究基于知识图谱的智能问答算法并搭建智能问答系统,用户直接输入自然语言问句,系统将答案返回给用户。本文的主要研究内容如下:

(1)由于煤矿安全装备领域存在较少的开源数据集,本文自建了煤矿安全装备领域实体、关系数据集(CMSE-EntityRel),针对CMSE-EntityRel数据集中存在实体嵌套、实体/关系重叠等复杂形式问题,本文提出一种基于片段分类解码的实体、关系联合知识抽取方法(TabREL)来构建知识图谱。该方法首先利用BERT预训练语言模型对文本数据进行特征编码;其次利用Biaffine模型对获取的特征编码进行双仿射变换,形成评分矩阵;最后利用片段分类解码的方式进行多标签分类任务。将其与现有的方法在CMSE-EntityRel数据集下进行对比实验,结果表明,TabREL方法的F1-score达到76%,比CasREL方法提升了2%,实现了对知识抽取方法的优化改进,从而更好地构建知识图谱。

(2)在基于知识图谱的智能问答任务中,针对用户输入的自然语言问句不规范,导致机器难以准确识别用户意图的问题,本文构建了煤矿安全装备领域问句数据集(CMSE-Question),同时提出一种多特征融合的意图分类方法(KBCNN)来更好识别用户意图。该方法首先利用BERT模型提取特征;其次利用TextCNN模型分别以卷积核大小为3,4,5做三次特征提取,并融合特征信息;再次将识别的实体特征平均计算后加入到整个句子特征中,使得问句中的重要实体特征得到较大权重,增强领域针对性;最后将所有特征拼接起来,利用Softmax实现问句意图分类任务。将其与现有的方法在CMSE-Question数据集下进行对比实验,结果表明,KBCNN方法的F1-score达到70%,比BERT+TextCNN方法提升了4%,证明了该方法的有效性。

(3)在上述内容的基础之上,本文搭建了煤矿智能问答系统。该系统基于角色的不同,分为前台查询系统和后台管理系统。用户输入问句,系统快速地将结果返回给用户。

本文在知识图谱构建和智能问答算法研究过程中所提出的创新性方法对煤矿智能化发展有积极意义,同时可为其他领域相关研究提供经验借鉴。

论文外文摘要:

The development of coal mine safety equipment is the premise and guarantee of coal mine safety production. However, the existing knowledge graph technology in the field of coal mine lacks systematic research in the field of coal mine safety equipment. At the same time, due to the complexity of the query language (Cypher) commonly used in knowledge graph, in order to make it more convenient and quick for users to use the large amount of rich knowledge in knowledge graph, this paper studies the intelligent question answering algorithm based on knowledge graph and builds an intelligent question answering system. The user directly enters a natural language question and the system returns the answer to the user. The main research contents of this paper are as follows:

(1) Due to the lack of open source datasets in the field of coal mine safety equipment, this paper self-built the Coal Mine safety equipment Domain Entity and Relationship dataset (CMSE-EntityRel), aiming at the complex form problems such as entity nesting and entity/relationship overlap in the CMSE-EntityRel dataset. In this paper, we propose a joint entity and relation knowledge extraction method based on segment classification and decoding (TabREL) to construct knowledge graphs. Firstly, the BERT pre-trained language model was used to encode the features of the text data. Secondly, the Biaffine model is used to perform the biaffine transformation on the obtained feature encoding to form a scoring matrix. Finally, the multi-label classification task was performed by segment classification and decoding. Compared with the existing methods on the CMSE-EntityRel dataset, the results show that the F1-score of the TabREL method reaches 76%, which is 2% higher than that of the CasREL method, which realizes the optimization and improvement of the knowledge extraction method, so as to better construct the knowledge graph.

(2) In the intelligent question answering task based on knowledge graph, aiming at the problem that the natural language questions input by users are not standardized, which makes it difficult for the machine to accurately identify the user's intention, this paper constructs a Coal mine safety equipment Domain Question dataset (CMSE-Question). At the same time, a multi-feature fusion intention classification method (KBCNN) was proposed to identify user's intention more accurately. Firstly, the BERT model was used to extract features. Secondly, TextCNN model was used to extract features three times with the convolution kernel size of 3, 4, and 5 respectively, and the feature information was fused. Thirdly, the identified entity features were added to the whole sentence features after average calculation, so that the important entity features in the question were weighted more and the domain pertinency was enhanced. Finally, all the features were concatenated, and Softmax was used to realize the question intention classification task. Compared with the existing methods on the CMSE-Question dataset, the results show that the F1-score of the KBCNN method reaches 70%, which is 4%higher than that of the BERT+TextCNN method, which proves the effectiveness of the method.

(3) On the basis of the above content, this paper builds an intelligent question answering system for coal mine. Based on the different roles, the system is divided into foreground query system and background management system. The user enters a question and the system returns the result to the user quickly.

The innovative methods proposed in this paper in the process of knowledge graph construction and intelligent question answering algorithm research have positive significance for the development of coal mine intelligence, and can provide experience for related research in other fields.

中图分类号:

 TP391.1    

开放日期:

 2023-06-26    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式