论文中文题名: | 面向垂直领域的知识图谱设计与构建 |
姓名: | |
学号: | 18207205076 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2021 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 信息处理 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-03-03 |
论文答辩日期: | 2021-12-04 |
论文外文题名: | Design and construction of vertical-oriented knowledge graphs |
论文中文关键词: | |
论文外文关键词: | Artificial intelligence ; knowledge graphs ; knowledge extraction ; named entity recognition ; BERT. |
论文中文摘要: |
与传统基于关系型数据库的信息管理方式相比,基于图数据库的知识图谱可以方便高效地找出知识间必要的关联关系,为人工智能领域应用中知识的高效利用提供基础。垂直领域知识图谱相对于开放领域知识图谱,有着知识总量相对较少,实体内容相对丰富等特点,但是在垂直领域知识图谱的构建中,针对传统的基于神经网络模型的中文命名实体识别和中文实体关系提取算法,存在词向量表示单一,不能适应中文信息提取环节中中文词语多义性特征的问题,本文提出基于BiLSTM-CRF模型和BiGRU-Attention模型,使用BERT预训练语言模型对现有的垂直领域知识图谱构建中知识抽取技术进行改进。 使用BERT预训练语言模型与BiLSTM-CRF模型和BiGRU-Attention模型进行融合,构建了BERT-BiLSTM-CRF的中文命名实体识别模型和BERT-BiGRU-Attention的中文关系抽取模型。其中BERT模型通过联合调节各层的上下文含义,以双向 Transformer 作为编码器,能够动态地产生并丰富字符的语义向量,解决了传统词向量表示法不能够表示语句特征而导致的模型准确度不高的问题,实现了对知识图谱中知识抽取这一关键步骤技术的改进。最终实验结果表明,所构建的中文命名实体识别模型和中文关系抽取模型,其F1值分别为95.12%和82.79%,相较于其他模型分别提升了7.56%和8.19%。 应用基于BERT-BiLSTM-CRF的中文命名实体识别算法和BERT-BiGRU-Attention的中文实体关系抽取算法,针对教育领域高考志愿推荐方向,根据垂直领域知识图谱整体构建流程,通过知识抽取、知识融合、知识存储、知识可视化等过程设计并完成该领域知识图谱的构建。基于所构建的垂直领域知识图谱,采用Java开发语言使用Spring Framwork框架并使用MySQL+Neo4j的混合数据库进行系统开发,设计并实现了针对高考志愿推荐的智能问答系统和推荐系统,完成了高考志愿推荐知识平台的搭建。平台支持对高校信息及招生信息的查询、志愿填报问题咨询以及志愿推荐等功能,对其他垂直领域知识图谱的构建及应用具有一定的参考价值。 |
论文外文摘要: |
Compared with traditional information management methods based on relational databases, knowledge graphs based on graph databases can easily and efficiently find the necessary association relationships between knowledge, and provide a basis for the efficient use of knowledge in artificial intelligence applications. Compared with the open domain knowledge graph, the vertical domain knowledge graph has the characteristics of relatively small total knowledge and relatively rich entity content. However, in the construction of the vertical domain knowledge graph, the traditional Chinese named entity recognition and Chinese based on the neural network model The entity relationship extraction algorithm has a single word vector representation and cannot adapt to the ambiguity characteristics of Chinese words in the Chinese information extraction process. This paper proposes a BiLSTM-CRF model and BiGRU-Attention model, using the BERT pre-training language model to compare the existing The knowledge extraction technology in the construction of the vertical domain knowledge graph is improved. Using the BERT pre-training language model to fuse the BiLSTM-CRF model and the BiGRU-Attention model, the BERT-BiLSTM-CRF Chinese named entity recognition model and the BERT-BiGRU-Attention Chinese relationship extraction model are constructed. Among them, the BERT model jointly adjusts the contextual meaning of each layer, and uses the two-way Transformer as the encoder, which can dynamically generate and enrich the semantic vector of the characters, and solves the problem of the traditional word vector notation being unable to express the sentence features and the model accuracy is not high. The key step technology of knowledge extraction in the knowledge graph has been improved. The final experimental results show that the constructed Chinese named entity recognition model and Chinese relationship extraction model have F1 values of 95.12% and 82.79%, respectively, which are increased by 7.56% and 8.19% compared with other models. Apply the Chinese named entity recognition algorithm based on BERT-BiLSTM-CRF and the Chinese entity relationship extraction algorithm of BERT-BiGRU-Attention, aiming at the recommendation direction of the college entrance examination in the education field, according to the overall construction process of the vertical domain knowledge graph, through knowledge extraction, knowledge fusion, Process design of knowledge storage and knowledge visualization and complete the construction of knowledge graph in this field. Based on the constructed vertical domain knowledge graph, using Java development language, Spring Framwork framework and MySQL+Neo4j hybrid database for system development, designed and implemented an intelligent question-and-answer system and recommendation system for college entrance examination voluntary recommendation, and completed college entrance examination voluntary recommendation Construction of knowledge platform. The platform supports functions such as querying college information and enrollment information, volunteering question consultation, and voluntary recommendation, and has certain reference value for the construction and application of knowledge graphs in other vertical fields. |
参考文献: |
[18] 于彤,刘静,贾李蓉,等.大型中医药知识图谱构建研究[J].中国数字医学,2015(3):80-82. [19] 吕梦平,段斌,蒋海辉,邓栋.基于知识图谱技术的风电数据管理与应用研究[J].电力系统保护与控制,2021,49(06):167-173. [20] 丁君怡,赵青松,夏博远,等.基于开源数据的武器装备知识图谱构建方法研究[J].指挥控制与仿真,2018(2):22-26. [21] 袁旭萍.基于深度学习的商业领域知识图谱构建[D].上海:华东师范大学,2015:8-9. [22] 孙僖.垂直领域知识图谱构建的关键技术研究[D].北京邮电大学,2019:2-5. [23] 徐增林,盛泳潘,贺丽荣,王雅芳.知识图谱技术综述[J].电子科技大学学报,2016, 45(04): 589-606. [25] 姚春华,刘潇,高弘毅,鄢秋霞.基于句法语义特征的实体关系抽取技术[J].通信技术,2018,51(08):1828-1835. [27] 王宇,谭松波,廖祥文,曾依灵.基于扩展领域模型的有名属性抽取[J].计算机研究与发展,2010,47(09):1567-1573. [31] 李攀成.公共安全领域知识图谱的知识融合技术研究[D].四川:电子科技大学,2019:15-19. [32] 王昊奋,漆桂林,陈华钧.知识图谱:方法、实践与应用[M].北京:电子工业出版社,2019.8:9-25. [33] 肖仰华,徐波,林欣等.知识图谱:概念与技术[M].北京:电子工业出版社,2020.1 [50] 张兰霞,胡文心.基于双向GRU神经网络和双层注意力机制的中文文本中人物关系抽取研究[J].计算机应用与软件,2018,35(11):130-135+189. |
中图分类号: | TP391 |
开放日期: | 2022-03-03 |