- 无标题文档
查看论文信息

论文中文题名:

 面向垂直领域的知识图谱设计与构建    

姓名:

 李直    

学号:

 18207205076    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2021    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电子与通信工程    

研究方向:

 信息处理    

第一导师姓名:

 孙弋    

第一导师单位:

 西安科技大学    

论文提交日期:

 2022-03-03    

论文答辩日期:

 2021-12-04    

论文外文题名:

 Design and construction of vertical-oriented knowledge graphs    

论文中文关键词:

 人工智能 ; 知识图谱 ; 知识抽取 ; 命名实体识别 ; BERT    

论文外文关键词:

 Artificial intelligence ; knowledge graphs ; knowledge extraction ; named entity recognition ; BERT.    

论文中文摘要:

与传统基于关系型数据库的信息管理方式相比,基于图数据库的知识图谱可以方便高效地找出知识间必要的关联关系,为人工智能领域应用中知识的高效利用提供基础。垂直领域知识图谱相对于开放领域知识图谱,有着知识总量相对较少,实体内容相对丰富等特点,但是在垂直领域知识图谱的构建中,针对传统的基于神经网络模型的中文命名实体识别和中文实体关系提取算法,存在词向量表示单一,不能适应中文信息提取环节中中文词语多义性特征的问题,本文提出基于BiLSTM-CRF模型和BiGRU-Attention模型,使用BERT预训练语言模型对现有的垂直领域知识图谱构建中知识抽取技术进行改进。

使用BERT预训练语言模型与BiLSTM-CRF模型和BiGRU-Attention模型进行融合,构建了BERT-BiLSTM-CRF的中文命名实体识别模型和BERT-BiGRU-Attention的中文关系抽取模型。其中BERT模型通过联合调节各层的上下文含义,以双向 Transformer 作为编码器,能够动态地产生并丰富字符的语义向量,解决了传统词向量表示法不能够表示语句特征而导致的模型准确度不高的问题,实现了对知识图谱中知识抽取这一关键步骤技术的改进。最终实验结果表明,所构建的中文命名实体识别模型和中文关系抽取模型,其F1值分别为95.12%和82.79%,相较于其他模型分别提升了7.56%和8.19%。

应用基于BERT-BiLSTM-CRF的中文命名实体识别算法和BERT-BiGRU-Attention的中文实体关系抽取算法,针对教育领域高考志愿推荐方向,根据垂直领域知识图谱整体构建流程,通过知识抽取、知识融合、知识存储、知识可视化等过程设计并完成该领域知识图谱的构建。基于所构建的垂直领域知识图谱,采用Java开发语言使用Spring Framwork框架并使用MySQL+Neo4j的混合数据库进行系统开发,设计并实现了针对高考志愿推荐的智能问答系统和推荐系统,完成了高考志愿推荐知识平台的搭建。平台支持对高校信息及招生信息的查询、志愿填报问题咨询以及志愿推荐等功能,对其他垂直领域知识图谱的构建及应用具有一定的参考价值。

论文外文摘要:

Compared with traditional information management methods based on relational databases, knowledge graphs based on graph databases can easily and efficiently find the necessary association relationships between knowledge, and provide a basis for the efficient use of knowledge in artificial intelligence applications. Compared with the open domain knowledge graph, the vertical domain knowledge graph has the characteristics of relatively small total knowledge and relatively rich entity content. However, in the construction of the vertical domain knowledge graph, the traditional Chinese named entity recognition and Chinese based on the neural network model The entity relationship extraction algorithm has a single word vector representation and cannot adapt to the ambiguity characteristics of Chinese words in the Chinese information extraction process. This paper proposes a BiLSTM-CRF model and BiGRU-Attention model, using the BERT pre-training language model to compare the existing The knowledge extraction technology in the construction of the vertical domain knowledge graph is improved.

Using the BERT pre-training language model to fuse the BiLSTM-CRF model and the BiGRU-Attention model, the BERT-BiLSTM-CRF Chinese named entity recognition model and the BERT-BiGRU-Attention Chinese relationship extraction model are constructed. Among them, the BERT model jointly adjusts the contextual meaning of each layer, and uses the two-way Transformer as the encoder, which can dynamically generate and enrich the semantic vector of the characters, and solves the problem of the traditional word vector notation being unable to express the sentence features and the model accuracy is not high. The key step technology of knowledge extraction in the knowledge graph has been improved. The final experimental results show that the constructed Chinese named entity recognition model and Chinese relationship extraction model have F1 values ​​of 95.12% and 82.79%, respectively, which are increased by 7.56% and 8.19% compared with other models.

Apply the Chinese named entity recognition algorithm based on BERT-BiLSTM-CRF and the Chinese entity relationship extraction algorithm of BERT-BiGRU-Attention, aiming at the recommendation direction of the college entrance examination in the education field, according to the overall construction process of the vertical domain knowledge graph, through knowledge extraction, knowledge fusion, Process design of knowledge storage and knowledge visualization and complete the construction of knowledge graph in this field. Based on the constructed vertical domain knowledge graph, using Java development language, Spring Framwork framework and MySQL+Neo4j hybrid database for system development, designed and implemented an intelligent question-and-answer system and recommendation system for college entrance examination voluntary recommendation, and completed college entrance examination voluntary recommendation Construction of knowledge platform. The platform supports functions such as querying college information and enrollment information, volunteering question consultation, and voluntary recommendation, and has certain reference value for the construction and application of knowledge graphs in other vertical fields.

参考文献:

[1] Berners-Lee T, Hendler J, Lassila O. The Semantic Web[J]. Scientific American, 2003, 284(5):34-43.

[2] Shadbolt Nigel, Wendy Hall, Tim Berners-Lee. The Semantic Web Revisited. IEEE In-telligent Systems,2006.

[3] Berners-Lee T, O’Hara K. The read-write Linked Data Web[J].Philosophical Transac-tions, 2013,371(1987):20120513.

[4] Singhal A. Introducing the knowledge graph: things, not strings.[EB/OL].https://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html.20120513.

[5] Matuszek C, Cabral J, Witbrock M, et al. An introduction to the syntax and content of Cyc[C]//DBLP,2006.

[6] Fellbaum C. WordNet: An Electronic Lexical Database[J].Library Quarterly Infor-mation Community Policy,1998,25(2):292-296

[7] Bollacker K. Freebase: A collaboratively created graph database for structuring human knowledge[J].Proc.SIGMOD'08,2008..

[8] AUER, Al S E. DBpedia: A Nucleus for a Web of Open Data[C] // Semantic Web, Inter-national Semantic Web Conference, Asian Semantic Web Conference, Iswc + Aswc, Busan, Korea, November. DBLP, 2007.

[9] Suchanek F M, Kasneci G, Weikum G. Yago: a core of semantic knowledge [C] // In-ternational Conference on World Wide Web.2007

[10] Mcbride B. The Resource Description Framework(RDF)and its Vocabulary Description Language RDFS[J].2004:51-65.

[11] Liu C Y, Sun W B, Chao W H, et al. Convolution neural network for relation extrac-tion[C]//International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, 2013:231-242.

[12] Zeng D, Liu K, Lai S, et al. Relation classification via convolutional deep neural net-work[C]//Proceedings of COLING 2014,the 25th International Conference on Compu-tational Linguistics: Technical Papers. 2014:2335-2344.

[13] Santos C, Xiang B, Zhou B. Classifying Relations by Ranking with Convolutional Neu-ral Networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference on Natural Language Processing(Volume 1:Long Papers). 2015,1:626-634.

[14] Zhang D, Wang D. Relation classification via recurrent neural network[J]. arXiv pre-print arXiv:1508.01006,2015.

[15] Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2016,2:207-212.

[16] Lu, Ruqian, et al. A Study on Big Knowledge and Its Engineering Issues, IEEE Transac-tions on Knowledge and Data Engineering,2018:1630-1644.

[17] Dong X, Gabrilovich E, Heitz G, et al. Knowledge vault: a web-scale approach to prob-abilistic knowledge fusion[C]//Acm Sigkdd International Conference on Knowledge Discovery & Data Mining.ACM,2014.

[18] 于彤,刘静,贾李蓉,等.大型中医药知识图谱构建研究[J].中国数字医学,2015(3):80-82.

[19] 吕梦平,段斌,蒋海辉,邓栋.基于知识图谱技术的风电数据管理与应用研究[J].电力系统保护与控制,2021,49(06):167-173.

[20] 丁君怡,赵青松,夏博远,等.基于开源数据的武器装备知识图谱构建方法研究[J].指挥控制与仿真,2018(2):22-26.

[21] 袁旭萍.基于深度学习的商业领域知识图谱构建[D].上海:华东师范大学,2015:8-9.

[22] 孙僖.垂直领域知识图谱构建的关键技术研究[D].北京邮电大学,2019:2-5.

[23] 徐增林,盛泳潘,贺丽荣,王雅芳.知识图谱技术综述[J].电子科技大学学报,2016, 45(04): 589-606.

[24] Jain A, Pennacchiotti M. Open Entity Extraction from Web Search Query Logs[C]. In-ternational conference on computational linguistics,2010:510-518.

[25] 姚春华,刘潇,高弘毅,鄢秋霞.基于句法语义特征的实体关系抽取技术[J].通信技术,2018,51(08):1828-1835.

[26] Guo J, Li Z, Yu Z, et al. Extraction and relation prediction of domain ontology concept instance, attribute and attribute value[J].journal of nanjing university(natural scienc-es),2012,42(34):53.

[27] 王宇,谭松波,廖祥文,曾依灵.基于扩展领域模型的有名属性抽取[J].计算机研究与发展,2010,47(09):1567-1573.

[28] Chen M, Tian Y, Yang M, et al. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment[J].arXiv preprint arXiv:1611.03954,2016.

[29] Zhu H, Xie R, Liu Z, et al. Iterative Entity Alignment via Joint Knowledge Embed-dings[C]//IJCAI.2017,17:4258-4264.

[30] Chen M, Tian Y, Chang K W, et al. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment[J]. arXiv preprint arXiv:1806.06478, 2018:3998-4003.

[31] 李攀成.公共安全领域知识图谱的知识融合技术研究[D].四川:电子科技大学,2019:15-19.

[32] 王昊奋,漆桂林,陈华钧.知识图谱:方法、实践与应用[M].北京:电子工业出版社,2019.8:9-25.

[33] 肖仰华,徐波,林欣等.知识图谱:概念与技术[M].北京:电子工业出版社,2020.1

[34] Todorovic B T, Rancic S R, Markovic I M, et al. Named entity recognition and classifi-cation using context Hidden Markov Model[C]//Neural Network Applications in Elec-trical Engineering,2008.NEUREL 2008.9th Symposium on.IEEE,2008.

[35] Saha S K, Sarkar S, Mitra P. Feature selection techniques for maximum entropy based biomedical named entity recognition[J].Journal of Biomedical Informatics, 2009, 42(5): 905-911.

[36] Tang B, Cao H, Wu Y, et al. Clinical entity recognition using structural support vector machines with rich features[C]//Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics.ACM,2012.

[37] Han L F. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics[J].2013.

[38] Khanam M H, Khudhus M A, Babu M S P. Named Entity Recognition using Machine learning techniques for Telugu language[C]//2016 7th IEEE International Conference on Software Engineering and Service Science(ICSESS).IEEE,2017.

[39] Rink B, Harabagiu S.UTD: classifying semantic relations by combining lexical and semantic resources[C]//Proc of the 5th International Workshop on Semantic Evaluation. Stroudsburg, PA: Association for Computational Linguistics, 2010:256-259.

[40] Santos C , Bing X , Zhou B . Classifying Relations by Ranking with Convolutional Neural Networks[J]. Computer Science, 2015, 86(86):132-137.

[41] Wang L , Zhu C , Melo G D , et al. Relation Classification via Multi-Level Attention CNNs[C]// Proceedings of the 54th Annual Meeting of the Association for Computa-tional Linguistics (Volume 1: Long Papers). 2016.

[42] Peng Z , Wei S , Tian J , et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2016.

[43] Ji G-L,Liu K,Lai S,et al. Distant Supervision for Relation Extraction with Sen-tence-Level Attention and Entity Descriptions. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence : AAAI Press,2017:3060-3066.

[44] Feng J , Huang M , Li Z , et al. Reinforcement Learning for Relation Classification from Noisy Data. 2017.

[45] Kambhatla N. Combining lexical, syntactic, and semantic features with maximum en-tropy models for extracting relations[C]// Proc of Interactive Poster and Demonstration Sessions. Stroudsburg, PA: Association for Computational Linguistics, 2004:22.

[46] Shi M, Li B, Chen X Research on integration of pre-Qin Chinese word segmentation based on CRF. Journal of Chinese Information Processing,2010,24(2):39-46

[47] Zhang J, Wang S, Qian C 2014 Identification of Chinese medical institution names based on CRF and rules. Journal of Computer Applications and Software,2014,3:159-162

[48] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.

[49] Cross J, Huang L. Incremental Parsing with Minimal Features Using Bi-Directional LSTM[C]. Proceedings of the Association for Computational Linguistics,2016:32-37.

[50] 张兰霞,胡文心.基于双向GRU神经网络和双层注意力机制的中文文本中人物关系抽取研究[J].计算机应用与软件,2018,35(11):130-135+189.

中图分类号:

 TP391    

开放日期:

 2022-03-03    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式