- 无标题文档
查看论文信息

论文中文题名:

 基于文本挖掘的煤矿安全隐患趋势预测研究    

姓名:

 杨帆    

学号:

 20307223009    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085400    

学科名称:

 工学 - 电子信息    

学生类型:

 硕士    

学位级别:

 工程硕士    

学位年度:

 2023    

培养单位:

 西安科技大学    

院系:

 通信与信息工程学院    

专业:

 电子与通信工程    

研究方向:

 煤矿安全信息化    

第一导师姓名:

 马莉    

第一导师单位:

 西安科技大学    

论文提交日期:

 2023-06-14    

论文答辩日期:

 2023-05-31    

论文外文题名:

 Research on the trend prediction of coal mine safety hazards based on text mining    

论文中文关键词:

 煤矿安全隐患 ; BERT ; 命名实体识别 ; 文本聚类 ; 灰色模型 ; 隐患预测    

论文外文关键词:

 Coal Mine Safety Hazards ; BERT ; NER ; Text Clustering ; Gray Model ; Hidden Hazard Prediction    

论文中文摘要:

       在煤矿安全隐患排查治理系统中,有大量包含工作面、隐患位置、隐患主体的煤矿安全隐患信息文本,而这些文本数据非结构化的特点,使得煤矿安全隐患文本数据没有得到充分挖掘和深化应用。因此,基于文本挖掘技术对煤矿安全隐患文本进行数据挖掘应用,有助于对煤矿安全隐患趋势进行预测,可有效避免和减少安全事故的发生,对进一步提升煤矿安全管理水平具有重要的意义。
       从煤矿安全隐患文本中提取煤矿安全隐患命名实体是开展煤矿安全隐患趋势预测研究的基础。针对煤矿安全隐患文本命名实体识别存在的专业性强、识别准确率低等问题,本文结合语义特征提取方法,提出一种基于Bert-CBsC的煤矿安全隐患命名实体识别模型。首先采用BERT预训练模型,对煤矿安全隐患文本进行字向量表示,生成的字向量包含丰富语义信息,结合CNN与多层BiGRU,充分挖掘语义特征信息,提取局部特征与全局深层特征的,并对其性能表现进行了对比。实验结果表明,Bert-CBsC模型准确率达到了91.74%,召回率达到了93.20%,F1值达到了92.45%,相对主流命名实体识别模型表现出更优的性能。为了实现对煤矿安全隐患趋势的预测,在煤矿安全隐患命名实体识别的基础上,需要进一步对煤矿安全隐患类型进行自动识别。因此,本文在文本向量表示阶段构建基于LDA-FastText的文本向量表示,实现煤矿安全隐患文本数据到计算机语言的转换,结合K-means算法构建煤矿安全隐患聚类模型,实现煤矿安全隐患类型的聚类识别模型。实验结果表明,所构建的煤矿安全隐患聚类识别模型的轮廓系数达到了0.536,戴维森堡丁指数指标达到了0.501,模型性能相比其他文本聚类模型均有一定提升。
       煤矿安全隐患趋势预测是当前煤矿隐患排查治理的重要方法之一,为准确地对其进行预测,对煤矿安全隐患实体信息与煤矿安全隐患类型数据进行数值化处理,作为隐患趋势预测数据集。运用灰色预测理论,建立煤矿安全隐患预测模型,并进行精度检验,预测未来一个月煤矿隐患发生的数量,及相关工作面、工作位置、隐患主体种类的隐患数量,并结合折线图展现了隐患的发生趋势。预测结果表明,模型预测精度达到了90%以上,表明该模型能够为煤矿的安全生产提供辅助决策依据。

论文外文摘要:

       In the coal mine safety hidden hazards investigation and management system, there are a large number of coal mine safety hidden hazards information texts containing working face, hidden hazards location, hidden hazards subject and hidden hazards problem description, and the unstructured characteristics of these text data make the coal mine safety hidden hazards text data are not fully mined and deepened application. Therefore, the data mining application of coal mine safety hidden hazards text based on text mining technology can help predict the trend of coal mine safety hidden hazards.
       Extracting coal mine safety hazards named entities from coal mine safety hazards text is the basis for conducting research on coal mine safety hazards trend prediction. Aiming at the problems of strong specialization and low recognition accuracy of coal mine safety hidden hazards text named entity recognition, this paper combines semantic feature extraction methods and proposes a Bert-CBsC based coal mine safety hidden hazards named entity recognition model; using BERT pre-training model, word vector representation of coal mine safety hidden hazards text, the generated word vector contains rich semantic information, combined with CNN and multi-layer BiGRU, fully mining semantic feature information, extracting local features with global deep features of. The performance performance is also compared, and the experiments show that the Bert-CBsC model achieves an accuracy of 91.74%, a recall of 93.20% and a F1 value of 92.45%, showing a superior performance. In order to realize the recognition of coal mine safety hazards types, a text vector representation model based on LDA-FastText is constructed in the text vector representation stage to realize the conversion of coal mine safety hazards text data to computer language. Combined with the K-means algorithm, a coal mine safety hazard clustering model is constructed, and the model is used to identify the types of coal mine safety hazards. The experimental results show that the constructed coal mine safety hazard clustering model achieves a contour coefficient of 0.536 and a Davidson Fortin index index of 0.501 after performance analysis, and the performance is improved compared with other text clustering models.
       Coal mine safety hidden hazards trend prediction is one of the important methods of current coal mine hidden hazards investigation and management, in order to accurately predict it, the coal mine safety hidden hazards entity information and coal mine safety hidden hazards type data are numerically processed as the hidden hazards trend prediction data set. Using the gray prediction theory, a prediction model of coal mine safety hazards was established and accuracy checked to predict the number of coal mine hazards occurring in the coming month, and the number of hazards in the relevant working face, working location and hazards subject type, and the trend of hazards occurring was shown with a line graph. The prediction results showed that the model prediction accuracy reached over 90%, indicating that the model can provide an auxiliary decision-making basis for coal mine safety production.

参考文献:

[1].付恩三,白润才,刘光伟,赵浩,杨传沓.“十三五”期间我国煤矿事故特征及演变趋势分析[J].中国安全科学学报,2022, 32(12):88-94.

[2].许鹏飞.2000-2021年我国煤矿事故特征及发生规律研究[J].煤炭工程,2022,54(07):129-133.

[3].吴大明.煤矿安全隐患概念辨析与双重预防机制应用研究[J].中国煤炭,2017,43(09):112-115+137.

[4].王龙康,聂百胜,蔡洪检,彭斌,李安金.煤矿安全隐患动态分级闭环管理方法及应用[J].中国安全生产科学技术, 2017, 13(06):126-131.

[5].张俭让,黄玉鑫,闫振国,张磊,霍小泉.煤矿隐患排查信息管理系统研究现状与展望[J].工矿自动化,2019,45(11):55-58+80.

[6].Yinnan HE,Ruxiang QIN. Autonomous rectification behavior of coal mine safety hazards under a gambling mind: From an evolutionary game perspective[J]. Process Safety and Environmental Protection,2023,169.

[7].许铭,吴宗之,罗云,程五一.基于LOP模型的事故隐患分类分级研究[J].中国安全科学学报,2014,24(07):15-20.

[8].Satar Mahdevari,Kourosh Shahriar,Akbar Esfahanipour. Human health and safety risks management in underground coal mines using fuzzy TOPSIS[J]. Science of the Total Environment,2014,488-489.

[9].陈运启.数据挖掘技术在煤矿隐患管理中的应用[J].工矿自动化,2016,42(02):27-30.

[10].Qiu Zunxiang,Liu Quanlong,Li Xinchun,Zhang Jinjia,Zhang Yueqian. Construction and Analysis of a Coal Mine Accident Causation Network Based on Text Mining[J]. Process Safety and Environmental Protection, 2021.

[11].郭对明,李国清,胡乃联等.基于文本挖掘的矿山安全隐患大数据分析与可视化[J].工程科学学报,2022,44(03):328-338.

[12].李国清,李学玉,侯杰等.矿山安全隐患辨识与预警大数据分析系统研发[J].金属矿山,2022,No.552(06):129-137.

[13].谭章禄,王兆刚,胡翰,姜萱,彭胜男.基于文本聚类的煤矿安全隐患类型挖掘研究[J].中国安全科学学报,2019,29(03):145-148.

[14].谭章禄,陈孝慈.基于文本挖掘的煤矿安全隐患管理研究[J].中国安全生产科学技术,2020,16(02):43-48.

[15].李新琴,史天运,李平,王喆,杨连报.基于进化集成分类器的铁路安全隐患智能分类[J].交通信息与安全,2019,37(02):33-39.

[16].谢斌红,马非,潘理虎,张英俊.煤矿安全隐患信息自动分类方法[J].工矿自动化,2018,44(10):10-14.

[17].赵法森,刘飞翔,李泽荃等.基于BiLSTM+Attention模型的煤矿事故隐患自动分类研究[J].煤炭科学技术,2022,50(S2):210-217.

[18].陈梓华,马占元,李敬兆.基于RNN的煤矿安全隐患信息关键语义智能提取系统[J].煤炭工程,2021,53(03):185-189.

[19].刘浏,王东波.命名实体识别研究综述[J].情报学报,2018,37(03):329-340.

[20].Nita Patil,Ajay Patil,B.V. Pawar. Named Entity Recognition using Conditional Random Fields[J]. Procedia Computer Science,2020,167(C).

[21].Hobley E. Iterative Named Entity Recognition with Conditional Random Fields[J]. Applied Sciences, 2021, 12.

[22].李慧林,柴玉梅,孙穆祯.面向文本命名实体识别的深层网络模型[J].小型微型计算机系统,2019,40(01):50-57.

[23].Minsoo Cho,Jihwan Ha,Chihyun Park,Sanghyun Park. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition[J]. Journal of Biomedical Informatics,2020,103(C).

[24].Ling Yuan,Hasan Sadid A,Farri Oladimeji,Chen Zheng,van Ommering Rob,Yee Charles,Dimitrova Nevenka. A Domain Knowledge-Enhanced LSTM-CRF Model for Disease Named Entity Recognition.[J]. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science,2019,2019.

[25].Deng Na,Fu Hao,Chen Xu. Named Entity Recognition of Traditional Chinese Medicine Patents Based on BiLSTM-CRF[J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING,2021,2021.

[26].李林,周晗,郭旭超,刘成启,苏洁,唐詹.基于多源信息融合的中文农作物病虫害命名实体识别[J].农业机械学报,2021,52(12):253-263.

[27].Li Lishuang,Jiang Yuxin. Integrating Language Model and Reading Control Gate in BLSTM-CRF for Biomedical Named Entity Recognition[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2018,17(3).

[28].张芳丛,秦秋莉,姜勇,庄润涛.基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J].数据分析与知识发现,2022,6(Z1):251-262.

[29].吴俊,程垚,郝瀚,艾力亚尔·艾则孜,刘菲雪,苏亦坡.基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究[J].情报学报,2020,39(04):409-418.

[30].Zhuang Haoyu,Wang Fu,Bo Songlin,Huang Yongzhong. A BERT based Chinese Named Entity Recognition method on ASEAN News[J]. Journal of Physics: Conference Series,2021,1848(1).

[31].蒋晨,王渊,胡俊华,徐积全,陈珉,王雅雯,马国明.基于深度学习的电力实体信息识别方法[J].电网技术,2021,45(06):2141-2149.

[32].和志强,王梦雪,马宁等.短文本聚类方法研究综述[J].河北省科学院学报,2021,38(05):34-40.

[33].Cozzolino Irene,Ferraro Maria Brigida. Document clustering[J]. Wiley Interdisciplinary Reviews: Computational Statistics,2022,14(6).

[34].Tong Wuning,Liu Sen,Gao Xiao Zhi. A Density-Peak-Based Clustering Algorithm of Automatically Determining the Number of Clusters[J]. Neurocomputing,2020,458.

[35].潘成胜,张斌,吕亚娜,杜秀丽,邱少明.改进灰狼优化算法的K-Means文本聚类[J].计算机工程与应用,2021,57(01):188-193.

[36].Lakshmi R ,Baskar S. DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering[J]. Journal of Information Science: Principles & Practice, 2019(6):45.

[37].Yunlong Wang,123,Xiong Luo,124,Jing Zhang,123,Zhigang Zhao,Jun Zhang. An Improved Algorithm of K-means Based on Evolutionary Computation[J]. Intelligent Automation & Soft Computing,2020,26(5).

[38].张琳,牟向伟.基于Canopy+K-means的中文文本聚类算法[J].图书馆论坛,2018,38(06):113-119.

[39].贾君霞,王会真,任凯,康文.基于句向量和卷积神经网络的文本聚类研究[J].计算机工程与应用,2022,58(16):123-128.

[40].陈玮,卢佳伟.基于特征矩阵优化与数据降维的文本聚类算法[J].数据采集与处理,2021,36(03):587-594.

[41].朱良奇,黄勃,黄季涛,马莉媛,史志才.融合BERT和自编码网络的短文本聚类研究[J].计算机工程与应用,2022,58(02):145-152.

[42].Li Xiao,Liu Yong,Fan Linsheng,Shi Shiliang,Zhang Tao,Qi Minghui. Research on the prediction of dangerous goods accidents during highway transportation based on the ARMA model[J]. Journal of Loss Prevention in the Process Industries,2021,72.

[43].Richard M. Medina,Guido Cervone,Nigel M. Waters. Characterizing and Predicting Traffic Accidents in Extreme Weather Environments[J]. The Professional Geographer,2017,69(1).

[44].Ze Wang,Huajiao Li,Renwu Tang. Network analysis of coal mine hazards based on text mining and link prediction[J]. International Journal of Modern Physics C,2019,30(7).

[45].王秉宇. 基于数据挖掘的某冷轧厂事故隐患预测预警研究[D].首都经济贸易大学,2019.

[46].林永明.基于决策树模型的煤矿安全事故严重程度分析与预测[J].安全与环境学报,2017,17(02):591-596.

[47].沙迪,李雨成,田叶,朱诗豪,陈晓军.煤矿安全生产事故统计分析及预测研究[J].高技术通讯,2018,28(01):83-89.

[48].程恋军,仲维清.阶段型分数阶累加GM(1,1)模型在煤矿安全事故预测中的应用[J].统计与决策,2016(04):88-90.

[49].聂百胜,黄鑫,薛斐,孟洋洋,陈江,刘晓兵.中美煤矿安全生产水平对等比较与预测[J].煤矿安全,2017,48(04):234-237.

[50].和刚,王彬,陈虎东.煤矿开采沉陷预测模型的三维虚拟仿真[J].煤炭技术,2022,41(09):49-52.

[51].Vaswani A , Shazeer N , Parmar N , et al. Attention Is All You Need[J]. arXiv, 2017.

[52].Dimas Wibisono Prakoso,Asad Abdi,Chintan Amrit. Short text similarity measurement methods: a review[J]. Soft Computing,2021.

[53].Alqamah Sayeed,Yunsoo Choi,Ebrahim Eslami,Yannic Lops,Anirban Roy,Jia Jung. Using a deep convolutional neural network to predict 2017 ozone concentrations, 24 hours in advance[J]. Neural Networks,2020,121(C).

[54].宁珊,严馨,周枫,王红斌,张金鹏.融合LSTM和LDA差异的新闻文本关键词抽取方法[J].计算机工程与科学,2020,42(01):153-160.

[55].Khuman Arjab Singh. The similarities and divergences between grey and fuzzy theory[J]. Expert Systems With Applications,2021,186.

[56].徐月. 基于灰色系统与聚类马尔可夫链的吉林省全社会用电量分析及预测[D].山东大学,2022.

中图分类号:

 TD79    

开放日期:

 2023-06-15    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式