论文中文题名: |
基于机器学习的滑坡地质灾害预测模型研究
|
姓名: |
王晨阳
|
学号: |
19208208050
|
保密级别: |
公开
|
论文语种: |
chi
|
学科代码: |
085212
|
学科名称: |
工学 - 工程 - 软件工程
|
学生类型: |
硕士
|
学位级别: |
工学硕士
|
学位年度: |
2022
|
培养单位: |
西安科技大学
|
院系: |
计算机科学与技术学院
|
专业: |
软件工程
|
研究方向: |
人工智能与信息处理
|
第一导师姓名: |
罗晓霞
|
第一导师单位: |
西安科技大学
|
论文提交日期: |
2022-06-22
|
论文答辩日期: |
2022-06-07
|
论文外文题名: |
Research on landslide geological disaster prediction model based on machine learning
|
论文中文关键词: |
滑坡预测 ; 机器学习 ; 贝叶斯优化 ; Stacking 算法 ; 模型融合
|
论文外文关键词: |
Landslide Prediction ; Machine Learning ; Bayesian Optimization ; Stacking
|
论文中文摘要: |
︿
我国国土面积辽阔,自然环境和气候条件复杂多样,地质灾害频繁发生,给人民生命财产安全和社会经济发展造成了极大危害。调查数据显示,滑坡在所有地质灾害中占比高达 70%以上,并具有危害性强、治理困难和分布地域广泛等特点,因此如何有效地进行滑坡地质灾害预测、为滑坡地质灾害的防治工作提供科学依据具有重要意义。本文以陕西省宁强县为研究区域,开展对滑坡地质灾害预测模型的研究,具体的研究内容和成果如下:
(1)首先通过分析研究区地质环境和滑坡形成条件,选取了滑坡地质灾害的主要影响因素。然后基于 GIS 技术对研究区 Landsat8 遥感影像、高程数字模型等数据进行解析,提取了 18 种滑坡影响因子数据,并采用随机森林和 Min-Max 法处理了数据中的缺失值以及量纲差异。最后通过主成分分析法和 Pearson 相关系数法得到了权重占比较大且线性相关性较弱的 10 种滑坡影响因子。
(2)利用支持向量机、逻辑回归、随机森林和 Adaboost 机器学习算法分别构建了四种滑坡地质灾害预测模型,并采用贝叶斯优化算法对模型中的重要超参数进行了优化。将优化后的模型在滑坡数据集上进行实验,结果表明,除多层感知器神经网络模型外,其余四种模型的 Accuracy、F1 Score 和 AUC 均达到了 0.8 以上,为下一步的模型融合奠定了基础。
(3)针对传统 Stacking 模型融合算法忽略了特征变量与输出值之间的关联性和基学习器学习效果的差异性问题,提出了结合次级学习层加入原始特征和精度反馈加权改进的 Stacking 算法,并以此将支持向量机、逻辑回归、随机森林和 Adaboost 模型进行融合,构建了融合型滑坡地质灾害预测模型。实验结果表明,与其他模型相比, 融合型滑坡地质灾害预测模型的 Accuracy、F1 Score 和 AUC 均有明显提升,验证了改进后Stacking 算法的有效性以及对研究区滑坡地质灾害预测的适用性。
(4)基于融合型滑坡地质灾害预测模型,开发了集气象监测、环境展示、滑坡预测和数据管理功能为一体的滑坡地质灾害预测系统,提升了研究区气象监测的及时性、环境展示的直观性、滑坡预测的科学性和数据管理的便捷性。
﹀
|
论文外文摘要: |
︿
China has a vast territory and complex natural environment and climate conditions, which lead to frequent occurrence of geological disasters. At the same time, geological disasters had
caused great harm to the safety of people's lives and property and social economic development.According to the survey data, landslides account for more than 70% of all geological disasters,
which is difficult to control and widely distributed. Therefore, how to effectively predict landslide geological hazards has a great significance to provide scientific basis for the prevention and control of landslide geological hazards. In this paper, Ningqiang County of Shaanxi Province was taken as the research area to study the landslide geological disaster prediction model. The specific research content and results were as follows:
(1) Firstly, the main influencing factors of landslide geological disasters are selected by analyzing geological environment and landslide formation conditions in the study area. Then,
Landsat8 remote sensing images and elevation digital model were analyzed based on GIS technology, and 18 landslide impact factors were extracted. The missing values and dimensional differences in the data were processed by random forest and Min-max method. Finally, principal component analysis and Pearson correlation coefficient method were used to obtain 10 landslide influence factors with large weight and weak linear correlation..
(2) Four types of landslide disaster prediction models were built based by using vector machine, logistic regression, random forest, and Adaboost machine learning algorithm, respectively. Notably, bayesian optimization algorithm was used to optimize the important hyperparameters of the model. The results show that except for the multilayer perceptron neural network model, the accuracy, F1 score and AUC of the other four models have reached more than 0.8, which lays a foundation for the next step of model fusion.
(3) Owing to the traditional stacking model fusion algorithm ignores the correlation between the characteristic variables and the output values and the differences in the learning effect of the basic learning machine, this paper proposes an improved stacking algorithm which combines the secondary learning layer with the original features and precision feedback weighting. Based on this, support vector machine, logistic regression, random forest and
AdaBoost model are fused to build a fusion landslide geological disaster prediction model. The experimental results showed that, compared with other models, the accuracy, F1 score and AUC
of the integrated landslide geological disaster prediction model were significantly improved, which verifies the effectiveness of the improved stacking algorithm and its applicability to the landslide geological hazard prediction in the study area.
(4) Based on the integrated landslide geological disaster prediction model, a landslide geological disaster prediction system integrating meteorological monitoring, environmental display, landslide prediction and data management functions was developed. The development of the system improved the timeliness of meteorological monitoring, the intuitiveness of
environmental display, the scientificity of landslide prediction and the convenience of data management in the study area.
﹀
|
参考文献: |
︿
[1] 朱成子.地质灾害监测预警中的精密空间对地观测技术[J].世界有色金属,2021(20):22-23. [2] 许强, 董秀军, 李为乐. 基于天-空-地一体化的重大地质灾害隐患早期识别与监测预警[J]. 武汉大学学报, 2019, 44(7): 957-966. [3] 谢和平,张茹,邓建辉,高明忠,李怡航,何治良,张泽天,任利.基于“深地–地表”联动的深地科学与地灾防控技术体系初探[J].工程科学与技术,2021,53(04):1-12. [4] 中华人民共和国国土资源部.全国地质灾害通报[R]. 北京:中国地质环境监测院, 2019. [5] 蒋白冰,蒋东,陈德龙,王恒.某深层顺向岩质滑坡变形特征及破坏机理研究[J].工程技术研究,2020,5(07):31-32. [6] Ram P, Gupta V. Landslide hazard, vulnerability, and risk assessment (HVRA), Mussoorie township, lesser himalaya, India[J]. Environment, Development and Sustainability, 2022, 24(1): 473-501. [7] 赵久彬,刘元雪,宋林波,刘佳鑫.大数据关键技术在滑坡监测预警系统中的应用[J].重庆理工大学学报(自然科学),2018,32(02):182-190. [8] 马晓蓉,查小春.秦巴山区乡村聚落空间格局演变及影响因子——以陕西宁强县为例[J].山地学报,2020,38(05):726-739. [9] 曹璞源,邱海军,胡胜,杨冬冬.区域崩塌和滑坡规模参数频率分布研究——以秦巴山地宁强县为例[J].灾害学,2017,32(04):126-131. [10] Irasema Alcántara Ayala et al. The 4th World Landslide Forum: Landslide Research and Risk Reduction for Advancing the Culture of Living with Natural Hazards[J]. International Journal of Disaster Risk Science, 2017, 8(4) :498-502. [11] 田文财,李青.熵值法与GWO-SVM耦合模型在滑坡预警中的应用[J].中国计量大学学报,2021,32(02):253-259. [12] 栗海军,刘泽,何矾.基于灰色理论和非线性回归分析的滑坡时间组合预测研究[J].建筑技术开发,2021,48(01):149-151. [13] 许强,黄润秋,李秀珍.滑坡时间预测预报研究进展[J].地球科学进展,2004(03):478-483. [14] 伍法权,王年生.一种滑坡位移动力学预报方法探讨[J].中国地质灾害与防治学报,1996(S1):38-41. [15] 王尚庆,长江三峡滑坡监测预报[M].北京:地质出版社,1999.6. [16] 林剑,张奇飞,龙万学,张红伟.基于预警隶属度函数多模型融合的滑坡预警方法[J].吉林大学学报(地球科学版),2019,49(02):477-484. [17] 黄精涛,李金轩.Verhulst模型在滑坡预测预报中的应用研究[J].矿冶,2021,30(05):18-23. [18] 孙怀军,张永波.滑坡预测预报的现状和发展趋势[J].太原理工大学学报,2001(06):636-639. [19] 秦四清,张倬元,王士天,黄润秋.滑坡时间预报的突变理论及灰色突变理论方法[J].大自然探索,1993(04):62-68. [20] 廖小平.滑坡破坏时间预报新理论探讨[J].地质灾害与环境保护,1994(03):25-29. [21] Robin Fell. Landslide risk assessment and acceptable risk[J]. Canadian Geotechnical Journal,1994,31(2): 261-272. [22] 许强,黄润秋.斜坡演化的自组织特征初探[J].中国地质灾害与防治学报,1997(01):8-12. [23] Stefan Hergarten,Horst J. Neugebauer. Self‐organized criticality in a landslide model[J]. Geophysical Research Letters,1998,25(6): 801-804. [24] Gregory C. Ohlmacher,John C. Davis. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA[J]. Engineering Geology,2003, 69(3-4): 331-343. [25] 文海家. 基于GIS的滑坡灾变智能预测系统及应用研究[D].重庆大学,2004. [26] 亓呈明, 郝玲, 崔守梅. 一种新的模糊决策树模型及其应用[J]. 山东大学学报 (理学版), 2006, 42(11): 107-109. [27] Biswajeet Pradhan,Saro Lee. Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling[J]. Environmental Modelling and Software,2009,25(6): 747-759. [28] Yilmaz I . Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine[J]. Environmental Geology, 2010, 61(4):P.821-836. [29] 夏添,常鸣.改进逻辑回归方法在滑坡敏感性评价中的应用研究[J].物探化探计算技术,2013,35(02):185-188. [30] 余坤勇,姚雄,邱祈荣,刘健.基于随机森林模型的山体滑坡空间预测研究[J].农业机械学报,2016,47(10):338-345. [31] Reichenbach P, Rossi M, Malamud B D, et al. A review of statistically-based landslide susceptibility models[J]. Earth-science reviews, 2018, 180: 60-91. [32] Ahmed Mohamed Youssef,Hamid Reza Pourghasemi.Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin,Asir Region,Saudi Arabia[J].Geoscience Frontiers,2021,12(02):639-655. [33] 方然可, 刘艳辉, 黄志全. 基于机器学习的区域滑坡危险性评价方法综述[J]. 中国地质灾害与防治学报, 2021, 32(4): 1-8. [34] 张帅, 贺拿, 钟卫,胡凯衡,杨红娟. 滑坡灾害监测与预测预报研究现状及展望[J]. 三峡大学学报 (自然科学版), 2021,43(05):39-48. [35] 亓星, 朱星, 许强, 等. 基于斋藤模型的滑坡临滑时间预报方法改进及应用[J]. 工程地质学报, 2020, 28(4): 832-839. [36] 王旭昭,侯磊,苏龙,白梦洁.改进灰色GM(1,1)模型在滑坡预测中的应用[J].地理空间信息,2016,14(11):88-90. [37] 刘晓宇,任光明,刘彬,罗菲,秦承运.基于突变理论的滑坡危险性评价[J].西华大学学报(自然科学版),2020,39(02):95-99. [38] 何静,刘强,许丁友,刘乾坤,毛宇昆,王超,王潇.基于聚类-信息量耦合模型下的广元市滑坡灾害易发性评价[J].测绘与空间地理信息,2020,43(12):25-31. [39] Aleotti P, Chowdhury R. Landslide hazard assessment: summary review and new perspectives[J]. Bulletin of Engineering Geology and the environment, 1999, 58(1): 21-44. [40] Ma Z, Mei G, Piccialli F. Machine learning for landslides prevention: a survey[J]. Neural Computing and Applications, 2021, 33(17): 10881-10907. [41] 王新刚,刘凯,连宝琴,王家鼎,邱海军,胡胜.黄土-泥岩滑坡诱发因素及形成机理研究进展[J].西北大学学报(自然科学版),2021,51(03):404-413. [42] 程乙峰,刘志辉.基于证据权模型的滑坡影响因子研究[J].新疆地质,2016,34(02):275-279. [43] 郭有金. 基于集成学习算法的西安市滑坡灾害易发性评价[D].西安科技大学,2020. [44] 黄发明,胡松雁,闫学涯,李明,王俊宇,李文彬,郭子正,范文彦.基于机器学习的滑坡易发性预测建模及其主控因子识别[J].地质科技通报,2022,41(02):79-90. [45] 王新胜,滕德贵,谢伟,聂闻,于鑫,陈结.山地城市滑坡灾害空间分布特征及影响因素分析[J].重庆大学学报,2020,43(08):87-96. [46] 熊中敏,郭怀宇,吴月欣.缺失数据处理方法研究综述[J].计算机工程与应用,2021,57(14):27-38. [47] 王纯杰,张乐,陈嘉,王淑影.基于随机森林插补的缺失数据的印度肝脏病人数据支持向量机分类分析[J].吉林师范大学学报(自然科学版),2020,41(04):36-40. [48] 王祥雪,许伦辉.基于深度学习的短时交通流预测研究[J].交通运输系统工程与信息,2018,18(01):81-88. [49] 杨柳,罗文倩,邓春林,肖婧嫣.基于灰色关联分析的舆情分级与预警模型研究[J].情报科学,2020,38(08):28-34. [50] 陈将宏, 宛良朋, 李建林, 等. 岸坡稳定性影响因子分析及权重确定[J]. 水力发电, 2017, 43(3): 34-37. [51] Modak Soumita,Chattopadhyay Tanuka,Chattopadhyay Asis Kumar. Clustering of eclipsing binary light curves through functional principal component analysis[J]. Astrophysics and Space Science,2022,367(2): 1-10. [52] Shastry K A, Sanjay H A. A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture[J]. Knowledge-Based Systems, 2021, 232: 107460. [53] Deng J, Deng Y, Cheong K H. Combining conflicting evidence based on Pearson correlation coefficient and weighted graph[J]. International Journal of Intelligent Systems, 2021, 36(12): 7443-7460. [54] Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, et al. A comprehensive survey on support vector machine classification: Applications, challenges and trends[J]. Neurocomputing, 2020, 408: 189-215. [55] Kamaruddin Mardhiah,Nadiah Wan-Arfah,Nyi Nyi Naing,Muhammad Radzi Abu Hassan,Huan-Keat Chan.Predictors of in-hospital mortality by logistic regression analysis among melioidosis patients in Northern Malaysia: A retrospective study[J].Asian Pacific Journal of Tropical Medicine,2021,14(08):356-363. [56] Schonlau M, Zou R Y. The random forest algorithm for statistical learning[J]. The Stata Journal, 2020, 20(1): 3-29. [57] Dang V H, Dieu T B, Tran X L, et al. Enhancing the accuracy of rainfall-induced landslide prediction along mountain roads with a GIS-based random forest classifier[J]. Bulletin of Engineering Geology and the Environment, 2019, 78(4): 2835-2849. [58] Mazini M, Shirazi B, Mahdavi I. Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and AdaBoost algorithms[J]. Journal of King Saud University-Computer and Information Sciences, 2019, 31(4): 541-553. [59] 肖淑玉,高静,孙志谦,汪鹏,张旭,王健龄,沈福海.多层感知器神经网络模型对职业性煤工尘肺发病预测研究[J].中国职业医学,2021,48(01):19-25. [60] 石怀涛,尚亚俊,白晓天,郭磊,马辉.基于贝叶斯优化的SWDAE-LSTM滚动轴承早期故障预测方法研究[J].振动与冲击,2021,40(18):286-297. [61] Das I, Stein A, Kerle N, et al. Landslide susceptibility mapping along road corridors in the Indian Himalayas using Bayesian logistic regression models[J]. Geomorphology, 2012, 179: 116-125. [62] 马骏,杨镜宇,邹立岩.基于Stacking集成元模型的作战体系能力图谱生成方法[J].系统工程与电子技术,2022,44(01):154-163. [63] Tanveer M, Rajani T, Rastogi R, et al. Comprehensive review on twin support vector machines[J]. Annals of Operations Research, 2022: 1-46. [64] 胡艳梅,杨波,多滨.基于网络结构的正则化逻辑回归[J].计算机科学,2021,48(07):281-291. [65] Chen N, Wang H B, Wu B Q, et al. Using random forest to detect multiple inherited metabolic diseases simultaneously based on GC-MS urinary metabolomics[J]. Talanta, 2021, 235: 122720. [66] Ferreira J M, Pires I M, Marques G, et al. Identification of daily activites and environments based on the adaboost method using mobile device data: A systematic review[J]. Electronics, 2020, 9(1): 192. [67] 邓威,郭钇秀,李勇,朱亮,刘定国.基于特征选择和Stacking集成学习的配电网网损预测[J].电力系统保护与控制,2020,48(15):108-115. [68] Dou J, Yunus A P, Bui D T, et al. Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan[J]. Landslides, 2020, 17(3): 641-658. [69] El-Rashidy N, El-Sappagh S, Abuhmed T, et al. Intensive care unit mortality prediction: An improved patient-specific stacking ensemble model[J]. IEEE Access, 2020, 8: 133541-133564. [70] 周健.基于三层架构的软件体系设计与应用[J].电子技术与软件工程,2017(08):46-47. [71] Warman I, Ramdaniansyah R. Analisis Perbandingan Kinerja Query Database Management System (Dbms) Antara Mysql 5.7. 16 Dan Mariadb 10.1[J]. Jurnal Teknoif ITP, 2018, 6(1): 32-41.
﹀
|
中图分类号: |
tp391.41
|
开放日期: |
2022-06-22
|