- 无标题文档
查看论文信息

论文中文题名:

 基千机器学习的煤炭行业高质量发展博弈策略研究    

姓名:

 田铮铮    

学号:

 21301103003    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 070104    

学科名称:

 理学 - 数学 - 应用数学    

学生类型:

 硕士    

学位级别:

 理学硕士    

学位年度:

 2025    

培养单位:

 西安科技大学    

院系:

 理学院    

专业:

 数学    

研究方向:

 博弈论    

第一导师姓名:

 冯卫兵    

第一导师单位:

 西安科技大学    

第二导师姓名:

 杨爱丽    

论文提交日期:

 2025-06-24    

论文答辩日期:

 2025-06-08    

论文外文题名:

 Research on game strategy of coal industry high quality development based on machine learning    

论文中文关键词:

 煤炭行业 ; 演化博弈论 ; 高质量发展 ; 微分博弈 ; AIT-MADDPG    

论文外文关键词:

 Coal industry ; Evolutionary game theory ; High quality development ; Differential game ; ATT-MADDPG    

论文中文摘要:

在“ 双碳" 目标驱动能源结构深度转型的背景下,煤炭行业高质量发展已成为保障国家能源安全与实现可持续发展的核心命题。本文综合运用机器学习与博弈论理论方法,针对煤炭行业多元主体决策优化难题,构建系统化研究体系并提出可行性解决方案。
研究从博弈论视角切入,首先针对政府、煤炭企业、公众三方博弈中演化规律复杂、决策优化难度大的关键问题,构建了融合微分博弈与机器学习技术的煤炭行业高质量发展决策优化框架,给出了基于数值迭代的精确求解方法与基于多头注意力机制的启发式优化策略。其次,构建三方动态Stackelberg 博弈模型,深入剖析政府监管、企业创新与公众监督的策略互动机制, 揭示不同主体行为的动态演化规律。最后,提出基于多头注
意力机制的多智体强化学习方法(ATT-MADDPG) ,通过优化模型训练性能,显著降低微分博弈决策优化过程的计算复杂度。
通过对连续时间下三方动态Stackelberg 博弈进行仿真模拟,研究发现:煤炭企业创新努力与公众监督间存在显著协同效应,公众监督效能提升可有效激发企业转型动力,降低其对政策补贴的依赖;当企业创新投入不足时,政府需加大补贴力度以突破转型瓶颈。基于AIT-M战)DPG 的策略优化方案能够显著提升煤炭企业经济效益,同时验证了博弈策略优化算法在行业发展行为预测与决策支持中的有效性。本研究为多方主体协同推进煤炭行业绿色低碳转型提供了量化分析路径与决策支持工具。

论文外文摘要:

Under the background of the deep transfo1mation of energy structure driven by the goal of "double carbon", the high-quality development of coal industry has become the core proposition to ensure national energy security and realize sustainable development. In this paper, machine learning and game theory are comprehensively applied to solve the problem of multi-agent decision-making optimization in coal industry, and a systematic research system is constructed and feasible solutions are put forward.

The research starts from the perspective of game theory. Firstly, aiming at the key problems of complex evolution law and difficult decision-making optimization in the tripartite game among government, coal enterprises and the public, the paper puts forward an optimization framework of high-quality development decision-making in coal industry, which integrates differential game and machine learning technology, and expounds the accurate solution method based on numerical iteration and the heuristic optimization strategy based on multi-head attention mechanism. Secondly, a three-way dynamic Stackelberg game model is constructed to deeply analyze the strategic interaction mechanism among government supervision, enterprise innovation and public supervision, and reveal the dynamic evolution law of different subjects'behaviors. Finally, a multi-agent reinforcement learning method based on Multi-Agent Deep Deterministic Policy Gradient (ATI-MADDPG) is proposed, which significantly reduces the computational complexity of the decision-making optimization process of differential games by optimizing the training performance of the model.

Through the simulation of three-party dynamic Stackelberg game under continuous time, the research finds that there is a significant synergy between the innovation efforts of coal enterprises and public supervision, and the improvement of public supervision efficiency can effectively stimulate the transformation motivation of enterprises and reduce their dependence on policy subsidies; When the innovation investment of enterprises is insufficient, the government needs to increase subsidies to break through the bottleneck of transformation. The strategy optimization scheme based on ATT-MADDPG can significantly improve the economic benefits of coal enterprises, and at the same time verify the effectiveness of game
strategy optimization algorithm in industry development behavior prediction and decision support. This study provides a quantitative analysis path and decision support tool for multi-parties to jointly promote the green and low-carbon transformation of coal industry.

参考文献:

[1] 李媛, 赵丽萍. 供给侧改革进程中煤炭企业发展趋势及对策研究[J]. 煤炭技术, 2018,

37(05): 321-323.

[2] 西蒙·库兹涅茨. 各国的经济增长[M]. 常勋译. 上海: 商务印书馆, 1985: 377.

[3] 卡马耶夫. 经济增长的速度和质量[M]. 陈华山译. 武汉: 湖北人民出社, 1983:

19-25.

[4] Lucas Jr R E. On the mechanics of economic development[J]. Journal of monetary

economics, 1988, 22(1): 3-42.

[5] North D C. Institutions, institutional change and economic performance[J]. Cambridge

University, 1990. 53.

[6] 维诺德·托马斯, 王燕. 增长的质量[M]. 张绘, 唐仲, 林渊译. 北京: 中国财政经济

出版社, 2001: 147-153.

[7] Robert J. Barro. Quantity and Quality of Economic Growth[R]. Chile: Working Papers

from Central Bank, 2002.

[8] Hae S. Kim. Patterns of Economic Development: Correlations Affecting Economic

Growth and Quality of Life in 222 Countries[J]. Politics&Policy, 2017, 45(01): 83-104.

[9] 季辉, 刘东升, 马德浩. 双人博弈问题中的蒙特卡洛树搜索算法的改进[J]. 计算机

科学, 2018, 45(1): 140-143.

[10]刘志彪. 理解高质量发展:基本特征,支撑要素与当前重点问题[J]. 学术月刊, 2018,

50(7): 8.

[11]黄镇谨, 陆阳. Markov 决策过程不确定策略特征模式[J]. 计算机科学, 2019, 40(4):

263-266.

[12]吴金明. “二维五元”价值分析模型——关于支撑我国高质量发展的基本理论研究[J].

湖南社会科学, 2018(03): 113-129.

[13]黄镇谨, 陆阳. 贯彻中央经济工作会议精神推动高质量发展[J]. 宏观经济管理,

2018(02): 13-17.

[14]任保平. 新时代高质量发展的政治经济学理论逻辑及其现实性[J]. 人文杂志,

2018(2): 9.

[15]杜康豪, 宋睿卓, 魏庆来. 强化学习在机器博弈上的应用综述[J]. 控制工程, 2021,

28(10): 1998-2004.

[16]冯俏彬. 我国经济高质量发展的五大特征与五大途径[J]. 中国党政干部论坛, 2018(01): 59-61.

[17]任保平, 余定坤, 张恒巍. 静态贝叶斯博弈主动防御策略选取方法[J]. 西安电子科

技大学学报, 2016, 43(1): 144-150.

[18]雷捷维. 基于强化学习与博弈树搜索的非完备信息博弈算法的研究与应用[D]. 南昌:

南昌大学, 2020.

[19]Smith J M, Price G. The Logic ofAnimal Conflict[J]. Nature, 1973, 246(02): 15-18.

[20]Taylor P D, Jonker L B. Evolutionary stable strategies and game dynamics[J].

Mathematical biosciences, 1978, 40(1-2): 145-156.

[21]Daniel Friedman, K.C. Fung. International trade and the internal organization of firms:An

evolutionary approach[J]. Journal of International Economics, 1996, 41(01): 29.

[22]Helmut Bester, Werner Güth. Is altruism evolutionarily stable[J]. Journal of Economic

Behavior and Organization, 1998, 34(02): 31.

[23]Dufwenberg M, Güth W. Indirect evolution vs. strategic delegation: a comparison of two

approaches to explaining economic institutions[J]. European Journal of Political

Economy, 1999, 15(2): 281-295.

[24]李峰, 张瑶. 应用博奕理论的多目标分布式气动优化设计[J]. 航空学报, 2020, (04):

321-6.

[25]陶悦川, 孙荣峰, 姜建国. 基于博弈论的可再生能源证书交易双层优化模型[J]. 全

球能源互联网, 2021, 4(01): 64-76.

[26]程乐峰, 杨汝, 刘贵云. 多群体非对称演化博弈动力学及其在智能市场需求侧响应

中的应用[J]. 中国电机工程学报, 2020, 40(S1): 20-36.

[27]江涛, 王佟, 宋梅. 煤炭行业绿色矿山建设标准及其评价指标初步探讨[J]. 煤田地

质与勘探, 2020, 46(01): 1-7.

[28]于连超, 毕茜, 张卫国. 工业企业绿色转型评价体系构建[J]. 统计与决策, 2019(14):

3.

[29]李梦欣,任保平. 新时代中国高质量发展的综合评价及其路径选择[J]. 财经科学,

2019(05): 26-40.

[30]赵顺招, 翁焕斌. 高质量发展评价指标体系初探[J]. 中国统计, 2019(04): 69-71.

[31]王喜莲, 宋远扬. 能源化工产业高端化发展障碍因子诊断-以陕西省为例[J]. 煤炭

经济研究, 2020, 40(06): 66-71.

[32]Abapour S, Mohammadi-Ivatloo B, Hagh M T. Robust bidding strategy for demand

response aggregators in electricity market based on game theory[J]. Journal of Cleaner

Production, 2020, 243: 118393.

[33]Parameswaran K. Sustainability initiatives at ASARCO LLC: a mining company

perspective[J]. Metal Sustainability: Global Challenges, Consequences, and Prospects,

2016: 424-452.

[34]Y Yamamoto. A bidirectional payment system for mitigating the supply–demand

imbalance among prosumers based on the core of coalitional game theory under the

enhanced use of renewable energy[J]. Energy Economics, 2021, 96: 1-10.

[35]A Neto, T L Friesz, K Han. Coal power network oligopoly as a dynamic stackelberg

game[J]. Netw Spatial Econ, 2020, 16(4): 1211–1241.

[36]张国兴. 基于博弈视角的煤矿企业安全生产管制分析[J]. 管理世界, 2013(09): 184-

185.

[37]马媛, 潘亚君. 煤炭绿色开采技术推动策略研究: 基于政府与企业的演化博弈视角

[J]. 中国矿业, 2019, 28(10): 97-101+108.

[38]张伟, 张金锁, 刘杰. 基于演化博弈的煤炭资源绿色开采监管策略研究[J]. 西安科

技大学学报, 2016, 36(03): 349-355.

[39]孔繁晔. 碳约束下煤炭企业协同创新永续合作模式的机制研究[J]. 技术经济与管理

研究, 2018(07): 38-42.

[40]李晓利, 王泽江, 张洪潮. 基于博弈理论的煤炭富集区生态化发展决策分析[J]. 煤

矿安全, 2014, 45(01): 204-206.

[41]王广成, 曹飞飞. 基于演化博弈的煤炭矿区生态修复管理机制研究[J]. 生态学报,

2017, 37(12): 4198-4207.

[42]孔繁晔. 煤炭清洁利用改革中的博弈分析[J]. 经济问题, 2017(01): 96-102.

[43]曾繁伟, 石夫磊. 基于演化博弈模型的矿区环境治理及监管策略分析[J]. 煤矿开采,

2018, 23(04): 66-71.

[44]吕永卫, 霍丽娜. 基于演化博弈的煤炭企业低碳减排路径分析[J]. 系统科学学报,

2019, 27(02): 132-136.

[45]F S Gazijahani and J Salehi. Game Theory Based Profit Maximization Model for

Microgrid Aggregators With Presence of EDRP Using Information Gap Decision

Theory[J]. IEEE Systems Journal, 2020, 13(2): 1767-1775.

[46]J He, Y Li, H Li, Z Yuan, X Yang. Application of Game Theory in Integrated Energy

System Systems: A Review[J]. IEEE Access, 2020, 8: 93380-93397.

[47]R Eskandarpour , A Khodaei. Machine Learning Based Power Grid Outage Prediction in

Response to Extreme Events[J]. IEEE Transactions on Power Systems, 2022, 32(4):

3315-3316.

[48]G, Henri and N Lu. A Supervised Machine Learning Approach to Control Energy

Storage Devices[J]. IEEE Transactions on Smart Grid, 2021, 10(6): 5910-5919.

[49]BROUSTE A, FUKASAWA M, HINO H, et al. The yuima project: A computational

framework for simulation and inference of stochastic differential equations[J]. Journal

of Statistical Software, 2020, 57: 1-51.

[50]KAMIEN M I, SCHWARTZ N L. Dynamic optimization: the calculus of variations and

optimal control in economics and management[M]. Courier Corporation, 2018.

[51]BELLMAN R. Dynamic programming[J]. Science, 1966, 153(3731): 34-7.

[52]ARNOLD L. Stochastic differential equations[J]. New York, 1974,: 23-5.

[53]朱强. 高性能数值微分博弈[D]. 杭州: 浙江大学, 2020.

[54]于波. 煤炭企业高质量发展评价指标体系构建与应用——以伊泰煤炭有限公司为例

[D]. 呼和浩特: 内蒙古大学, 2002.

[55]CCTD中国煤炭市场网[EB/OL].[2024-09-25].https://www.cctd.com.cn/list-167-1.html.

[56]环渤海动力煤价格指数(BSPI)的实时价格数据[EB/OL].[2025-01-12].https://www.coal

china.org.cn/list-25-1.html

[57]中国工业统计年鉴[EB/OL].[2024-10-25].https://cnki.nbsti.net/CSYDMirror/trade/Year

book/Single/N2014020031?z=Z012.

[58]Seebauer S, Kulmer V, Frühmann C. Promoting adoption while avoiding rebound:

integrating disciplinary perspectives on market diffusion and carbon impacts of electric cars

and building renovations in Austria[J]. Energy, Sustainability and Society, 2019: 9-26.

[59]Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy

gradients[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).

[60]Abdulghani A M, Abdulghani M M, Walters W L, et al. Performance Evaluation of

Multi-Agent Reinforcement Learning Algorithms[J]. Intelligent Automation & Soft Computing, 2024, 39(2).

[61]Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of ppo in cooperative

multi-agent games[J]. Advances in neural information processing systems, 2022, 35:24611-24624.

中图分类号:

 O29    

开放日期:

 2025-06-25    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式