查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于强化学习的煤矿井下无人车路径规划算法研究
姓名：	卫健健
学号：	19206204106
保密级别：	公开
论文语种：	chi
学科代码：	085210
学科名称：	工学 - 工程 - 控制工程
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2022
培养单位：	西安科技大学
院系：	电气与控制工程学院
专业：	控制工程
研究方向：	智能控制工程
第一导师姓名：	陈文燕
第一导师单位：	西安科技大学
第二导师姓名：	周李兵
论文提交日期：	2022-06-26
论文答辩日期：	2022-06-07
论文外文题名：	Research on Path Planning Algorithm of Unmanned Vehicle in Coal Mine Based on Reinforcement Learning
论文中文关键词：	井下无人车 ; RRT算法 ; 强化学习 ; 路径规划 ; ROS
论文外文关键词：	Underground unmanned vehicle ; RRT algorithm ; Reinforcement learning ; Path planning ; ROS
论文中文摘要：	︿井下无人车属于智慧矿山的重要组成部分，而路径规划则是无人驾驶任务中极为重要的一环。由于煤矿井下存在巷道狭窄、障碍物较多且工作地点分散、易变动的特点，传统的路径规划算法会出现规划效率低、实时性和规划质量差等问题。强化学习可以通过“试错”的方式，让智能体学习周围环境，从而获得最大化收益，因此本文提出了将强化学习应用于路径规划算法的改进，旨在提高无人车路径规划的实时性和自适应性，确保生成路径的质量，主要工作有以下几个方面：（1）在全局路径规划算法方面，提出了一种结合强化学习算法的Q-RRT算法。针对RRT (Rapidly-exploring Random Trees)算法节点采样效率低的问题，利用设计好的奖励函数引导节点扩展，提高了算法搜索效率；同时通过剪枝方法及施加了约束条件的三次贝塞尔曲线来优化生成路径，仿真结果表明，该算法提高了路径规划的效率，并实现了路径的平滑处理。（2）在局部路径规划算法方面，提出了一种基于确定性策略和生成对抗模仿学习的GAIL-D3PG (Generative Adversarial Imitation Learning -D3PG)算法。首先设计了增加专家经验回放池的D3PG (Double experience replay DDPG)算法，对DDPG(Deep Deterministic Policy Gradient)算法中用到的经验回放池进行改进，加速了智能体对环境的探索效率；之后在D3PG算法的基础上，通过结合生成对抗模仿学习，设计了GAIL-D3PG算法，利用专家数据直接学习专家策略，进一步提高了算法的学习效率。实验结果表明，GAIL-D3PG算法相比其它算法的学习效率和训练效果有较大提升。（3）在ROS (Robotic Operating System)平台下对本文的算法进行了巷道仿真验证。利用Gazebo仿真器搭建了巷道仿真环境，选择Turtlebot3移动机器人作为仿真机器人进行训练，并将本文算法移植到Turtlebot3实体机器人，在仿真巷道和现实环境中验证了本文方法的可行性。本文提出的Q-RRT算法和GAIL-D3PG算法不仅提高了井下无人车路径规划质量，而且还可推广应用于井下救援、巡检以及地面复杂环境等场合的路径规划，具有一定的理论和实用意义。﹀
论文外文摘要：	︿ Underground unmanned vehicles are an important part of smart mines, and path planning is an extremely important part of unmanned tasks. Due to the characteristics of narrow tunnels, many obstacles, scattered and easily changeable working sites in coal mines, traditional path planning algorithms will have problems such as low planning efficiency, real-time performance and poor planning quality. Reinforcement learning can allow the agent to learn the surrounding environment through "trial and error", so as to maximize the benefits. Therefore, this paper proposes the application of reinforcement learning to the improvement of the path planning algorithm, which improves the real-time performance and the path planning of unmanned vehicles. The adaptability also ensures the quality of the path. The main work includes the following aspects: (1) In the aspect of global path planning algorithm, a Q-RRT algorithm combined with reinforcement learning algorithm is proposed. Aiming at the problem of low node sampling efficiency in the RRT (Rapidly-exploring Random Trees) algorithm, the method of designing a reward function is used to guide node expansion, which improves the algorithm search efficiency. At the same time, the pruning method and the cubic Bezier curve with constraints are used to optimize the generated path, the result shows that the algorithm improves the efficiency of path planning and realizes the smooth processing of the path. (2) In terms of local path planning algorithm, a GAIL-D3PG (Generative Adversarial Imitation learning-D3PG) algorithm was proposed based on deterministic strategy and Generative Adversarial Imitation Learning. First, the Double Experience Replay DDPG(D3PG) algorithm was designed to increase the expert experience replay pool, and the experience replay pool used in DDPG (Deep Deterministic Policy Gradient) algorithm was improved and the efficiency of exploring the environment is accelerated. Then, on the basis of D3PG algorithm, GAIL-D3PG algorithm is designed by combining generative adversarial imitation learning, which uses expert data to directly learn expert strategy, further improving the learning efficiency of the algorithm. The experiments show that the learning efficiency and training effect of the GAIL-D3PG algorithm are greatly improved compared with other algorithms. (3) The algorithm in this paper is verified by roadway simulation under the ROS (Robotic Operating System) platform. The roadway simulation environment was built by using the Gazebo simulator, and the Turtlebot3 mobile robot was selected as the simulation robot for training, and the algorithm in this paper was transplanted to the Turtlebot3 entity robot, and the feasibility of the method in this paper was verified in the simulated roadway and the real environment. The Q-RRT algorithm and GAIL-D3PG algorithm proposed in this paper not only improve the quality of underground unmanned vehicle path planning, but also can be widely applied to underground rescue, inspection and path planning in complex ground environment, which has certain theoretical and practical significance. ﹀
参考文献：	︿ [1]唐春庆. 煤矿经济形势分析[J]. 经济师, 2018(11): 233-234. [2]岑龙. 监控与通信技术在煤矿井下安全生产中的应用[J]. 通讯世界, 2016(12): 114-115. [3]贺佑国, 叶旭东, 王震. 关于煤炭工业“十三五”规划的思考[J]. 煤炭经济研究, 2015, 35(1): 6-8+21. [4]贺正楚, 潘红玉. 德国“工业4.0”与“中国制造2025”[J]. 长沙理工大学学报(社会科学版), 2015, 30(3): 103-110. [5]李树刚, 马莉, 杨守国. 互联网+煤矿安全信息化关键技术及应用构架[J]. 煤炭科学技术, 2016, 44(7): 34-40. [6]唐恩贤, 张玉良, 马骋. 煤矿智能化开采技术研究现状及展望[J]. 煤炭科学技术, 2019, 47(10): 111-115. [7]范京道. 煤矿智能化开采技术创新与发展[J]. 煤炭科学技术, 2017, 45(9): 65-71. [8]丁震, 孟峰. 矿用无人卡车国内外研究现状及关键技术[J]. 中国煤炭, 2020, 46(2): 42-49. [9]O. Khatib. Real-time obstacle avoidance for manipulators and mobile robots[C]// IEEE International Conference on Robotics and Automation, St. Louis, MO, USA, 1985: 500-505. [10]张杰. 移动机器人路径规划研究[D]. 上海：上海交通大学, 2014. [11]Tian L, Collins C. An effective robot trajectory planning method using a genetic algorithm[J]. Mechatronics, 2004, 14(5): 455-470. [12]O. Arslan, P. Tsiotras. Dynamic programming guided exploration for sampling-based motion planning algorithms[C]// IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, 2015: 4819-4826. [13]袁晓明，郝明锐. 煤矿无轨辅助运输无人驾驶关键技术与发展趋势研究[J]. 智能矿山, 2020, 1(1): 89-97. [14]鲍久圣, 张牧野, 葛世荣. 基于改进A*和人工势场算法的无轨胶轮车井下无人驾驶路径规划[J/OL]. 煤炭学报, 2022: 1-14. [15]于丹, 颜伟. 煤矿井下避灾路径规划研究综述[J]. 中国煤炭, 2022, 48(2): 40-47. [16]黄友锐, 李静, 韩涛等. 基于膜计算的煤矿井下机器人路径规划算法[J]. 工矿自动化, 2021, 47(11): 22-29. [17]成怡, 王赟, 修春波. 一种改进RRT算法在路径规划中的应用研究[J]. 控制工程, 2020, 27(3): 567-571. [18]M. R. Nimmagadda, S. Dattawadkar. Adaptive Directional Path Planner for Real-Time, Energy-Efficient, Robust Navigation of Mobile Robots[C]// IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020: 455-461. [19]Y. Wang, W. She, M. Fu, F. Ding and S. Dai, Real-time obstacle avoidance of hovercraft based on follow the gap with dynamic window approach[C]// OCEANS 2018 MTS/IEEE Charleston, 2018: 1-8. [20]Tian L, Collins C. An effective robot trajectory planning method using a genetic algorithm[J]. Mechatronics, 2004, 14(5) :455-470. [21]祁永强, 李帅, 李忠杰. 基于改进人工势场法的井下救援机器人的路径规划研究[J]. 徐州工程学院学报(自然科学版), 2019, 34(4): 32-37. [22]田子建, 高学浩, 张梦霞. 基于改进人工势场的矿井导航装置路径规划[J].煤炭学报, 2016, 41(S2): 589-597. [23]张春芳, 张传俊, 李艳华. 矿用井下救援机器人路径优化研究[J]. 成都工业学院学报, 2021, 24(1): 54-59. [24]皇甫淑云. 矿井救灾机器人障碍物识别与路径规划研究[D]. 徐州：中国矿业大学, 2020. [25]谭玉新, 杨维, 徐子睿. 面向煤矿井下局部复杂空间的机器人三维路径规划方法[J]. 煤炭学报, 2017, 42(6): 1634-1642. [26]杨亚新. 基于改进的蚁群算法路径规划方法研究[D]. 西安：西安科技大学, 2020. [27]周斌, 唐丽均, 刘世森. 基于遗传算法的井下车辆路径规划设计[J]. 煤矿机械, 2022, 43(1): 23-26. [28]刘思嘉, 童向荣. 基于强化学习的城市交通路径规划[J]. 计算机应用, 2021, 41(1): 185-190. [29]王猛. 基于激光雷达检测的水面机器人路径规划和避障方法研究[D]. 合肥：中国科学技术大学, 2020. [30]李婷. 基于强化学习的路径规划算法研究[D]. 长春：吉林大学, 2020. [31]Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. [32]牟治宇, 张煜, 范典, 等. 基于深度强化学习的无人机数据采集和路径规划研究[J]. 物联网学报, 2020, 4(3): 10. [33]Silver D, Lever G, Heess N. Deterministic policy gradient algorithms[C]// 31st International Conference on Machine Learning (ICML). Beijing, 2014: 605-619. [34]俸东升. 强化学习在移动机器人自主导航中的应用研究[D]. 成都：电子科技大学, 2020. [35]Gu S, Lillicrap T, Sutskever I. Continuous deep q-learning with model-based acceleration[C]// International Conference on Machine Learning (ICML). California, 2016:2829-2838. [36]Zijian HU, Xiaoguang GAO, Kaifang WAN, et al. Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204. [37]潘传超. 基于深度强化学习的室内移动机器人路径规划研究[D]. 徐州：中国矿业大学, 2020. [38]Rubinstein R Y. Simulation and the Monte Carlo Method[M]// Hoboken: Wiley, 2016. [39]Sutton R S, Mcallester D A, Singh S P. Policy gradient methods for reinforcement learning with function approximation[C]// Advances in neural information processing systems. Denver, 2000: 1057-1063. [40]Konda V R, Tsitsiklis J N. Actor-critic algorithms[C]// Advances in neural information processing systems. Denver, 2000: 1008-1014. [41]Chen T, Liu J Q, Li H. Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View[J]. 计算机科学技术学报:英文版, 2021, 36(5): 20. [42]Schulman J, Levine S, Abbeel P. Trust region policy optimization[C]// International conference on machine learning. Lille, 2015: 1889-1897. [43]龙建全. 复杂环境下移动机器人的路径规划[D]. 绵阳：西南科技大学, 2019. [44]Kuffner J J, Lavalle S M. RRT-connect: An efficient approach to single-query path planning[C]// Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings. IEEE, 2002: 995-1001. [45]Chernova S, Veloso M. Confidence-based policy learning from demonstration using gaussian mixture models[C]// Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems. Hawaii, 2007: 233. [46]Chen N, Bayer J, Urban S, et al. Efficient movement representation by embedding dynamic movement primitives in deep autoencoders[C]// IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids). Seoul, 2015: 434-440. [47]Ho J, Ermon S. Generative adversarial imitation learning[C]// Advances in Neural Information Processing Systems. Barcelona, 2016: 4565-4573. [48]唐维军. 自动驾驶汽车虚拟测试中的树木及车辆三维点云生成方法研究[D]. 西安：长安大学, 2021. [49]勾骅. 基于ROS的机器人智能导航系统研究[D]. 哈尔滨：哈尔滨理工大学, 2021. ﹀
中图分类号：	TP242.3
开放日期：	2022-06-27

附件下载