论文中文题名: | 基于深度强化学习的多目标 动态路径规划研究 |
姓名: | |
学号: | 21207223067 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085400 |
学科名称: | 工学 - 电子信息 |
学生类型: | 硕士 |
学位级别: | 工程硕士 |
学位年度: | 2024 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 路径规划与导航算法 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
论文提交日期: | 2024-06-12 |
论文答辩日期: | 2024-05-28 |
论文外文题名: | Research on multi-objective dynamic path planning based on deep reinforcement learning |
论文中文关键词: | |
论文外文关键词: | Multi-objective Path Planning ; Deep Reinforcement Learning ; Monte Carlo Tree Search ; Multi-layer Long Short-Term Memory Networks |
论文中文摘要: |
随着科技的不断发展和应用的广泛普及,人们对路径规划系统的性能要求不断提高,对系统性能的标准也在不断提升。为了解决现实中复杂环境下的任务,研究人员提出了基于深度强化学习的路径规划算法。深度Q网络(Deep Q Network,DQN)作为一种重要的深度学习算法,在路径规划领域已经展示了其潜力,尤其是在解决单个智能体到单一目标点的路径规划问题上。然而,在处理多目标点的复杂场景时,DQN算法仍面临计算效率和决策质量之间难以平衡的问题。针对以上问题,本文具体研究如下: 为了解决序列固定方法缺乏足够的灵活性的问题,本文提出了一种改进DQN的路径规划算法-融合了蒙特卡罗树搜索(Monte Carlo Tree Search,MCTS)与DQN的MCTS-DQN多目标路径规划算法。该算法结合了MCTS的全局搜索能力与DQN的局部决策优化能力,首先对多个目标点进行序列化处理,建立一个决策搜索树,有效的探索从初始点出发到达多个目标点的潜在路径,然后利用DQN根据搜索结果来做出局部层面的决策,优化短期内智能体动作的选择。这种结合策略不仅大幅提升了决策的质量,还借助DQN快速生成决策的能力,显著增强了整体的计算效率。 为了解决DQN在处理动态环境中时序数据能力的不足,本文对DQN中的多层感知机(MLP)结构进行了改进,引入了多层长短期记忆(LSTM)网络和优先级经验回放(PER)机制,提出了一种基于深度递归Q网络(DRQN)的动态路径规划策略。LSTM网络的门控机制有效的解决了梯度消失或梯度爆炸问题,而PER机制则优化了经验的学习过程,使得算法更加关注关键学习经验。通过建立一个动态环境的仿真模型,本文验证了该算法在各种环境条件下的有效性。实验结果表明,结合MCTS的DQN算法在处理复杂决策空间的情况下,不仅能够维持较高的决策质量,还能提高10%至25%的学习效率。此外,基于DRQN的动态路径规划方法显著提高了算法的记忆力和泛化性,与传统的DQN路径规划方法相比,在路径规划的精确度和稳定性方面表现出显著的优势。 |
论文外文摘要: |
With the continuous advancement of technology and its widespread application, the performance requirements for path planning systems are increasingly stringent, and the standards for system performance are continuously rising. To address tasks in complex real-world environments, researchers have proposed path planning algorithms based on deep reinforcement learning. The Deep Q Network , as a pivotal deep learning algorithm, has demonstrated its potential in the realm of path planning, particularly in solving path planning problems from a single agent to a single target point. However, the DQN algorithm still faces challenges in balancing computational efficiency and decision quality when dealing with complex scenarios involving multiple target points. This paper specifically investigates the following issues: To tackle the problem of insufficient flexibility in sequential fixed methods, this paper proposes an enhanced DQN-based path planning algorithm that integrates MCTS with DQN, forming the MCTS-DQN multi-objective path planning algorithm. This algorithm combines the global search capability of MCTS with the local decision optimization prowess of DQN. It starts by serializing multiple targets and constructing a decision search tree, effectively exploring potential paths from the start point to multiple targets, then uses DQN to make local-level decisions based on the search outcomes, optimizing the selection of agent actions in the short term. This strategy not only significantly improves the quality of decision-making but also leverages DQN's rapid decision-making capability to greatly enhance overall computational efficiency. To address the deficiency of DQN in processing temporal data within dynamic environments, this paper improves upon the MLP structure within DQN by incorporating LSTM networks and a PER mechanism, introducing a dynamic path planning strategy based on the DRQN. The gated mechanism of the LSTM networks effectively mitigates issues of gradient vanishing or explosion, while the PER mechanism optimizes the learning process of experiences, allowing the algorithm to focus more on key learning experiences. By establishing a simulation model of dynamic environments, this paper verifies the effectiveness of the proposed algorithm under various environmental conditions. Experimental results indicate that the DQN algorithm, when integrated with MCTS, not only maintains high-quality decision-making in complex decision spaces but also enhances learning efficiency by 10% to 25%. Furthermore, the dynamic path planning method based on DRQN significantly improves the algorithm's memory and generalization capabilities, showing substantial advantages in accuracy and stability of path planning compared to traditional DQN path planning methods. |
参考文献: |
[5] 王梓强,胡晓光,李晓筱,等. 移动机器人全局路径规划算法综述[J]. 计算机科学,2021,48(10):19-29. [6] 徐思雨,吕冬慧,单仲,等.一种小型多功能农业机器人的设计与实现[J].机电工程技术,2023,52(04):194-198. [7] 朱胜涛,戴娟,刘海涛,等.Bezier曲线与A-Star算法融合的火星探测器路径规划[J].电子测量技术,2023,46(19):69-75. [8] 郭洪升,李忠伟,罗偲,等.基于混合人工蜂群算法和A~*算法的求解旅行商问题算法[J].科学技术与工程,2023,23(11):4718-4724. [9] 李全勇,李波,张瑞,等 . 基于改进 Dijkstra 算法的 AGV 路径规划研究[J]. 机械工程与自动化,2021,28(1):23-25. [10] 付留芳,周明,李文哲,等.基于遗传算法的UUV应召搜潜路径规划[J].电光与控制,2024,16(4):1-9. [12] 王凯,朱慧珍,王丽君.深度学习理论下移动机器人全局路径规划方法[J].计算机仿真,2023,40:431-439. [20] 郭利进,李强.基于改进RRT*算法的移动机器人路径规划[J].智能系统学报,2024,3:1-8. [21] 王磊,孙力帆.引入必经点约束的路径规划算法研究[J].计算机工程与应用,2020,56(21):25-29. [22] 袁红涛,朱美正.K 优路径的一种求解算法与实现[J]. 计算机工程与应用,2004,40(6):51-53. [23] 徐庆征,柯熙政.必经点最短路径问题模型及相应遗传算法研究[J].系统工程与电子技术,2009,31(2):459-462. [24] 胡杰,朱琪,陈锐鹏等.引入必经点约束的智能汽车全局路径规划[J].汽车工程,2023,45(03):350-360. [25] 黄书力,胡大裟,蒋玉明.经过指定的中间节点集的最短路径算法[J].计算机工程与应用,2015,51(11):41-46. [26] 刘丽珏,罗舒宁,高琰等.基于回溯蚁群-粒子群混合算法的多点路径规划[J].通信学报,2019,40(02):102-110. [27] 李奇儒,耿霞.基于改进DQN算法的机器人路径规划[J].计算机工程,2023,49(12):111-120. [30] Guan Z T, He H, Sloman A. Global optimal path planning for mobile robot based on [31] 黄宸希,韩韬,吴雪琼等.基于改进A~*算法的机器人路径规划与避障技术[J].电子设计工程,2024,32(01):24-28. [32] 刘威,储春华,肖明伟.一种基于时间和路径双重优化的改进A~*算法[J].制造业自动化,2023,45(12):173-177. [34] Yong S, Yibin L, Caihong L, et al. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory and Application, 2012, 29 (12):1623-1628. [36] Wang B, Liu Z, Li Q, et al. Mobile robot path planning in dynamic environments through [38] Zhang K, Cao J, Zhang Y. Adaptive digital twin and multiagent deep reinforcement learning for vehicular edge computing and networks[J]. IEEE transactions on industrial informatics, [48] Yong S, Yibin L, Caihong L, et al. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory and Application, 2012, 29 (12):1623-1628. [49] Yang Y, LI J, Peng L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3):177-183. |
中图分类号: | TP18 |
开放日期: | 2024-06-13 |