查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于深度强化学习的多目标动态路径规划研究
姓名：	李鹏程
学号：	21207223067
保密级别：	公开
论文语种：	chi
学科代码：	085400
学科名称：	工学 - 电子信息
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2024
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	电子与通信工程
研究方向：	路径规划与导航算法
第一导师姓名：	周远国
第一导师单位：	西安科技大学
第二导师姓名：	梁尚清
论文提交日期：	2024-06-12
论文答辩日期：	2024-05-28
论文外文题名：	Research on multi-objective dynamic path planning based on deep reinforcement learning
论文中文关键词：	多目标路径规划 ; 深度强化学习 ; 蒙特卡罗树搜索 ; 多层长短期记忆网络
论文外文关键词：	Multi-objective Path Planning ; Deep Reinforcement Learning ; Monte Carlo Tree Search ; Multi-layer Long Short-Term Memory Networks
论文中文摘要：	︿随着科技的不断发展和应用的广泛普及，人们对路径规划系统的性能要求不断提高，对系统性能的标准也在不断提升。为了解决现实中复杂环境下的任务，研究人员提出了基于深度强化学习的路径规划算法。深度Q网络（Deep Q Network，DQN）作为一种重要的深度学习算法，在路径规划领域已经展示了其潜力，尤其是在解决单个智能体到单一目标点的路径规划问题上。然而，在处理多目标点的复杂场景时，DQN算法仍面临计算效率和决策质量之间难以平衡的问题。针对以上问题，本文具体研究如下：为了解决序列固定方法缺乏足够的灵活性的问题，本文提出了一种改进DQN的路径规划算法-融合了蒙特卡罗树搜索（Monte Carlo Tree Search，MCTS）与DQN的MCTS-DQN多目标路径规划算法。该算法结合了MCTS的全局搜索能力与DQN的局部决策优化能力，首先对多个目标点进行序列化处理，建立一个决策搜索树，有效的探索从初始点出发到达多个目标点的潜在路径，然后利用DQN根据搜索结果来做出局部层面的决策，优化短期内智能体动作的选择。这种结合策略不仅大幅提升了决策的质量，还借助DQN快速生成决策的能力，显著增强了整体的计算效率。为了解决DQN在处理动态环境中时序数据能力的不足，本文对DQN中的多层感知机（MLP）结构进行了改进，引入了多层长短期记忆（LSTM）网络和优先级经验回放（PER）机制，提出了一种基于深度递归Q网络（DRQN）的动态路径规划策略。LSTM网络的门控机制有效的解决了梯度消失或梯度爆炸问题，而PER机制则优化了经验的学习过程，使得算法更加关注关键学习经验。通过建立一个动态环境的仿真模型，本文验证了该算法在各种环境条件下的有效性。实验结果表明，结合MCTS的DQN算法在处理复杂决策空间的情况下，不仅能够维持较高的决策质量，还能提高10%至25%的学习效率。此外，基于DRQN的动态路径规划方法显著提高了算法的记忆力和泛化性，与传统的DQN路径规划方法相比，在路径规划的精确度和稳定性方面表现出显著的优势。﹀
论文外文摘要：	︿ With the continuous advancement of technology and its widespread application, the performance requirements for path planning systems are increasingly stringent, and the standards for system performance are continuously rising. To address tasks in complex real-world environments, researchers have proposed path planning algorithms based on deep reinforcement learning. The Deep Q Network , as a pivotal deep learning algorithm, has demonstrated its potential in the realm of path planning, particularly in solving path planning problems from a single agent to a single target point. However, the DQN algorithm still faces challenges in balancing computational efficiency and decision quality when dealing with complex scenarios involving multiple target points. This paper specifically investigates the following issues: To tackle the problem of insufficient flexibility in sequential fixed methods, this paper proposes an enhanced DQN-based path planning algorithm that integrates MCTS with DQN, forming the MCTS-DQN multi-objective path planning algorithm. This algorithm combines the global search capability of MCTS with the local decision optimization prowess of DQN. It starts by serializing multiple targets and constructing a decision search tree, effectively exploring potential paths from the start point to multiple targets, then uses DQN to make local-level decisions based on the search outcomes, optimizing the selection of agent actions in the short term. This strategy not only significantly improves the quality of decision-making but also leverages DQN's rapid decision-making capability to greatly enhance overall computational efficiency. To address the deficiency of DQN in processing temporal data within dynamic environments, this paper improves upon the MLP structure within DQN by incorporating LSTM networks and a PER mechanism, introducing a dynamic path planning strategy based on the DRQN. The gated mechanism of the LSTM networks effectively mitigates issues of gradient vanishing or explosion, while the PER mechanism optimizes the learning process of experiences, allowing the algorithm to focus more on key learning experiences. By establishing a simulation model of dynamic environments, this paper verifies the effectiveness of the proposed algorithm under various environmental conditions. Experimental results indicate that the DQN algorithm, when integrated with MCTS, not only maintains high-quality decision-making in complex decision spaces but also enhances learning efficiency by 10% to 25%. Furthermore, the dynamic path planning method based on DRQN significantly improves the algorithm's memory and generalization capabilities, showing substantial advantages in accuracy and stability of path planning compared to traditional DQN path planning methods. ﹀
参考文献：	︿ [1] Kanda T, Ishiguro H, Ono T, et al. Effects of observation of robot-robot communication on human-robot communication[J]. Electronics & Communications in Japan, 2010, 87(5): 48-58. [2] Yang J, Mun J, Kwon S, et al. Electronic skin: recent progress and future prospects for skin-attachable devices for health monitoring, robotics, and prosthetics[J]. Advanced Materials, 2019, 31(48): 1904765. [3] Aggarwal S, Kumar N. Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges[J]. Computer Communications, 2020, 149: 270-299. [4] González D, Pérez J, Milanés V, et al. A review of motion planning techniques for automated vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(4): 1135-1145. [5] 王梓强，胡晓光，李晓筱，等. 移动机器人全局路径规划算法综述［J］. 计算机科学，2021,48(10):19-29. [6] 徐思雨,吕冬慧,单仲,等.一种小型多功能农业机器人的设计与实现[J].机电工程技术,2023,52(04):194-198. [7] 朱胜涛,戴娟,刘海涛,等.Bezier曲线与A-Star算法融合的火星探测器路径规划[J].电子测量技术,2023,46(19):69-75. [8] 郭洪升,李忠伟,罗偲,等.基于混合人工蜂群算法和A~算法的求解旅行商问题算法[J].科学技术与工程,2023,23(11):4718-4724. [9] 李全勇，李波，张瑞，等 . 基于改进 Dijkstra 算法的 AGV 路径规划研究［J］. 机械工程与自动化,2021,28(1):23-25. [10] 付留芳,周明,李文哲,等.基于遗传算法的UUV应召搜潜路径规划[J].电光与控制,2024,16(4):1-9. [11] Tang X ,Yang Y ,Liu T , et al.Path Planning and Tracking Control for Parking via Soft Actor-Critic Under Non-Ideal Scenarios[J].IEEE/CAA Journal of Automatica Sinica,2024,1:181-195. [12] 王凯,朱慧珍,王丽君.深度学习理论下移动机器人全局路径规划方法[J].计算机仿真,2023,40:431-439. [13] Hu J. Intelligent Decisionmaking System through LSTM Prediction Model and DQN algorithm[A]. In: Proceedings of the 2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)[C]. Dalian, China, 2022: 958-965. [14] Shu X, Zhang L, Sun Y, Tang J. Host–Parasite: Graph LSTM-in-LSTM for Group Activity Recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(2): 663-674. [15] Zhu F, Lv Y, Chen Y, Wang X, Xiong G, Wang F-Y. Parallel Transportation Systems: Toward IoT-Enabled Smart Urban Traffic Control and Management[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4063-4071. [16] Kuru K, Ansell D, Khan W, Yetgin H. Analysis and Optimization of Unmanned Aerial Vehicle Swarms in Logistics: An Intelligent Delivery Platform[J]. IEEE Access, 2019, 7: 15804-15831. [17] Grigorescu S, Trasnea B, et al. A survey of deep learning techniques for autonomous driving[J]. Journal of Field Robotics, 2020, 37(3): 326-386. [18] Xu X, Gao S, Tao M. Distributed Online Caching for High-Definition Maps in Autonomous Driving Systems[J]. IEEE Wireless Communications Letters, 2021, 10(7): 1390-1394. [19] Borrageiro G, Firoozye N, Barucca P. Reinforcement Learning for Systematic FX Trading[J]. IEEE Access, 2022, 10: 5024-5036. [20] 郭利进,李强.基于改进RRT算法的移动机器人路径规划[J].智能系统学报,2024,3:1-8. [21] 王磊,孙力帆.引入必经点约束的路径规划算法研究[J].计算机工程与应用,2020,56(21):25-29. [22] 袁红涛，朱美正.K 优路径的一种求解算法与实现[J]. 计算机工程与应用，2004，40(6):51-53. [23] 徐庆征，柯熙政.必经点最短路径问题模型及相应遗传算法研究[J].系统工程与电子技术，2009,31(2):459-462. [24] 胡杰,朱琪,陈锐鹏等.引入必经点约束的智能汽车全局路径规划[J].汽车工程,2023,45(03):350-360. [25] 黄书力，胡大裟，蒋玉明.经过指定的中间节点集的最短路径算法[J].计算机工程与应用，2015,51(11):41-46. [26] 刘丽珏,罗舒宁,高琰等.基于回溯蚁群-粒子群混合算法的多点路径规划[J].通信学报,2019,40(02):102-110. [27] 李奇儒,耿霞.基于改进DQN算法的机器人路径规划[J].计算机工程,2023,49(12):111-120. [28] M. Kobayashi and N. Motoi, "Local Path Planning: Dynamic Window Approach With Virtual Manipulators Considering Dynamic Obstacles," in IEEE Access, vol. 10, 17018-17029. [29] Liu L S, Wang B, Xu H. Research on path-planning algorithm integrating optimization A-star algorithm and artificial potential field method[J]. Electronics, 2022, 11(22): 3660 -3660. [30] Guan Z T, He H, Sloman A. Global optimal path planning for mobile robot based on improved Dijkstra algorithm and ant system algorithm[J]. Journal of Central South University of Technology, 2006, 13(1): 80-86. [31] 黄宸希,韩韬,吴雪琼等.基于改进A~算法的机器人路径规划与避障技术[J].电子设计工程,2024,32(01):24-28. [32] 刘威,储春华,肖明伟.一种基于时间和路径双重优化的改进A~算法[J].制造业自动化,2023,45(12):173-177. [33] Chu P, Vu H, Yeo D, et al. Robot reinforcement learning for automatically avoiding a dynamic obstacle in a virtual environment[J]. Lecture Notes in Electrical Engineering, 2015, 352: 157-164. [34] Yong S, Yibin L, Caihong L, et al. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory and Application, 2012, 29 (12):1623-1628. [35] Panov A, Yakovlev K, Suvorov R. Grid path planning with deep reinforcement learning: preliminary results[J]. Procedia Computer Science, 2018, 123(2018): 347-353. [36] Wang B, Liu Z, Li Q, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J]. IEEE Robotics and Automation Letters, 2020, 5(4):6932-6939. [37] Yao J, Li X, Zhang Y, Ji J, Wang Y, Liu Y. Path Planning of Unmanned Helicopter in Complex Dynamic Environment Based on State-Coded Deep Q-Network. Symmetry. 2022; 14(5):856. [38] Zhang K, Cao J, Zhang Y. Adaptive digital twin and multiagent deep reinforcement learning for vehicular edge computing and networks[J]. IEEE transactions on industrial informatics, 2022,18(2): 1405-1413. [39] Qin J, Li M, Shi Y, Ma Q, Zheng WX. Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(1): 85-96. [40] Wang J, Mi X, Shen H, Park JH, Shi K. Optimal Control for Interconnected Multi-Area Power Systems With Unknown Dynamics: An Off-Policy Q-Learning Method[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2023. [41] Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning[J]. Computer Science, 2013, 9(6): 10-16． [42] Xie P ,Zhang J ,Wu P , et al.Multicellular tumor spheroids bridge the gap between two-dimensional cancer cells and solid tumors: The role of lipid metabolism and distribution[J].Chinese Chemical Letters,2023,34(02):291-294. [43] 庄灿,Guo Mingqiang,Xie Zhong.A geospatial service composition approach based on MCTS with temporal-difference learning[J].High Technology Letters,2021,27(01):17-25. [44] Ward CD, Cowling PI. Monte Carlo search applied to card selection in Magic: The Gathering[A]. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games[C]. Milan, Italy, 2009: 9-16. [45] Jacobsen EJ, et al. Monte Mario: platforming with MCTS[A]. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation[C]. 2014. [46] Benbassat A, Sipper M. EvoMCTS: Enhancing MCTS-based players through genetic programming[A]. In: Proceedings of the 2013 IEEE Conference on Computational Intelligence in Games (CIG)[C]. Niagara Falls, ON, Canada, 2013: 1-8. [47] Cowling PI, Ward CD, Powley EJ. Ensemble determinization in Monte Carlo Tree Search for the imperfect information card game Magic: The Gathering[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4: 241-257. [48] Yong S, Yibin L, Caihong L, et al. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory and Application, 2012, 29 (12):1623-1628. [49] Yang Y, LI J, Peng L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3):177-183. [50] Gao X, Dong Y, Han Y. An Optimized Path Planning Method for Container Ships in Bohai Bay Based on Improved Deep Q-Learning[J]. IEEE Access, 2023, 11: 91275-91292. [51] Yang J, Ni J, Xi M, Wen J, Li Y. Intelligent Path Planning of Underwater Robot Based on Reinforcement Learning[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20(3): 1983-1996. [52] Lv L, Zhang S, Ding D, Wang Y. Path Planning via an Improved DQN-Based Learning Policy[J]. IEEE Access, 2019, 7: 67319-67330. [53] Nakamura T, Kobayashi M, Motoi N. Path Planning for Mobile Robot Considering Turnabouts on Narrow Road by Deep Q-Network[J]. IEEE Access, 2023, 11: 19111-19121. [54] Hebiri M, Lederer J. Linear activation functions in neural networks[J]. Neural Computation, 2020, 32(5): 1234-1258. [55] Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554. [56] Zhu S, Gui L, Cheng N, Sun F, Zhang Q. Joint Design of Access Point Selection and Path Planning for UAV-Assisted Cellular Networks[J]. IEEE Internet of Things Journal, 2020, 7(1): 220-233. [57] Gu Y, Zhu Z, Lv J, et al. DM-DQN: Dueling Munchausen deep Q network for robot path planning[J]. Complex & Intelligent Systems, 2022: 1-14. [58] Koziol S. Multi-Objective Path Planning for Autonomous Robots Using Reconfigurable Analog VLSI[J]. IEEE Access, 2020, 8: 80134-80147. ﹀
中图分类号：	TP18
开放日期：	2024-06-13

附件下载