查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于强化学习的移动机器人路径规划方法研究
姓名：	吕佳豪
学号：	20208049017
保密级别：	保密（1年后开放）
论文语种：	chi
学科代码：	081203
学科名称：	工学 - 计算机科学与技术（可授工学、理学学位） - 计算机应用技术
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	计算机科学与技术学院
专业：	计算机科学与技术
研究方向：	媒体计算与可视化
第一导师姓名：	马天
第一导师单位：	西安科技大学
论文提交日期：	2023-06-13
论文答辩日期：	2023-06-06
论文外文题名：	Research on Path Planning Method for Mobile Robot Based on Reinforcement Learning
论文中文关键词：	强化学习 ; 移动机器人 ; 路径规划 ; Q-learning ; 近端策略优化
论文外文关键词：	Reinforcement learning ; Mobile robot ; Path planning ; Q-learning ; Proximal policy optimization
论文中文摘要：	︿近年来，随着智能化装备在众多行业的广泛普及，研究人员愈发关注其在执行任务过程中的自主能力。移动机器人作为应用最为广泛的智能化装备之一，在面对复杂多变的环境时，其进行自主路径规划和避障的能力尤为重要。而现阶段大部分研究依赖于已知的环境信息，这将限制移动机器人随机应变的能力，无法高效的完成自主路径规划任务。因此，本文针对二维和三维场景中的未知环境，研究了基于强化学习的移动机器人自主路径规划方法。本文的主要研究内容如下：针对现有算法无法在复杂环境下进行有效的二维路径规划，且容易陷入局部最优解的问题，提出一种基于连续局部搜索策略的强化学习方法，即将一个复杂目标分解为若干个简单子目标。该方法以Q-learning算法为基础并优化，将环境先验知识添加到Q表的初始化过程中，引导移动机器人向局部环境中的子目标前进。在局部环境内该方法采取一种动态调整ε-greedy策略来实时优化每步的搜索能力，然后在若干个连续的局部环境中完成逐个的子目标路径规划任务，最终实现一个复杂目标的自主路径规划任务。实验结果表明，提出的方法能够很好的解决Q-learning算法中有关稀疏奖励反馈的问题，且相较于其他基于Q-learning算法的优化方法具有更好的收敛速度和路径结果。针对三维未知环境中存在的高复杂度和不确定性的问题，提出了一种基于有限观测空间优化策略下的深度强化学习方法。该方法以近端策略优化算法框架为基础，并设计了相关三维路径规划任务模型。首先，将移动机器人的可视范围限制在一定范围空间内，通过深度图建模的方式来模拟移动机器人以第一视角在未知环境下对环境信息的探索。然后，采取两阶段离散动作空间的方式来引导移动机器人进行高效的动作移动，最大化移动机器人的单步搜索的能力。最后，在设计的网络模型中采用GRU来结合历史状态信息以更好的利用过去的信息从而提高决策效率。实验结果表明，提出的方法不仅能够在未知环境条件下实现更高的路径规划任务搜索成功率，同时保持了良好的搜索效率。此外，该方法也能够应用在多目标路径规划任务中，并且达到了理想的环境搜索覆盖率。综上所述，本文主要研究了如何在多元化场景中执行相关移动机器人路径规划任务，并基于此对简单强化学习和深度强化学习依次展开了深入研究，这对于移动机器人在面对不同场景下执行不同类型任务具有一定的研究意义和应用价值。﹀
论文外文摘要：	︿ In recent years, with the widespread use of intelligent equipment in various industries, researchers have become increasingly eager to pursue the autonomy of intelligent equipment. As one of the most widely used intelligent equipment in various application areas, mobile robots' ability to autonomously plan paths and avoid obstacles is particularly important when facing complex and changing environments. However, at present, most research relies on known environmental information, which will limit the mobile robot's ability to adapt to random changes and efficiently accomplish autonomous path planning tasks. Therefore, this thesis focuses on the research of mobile robot autonomous path planning methods based on reinforcement learning in unknown environments in two-dimensional and three-dimensional scenarios. The main research content of this thesis is as follows: In response to the existing problem that current algorithms cannot effectively perform two-dimensional path planning in complex environments and are prone to get trapped in local optima, a reinforcement learning method based on continuous local search strategy is proposed. This method decomposes a complex goal into several simple sub-goals. The method is based on the Q-learning algorithm and optimized by adding prior environmental knowledge to the initialization process of the Q table, which guides the mobile robot to move towards the sub-goals in the local environment. In the local environment, this method adopts a dynamic ε-greedy strategy to optimize the search ability of each step in real-time. Then, it completes the planning tasks of individual sub-goals in several continuous local environments, ultimately achieving autonomous path planning tasks for a complex goal. Experimental results show that the proposed method can effectively solve the problem of sparse reward feedback in the Q-learning algorithm, and has better convergence speed and path results compared to other optimization methods based on the Q-learning algorithm. To overcome the high complexity and uncertainty in three-dimensional unknown environments, a deep reinforcement learning method is proposed based on an optimization strategy with limited observation space. This method is based on the proximal policy optimization algorithm framework, and a related three-dimensional path planning task model is designed. First, the visible range of the mobile robot is limited to a specific range space, and the modeling method of the depth map is used to simulate the exploration of information by the mobile robot in an unknown environment from a first-person perspective. Then, a two-stage discrete action space is adopted to guide the mobile robot to move efficiently, maximizing the single-step search capability of the mobile robot. Finally, GRU is used in the designed network model to combine the historical state information to make better use of the past information and improve the decision-making efficiency. Experimental results show that the proposed method not only achieves a high success rate of search task under unknown environmental conditions but also maintains good search efficiency. In addition, this method can also be applied to multi-objective path planning tasks and achieves the ideal environmental search coverage. In conclusion, this thesis mainly studies how to execute related mobile robot path planning tasks in diversified scenarios. Based on this, in-depth research is carried out on simple reinforcement learning and deep reinforcement learning, which has certain research significance and application value for mobile robots to execute different types of tasks in different scenarios. ﹀
中图分类号：	TP391.4
开放日期：	2024-06-13

附件下载