论文中文题名: | 基于深度强化学习的无人机协同任务规划方法研究 |
姓名: | |
学号: | 20208049014 |
保密级别: | 保密(1年后开放) |
论文语种: | chi |
学科代码: | 0812 |
学科名称: | 工学 - 计算机科学与技术(可授工学、理学学位) |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 计算机应用 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-20 |
论文答辩日期: | 2023-06-06 |
论文外文题名: | Deep reinforcement learning-based collaborative mission planning method for UAVs |
论文中文关键词: | |
论文外文关键词: | UAVs ; Task assignment ; Route planning ; Reinforcement learning ; Particle swarm optimization. |
论文中文摘要: |
无人机具有机动性好、隐蔽性高、可以代替人类执行各种复杂危险任务等优点,已经广泛应用于边防巡检、抢险救灾、农业灌溉、战场侦察和作战打击等各个方面。高效的无人机任务规划方法,能够增强无人机解决复杂问题的能力,提高无人机执行效率,这对促进无人机智能系统发展具有重要意义。 无人机任务分配和航路规划作为无人机任务规划的核心,得到了相关研究人员的广泛关注,但是由于任务多样化和可用资源有限等原因,现有方法仍存在一些问题:(1) 传统无人机任务分配方法对复杂问题的表征能力较弱,经常面临建模复杂、收敛困难、稳定性差等问题,难以适用于无人机多目标任务分配。(2) 现有航路规划算法在解决无人机航路规划问题时,经常陷入局部最优解,收敛较慢。针对以上问题,本文进行如下研究: (1) 针对无人机任务分配问题,本文充分考虑任务类型、数量、位置、无人机性能以及任务间关联约束等特征,建立无人机任务分配数学模型;在该模型基础上设计状态空间、动作空间、奖励策略等,提出一种新的深度强化学习算法用于解决无人机任务分配问题。该算法引入GRU神经网络单元,帮助智能体分别提取任务序列依赖关系和各个无人机的时空依赖信息;设计一种经验池轨迹链策略,在一定程度上克服了强化学习框架中奖励的延后性反馈问题,更好地帮助智能体发现任务分配过程中的有益状态和动作信息。最后,通过仿真实验与其他深度强化学习算法对比,验证了所提算法在解决无人机任务分配问题时具有更好的稳定性和收敛能力。 (2) 针对无人机航路规划问题,本文充分考虑飞行特性、地形和威胁信息等因素,建立无人机航路规划数学模型,提出一种新的混合粒子群算法用于解决航路规划问题。该算法将全局最优解的迭代更新和模拟退火算法结合,增强搜索空间的多样性,提高搜索全局最优解的能力;基于维度学习策略使每个粒子逐维整合全局最优解的有用信息,减少粒子在迭代过程中退化造成的收敛振荡现象,提高算法的收敛速度。最后,通过仿真实验验证了所提算法在解决无人机航路规划问题时具有更好的全局最优解,收敛更快。 |
论文外文摘要: |
Unmanned aerial vehicles (UAVs) have been widely utilized in various fields, such as border patrol, disaster relief, agricultural irrigation, reconnaissance and combat, with the advantages of excellent maneuverability, high concealment, and ability to replace humans in executing complex and dangerous tasks. Efficient UAVs mission planning methods can enhance the ability of UAVs to solve complex problems and improve their operational efficiency, which is of great significance in promoting the development of UAVs intelligent systems. The task assignment and route planning for unmanned aerial vehicles (UAVs) have attracted widespread attention from researchers as the core of UAVs mission planning. However, existing methods still face some challenges due to the diversity of tasks and limited available resources. Firstly, traditional UAVs task assignment methods have weak ability in representing complex problems and often encounter difficulties in modeling complexity, convergence, and stability issues, making it difficult to apply them to multi-objective UAVs task assignment. Secondly, current algorithms often get trapped in local optimal solutions and converge slowly when solving UAVs path planning problems. In response to the above issues, this thesis conducts the following research: (1) To address the UAVs mission planning problem, this thesis fully considers the characteristics of task type, quantity, location, UAVs performance, and correlation constraints between tasks, and establishes a mathematical model of UAVs task assignment problem. Based on this model, the thesis designs factors such as state space, action space, reward strategy, and proposes a new deep reinforcement learning algorithm. This algorithm introduces GRU (Gated Recurrent Unit) neural network units to help agents extract the dependency relationships of task sequences and spatiotemporal dependency information of individual UAVs. This enhances the convergence of the algorithm and improves the decision-making ability of agents for task assignment; A trajectory chain strategy for the experience pool is designed to overcome the problem of delayed feedback in the reward in the reinforcement learning framework to a certain extent. This helps agents discover useful states and action information during the task assignment process, accelerates neural network training, and improves the optimization efficiency of the algorithm. Finally, through simulation experiments and comparative analysis with other deep reinforcement learning algorithms, this thesis verifies that the proposed algorithm has better stability and convergence ability in the UAVs task assignment problem. (2) Aiming at the problem of UAVs task planning, this thesis fully considers factors such as flight characteristics, terrain, and threat information and establishes a mathematical model for UAVs route planning. A novel hybrid particle swarm optimization algorithm named SDPSO is proposed. This algorithm combines iterative updates of the global optimal solution with SA (Simulated Annealing) algorithm to enhance the diversity of the search space and improve SDPSO’s ability to search for the global optimal solution; each particle integrates the beneficial information of the optimal solution according to the dimensional learning strategy, which reduces the phenomenon of particles oscillation during the evolution process and increases the convergence speed of the SDPSO algorithm. Finally, simulation experiments are conducted to verify that the proposed algorithm has better global optimal solutions and convergence speed in solving UAVs route planning problems. |
中图分类号: | TP391 |
开放日期: | 2024-06-20 |