系统工程与电子技术 ›› 2023, Vol. 45 ›› Issue (1): 193-201.doi: 10.12305/j.issn.1001-506X.2023.01.23

• 制导、导航与控制 • 上一篇    

基于强化学习的改进三维A*算法在线航迹规划

任智1,2, 张栋1,2,*, 唐硕1,2   

  1. 1. 西北工业大学航天学院, 陕西 西安 710072
    2. 陕西省空天飞行器设计重点实验室, 陕西 西安 710072
  • 收稿日期:2021-08-12 出版日期:2023-01-01 发布日期:2023-01-03
  • 通讯作者: 张栋
  • 作者简介:任智(1999—), 男, 博士研究生, 主要研究方向为飞行器集群智能规划与自主控制
    张栋(1986—), 男, 副教授, 博士, 主要研究方向为飞行器集群智能规划与自主控制
    唐硕 (1963—), 男, 教授, 博士, 主要研究方向为飞行动力学与制导
  • 基金资助:
    国家自然科学基金重点项目(61933010);国家自然科学基金(61903301)

Improved three-dimensional A* algorithm of real-time path planning based on reinforcement learning

Zhi REN1,2, Dong ZHANG1,2,*, Shuo TANG1,2   

  1. 1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China
    2. Shaanxi Key Laboratory of Space Vehicle Design, Xi'an 710072, China
  • Received:2021-08-12 Online:2023-01-01 Published:2023-01-03
  • Contact: Dong ZHANG

摘要:

针对飞行器在线航迹规划对算法实时性与结果最优性要求高的问题,基于强化学习方法改进三维A*算法。首先,引入收缩因子改进代价函数的启发信息加权方法提升算法时间性能;其次,建立算法实时性与结果最优性的性能变化度量模型,结合深度确定性策略梯度方法设计动作-状态与奖励函数,对收缩因子进行优化训练;最后,在多场景下对改进后的三维A*算法进行仿真验证。仿真结果表明,改进算法能够在保证航迹结果最优性的同时有效提升算法时间性能。

关键词: 改进A*算法, 收缩因子, 强化学习, 深度确定性策略梯度, 在线航迹规划

Abstract:

In order to address the problem of high requirements for real-time performance and optimality of real-time path planning, a three-dimensional A* algorithm is improved based on the reinforcement learning method. Firstly, the shrinkage factor is introduced to ameliorate the heuristic information weighting method of the improved cost function, so as to improve the time performance. Secondly, a measurement model is established to measure the real-time performance and optimality of the algorithm. Combined with the deterministic policy gradient method, the action-state and reward functions are designed to optimize the shrinkage factor. Finally, the improved three-dimensional A* algorithm is simulated in multiple scenarios, and the simulation results show that the improved algorithm can ensure the optimality of the track results and effectively improve the time performance of the algorithm.

Key words: improved A* algorithm, shrinkage factor, reinforcement learning, deep deterministic policy gradient, real-time path planning

中图分类号: