系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (1): 268-279.doi: 10.12305/j.issn.1001-506X.2025.01.27

• 制导、导航与控制 • 上一篇    下一篇

基于LSTM-DDPG的再入制导方法

闫循良1,*, 王宽1, 张子剑2, 王培臣1   

  1. 1. 西北工业大学航天学院陕西省空天飞行器设计重点实验室, 陕西 西安 710072
    2. 北京宇航系统工程研究所, 北京 100076
  • 收稿日期:2024-03-05 出版日期:2025-01-21 发布日期:2025-01-25
  • 通讯作者: 闫循良
  • 作者简介:闫循良(1984—), 男, 副研究员, 博士, 主要研究方向为高超声速飞行器弹道设计与制导、攻防对抗建模与仿真评估
    王宽(1998—), 男, 硕士研究生, 主要研究方向为高超声速飞行器弹道设计与制导
    张子剑(1987—), 男, 高级工程师, 博士, 主要研究方向为飞行器设计
    王培臣(1997—), 男, 博士研究生, 主要研究方向为高超声速飞行器弹道设计与制导
  • 基金资助:
    国家自然科学基金(11602296);陕西省自然科学基础研究计划(2019JM-434);智控实验室开放基金(2023-ZKSYS-KF04-02)

Reentry guidance method based on LSTM-DDPG

Xunliang YAN1,*, Kuan WANG1, Zijian ZHANG2, Peichen WANG1   

  1. 1. Shaanxi Aerospace Flight Vehicle Design Key Laboratory, School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China
    2. Beijing Institute of Aerospace Systems Engineering, Beijing 100076, China
  • Received:2024-03-05 Online:2025-01-21 Published:2025-01-25
  • Contact: Xunliang YAN

摘要:

针对现有基于深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法的再入制导方法计算精度较差, 对强扰动条件适应性不足等问题, 在DDPG算法训练框架的基础上, 提出一种基于长短期记忆-DDPG(long short term memory-DDPG, LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想, 在纵向制导方面, 首先针对再入制导问题构建强化学习所需的状态、动作空间; 其次, 确定决策点和制导周期内的指令计算策略, 并设计考虑综合性能的奖励函数; 然后, 引入LSTM网络构建强化学习训练网络, 进而通过在线更新策略提升算法的多任务适用性; 侧向制导则采用基于横程误差的动态倾侧反转方法, 获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic, CAV-H)再入滑翔为例进行仿真, 结果表明: 与传统数值预测-校正方法相比, 所提制导方法具有相当的终端精度和更高的计算效率优势; 与现有基于DDPG算法的再入制导方法相比, 所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。

关键词: 再入滑翔制导, 强化学习, 深度确定性策略梯度, 长短期记忆网络

Abstract:

A reentry guidance method based on long short term memory-deep deterministic policy gradient (LSTM-DDPG) is proposed on the basis of the training framework of the DDPG algorithm to address the problems of poor computational accuracy and insufficient adaptability to strong disturbance conditions in existing DDPG algorithm based reentry guidance methods. This method adopts the decoupling design concept of longitudinal and lateral guidance. In terms of longitudinal guidance, firstly, the state and action space required for reinforcement learning are constructed for the reentry guidance problem. Secondly, decision points and instruction calculation strategies within the guidance cycle are determined, and design a reward function that considers comprehensive performance is designed. Then, the LSTM network is introduced to construct a reinforcement learning training network, and multitasking applicability of the algorithms is improved through online updating strategies. Lateral guidance adopts a dynamic bank reversal method based on lateral error to obtain the sign of bank angle. Taking the American general aircraft common aero vehicle-hypersonic (CAV-H) reentry gliding as an example for simulation, the results show that compared with traditional numerical prediction correction methods, the proposed guidance method has significant terminal accuracy and higher computational efficiency advantages. Compared with existing reentry guidance methods based on the DDPG algorithm, the proposed guidance method has considerable computational efficiency, higher terminal accuracy, and robustness.

Key words: reentry gliding guidance, reinforcement learning, deep deterministic policy gradient (DDPG), long short term memory (LSTM) network

中图分类号: