系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (10): 3190-3199.doi: 10.12305/j.issn.1001-506X.2022.10.23

• 制导、导航与控制 • 上一篇    下一篇

基于深度强化学习的驾驶仪参数快速整定方法

万齐天1, 卢宝刚2, 赵雅心3, 温求遒1,*   

  1. 1. 北京理工大学宇航学院, 北京 100081
    2. 北京航天长征飞行器研究所, 北京 100076
    3. 中国运载火箭技术研究院, 北京 100076
  • 收稿日期:2021-12-07 出版日期:2022-09-20 发布日期:2022-10-24
  • 通讯作者: 温求遒
  • 作者简介:万齐天(1998—), 男, 硕士研究生, 主要研究方向为飞行器制导与控制技术|卢宝刚(1985—), 男, 高级工程师, 博士研究生, 主要研究方向为飞行力学|赵雅心(1987—), 女, 工程师,硕士研究生, 主要研究方向为飞行器控制技术|温求遒(1982—), 男, 副教授, 博士, 主要研究方向为飞行器制导与控制技术、飞行器总体设计
  • 基金资助:
    航空科学基金(202037012003)

Autopilot parameter rapid tuning method based on deep reinforcement learning

Qitian WAN1, Baogang LU2, Yaxin ZHAO3, Qiuqiu WEN1,*   

  1. 1. School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China
    2. Beijing Institute of Space Long March Vehicle, Beijing 100076, China
    3. China Academy of Launch Vehicle Technology, Beijing 100076, China
  • Received:2021-12-07 Online:2022-09-20 Published:2022-10-24
  • Contact: Qiuqiu WEN

摘要:

针对深度强化学习方法对驾驶仪控制参数训练速度慢、奖励函数收敛性不好等问题, 以三回路驾驶仪极点配置算法为核心, 提出一种将三维控制参数转换为一维设计参量的智能训练方法, 构建离线深度强化学习训练叠加在线多层感知器神经网络实时计算的智能控制架构, 在提高深度强化学习算法的效率和奖励函数收敛性同时, 确保在大范围飞行状态变化条件下控制参数的快速在线自整定。以典型再入飞行器为例, 完成深度强化学习训练和神经网络部署。仿真结果表明,强化学习动作空间简化后的训练效率更高, 训练得到的驾驶仪对控制指令的跟踪误差在1.2%以内。

关键词: 强化学习, 自动驾驶仪, 参数整定, 智能控制, 归一化

Abstract:

Aiming at the problem of slow training speed and poor convergence of deep reinforcement learning method for the autopilot control parameters training, an intelligent training method that converts three-dimensional control parameters into one-dimensional design parameters is proposed with the three-loop autopilot pole placement method as the core. The intelligent control architecture of offline deep reinforcement learning training and online multi-layer perceptron neural network real-time calculation is constructed, which improves the efficiency and convergence of deep reinforcement learning algorithm and ensures the rapid online tuning of control parameters under the condition of large-scale flight state changes. Taking a typical reentry aircraft as an example, the deep reinforcement learning training and neural network deployment are accomplished. The simulation results show that the training efficiency of the simplified reinforcement learning action space is higher, and the tracking error of the controller to the control command is less than 1.2% by the proposed parameter rapid tuning method based on deep reinforcement learning.

Key words: reinforcement learning, autopilot, parameter tuning, intelligent control, normalization

中图分类号: