系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (3): 1038-1047.doi: 10.12305/j.issn.1001-506X.2024.03.30

• 制导、导航与控制 • 上一篇    下一篇

基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究

李彦铃1, 罗飞舟2, 葛致磊1,*   

  1. 1. 西北工业大学航天学院, 陕西 西安 710072
    2. 中国运载火箭技术研究院, 北京 100076
  • 收稿日期:2023-02-17 出版日期:2024-02-29 发布日期:2024-03-08
  • 通讯作者: 葛致磊
  • 作者简介:李彦铃(1999—), 女, 硕士研究生, 主要研究方向为智能控制、导航与制导、飞行器姿态控制
    罗飞舟(1967—), 男, 工程师, 硕士, 主要研究方向为总体控制、导航与制导
    葛致磊(1979—), 男, 副教授, 博士, 主要研究方向为智能控制、导航与制导、飞行器控制、地磁导航

Robust observer-based deep reinforcement learning for attitude stabilization of vertical takeoff and landing vehicle

Yanling LI1, Feizhou LUO2, Zhilei GE1,*   

  1. 1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China
    2. China Academy of Launch Vehicle Technology, Beijing 100076, China
  • Received:2023-02-17 Online:2024-02-29 Published:2024-03-08
  • Contact: Zhilei GE

摘要:

针对考虑弹性振动、模型不确定干扰下的垂直起降运载器姿态稳定问题, 将鲁棒观测器和深度强化学习中的近端策略优化算法相结合, 研究了一种基于鲁棒观测器的近端策略优化(robust observer-based proximal policy optimization, ROB-PPO)方法。该方法设计鲁棒观测器重构受弹性振动干扰的运载器姿态信息, 将鲁棒观测器与运载器动力学模型组成环境, 将鲁棒观测器得到的重构姿态作为深度强化学习算法的状态, 使得深度强化学习智能体与之不断交互, 从而训练智能体控制运载器姿态稳定。仿真结果表明, 所研究的ROB-PPO算法相较于目前常用的自适应模糊比例-积分-微分(proportional-integral-derivative, PID)算法鲁棒性更强, 收敛速度更快。最后, 在自主研制的垂直起降运载器上验证了所提出算法有效性。

关键词: 垂直起降运载器, 姿态控制, 鲁棒观测器, 深度强化学习

Abstract:

A robust observer-based proximal policy optimization (ROB-PPO) control method, which combines a robust observer and a proximal policy optimization in the deep reinforcement learning algorithm, is studied for the attitude stabilization problem of vertical takeoff and landing vehicles under the consideration of elastic vibration and model uncertainty disturbance. The method designs the robust observer to reconstruct the carrier attitude information disturbed by elastic vibration, composes the environment of the robust observer and the carrier dynamics model, and takes the reconstructed attitude obtained by the robust observer as the state of the deep reinforcement learning algorithm, so that the deep reinforcement learning intelligent body continuously interacts with it, thus training the intelligent body to control the carrier attitude stabilization. The simulation results show that the studied ROB-PPO algorithm is more robust and converges faster than the adaptive fuzzy proportional-integral-derivative (PID) algorithm commonly used today. Finally, the effectiveness of the proposed algorithm is verified on a self-developed vertical takeoff and landing vehicle.

Key words: vertical takeoff and landing vehicle, attitude control, robust observer, deep reinforcement learning

中图分类号: