系统工程与电子技术 ›› 2026, Vol. 48 ›› Issue (2): 694-704.doi: 10.12305/j.issn.1001-506X.2026.02.29

• 制导、导航与控制 • 上一篇    下一篇

基于双动态PPO算法的高超声速飞行器姿态控制

王旭, 蔡光斌, 余晓亚, 叶子绮, 单斌   

  1. 火箭军工程大学导弹工程学院,陕西 西安 710025
  • 收稿日期:2025-01-15 修回日期:2025-03-06 出版日期:2025-06-10 发布日期:2025-06-10
  • 通讯作者: 蔡光斌
  • 作者简介:王 旭(1998—),男,硕士研究生,主要研究方向为飞行器智能控制、人工智能
    余晓亚(1990—),女,主要研究方向为人工智能
    叶子绮(2001—),女,硕士研究生,主要研究方向为飞行器智能控制、容错控制
    单 斌(1974—),男,副教授,博士,主要研究方向为飞行器制导与控制
  • 基金资助:
    国家自然科学基金面上项目(62473374);青年科学基金(62403487);叶企孙科学基金(U2441243)资助课题

Attitude control of hypersonic vehicle based on dual-dynamic PPO algorithm

Xu WANG, Guangbin CAI, Xiaoya YU, Ziqi YE, Bin SHAN   

  1. School of Missile Engineering,Rocket Force Engineering University,Xi’an 710025,China
  • Received:2025-01-15 Revised:2025-03-06 Online:2025-06-10 Published:2025-06-10
  • Contact: Guangbin CAI

摘要:

针对高超声速飞行器姿态控制中的强非线性和大不确定性特点,以及传统强化学习算法在多重控制需求下训练收敛性和控制精度的不足,提出一种双动态自适应近端策略优化(proximal policy optimization, PPO)算法。算法通过软动态裁剪机制和策略驱动的熵调整机制,实现控制精度与执行机构保护的平衡,并在此基础上构建了集成气动特性和执行机构特性的综合仿真验证环境。结合比例-积分-微分控制思想,对状态观测空间进行了优化设计。仿真结果表明,与基准PPO算法相比,所提算法的收敛速度提升了22%,并显著改善了控制精度和动作平滑性。在不同飞行工况下,该方法展现出优异的策略适应性和鲁棒性,有效提升了飞行器的姿态控制性能。

关键词: 高超声速飞行器, 动态自适应机制, 智能控制, 深度强化学习, 近端策略优化

Abstract:

To address the strong nonlinearities and significant uncertainties in hypersonic vehicle attitude control, as well as the limitations of traditional reinforcement learning algorithms in training convergence and control accuracy under multiple control requirements, a dual-dynamic adaptive proximal policy optimization (PPO) algorithm is proposed. The algorithm balances control precision and actuator protection through a soft dynamic clipping mechanism and a policy-driven entropy adjustment strategy. An integrated simulation environment incorporating aerodynamic characteristics and actuator dynamic characteristics is subsequently established. By integrating proportional-integral-derivative control principles, the state observation space is optimally redesigned. Simulation results demonstrate that compared with the baseline PPO algorithm, the proposed method improves convergence speed by 22% while significantly enhancing both control accuracy and action smoothness. Under different flight conditions, this method exhibits excellent strategic adaptability and robustness, and effectively improves the attitude control performance of the hypersonic vehicle.

Key words: hypersonic vehicle, intelligent control, deep reinforcement learning, proximal policy optimization (PPO), dynamic adaptive mechanism

中图分类号: