系统工程与电子技术 ›› 2026, Vol. 48 ›› Issue (5): 1590-1598.doi: 10.12305/j.issn.1001-506X.2026.05.15

• 系统工程 • 上一篇    下一篇

基于多策略学习的航天器在轨观测机动决策方法

贾振帅, 肖冰, 钱寒雨, 张哲宇   

  1. 西北工业大学自动化学院,陕西 西安 710072
  • 收稿日期:2024-03-15 出版日期:2026-05-27 发布日期:2026-05-27
  • 通讯作者: 肖冰
  • 作者简介:贾振帅(2000—),男,硕士研究生,主要研究方向为航天器智能机动决策
    钱寒雨(1997—),男,博士研究生,主要研究方向为集群博弈规划与控制
    张哲宇(2000—),男,硕士研究生,主要研究方向为卫星编队控制

Spacecraft on-orbit observation maneuver decision-making method based on multi-policy learning

Zhenshuai JIA, Bing XIAO, Hanyu QIAN, Zheyu ZHANG   

  1. School of Automation,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2024-03-15 Online:2026-05-27 Published:2026-05-27
  • Contact: Bing XIAO

摘要:

针对航天器抵近空间目标执行在轨观测服务的机动决策问题,提出一种基于深度强化学习的航天器多阶段策略机动决策方法。首先,将在轨观测任务划分成“目标抵近-观测准备-持续观测”3个阶段,建立多阶段任务模型及约束集合,提高任务可解性。其次,提出一种多阶段策略学习算法,构建多阶段训练环境和任务奖励函数,并融合预测制导和规则耦合机动引导机制,提升算法探索能力和收敛稳定性。最后,仿真表明该算法对比经典强化学习算法收敛时间缩短30.9%,平均任务累计奖励提升9.28%,平均脉冲耗量降低13.91%,且相较于传统优化方法可有效提升任务核心指标,验证该方法的有效性。

关键词: 航天器机动, 在轨观测, 智能决策, 深度强化学习

Abstract:

A spacecraft multi-stage maneuver decision-making method based on deep reinforcement learning is proposed to address the maneuver decision problem for spacecraft approaching space targets during on-orbit observation service. Firstly, the on-orbit observation task is divided into target approach-observation preparation-continuous observation three stages, establishing the multi-stage task model and constraint set to enhance task solvability. Secondly, the multi-stage policy learning algorithm is proposed, constructing the multi-stage training environment and task reward function, integrating predictive guidance and rule-coupled maneuver guidance mechanisms to enhance algorithm exploration capability and convergence stability. Finally, simulations demonstrate that compared to classical reinforcement learning algorithms, this algorithm reduces convergence time by 30.9%, increases average task cumulative reward by 9.28%, and decreases average pulse consumption by 13.91%. Moreover, compared to the traditional optimization method, it effectively enhances core task indicators, validating its effectiveness.

Key words: spacecraft maneuvering, on-orbit observation, intelligent decision-making, deep reinforcement learning

中图分类号: