系统工程与电子技术 ›› 2026, Vol. 48 ›› Issue (4): 1404-1412.doi: 10.12305/j.issn.1001-506X.2026.04.29

• 制导、导航与控制 • 上一篇    

面向多航天器协作围捕的智能决策方法

陈丹鹤1,*, 王书航1, 刘志勇2, 王创歌1   

  1. 1. 南京理工大学机械工程学院特种动力技术教育部重点实验室,江苏 南京 210094
    2. 北京空间飞行器总体设计部,北京 100094
  • 收稿日期:2025-03-24 修回日期:2025-07-03 出版日期:2025-11-06 发布日期:2025-11-06
  • 通讯作者: 陈丹鹤
  • 作者简介:王书航(2000—),男,硕士研究生,主要研究方向为无人系统、强化学习
    刘志勇(1980—),男,研究员,硕士,主要研究方向为航天器总体、空间碎片监测与防护
    王创歌(1996—),男,博士研究生,主要研究方向为航天器近距离机动控制、航天器机动博弈
  • 基金资助:
    空间智能控制技术重点实验室稳定支持基金(HTKJ2023KL502009)资助课题

Intelligent decision-making methods for collaborative roundup by multi-spacecraft

Danhe CHEN1,*, Shuhang WANG1, Zhiyong LIU2, Chuangge WANG1   

  1. 1. Key Laboratory of Special Engine Technology,Ministry of Education,School of Mechanical Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
    2. Institute of Spacecraft System Engineering,Beijing 100094,China
  • Received:2025-03-24 Revised:2025-07-03 Online:2025-11-06 Published:2025-11-06
  • Contact: Danhe CHEN

摘要:

面对多航天器智能协作围捕逃逸目标的空间复杂任务,提出基于多智能体双延迟深度确定性策略梯度(multi-agent twin-delayed deep deterministic policy gradient, MATD3)的智能协作围捕算法。首先建立多航天器协作围捕环境和相对轨道动力学模型,利用马尔可夫决策过程来描述空间目标围捕问题;其次为了改进围捕环境中高维度状态空间、连续动作空间,并解决多智能体航天器构型不稳定等问题,设计一种考虑围捕态势一致性的引导性奖励函数,使围捕星能够快速实现对逃逸星的稳定围捕;最后基于Gym框架搭建的多航天器协作围捕仿真环境进行集群博弈策略的训练优化,使各个航天器行为达到个体和团队双重最优决策目的。仿真结果表明,在100 m末端位置约束下,该算法能避免多航天器相互碰撞,并有效实现多航天器对目标的协作围捕,为未来空间航天器的智能自主操控提供参考。

关键词: 多智能体双延迟深度确定性策略梯度, 多航天器, 协作围捕, 围捕态势一致性, 策略优化

Abstract:

Facing complex tasks in space where multi-spacecraft collaborate intelligently to round up escaped targets, an intelligent collaborative roundup algorithm based on the multi-agent twin-delayed deep deterministic policy gradient (MATD3) is proposed. Firstly, a multi-spacecraft collaborative roundup environment and relative orbital dynamics model is established, and the Markov decision process is utilized to describe the roundup problem of the space target. Secondly, in order to improve the high-dimensional state space, continuous action space, and solve the problems of unstable configuration of multi-intelligence spacecraft in a roundup environment, a leading reward function that takes into account the consistency of the roundup posture is designed so that the roundup satellites can quickly achieve a stable roundup of the escaping satellites. Finally, the simulation environment based on the Gym framework is used for the training and optimization of the swarm gaming strategy, so that the behaviors of each spacecraft can achieve the dual optimal decision-making purpose of individual and team. Simulation results show that the algorithm can avoid collision of multi-spacecraft under the 100 m end-position constraint and effectively realize the collaborative roundup of targets by multi-spacecraft, which provides a reference for the intelligent and autonomous maneuvering of spacecraft in the future.

Key words: multi-agent twin-delayed deep deterministic policy gradient (MATD3), multi-spacecraft, collaborative roundup, roundup posture consistency, strategy optimization

中图分类号: