面向多航天器协作围捕的智能决策方法

doi:10.12305/j.issn.1001-506X.2026.04.29

系统工程与电子技术 ›› 2026, Vol. 48 ›› Issue (4): 1404-1412.doi: 10.12305/j.issn.1001-506X.2026.04.29

• 制导、导航与控制 • 上一篇

面向多航天器协作围捕的智能决策方法

陈丹鹤¹^,*, 王书航¹, 刘志勇², 王创歌¹

1. 南京理工大学机械工程学院特种动力技术教育部重点实验室，江苏南京 210094
2. 北京空间飞行器总体设计部，北京 100094

收稿日期:2025-03-24 修回日期:2025-07-03 出版日期:2025-11-06 发布日期:2025-11-06
通讯作者: 陈丹鹤
作者简介:王书航（2000—），男，硕士研究生，主要研究方向为无人系统、强化学习
刘志勇（1980—），男，研究员，硕士，主要研究方向为航天器总体、空间碎片监测与防护
王创歌（1996—），男，博士研究生，主要研究方向为航天器近距离机动控制、航天器机动博弈
基金资助:
空间智能控制技术重点实验室稳定支持基金（HTKJ2023KL502009）资助课题

Intelligent decision-making methods for collaborative roundup by multi-spacecraft

Danhe CHEN¹^,*, Shuhang WANG¹, Zhiyong LIU², Chuangge WANG¹

1. Key Laboratory of Special Engine Technology，Ministry of Education，School of Mechanical Engineering，Nanjing University of Science and Technology，Nanjing 210094，China
2. Institute of Spacecraft System Engineering，Beijing 100094，China

Received:2025-03-24 Revised:2025-07-03 Online:2025-11-06 Published:2025-11-06
Contact: Danhe CHEN

摘要/Abstract

摘要：

面对多航天器智能协作围捕逃逸目标的空间复杂任务，提出基于多智能体双延迟深度确定性策略梯度（multi-agent twin-delayed deep deterministic policy gradient, MATD3）的智能协作围捕算法。首先建立多航天器协作围捕环境和相对轨道动力学模型，利用马尔可夫决策过程来描述空间目标围捕问题；其次为了改进围捕环境中高维度状态空间、连续动作空间，并解决多智能体航天器构型不稳定等问题，设计一种考虑围捕态势一致性的引导性奖励函数，使围捕星能够快速实现对逃逸星的稳定围捕；最后基于Gym框架搭建的多航天器协作围捕仿真环境进行集群博弈策略的训练优化，使各个航天器行为达到个体和团队双重最优决策目的。仿真结果表明，在100 m末端位置约束下，该算法能避免多航天器相互碰撞，并有效实现多航天器对目标的协作围捕，为未来空间航天器的智能自主操控提供参考。

关键词: 多智能体双延迟深度确定性策略梯度, 多航天器, 协作围捕, 围捕态势一致性, 策略优化

Abstract:

Facing complex tasks in space where multi-spacecraft collaborate intelligently to round up escaped targets, an intelligent collaborative roundup algorithm based on the multi-agent twin-delayed deep deterministic policy gradient （MATD3） is proposed. Firstly, a multi-spacecraft collaborative roundup environment and relative orbital dynamics model is established, and the Markov decision process is utilized to describe the roundup problem of the space target. Secondly, in order to improve the high-dimensional state space, continuous action space, and solve the problems of unstable configuration of multi-intelligence spacecraft in a roundup environment, a leading reward function that takes into account the consistency of the roundup posture is designed so that the roundup satellites can quickly achieve a stable roundup of the escaping satellites. Finally, the simulation environment based on the Gym framework is used for the training and optimization of the swarm gaming strategy, so that the behaviors of each spacecraft can achieve the dual optimal decision-making purpose of individual and team. Simulation results show that the algorithm can avoid collision of multi-spacecraft under the 100 m end-position constraint and effectively realize the collaborative roundup of targets by multi-spacecraft, which provides a reference for the intelligent and autonomous maneuvering of spacecraft in the future.

Key words: multi-agent twin-delayed deep deterministic policy gradient (MATD3), multi-spacecraft, collaborative roundup, roundup posture consistency, strategy optimization

中图分类号:

V 448.2

陈丹鹤, 王书航, 刘志勇, 王创歌. 面向多航天器协作围捕的智能决策方法[J]. 系统工程与电子技术, 2026, 48(4): 1404-1412.

Danhe CHEN, Shuhang WANG, Zhiyong LIU, Chuangge WANG. Intelligent decision-making methods for collaborative roundup by multi-spacecraft[J]. Systems Engineering and Electronics, 2026, 48(4): 1404-1412.

图/表 11

图1

图2

图3

图4

表1

表2

图5

图6

图7

图8

图9

参考文献 30

1	李传江, 闫慧达, 郭延宁, 等. 混合空间目标下的多航天器抵近观测任务规划[J]. 宇航学报, 2023, 44 (12): 1871- 1882. doi: 10.3873/j.issn.1000-1328.2023.12.009
	LI C J, YAN H D, GUO Y N, et al. Mission planning for multiple spacecraft inspection of mixed space targets in proximity[J]. Journal of Astronautics, 2023, 44 (12): 1871- 1882. doi: 10.3873/j.issn.1000-1328.2023.12.009
2	BHARDWAJ A, BHATTA S, TSUKAMOTO H. Information optimal multi spacecraft positioning for interstellar object exploration[EB/OL]. [2024−11−14]. https://doi.org/10.48550/arXiv.2411.09110.
3	WU C Y, HAN S, CHEN Q, et al. Enhancing LEO mega-constellations with inter-satellite links: vision and challenges [EB/OL]. [2024−06−07]. https://doi.org/10.48550/arXiv.2406.05078.
4	WANG N, SUN L Y, FANG Y K, et al. Assignment of hybrid laser and microwave inter-satellite links for navigation satellite systems[J]. Scientific Reports, 2025, 15, 11374. doi: 10.1038/s41598-025-95869-z
5	ASRI E G, ZHU Z H. An introductory review of swarm technology for spacecraft on-orbit servicing[J]. International Journal of Mechanical System Dynamics, 2024, 4 (1): 3- 21.
6	ZHAO Y, WU P L, YU H L, et al. Task allocation for space debris removal based on improved particle swarm optimization algorithm[C]//Proc. of the International Conference on Cyber-Physical Social Intelligence, 2023: 399–404.
7	LIU Z H, LIN C W, CHEN G. Space attack technology overview[J]. Journal of Physics: Conference Series, 2020, 1544 (1): 012178. doi: 10.1088/1742-6596/1544/1/012178
8	ZHAO Y K, HUANG P F, ZHANG F. Capture dynamic sand net closing control for tethered space net robot[J]. Journal of Guidance, Control, and Dynamics, 2019, 42 (1): 199- 208.
9	JIAO C T J, ZHANG L, SU X J, et al. Predictive motion control for autonomous capture of a tumbling target with a space manipulator[J]. Journal of the Franklin Institute, 2022, 359 (15): 7913- 7935. doi: 10.1016/j.jfranklin.2022.08.012
10	AGLIETTI G S, TAYLOR B, FELLOWES S. et al. The active space debris removal mission remove debris. Part 2: in orbit operations[J]. Acta Astronautica, 2020, 168, 310- 322. doi: 10.1016/j.actaastro.2019.09.001
11	HAN H Y, DANG Z H. Optimal delta-V-based strategies inorbital pursuit-evasion games[J]. Advances in Space Research, 2023, 72 (2): 243- 256. doi: 10.1016/j.asr.2023.03.028
12	ISAACS R. Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization[M]. New York: Wiley, 1965.
13	WEINTRAUB I E, PACHTER M, GARCIA E. An introduction to pursuit-evasion differential games[C]//Proc. of the American Control Conference, 2020: 1049–1066.
14	杨傅云翔, 杨乐平, 朱彦伟, 等. 航天器轨道追逃态势分析的水平集方法[J]. 国防科技大学学报, 2024, 46 (3): 30- 38. doi: 10.11887/j.cn.202403004
	YANG F Y X, YANG L P, ZHU Y W, et al. Situation analysis method based on level set for spacecraft pursuit-evasion game[J]. Journal of National University of Defense Technology, 2024, 46 (3): 30- 38. doi: 10.11887/j.cn.202403004
15	张秋华, 孙松涛, 谌颖, 等. 时间固定的两航天器追逃策略及数值求解[J]. 宇航学报, 2014, 35 (5): 537- 544.
	ZHANG Q H, SUN S T, CHEN Y, et al. Strategy and numerical solution of pursuit evasion with fixed duration for two Spacecraft[J]. Journal of astronautics, 2014, 35 (5): 537- 544.
16	赵琳, 周俊峰, 刘源, 等. 三维空间“追-逃-防”三方微分对策方法[J]. 系统工程与电子技术, 2019, 41 (2): 322- 335. doi: 10.3969/j.issn.1001-506X.2019.02.14
	ZHAO L, ZHOU J F, LIU Y, et al. Three body differential game approach of pursuit evasion defense in three dimensional space[J]. Systems Engineering and Electronics, 2019, 41 (2): 322- 335. doi: 10.3969/j.issn.1001-506X.2019.02.14
17	GARCIA E, FUCHS Z E, MILUTINOVIC D, et al. A geometric approach for the cooperative two-pursuer one-evader differential game[C]//Proc. of the International Federation of Automatic Control. 2017: 15209–15214.
18	LI Z, ZHU H, YANG Z, et al. A dimension reduction solution of free-time differential games for spacecraft pursuit-evasion[J]. Acta Astronautica, 2019, 163, 201- 210. doi: 10.1016/j.actaastro.2019.01.011
19	SARGENT L, COVERSTONE V, RODRIGUEZ N, et al. Toward reinforcement learning identification for swarms engaged in cooperative pursuit[C]//Proc. of the AIAA Science and technology Forum and Exposition, 2022: 2501.
20	CHEN V, PHILLIPS S A, COPP D A, Planning autonomous spacecraft rendezvous and docking trajectories via reinforcement learning[C]//Proc. of the AAS Guidance, Navigation and Control Conference, 2023.
21	ZHAO L, ZHANG Y, DANG Z. PRD-MADDPG: an efficient learning-based algorithm for orbital pursuit-evasion game with impulsive maneuvers[J], Advances in Space Research, 2023, 72(2): 211–230.
22	LUO Y L, JIANG X Q, ZHOU C, et al. Swarm-to-swarm orbital pursuit method under delta-v maneuver for space pursuit-evasion[J]. Acta Astronautica, 2024, 223, 702- 722. doi: 10.1016/j.actaastro.2024.07.048
23	周贞文, 邵将, 徐扬, 等. 针对逃逸目标的多机协同围捕策略研究[J]. 空军工程大学学报（自然科学版）, 2021, 22 (3): 2- 8.
	ZHOU Z W, SHAO J, XU Y, et al. Research on multi-UAV cooperative roundup strategy for escape targets[J]. Journal of Aeronautical Engineering University （Natural Science Edition）, 2021, 22 (3): 2- 8.
24	LI B, WANG J M, SONG C, et al. Multi-UAV roundup strategy method based on deep reinforcement learning CEL-MADDPG algorithm[J]. Expert Systems with Applications, 2024, 245, 123018.
25	FU X W, WANG H, XU Z. Research on cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42, 325311.
26	许旭升, 党朝辉, 宋斌, 等. 基于多智能体强化学习的轨道追逃博弈方法[J]. 上海航天（中英文）, 2022, 39 (2): 24- 31.
	XU X S, DANG Z H, SONG B, et al. Method for cluster satellite orbit pursuit-evasion game based on multi-agent deep deterministic policy gradient algorithm[J]. Aerospace Shanghai （Chinese & English）, 2022, 39 (2): 24- 31.
27	耿远卓, 袁利, 黄煌, 等. 基于终端诱导强化学习的航天器轨道追逃博弈[J]. 自动化学报, 2023, 49 (5): 974- 984.
	GENG Y Z, YUAN L, HUANG H, et al. Terminal guidance based reinforcement-learning for orbital pursuit-evasion game of the spacecraft[J]. Acta Automatica Sinica, 2023, 49 (5): 974- 984.
28	迟进梓, 余红英, 张子雄. 一种小推力航天器变轨优化方法[J]. 航天控制, 2021, 39 (2): 3- 10.
	CHI J Z, YU H Y, ZHANG Z X. Research on orbit flight planning method of small thrust spacecraft[J]. Aerospace Control, 2021, 39 (2): 3- 10.
29	FUJIMOTO S, HOOF H V, MEGER D. Addressing function approximation error in actor-critic methods[C]//Proc. of the 35th International Conference on Machine Learning, 2018: 1587–1596.
30	范书珲, 廖文和, 张翔, 等. 基于深度强化学习的双星近距离追逃博弈控制方法[J]. 中国惯性技术学报, 2024, 32(12): 1240–1249.
	FAN S H, LIAO W H, ZHANG X, et al. Close-range pursuit-evasion game control method of dual satellite based on deep reinforcement[J], 2024, 32(12): 1240–1249.

参数属性	仿真智能体
参数属性	追捕星	目标星
数量	3	1
初始位置	距目标星1~1.5 km 圆内随机	坐标系原点
x、y轴方向最大推力加速度/（${\text{m}} \cdot {{\text{s}}^{{{ - 2}}}}$）	0.05	0.03
控制周期/s	1	1
捕获半径$ {d_{{\text{capture}}}} $/m	100	—
避碰安全阈值/m	$ {d_{{\text{ppsafe}}}} $=20	$ {d_{{\text{pesafe}}}} $=30

参数	数值
最大训练步数	3×10⁶
每回合最大训练步数	800
经验回放池大小	5×10⁵
批尺寸	500
折扣因子	0.985
Actor网络学习率	0.0001
Critic网络学习率	0.0001
探索随机噪声	0.015
软更新参数	0.05
延迟策略更新频率（MATD3）	2

[1]	王旭, 蔡光斌, 余晓亚, 叶子绮, 单斌. 基于双动态PPO算法的高超声速飞行器姿态控制[J]. 系统工程与电子技术, 2026, 48(2): 694-704.
[2]	张兰, 张彪, 梁天一, 朱辉杰. 面向电磁信息智能控制的生成对抗网络研究进展[J]. 系统工程与电子技术, 2025, 47(3): 730-744.
[3]	胡洋, 刘学超, 李化义, 曹芊. 多星姿态协同中的几何鲁棒控制[J]. 系统工程与电子技术, 2024, 46(9): 3118-3127.
[4]	秦湖程, 黄炎焱, 陈天德, 张寒. 基于PPO算法的集群多目标火力规划方法[J]. 系统工程与电子技术, 2024, 46(11): 3764-3773.
[5]	范大伟, 蔡伟伟, 杨乐平, 张润德. 基于多视线融合的高轨近场感知队形设计[J]. 系统工程与电子技术, 2023, 45(12): 3984-3994.
[6]	韩明仁, 王玉峰. 基于强化学习的全电推进卫星变轨优化方法[J]. 系统工程与电子技术, 2022, 44(5): 1652-1661.
[7]	陈闯, 陆宁云, 姜斌, 邢尹. 单部件加速退化系统的视情维修策略优化[J]. 系统工程与电子技术, 2020, 42(3): 613-619.
[8]	董朝阳, 马鸣宇, 王青, 周敏. 含有通信时滞的多航天器SO(3)姿态协同控制[J]. 系统工程与电子技术, 2018, 40(9): 2032-2039.

面向多航天器协作围捕的智能决策方法

Intelligent decision-making methods for collaborative roundup by multi-spacecraft

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 30

相关文章 8

编辑推荐

Metrics

本文评价