基于多策略学习的航天器在轨观测机动决策方法

doi:10.12305/j.issn.1001-506X.2026.05.15

摘要/Abstract

摘要：

针对航天器抵近空间目标执行在轨观测服务的机动决策问题，提出一种基于深度强化学习的航天器多阶段策略机动决策方法。首先，将在轨观测任务划分成“目标抵近-观测准备-持续观测”3个阶段，建立多阶段任务模型及约束集合，提高任务可解性。其次，提出一种多阶段策略学习算法，构建多阶段训练环境和任务奖励函数，并融合预测制导和规则耦合机动引导机制，提升算法探索能力和收敛稳定性。最后，仿真表明该算法对比经典强化学习算法收敛时间缩短30.9%，平均任务累计奖励提升9.28%，平均脉冲耗量降低13.91%，且相较于传统优化方法可有效提升任务核心指标，验证该方法的有效性。

关键词: 航天器机动, 在轨观测, 智能决策, 深度强化学习

Abstract:

A spacecraft multi-stage maneuver decision-making method based on deep reinforcement learning is proposed to address the maneuver decision problem for spacecraft approaching space targets during on-orbit observation service. Firstly, the on-orbit observation task is divided into target approach-observation preparation-continuous observation three stages, establishing the multi-stage task model and constraint set to enhance task solvability. Secondly, the multi-stage policy learning algorithm is proposed, constructing the multi-stage training environment and task reward function, integrating predictive guidance and rule-coupled maneuver guidance mechanisms to enhance algorithm exploration capability and convergence stability. Finally, simulations demonstrate that compared to classical reinforcement learning algorithms, this algorithm reduces convergence time by 30.9%, increases average task cumulative reward by 9.28%, and decreases average pulse consumption by 13.91%. Moreover, compared to the traditional optimization method, it effectively enhances core task indicators, validating its effectiveness.

Key words: spacecraft maneuvering, on-orbit observation, intelligent decision-making, deep reinforcement learning

中图分类号:

V 412.4

贾振帅, 肖冰, 钱寒雨, 张哲宇. 基于多策略学习的航天器在轨观测机动决策方法[J]. 系统工程与电子技术, 2026, 48(5): 1590-1598.

Zhenshuai JIA, Bing XIAO, Hanyu QIAN, Zheyu ZHANG. Spacecraft on-orbit observation maneuver decision-making method based on multi-policy learning[J]. Systems Engineering and Electronics, 2026, 48(5): 1590-1598.

图/表 13

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

图12

图13

参考文献 30

1	ABAD A F, MA O, PHAM K, et al. A review of space robotics technologies for on-orbit servicing[J]. Progress in Aerospace Sciences, 2014, 68, 1- 26. doi: 10.1016/j.paerosci.2014.03.002
2	LI W J, CHENG D Y, LIU X G, et al. On-orbit service（OOS）of spacecraft: a review of engineering developments[J]. Progress in Aerospace Sciences, 2019, 108, 32- 120. doi: 10.1016/j.paerosci.2019.01.004
3	周雅兰, 郭延宁, 李文龙, 等. 空间可修系统的维修性分析、评价与验证技术[J]. 系统工程与电子技术, 2019, 41 (11): 2647- 2655.
	ZHOU Y L, GUO Y N, LI W L, et al. Maintainability analysis, evaluation and verification technology for space maintain-able systems[J]. Systems Engineering and Electronics, 2019, 41 (11): 2647- 2655.
4	HATTY I. Viability of on-orbit servicing spacecraft to prolong the operational life of satellites[J]. Journal of Space Safety Engineering, 2022, 9 (2): 263- 268. doi: 10.1016/j.jsse.2022.02.011
5	FAGHIHI S, TAVANA S, ANTON H J. Optimal pose design for close-proximity on-orbit inspection[J]. Journal of Guidance, Control, and Dynamics, 2024, 47 (4): 609- 622.
6	ZHANG H T, LI Z, WANG W L, et al. Trajectory planning for optical satellite’s continuous surveillance of geostationary spacecraft[J]. IEEE Access, 2021, 9, 14282- 14293.
7	王涵巍, 张嘉城, 朱阅訸. 异构编队卫星近距离操作轨迹规划方法[J]. 系统工程与电子技术, 2024, 46 (3): 1048- 1057.
	WANG H W, ZHANG J C, ZHU Y H. A trajectory planning method for proximity operations of heterogeneous formation satellites[J]. Systems Engineering and Electronics, 2024, 46 (3): 1048- 1057.
8	WANG J X, CHEN R, CHEN Z J, et al. Trajectory planning for complex shaped spacecraft proximity based on critical safety curve and disturbed fluid[J]. IEEE Trans. on Aerospace and Electronic Systems, 2023, 59 (5): 5930- 5942.
9	CHEN R, DONG M, BAI Y Z, et al. Trajectory planning and control of spacecraft avoiding dynamic debris swarm[J]. Aerospace Science and Technology, 2024, 151 (4): 109- 121.
10	LIANG W K, ZHI H, HAN P, et al. GEO satellite on-orbit refueling and debris removal hybrid mission planning under uncertainty[J]. Advances in Space Research, 2024, 74 (5): 2376- 2387. doi: 10.1016/j.asr.2024.05.059
11	HE H Q, SHI P, ZHAO Y S. Adaptive connected hierarchical optimization algorithm for minimum energy spacecraft attitude maneuver path planning[J]. Astro-dynamics, 2023, 7 (2): 197- 209.
12	文启翟, 康志宇, 卫国宁, 等. 子母型航天器抵近观测任务流程规划方法研究[J]. 系统工程与电子技术, 2023, 45 (12): 3941- 3948.
	WEN Q D, KANG Z Y, WEI G N, et al. Research on the approach observation mission flow planning method of parent-child spacecraft[J]. Systems Engineering and Electronics, 2023, 45 (12): 3941- 3948.
13	李君龙, 李松洲, 周荻. 一种多约束条件下的三脉冲交会优化设计方法[J]. 系统工程与电子技术, 2022, 44 (8): 2612- 2620. doi: 10.12305/j.issn.1001-506X.2022.08.26
	LI J L, LI S Z, ZHOU D. Optimization method for three-impulse rendezvous under multi-constraints[J]. Systems Engineering and Electronics, 2022, 44 (8): 2612- 2620. doi: 10.12305/j.issn.1001-506X.2022.08.26
14	XUE W H, WANG B C, HUANG X X, et al. Spacecraft attitude maneuver planning with multi-sensor pointing constraints using improved RRT-star algorithm[J]. Advances in Space Research, 2023, 72 (5): 1485- 1495. doi: 10.1016/j.asr.2023.04.024
15	HE H Q, SHI P, ZHAO Y S. Hierarchical optimization algorithm and applications of spacecraft trajectory optimization[J]. Aero Space, 2022, 9 (2): 81- 113.
16	ZHANG G X, WEN C X, HAN H W, et al. Aerocapture trajec-tory planning using hierarchical differential dynamic program-ming[J]. Journal of Spacecraft and Rockets, 2022, 59 (5): 1647- 1659. doi: 10.2514/1.A35264
17	NAKKA Y K, HONIG W, CHOI C, et al. Information-based guidance and control architecture for multi-spacecraft on-orbit inspection[J]. Journal of Guidance, Control, and Dynamics, 2022, 45 (7): 1184- 1201.
18	HOVELL K, ULRICH S. Deep reinforcement learning for spacecraft proximity operations guidance[J]. Journal of Spacecraft and Rockets, 2021, 58 (2): 254- 264. doi: 10.2514/1.A34838
19	JIANG R, YE D, XIAO Y, et al. Orbital interception pursuit strategy for random evasion using deep reinforcement learning[J]. Space: Science & Technology, 2023, 3 (5): 2692- 2695.
20	CHENG L, WANG Z B, JIANG F H, et al. Real time optimal control for spacecraft orbit transfer via multiscale deep neural networks[J]. IEEE Trans. on Aerospace and Electronic Systems, 2018, 55 (5): 2436- 2450.
21	ZHAO Y J, YANG H W, LI S. Real-time trajectory optimization for collision-free asteroid landing based on deep neural networks[J]. Advances in Space Research, 2022, 70 (1): 112- 124. doi: 10.1016/j.asr.2022.04.006
22	TIPALDI M, IERVOLINO R, MASSENIO P R. Reinforcement learning in spacecraft control applications: advances, prospects, and challenges[J]. Annual Reviews in Control, 2022, 54, 1- 23. doi: 10.1016/j.arcontrol.2022.07.004
23	WU J F, WEI C L, ZHANG H B, et al. Learning-based spacecraft reactive anti-hostile-rendezvous maneuver control in complex space environments[J]. Advances in Space Research, 2023, 72 (10): 4531- 4552. doi: 10.1016/j.asr.2023.08.043
24	ANDREA B, CAPRA L, LAVAGNA M. Deep reinforcement learning spacecraft guidance with state uncertainty for autonomous shape reconstruction of uncooperative target[J]. Advances in Space Research, 2024, 73 (11): 5741- 5755. doi: 10.1016/j.asr.2023.07.007
25	WU J F, WEI C L, ZHANG H B, et al. Learning-based spacecraft multi-constraint rapid trajectory planning for emergency collision avoidance[J]. Aerospace Science and Technology, 2024, 149 (3): 189- 212.
26	QU Q Y, LIU K X, WANG W, et al. Spacecraft proximity maneuvering and rendezvous with collision avoidance based on reinforcement learning[J]. IEEE Trans. on Aerospace and Electronic Systems, 2022, 58 (6): 5823- 5834. doi: 10.1109/TAES.2022.3180271
27	孙雷翔, 郭延宁, 邓武东, 等. 一种超参数自适应航天器交会变轨策略优化方法[J]. 宇航学报, 2024, 45 (1): 52- 62. doi: 10.3873/j.issn.1000-1328.2024.01.006
	SUN L X, GUO Y N, DENG W D, et al. An adaptive hyperparameter strategy optimiz-ation method for spacecraft rendezvous and orbital transfer[J]. Journal of Astronautics, 2024, 45 (1): 52- 62. doi: 10.3873/j.issn.1000-1328.2024.01.006
28	KANTA P S, INDRADEEP K, PAVITAR P S, et al. Advancing spacecraft rendezvous and docking through safety reinforcement learning and ubiquitous learning principles[J]. Computers in Human Behavior, 2024, 153 (3): 358- 391.
29	LORENZO C, ANDREA B, MICHELE L. Network architecture and action space analysis for deep reinforcement learning towards spacecraft autonomous guidance[J]. Advances in Space Research, 2023, 71 (9): 3787- 3802. doi: 10.1016/j.asr.2022.11.048
30	GENG Y Z, YUAN L, GUO Y N, et al. Impulsive guidance of optimal pursuit with conical imaging zone for the evader[J]. Aerospace Science and Technology, 2023, 142 (11): 108- 126.

[1]	马赞, 刘禹彬, 白杰, 陈勇, 孙淑光. 基于分层STPA-MC的无人机智能避让适航安全风险评估[J]. 系统工程与电子技术, 2026, 48(6): 2000-2013.
[2]	王旭, 蔡光斌, 余晓亚, 叶子绮, 单斌. 基于双动态PPO算法的高超声速飞行器姿态控制[J]. 系统工程与电子技术, 2026, 48(2): 694-704.
[3]	薛锦妍, 张雅声, 陶雪峰, 杨茗棋, 赵帅龙. GEO航天器轨道机动控制研究进展[J]. 系统工程与电子技术, 2026, 48(1): 290-300.
[4]	宋传龙, 张倩武, 何健, 周文骏, 王辉, 孔巍巍, 田文波. 基于MADDPG算法的星地协同边缘计算任务卸载方法[J]. 系统工程与电子技术, 2026, 48(1): 350-360.
[5]	魏潇龙, 吴亚荣, 姚登凯, 赵顾颢. 基于深度强化学习的无人机空战机动分层决策算法[J]. 系统工程与电子技术, 2025, 47(9): 2993-3003.
[6]	王纪凯, 豆亚杰, 李婧, 董奕君, 姜江, 谭跃进. 智能决策在军事体系工程的研究综述[J]. 系统工程与电子技术, 2025, 47(8): 2581-2599.
[7]	朱运豆, 孙海权, 胡笑旋. 基于指针网络架构的多星协同成像任务规划方法[J]. 系统工程与电子技术, 2025, 47(7): 2246-2255.
[8]	孟麟芝, 孙小涓, 胡玉新, 高斌, 孙国庆, 牟文浩. 面向卫星在轨处理的强化学习任务调度算法[J]. 系统工程与电子技术, 2025, 47(6): 1917-1929.
[9]	郑康洁, 张新宇, 王伟菘, 刘震生. DQN与规则结合的智能船舶动态自主避障决策[J]. 系统工程与电子技术, 2025, 47(6): 1994-2001.
[10]	刘书含, 李彤, 李富强, 杨春刚. 意图态势双驱动的数据链抗干扰通信机制[J]. 系统工程与电子技术, 2025, 47(6): 2055-2064.
[11]	熊威, 张栋, 任智, 杨书恒. 面向有人/无人机协同打击的智能决策方法研究[J]. 系统工程与电子技术, 2025, 47(4): 1285-1299.
[12]	马鹏, 蒋睿, 王斌, 徐盟飞, 侯长波. 基于隐式对手建模的策略重构抗智能干扰方法[J]. 系统工程与电子技术, 2025, 47(4): 1355-1363.
[13]	唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841.
[14]	陈夏瑢, 李际超, 陈刚, 刘鹏, 姜江. 基于异质网络的装备体系组合发展规划问题[J]. 系统工程与电子技术, 2025, 47(3): 855-861.
[15]	刘洋, 孟凡一, 陈刚. 基于强化学习的变形飞行器抗扰补偿控制方法[J]. 系统工程与电子技术, 2025, 47(12): 4130-4142.