基于强化学习的全电推进卫星变轨优化方法

doi:10.12305/j.issn.1001-506X.2022.05.27

摘要/Abstract

摘要：

采用电推力器实现自主轨道转移是全电推进卫星领域的关键技术之一。针对地球同步轨道(geostationary orbit, GEO)全电推进卫星的轨道提升问题, 将广义优势估计(generalized advantage estimator, GAE)和近端策略优化(proximal policy optimization, PPO)方法相结合, 在考虑多种轨道摄动影响以及地球阴影约束的情况下, 提出了基于强化学习的时间最优小推力变轨策略优化方法。针对状态空间过大、奖励稀疏导致训练困难这一关键问题, 提出了动作输出映射和分层奖励等训练加速方法, 有效提升了训练效率, 加快了收敛速度。数值仿真和结果对比表明, 所提方法更加简单、灵活、高效, 与传统的直接法、间接法以及反馈控制法相比，能够保证轨道转移时间的最优性。

关键词: 全电推进卫星, 小推力变轨优化, 强化学习, 近端策略优化, 训练加速方法

Abstract:

Using electric thrusters for autonomous orbit transfer is one of the critical technologies in the field of all-electric propulsion satellites. In order to solve the orbit raising problem of all-electric propulsion geostationary orbit (GEO) satellites, a reinforcement learning-based optimization method for the time-optimal low-thrust orbit transfer strategy is formulated by combining generalized advantage estimator (GAE) and proximal policy optimization (PPO) methods, taking into account the influence of multiple orbital perturbations and the constraints of the earth's shadow. Aiming at the key problem of training difficulty caused by too large state space and sparse reward, training acceleration methods such as action output mapping and hierarchical reward are proposed, which effectively improve the training efficiency and accelerate the convergence speed. Through numerical simulation and comparison of the results with the direct method, the indirect method and the feedback control method, it shows that the optimization method based on reinforcement learning is more simple, flexible, efficient, and time-optimal in orbit transfer.

Key words: all-electric propulsion satellite, low-thrust orbit transfer optimization, reinforcement learning, proximal policy optimization (PPO), training acceleration method

中图分类号:

V412.4

韩明仁, 王玉峰. 基于强化学习的全电推进卫星变轨优化方法[J]. 系统工程与电子技术, 2022, 44(5): 1652-1661.

Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning[J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.

图/表 12

图1

图2

图3

图4

表1

图5

表2

图6

图7

图8

表3

表4

参考文献 43

1	周志成, 高军. 全电推进GEO卫星平台发展研究[J]. 航天器工程, 2015, 24 (2): 1- 6. doi: 10.3969/j.issn.1673-8748.2015.02.001
	ZHOU Z C , GAO J . Development approach to all-electric propulsion GEO satellite platform[J]. Spacecraft Engineering, 2015, 24 (2): 1- 6. doi: 10.3969/j.issn.1673-8748.2015.02.001
2	段传辉, 任立新, 常雅杰, 等. 全电推进卫星轨道优化的推力同伦解法[J]. 中国空间科学技术, 2020, 40 (2): 42- 48.
	DUAN C H , REN L X , CHANG Y J , et al. All-electric propulsion satellite trajectory optimization by homotopic approach[J]. Chinese Space Science and Technology, 2020, 40 (2): 42- 48.
3	PETROPOULOS A E, SIMS J A. A review of some exact solutions to the planar equations of motion of a thrusting spacecraft[C]//Proc. of the 2nd International Symposium on Low-Thrust Trajectories, 2002.
4	MORANTE D , RIVO S M , SOLER M . A survey on low-thrust trajectory optimization approaches[J]. Aerospace, 2021, 8 (3): 88. doi: 10.3390/aerospace8030088
5	EDELBAUM T N . Propulsion requirements for controllable sa-tellites[J]. Journal of the American Rocket Society, 1961, 31 (8): 1079- 1089.
6	COLASURDO G, CASALINO L. Optimal low-thrust maneuvers in presence of earth shadow[C]//Proc. of the AIAA/AAS Astrodynamics Specialist Conference and Exhibit, 2004: 716-725.
7	RICCIARDI L A, VASILE M. Modhoc-multi objective direct hybrid optimal control[C]//Proc. of the 7th International Conference on Astrodynamics Tools and Techniques, 2018.
8	PRITCHETT R, HOWELL K, GREBOW D. Low-thrust transfer design based on collocation techniques: applications in the restricted three-body problem[C]//Proc. of the AAS/AIAA Astrodynamics Specialist Conference, 2017.
9	LOCOCHE S. OptElec: an optimisation software for low-thrust orbit transfer including satellite and operation constraints[C]//Proc. of the 7th International Conference on Astrodynamics Tools and Techniques, 2018.
10	MAZZINI L, CERRETO M. Theory and applications of optimal finite thrust orbital transfers[M]//GIOKGIO F, JANOS D P, ed. Modeling and optimization in space engineering. Cham, Switzerland: Springer, 2019: 233-269.
11	BASTANTE J C, PENARROYA P. Electro: a SW tool for the electric propulsion trajectory optimisation[C]//Proc. of the 7th International Conference on Astrodynamics Tools and Techniques, 2018.
12	MORANTE D , RIVO S M , SOLER M , et al. Hybrid multi-objective orbit-raising optimization with operational constraints[J]. Acta Astronautica, 2020, 175 (1): 447- 461. doi: 10.1016/j.actaastro.2020.05.022
13	SHANNON J L , OZIMEK M , ATCHISON J A , et al. Q-law aided direct trajectory optimization for the high-fidelity, many-revolution, low-thrust orbit transfer problem[J]. Advances in the Astronautical Sciences, 2019, 168, 781- 800.
14	LANTUKH D V, RANIERI C L, DIPRINZIO M D, et al. Enhanced Q-law Lyapunov control for low-thrust transfer and rendezvous design[C]//Proc. of the AAS/AIAA Astrodyna-mics Specialist Conference, 2017.
15	闫安, 陈章, 董朝阳, 等. 基于模糊强化学习的双轮机器人姿态平衡控制[J]. 系统工程与电子技术, 2021, 43 (4): 1036- 1043.
	YAN A , CHEN Z , DONG C Y , et al. Attitude balance control of two-wheeled robot based on fuzzy reinforcement learning[J]. Systems Engineering and Electronics, 2021, 43 (4): 1036- 1043.
16	RECHT B . A tour of reinforcement learning: the view from continuous control[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2019, 2 (1): 253- 279. doi: 10.1146/annurev-control-053018-023825
17	KOU P , LIANG D , WANG C , et al. Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks[J]. Applied Energy, 2020, 264, 114772. doi: 10.1016/j.apenergy.2020.114772
18	YOO H , KIM B , KIM J W , et al. Reinforcement learning based optimal control of batch processes using Monte Carlo deep deterministic policy gradient with phase segmentation[J]. Computers and Chemical Engineering, 2021, 144, 107133. doi: 10.1016/j.compchemeng.2020.107133
19	LI Y , QIU X H , LIU X D , et al. Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs[J]. Journal of Systems Engineering and Electronics, 2020, 31 (4): 734- 742. doi: 10.23919/JSEE.2020.000048
20	BERTSEKAS D P . Reinforcement learning and optimal control[M]. Belmont, MA: Athena Scientific, 2019.
21	YANAGIDA K, OZAKI N, FUNASE R. Exploration of long time-of-flight three-body transfers using deep reinforcement learning[C]//Proc. of the AIAA Scitech 2020 Forum, 2020: 0460.
22	MILLER D, LINARES R. Low-thrust optimal control via reinforcement learning[C]//Proc. of the 29th AAS/AIAA Space Flight Mechanics Meeting, 2019.
23	SULLIVAN C J, BOSANAC N. Using reinforcement learning to design a low-thrust approach into a periodic orbit in a multi-body system[C]//Proc. of the AIAA Scitech 2020 Forum, 2020: 1914.
24	MILLER D, ENGLANDER J A, LINARES R. Interplanetary low-thrust design using proximal policy optimization[C]//Proc. of the AAS/AIAA Astrodynamics Specialist Conference, 2019.
25	BATTIN R H . An introduction to the mathematics and methods of astrodynamics[M]. Washington DC: American Institute of Aeronautics and Astronautics, 1987.
26	SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. 2nd ed. Cambridge, MA: MIT Press, 2018: 46- 54.
27	SUTTON R S , MCALLESTER D , SINGH S , et al. Policy gradient methods for reinforcement learning with function approximation[J]. Advances in Neural Information Processing Systems, 2000, 12, 1057- 1063.
28	张冉, 李小娟, 韩潮, 等. 基于分段常值的全电推进GEO卫星制导策略[J]. 飞控与探测, 2020, 3 (3): 40- 48.
	ZHANG R , LI X J , HAN C , et al. Guidance strategy for all-electric propulsion GEO satellite based on piecewise constant thrust[J]. Flight Control and Detection, 2020, 3 (3): 40- 48.
29	WALKER M . A set of modified equinoctial orbit elements[J]. Celestial Mechanics, 1986, 38 (4): 391- 392. doi: 10.1007/BF01238929
30	MA J B , LIN L , XIN W . Problems concerning the perturbation due to the tesseral harmonic terms in the nonspherical gravitational potential of the earth[J]. Chinese Astronomy and Astrophysics, 2002, 26 (2): 235- 244. doi: 10.1016/S0275-1062(02)00062-0
31	ALLAN R R . Satellite orbit perturbations due to radiation pressure and luni-solar forces[J]. The Quarterly Journal of Mechanics and Applied Mathematics, 1962, 15 (3): 283- 301. doi: 10.1093/qjmam/15.3.283
32	PARKINSON R W , JONES H M , SHAPIRO I I . Effects of solar radiation pressure on earth satellite orbits[J]. Science, 1960, 131 (3404): 920- 921. doi: 10.1126/science.131.3404.920
33	COOK G E . The effect of aerodynamic lift on satellite orbits[J]. Planetary and Space Science, 1964, 12 (11): 1009- 1020. doi: 10.1016/0032-0633(64)90077-7
34	SRIVASTAVA V K , YADAV S M , KUMAR J , et al. Earth conical shadow modeling for LEO satellite using reference frame transformation technique: a comparative study with existing earth conical shadow models[J]. Astronomy and Computing, 2015, 9 (9): 34- 39.
35	谭红力, 胡光明, 马民, 等. 两种锥形地影模型对太阳光压的影响分析[C]//第五届中国卫星导航学术年会论文集-S3精密定轨与精密定位, 2014: 115-118.
	TAN H L, HU G M, MA M, et al. Analyze the solar radiation pressure with two conical earth shadow models[C]//Proc. of the 5th China Satellite Navigation Conference-S3: Drecision Orbit Drbit Determination and Positioning, 2014: 115-118.
36	WANG Y H, HE H, TAN X T. Truly proximal policy optimization[C]//Proceeding of Machine Learning Research, 2020: 113-122.
37	SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[C]//Proc. of the 4th International Conference on Learning Representations, 2016.
38	GEFFROY S , EPENOY R . Optimal low-thrust transfers with constraints generalization of averaging techniques[J]. Acta Astronautica, 1997, 41 (3): 133- 149. doi: 10.1016/S0094-5765(97)00208-7
39	KLUEVER C A , OLESON S R . Direct approach for computing near-optimal low-thrust earth-orbit transfers[J]. Journal of Spacecraft and Rockets, 1998, 35 (4): 509- 515. doi: 10.2514/2.3360
40	VERLET M, SLAMA B, REYNAUD S, et al. Coupled optimization of launcher and all-electric satellite trajectories[C]//Proc. of the 24th International Symposium on Space Flight Dynamics, 2014.
41	杨大林, 徐波, 高有涛. 地球轨道卫星电推进变轨控制方法[J]. 宇航学报, 2015, 36 (9): 1010- 1017.
	YANG D L , XU B , GAO Y T . Control method for earth satellite orbit transfer using electric propulsion[J]. Journal of Astronautics, 2015, 36 (9): 1010- 1017.
42	BETTS J T . Very low-thrust trajectory optimization using a direct SQP method[J]. Journal of Computational and Applied Mathematics, 2000, 120 (1/2): 27- 40.
43	AZIZ J D , PARKER J S , SCHEERES D J , et al. Low-thrust many-revolution trajectory optimization via differential dynamic programming and a Sundman transformation[J]. Journal of the Astronautical Sciences, 2018, 65 (2): 205- 228. doi: 10.1007/s40295-017-0122-8

轨道参数	初始值	目标值
初始历元	2021.1.1 12:00:00	无约束
半长轴/km	17 169.8	42 165
偏心率	0.608 7	0
轨道倾角/(°)	28.5	0
升交点赤经/(°)	0	无约束
近地点幅角/(°)	0	无约束
真近点角/(°)	180	无约束

超参数	取值
剪切率ε	0.25
折扣因子γ	0.999 3
指数加权系数λ	0.98
耗时惩罚ζ	0.01
任务成功奖励η	550

序号	推力/N	比冲/s	质量/kg	半长轴/km	偏心率	倾角/(°)
1^[38]	0.35	2 000	2 000	24 505.9	0.725	7
2^[39]	0.2	3 300	450	24 364.3	0.731	27
3^[40]	0.58	1 800	2 000	24 396	0.728	5
4^[41]	0.32	3 000	4 000	24 478	0.731	19.5

情况	参考文献中采用的方法	转移时间/d
情况	参考文献中采用的方法	文献	本文
1	间接法	137.5	137.4
2	直接法	66.7	67.8
3	间接法	82	84.2
4	反馈控制法	413.8	376.9

[1]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497.
[2]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[3]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[4]	郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949.
[5]	何立, 沈亮, 李辉, 王壮, 唐文泉. 强化学习中的策略重用: 研究进展[J]. 系统工程与电子技术, 2022, 44(3): 884-899.
[6]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12): 3685-3695.
[7]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[8]	曾斌, 张鸿强, 李厚朴. 针对无人潜航器的反潜策略研究[J]. 系统工程与电子技术, 2022, 44(10): 3174-3181.
[9]	万齐天, 卢宝刚, 赵雅心, 温求遒. 基于深度强化学习的驾驶仪参数快速整定方法[J]. 系统工程与电子技术, 2022, 44(10): 3190-3199.
[10]	曾斌, 王睿, 李厚朴, 樊旭. 基于强化学习的战时保障力量调度策略研究[J]. 系统工程与电子技术, 2022, 44(1): 199-208.
[11]	江志炜, 黄洋, 吴启晖. 基于核函数强化学习的抗干扰频点分配[J]. 系统工程与电子技术, 2021, 43(6): 1547-1556.
[12]	刘家义, 岳韶华, 王刚, 姚小强, 张杰. 复杂任务下的多智能体协同进化算法[J]. 系统工程与电子技术, 2021, 43(4): 991-1002.
[13]	闫安, 陈章, 董朝阳, 何康辉. 基于模糊强化学习的双轮机器人姿态平衡控制[J]. 系统工程与电子技术, 2021, 43(4): 1036-1043.
[14]	李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762.
[15]	高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433.