

系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (5): 1652-1661.doi: 10.12305/j.issn.1001-506X.2022.05.27
韩明仁1,2, 王玉峰1,2,*
收稿日期:2021-07-09
									
				
									
				
									
				
											出版日期:2022-05-01
									
				
											发布日期:2022-05-16
									
			通讯作者:
					王玉峰
												作者简介:韩明仁(1996—), 男, 硕士研究生, 主要研究方向为航天器智能控制|王玉峰(1976—), 男, 研究员, 博士, 主要研究方向为航天器姿态与轨道控制、卫星控制系统设计与集成测试
				
							基金资助:Mingren HAN1,2, Yufeng WANG1,2,*
Received:2021-07-09
									
				
									
				
									
				
											Online:2022-05-01
									
				
											Published:2022-05-16
									
			Contact:
					Yufeng WANG   
												摘要:
采用电推力器实现自主轨道转移是全电推进卫星领域的关键技术之一。针对地球同步轨道(geostationary orbit, GEO)全电推进卫星的轨道提升问题, 将广义优势估计(generalized advantage estimator, GAE)和近端策略优化(proximal policy optimization, PPO)方法相结合, 在考虑多种轨道摄动影响以及地球阴影约束的情况下, 提出了基于强化学习的时间最优小推力变轨策略优化方法。针对状态空间过大、奖励稀疏导致训练困难这一关键问题, 提出了动作输出映射和分层奖励等训练加速方法, 有效提升了训练效率, 加快了收敛速度。数值仿真和结果对比表明, 所提方法更加简单、灵活、高效, 与传统的直接法、间接法以及反馈控制法相比,能够保证轨道转移时间的最优性。
中图分类号:
韩明仁, 王玉峰. 基于强化学习的全电推进卫星变轨优化方法[J]. 系统工程与电子技术, 2022, 44(5): 1652-1661.
Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning[J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.
| 1 |  
											  周志成, 高军.  全电推进GEO卫星平台发展研究[J]. 航天器工程, 2015, 24 (2): 1- 6. 
											 												 doi: 10.3969/j.issn.1673-8748.2015.02.001  | 
										
|  
											   ZHOU Z C ,  GAO J .  Development approach to all-electric propulsion GEO satellite platform[J]. Spacecraft Engineering, 2015, 24 (2): 1- 6. 
											 												 doi: 10.3969/j.issn.1673-8748.2015.02.001  | 
										|
| 2 | 段传辉, 任立新, 常雅杰, 等. 全电推进卫星轨道优化的推力同伦解法[J]. 中国空间科学技术, 2020, 40 (2): 42- 48. | 
| DUAN C H , REN L X , CHANG Y J , et al. All-electric propulsion satellite trajectory optimization by homotopic approach[J]. Chinese Space Science and Technology, 2020, 40 (2): 42- 48. | |
| 3 | PETROPOULOS A E, SIMS J A. A review of some exact solutions to the planar equations of motion of a thrusting spacecraft[C]//Proc. of the 2nd International Symposium on Low-Thrust Trajectories, 2002. | 
| 4 |  
											   MORANTE D ,  RIVO S M ,  SOLER M .  A survey on low-thrust trajectory optimization approaches[J]. Aerospace, 2021, 8 (3): 88. 
											 												 doi: 10.3390/aerospace8030088  | 
										
| 5 | EDELBAUM T N . Propulsion requirements for controllable sa-tellites[J]. Journal of the American Rocket Society, 1961, 31 (8): 1079- 1089. | 
| 6 | COLASURDO G, CASALINO L. Optimal low-thrust maneuvers in presence of earth shadow[C]//Proc. of the AIAA/AAS Astrodynamics Specialist Conference and Exhibit, 2004: 716-725. | 
| 7 | RICCIARDI L A, VASILE M. Modhoc-multi objective direct hybrid optimal control[C]//Proc. of the 7th International Conference on Astrodynamics Tools and Techniques, 2018. | 
| 8 | PRITCHETT R, HOWELL K, GREBOW D. Low-thrust transfer design based on collocation techniques: applications in the restricted three-body problem[C]//Proc. of the AAS/AIAA Astrodynamics Specialist Conference, 2017. | 
| 9 | LOCOCHE S. OptElec: an optimisation software for low-thrust orbit transfer including satellite and operation constraints[C]//Proc. of the 7th International Conference on Astrodynamics Tools and Techniques, 2018. | 
| 10 | MAZZINI L, CERRETO M. Theory and applications of optimal finite thrust orbital transfers[M]//GIOKGIO F, JANOS D P, ed. Modeling and optimization in space engineering. Cham, Switzerland: Springer, 2019: 233-269. | 
| 11 | BASTANTE J C, PENARROYA P. Electro: a SW tool for the electric propulsion trajectory optimisation[C]//Proc. of the 7th International Conference on Astrodynamics Tools and Techniques, 2018. | 
| 12 |  
											   MORANTE D ,  RIVO S M ,  SOLER M , et al.  Hybrid multi-objective orbit-raising optimization with operational constraints[J]. Acta Astronautica, 2020, 175 (1): 447- 461. 
											 												 doi: 10.1016/j.actaastro.2020.05.022  | 
										
| 13 | SHANNON J L , OZIMEK M , ATCHISON J A , et al. Q-law aided direct trajectory optimization for the high-fidelity, many-revolution, low-thrust orbit transfer problem[J]. Advances in the Astronautical Sciences, 2019, 168, 781- 800. | 
| 14 | LANTUKH D V, RANIERI C L, DIPRINZIO M D, et al. Enhanced Q-law Lyapunov control for low-thrust transfer and rendezvous design[C]//Proc. of the AAS/AIAA Astrodyna-mics Specialist Conference, 2017. | 
| 15 | 闫安, 陈章, 董朝阳, 等. 基于模糊强化学习的双轮机器人姿态平衡控制[J]. 系统工程与电子技术, 2021, 43 (4): 1036- 1043. | 
| YAN A , CHEN Z , DONG C Y , et al. Attitude balance control of two-wheeled robot based on fuzzy reinforcement learning[J]. Systems Engineering and Electronics, 2021, 43 (4): 1036- 1043. | |
| 16 |  
											   RECHT B .  A tour of reinforcement learning: the view from continuous control[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2019, 2 (1): 253- 279. 
											 												 doi: 10.1146/annurev-control-053018-023825  | 
										
| 17 |  
											   KOU P ,  LIANG D ,  WANG C , et al.  Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks[J]. Applied Energy, 2020, 264, 114772. 
											 												 doi: 10.1016/j.apenergy.2020.114772  | 
										
| 18 |  
											   YOO H ,  KIM B ,  KIM J W , et al.  Reinforcement learning based optimal control of batch processes using Monte Carlo deep deterministic policy gradient with phase segmentation[J]. Computers and Chemical Engineering, 2021, 144, 107133. 
											 												 doi: 10.1016/j.compchemeng.2020.107133  | 
										
| 19 |  
											   LI Y ,  QIU X H ,  LIU X D , et al.  Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs[J]. Journal of Systems Engineering and Electronics, 2020, 31 (4): 734- 742. 
											 												 doi: 10.23919/JSEE.2020.000048  | 
										
| 20 | BERTSEKAS D P . Reinforcement learning and optimal control[M]. Belmont, MA: Athena Scientific, 2019. | 
| 21 | YANAGIDA K, OZAKI N, FUNASE R. Exploration of long time-of-flight three-body transfers using deep reinforcement learning[C]//Proc. of the AIAA Scitech 2020 Forum, 2020: 0460. | 
| 22 | MILLER D, LINARES R. Low-thrust optimal control via reinforcement learning[C]//Proc. of the 29th AAS/AIAA Space Flight Mechanics Meeting, 2019. | 
| 23 | SULLIVAN C J, BOSANAC N. Using reinforcement learning to design a low-thrust approach into a periodic orbit in a multi-body system[C]//Proc. of the AIAA Scitech 2020 Forum, 2020: 1914. | 
| 24 | MILLER D, ENGLANDER J A, LINARES R. Interplanetary low-thrust design using proximal policy optimization[C]//Proc. of the AAS/AIAA Astrodynamics Specialist Conference, 2019. | 
| 25 | BATTIN R H . An introduction to the mathematics and methods of astrodynamics[M]. Washington DC: American Institute of Aeronautics and Astronautics, 1987. | 
| 26 | SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. 2nd ed. Cambridge, MA: MIT Press, 2018: 46- 54. | 
| 27 | SUTTON R S , MCALLESTER D , SINGH S , et al. Policy gradient methods for reinforcement learning with function approximation[J]. Advances in Neural Information Processing Systems, 2000, 12, 1057- 1063. | 
| 28 | 张冉, 李小娟, 韩潮, 等. 基于分段常值的全电推进GEO卫星制导策略[J]. 飞控与探测, 2020, 3 (3): 40- 48. | 
| ZHANG R , LI X J , HAN C , et al. Guidance strategy for all-electric propulsion GEO satellite based on piecewise constant thrust[J]. Flight Control and Detection, 2020, 3 (3): 40- 48. | |
| 29 |  
											   WALKER M .  A set of modified equinoctial orbit elements[J]. Celestial Mechanics, 1986, 38 (4): 391- 392. 
											 												 doi: 10.1007/BF01238929  | 
										
| 30 |  
											   MA J B ,  LIN L ,  XIN W .  Problems concerning the perturbation due to the tesseral harmonic terms in the nonspherical gravitational potential of the earth[J]. Chinese Astronomy and Astrophysics, 2002, 26 (2): 235- 244. 
											 												 doi: 10.1016/S0275-1062(02)00062-0  | 
										
| 31 |  
											   ALLAN R R .  Satellite orbit perturbations due to radiation pressure and luni-solar forces[J]. The Quarterly Journal of Mechanics and Applied Mathematics, 1962, 15 (3): 283- 301. 
											 												 doi: 10.1093/qjmam/15.3.283  | 
										
| 32 |  
											   PARKINSON R W ,  JONES H M ,  SHAPIRO I I .  Effects of solar radiation pressure on earth satellite orbits[J]. Science, 1960, 131 (3404): 920- 921. 
											 												 doi: 10.1126/science.131.3404.920  | 
										
| 33 |  
											   COOK G E .  The effect of aerodynamic lift on satellite orbits[J]. Planetary and Space Science, 1964, 12 (11): 1009- 1020. 
											 												 doi: 10.1016/0032-0633(64)90077-7  | 
										
| 34 | SRIVASTAVA V K , YADAV S M , KUMAR J , et al. Earth conical shadow modeling for LEO satellite using reference frame transformation technique: a comparative study with existing earth conical shadow models[J]. Astronomy and Computing, 2015, 9 (9): 34- 39. | 
| 35 | 谭红力, 胡光明, 马民, 等. 两种锥形地影模型对太阳光压的影响分析[C]//第五届中国卫星导航学术年会论文集-S3精密定轨与精密定位, 2014: 115-118. | 
| TAN H L, HU G M, MA M, et al. Analyze the solar radiation pressure with two conical earth shadow models[C]//Proc. of the 5th China Satellite Navigation Conference-S3: Drecision Orbit Drbit Determination and Positioning, 2014: 115-118. | |
| 36 | WANG Y H, HE H, TAN X T. Truly proximal policy optimization[C]//Proceeding of Machine Learning Research, 2020: 113-122. | 
| 37 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[C]//Proc. of the 4th International Conference on Learning Representations, 2016. | 
| 38 |  
											   GEFFROY S ,  EPENOY R .  Optimal low-thrust transfers with constraints generalization of averaging techniques[J]. Acta Astronautica, 1997, 41 (3): 133- 149. 
											 												 doi: 10.1016/S0094-5765(97)00208-7  | 
										
| 39 |  
											   KLUEVER C A ,  OLESON S R .  Direct approach for computing near-optimal low-thrust earth-orbit transfers[J]. Journal of Spacecraft and Rockets, 1998, 35 (4): 509- 515. 
											 												 doi: 10.2514/2.3360  | 
										
| 40 | VERLET M, SLAMA B, REYNAUD S, et al. Coupled optimization of launcher and all-electric satellite trajectories[C]//Proc. of the 24th International Symposium on Space Flight Dynamics, 2014. | 
| 41 | 杨大林, 徐波, 高有涛. 地球轨道卫星电推进变轨控制方法[J]. 宇航学报, 2015, 36 (9): 1010- 1017. | 
| YANG D L , XU B , GAO Y T . Control method for earth satellite orbit transfer using electric propulsion[J]. Journal of Astronautics, 2015, 36 (9): 1010- 1017. | |
| 42 | BETTS J T . Very low-thrust trajectory optimization using a direct SQP method[J]. Journal of Computational and Applied Mathematics, 2000, 120 (1/2): 27- 40. | 
| 43 |  
											   AZIZ J D ,  PARKER J S ,  SCHEERES D J , et al.  Low-thrust many-revolution trajectory optimization via differential dynamic programming and a Sundman transformation[J]. Journal of the Astronautical Sciences, 2018, 65 (2): 205- 228. 
											 												 doi: 10.1007/s40295-017-0122-8  | 
										
| [1] | 朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497. | 
| [2] | 王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285. | 
| [3] | 孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318. | 
| [4] | 郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949. | 
| [5] | 何立, 沈亮, 李辉, 王壮, 唐文泉. 强化学习中的策略重用: 研究进展[J]. 系统工程与电子技术, 2022, 44(3): 884-899. | 
| [6] | 朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12): 3685-3695. | 
| [7] | 杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495. | 
| [8] | 曾斌, 张鸿强, 李厚朴. 针对无人潜航器的反潜策略研究[J]. 系统工程与电子技术, 2022, 44(10): 3174-3181. | 
| [9] | 万齐天, 卢宝刚, 赵雅心, 温求遒. 基于深度强化学习的驾驶仪参数快速整定方法[J]. 系统工程与电子技术, 2022, 44(10): 3190-3199. | 
| [10] | 曾斌, 王睿, 李厚朴, 樊旭. 基于强化学习的战时保障力量调度策略研究[J]. 系统工程与电子技术, 2022, 44(1): 199-208. | 
| [11] | 江志炜, 黄洋, 吴启晖. 基于核函数强化学习的抗干扰频点分配[J]. 系统工程与电子技术, 2021, 43(6): 1547-1556. | 
| [12] | 刘家义, 岳韶华, 王刚, 姚小强, 张杰. 复杂任务下的多智能体协同进化算法[J]. 系统工程与电子技术, 2021, 43(4): 991-1002. | 
| [13] | 闫安, 陈章, 董朝阳, 何康辉. 基于模糊强化学习的双轮机器人姿态平衡控制[J]. 系统工程与电子技术, 2021, 43(4): 1036-1043. | 
| [14] | 李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762. | 
| [15] | 高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433. | 
| 阅读次数 | ||||||
| 
												        	全文 | 
											        	
												        	 | 
													|||||
| 
												        	摘要 | 
												        
															 | 
													|||||