再入飞行器深度确定性策略梯度制导方法研究

doi:10.12305/j.issn.1001-506X.2022.06.21

Abstract

Abstract:

In order to solve the problem that the traditional reentry vehicle trajectory guidance methods are not adaptable to the strong disturbance conditions and difficult to meet the terminal constraints. Based on the framework of deep deterministic policy gradient (DDPG) reinforcement learning method, conducts network training on the off-line flight trajectory under the random strong disturbance conditions to find the optimal actor network under different environmental conditions. It can be used for guidance trajectory planning under the condition of on-line interference to meet the terminal altitude, range and speed constraints of reentry flight by periodically forecasting the angle of attack and pitch profile of reentry flight. The simulation results show that the maximum terminal residual range deviation is less than 500 m and the maximum terminal speed deviation is less than 35 m/s while meeting the terminal height constraint. Compared with the traditional tracking guidance method, the guidance control method proposed in this paper has higher accuracy and less calculation, which has a good engineering application prospect.

Key words: reentry vehicle, reinforcement learning, deep deterministic policy gradient, guidance

CLC Number:

V448

Dongzi GUO, Rong HUANG, Hechuan XU, Liwei SUN, Naigang CUI. Research on deep deterministic policy gradient guidance method for reentry vehicle[J]. Systems Engineering and Electronics, 2022, 44(6): 1942-1949.

Figures/Tables 15

Fig.1

Fig.2

Table 1

Table 2

Table 3

Table 4

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

References 24

1	李凯文, 张涛, 王锐, 等. 基于深度强化学习的组合优化研究进展[J]. 自动化学报, 2020, 41 (11): 2521- 2537.
	LI K W , ZHANG T , WANG R , et al. Research reviews of combinatorial optimization methods based on deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 41 (11): 2521- 2537.
2	ZHANG H P , WANG H L , LI N , et al. Time-optimal memetic whale optimization algorithm for hypersonic vehicle reentry tra-jectory optimization with no-fly zones[J]. Neural Computing and Applications, 2020, 32 (7): 2735- 2749. doi: 10.1007/s00521-018-3764-y
3	高嘉时. 升力式再入飞行器轨迹优化与制导方法研究[D]. 武汉: 华中科技大学, 2019.
	GAO J S. Research on trajectory optimization and guidance method of lift reentry vehicle[D]. Wuhan: Huazhong University of Science and Technology, 2019.
4	LI R F , HU L , CAI L . Adaptive tracking control of a hypersonic flight aircraft using neural networks with reinforcement syn-thesis[J]. Aero Weaponry, 2018, (6): 3- 10.
5	杨烨峰, 邓凯, 左英琦, 等. PILCO框架对飞行姿态模拟器系统的参数设计与优化[J]. 光学精密工程, 2019, 27 (11): 2365- 2373.
	YANG Y F , DENG K , ZUO Y Q , et al. Parameter design and optimization of flight attitude simulator system based on pilco framework[J]. Optical Precision Engineering, 2019, 27 (11): 2365- 2373.
6	甄岩, 郝明瑞. 基于深度强化学习的智能PID控制方法研究[J]. 战术导弹技术, 2019, (5): 37- 43.
	ZHEN Y , HAO M R . Research on Intelligent PID control method based on deep reinforcement learning[J]. Tactical Missile Technology, 2019, (5): 37- 43.
7	任坚, 刘剑慰, 杨蒲. 基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制[J]. 控制理论与应用, 2020, 37 (7): 1429- 1438.
	REN J , LIU J W , YANG P . Fault tolerant tracking control of flight control system based on incremental strategy reinforcement learning algorithm[J]. Control theory and application, 2020, 37 (7): 1429- 1438.
8	KOCH W , MANCUSO R , WEST R , et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3 (2): 1- 21.
9	LAMBERT N O , SCHINDLER C B , DREW D S , et al. Nonholonomic yaw control of an underactuated flying robot with model-based reinforcement learning[J]. IEEE Robotics and Automation Letters, 2020, 6 (2): 455- 461.
10	TANG C, LAI Y C. Deep reinforcement learning automatic landing control of fixed-wing aircraft using deep deterministic policy gradient[C]//Proc. of the IEEE International Confe-rence on Unmanned Aircraft Systems, 2020.
11	CHENG Y, SHUI Z S, XU C, et al. Cross-cycle iterative unmanned aerial vehicle reentry guidance based on reinforcement learning[C]//Proc. of the IEEE International Conference on Unmanned Systems, 2019: 587-592.
12	涂铮铮. 基于进化和强化学习算法的动态路径规划研究[D]. 成都: 电子科技大学, 2020.
	TU Z Z. Research on dynamic path planning based on evolution and reinforcement learning algorithm[D]. Chengdu: University of Electronic Science and Technology of China, 2020.
13	邱月, 郑柏通, 蔡超. 多约束复杂环境下UAV航迹规划策略自学习方法[J]. 计算机工程, 2021, 47 (5): 44- 51.
	QIU Y , ZHENG B T , CAI C . Self learning method of UAV path planning strategy in complex environment with multiple constraints[J]. Computer Engineering, 2021, 47 (5): 44- 51.
14	GAUDET B , FURFARO R , LINARES R . Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology, 2020, 99 (4): 105746.
15	LU P . Entry guidance: a unified method[J]. Journal of Gui-dance, Control, and Dynamics, 2014, 37 (3): 713- 728. doi: 10.2514/1.62605
16	崔乃刚, 李浩, 卢宝刚, 等. 可重复使用飞行器制导控制一体化技术[J]. 光学精密工程, 2017, 25 (12): 52- 58.
	CUI N G , LI H , LU B G , et al. Integrated guidance and control for reusable launch vehicle[J]. Optics and Precision Engineering, 2017, 25 (12): 52- 58.
17	SHEN Z J , LU P . Onboard generation of three-dimensional constrained entry trajectories[J]. Journal of Guidance, control, and Dynamics, 2003, 26 (1): 111- 121. doi: 10.2514/2.5021
18	ZHAO J , ZHOU R , JIN X L . Progress in reentry trajectory planning for hypersonic vehicle[J]. Journal of Systems Engineering and Electronics, 2014, 25 (4): 627- 639. doi: 10.1109/JSEE.2014.00073
19	ARULKUMARAN K , DEISENROTH M P , BRUNDAGE M , et al. Deep reinforcement learning: a brief survey[J]. IEEE Signal Processing Magazine, 2017, 34 (6): 26- 38. doi: 10.1109/MSP.2017.2743240
20	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2021-06-02]. https://axiv.org/abs/1509.02971.
21	GAO J S, SHI X M, CHENG Z T, et al. Reentry trajectory optimization based on deep reinforcement learning[C]//Proc. of the IEEE Chinese Control and Decision Conference, 2019: 2588-2592.
22	KE H C , WANG J , DENG L Y , et al. Deep reinforcement learning-based adaptive computation offloading for MEC in hete-rogeneous vehicular networks[J]. IEEE Trans.on Vehicular Technology, 2020, 69 (7): 7916- 7929. doi: 10.1109/TVT.2020.2993849
23	NAUTA J, KHALUF Y, SIMOENS P. Using the Ornstein-Uhlenbeck process for random exploration[C]//Proc. of the 4th International Conference on Complexity, Future Information Systems and Risk, 2019.
24	党选举, 王凯利, 姜辉, 等. 工业机器人谐波减速器迟滞特性的神经网络建模[J]. 光学精密工程, 2019, 27 (3): 694- 701.
	DANG X J , WANG K L , JIANG H , et al. Neural network mode-ling of hysteresis for harmonic drive in industrial robots[J]. Optics and Precision Engineering, 2019, 27 (3): 694- 701.

参数	参数值
质量/kg	1 200
气动参考面积/㎡	0.5

条件	参数	参数值
初始条件	高度/km	80
	速度/(m·s^-1)	5 200
	飞行路径角/(°)	－1
	航向角/(°)	90
	设计航程/km	3 500
终端条件	高度/km	27
	速度/(m·s^-1)	1 500
	剩余航程/km	50
约束条件	最大动压/kPa	100
	最大法向过载/g	5
	最大驻点热流/(kW·h)	5 000

超参数名称	超参数取值
s_online	100
n_train	32
κ_o	0.15
η_o	0.15
σ_o	0.15
α_a	0.000 01
α_c	0.001
ζ	0.99
s_target	5
τ	0.01

参数	参数值
初始高度偏差Δh₀/m	－100~100
初始速度偏差Δv₀/(m·s^-1)	－50~50
初始飞行路径角偏差Δγ₀/(°)	－1~1
阻力系数偏差/%	－10~10
升力系数偏差/%	－10~10
大气密度偏差/%	－10~10

[1]	Zijie MA, Yongjun XIE. Dynamic stealth of cruise missile in system combat [J]. Systems Engineering and Electronics, 2022, 44(9): 2826-2831.
[2]	Mengping ZHOU, Xiuyun MENG, Junhui LIU. Design of optimal sliding mode guidance law for head-on interception of maneuvering targets with large angle of fall [J]. Systems Engineering and Electronics, 2022, 44(9): 2886-2893.
[3]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Research on decision-making modeling of cognitive jamming for multi-functional radar based on Markov [J]. Systems Engineering and Electronics, 2022, 44(8): 2488-2497.
[4]	Junlong LI, Songzhou LI, Di ZHOU. Optimization method for three-impulse rendezvous under multi-constraints [J]. Systems Engineering and Electronics, 2022, 44(8): 2612-2620.
[5]	Guan WANG, Haizhong RU, Dali ZHANG, Guangcheng MA, Hongwei XIA. Design of intelligent control system for flexible hypersonic vehicle [J]. Systems Engineering and Electronics, 2022, 44(7): 2276-2285.
[6]	Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.
[7]	Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.
[8]	Shang JIANG, Bo WEI, Weige LIANG, Dongyan SUN, Jinjun LI, Ye MA. Integrated guidance and control design method with multiple constraints and backlash [J]. Systems Engineering and Electronics, 2022, 44(4): 1318-1328.
[9]	Li HE, Liang SHEN, Hui LI, Zhuang WANG, Wenquan TANG. Survey on policy reuse in reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(3): 884-899.
[10]	Tong AN, Peng WANG, Jianhua WANG, Guojian TANG, Yulong PAN, Haishan CHEN. Integrated guidance and control schemes for dynamic surface of flexible hypersonic vehicles [J]. Systems Engineering and Electronics, 2022, 44(3): 956-966.
[11]	Jinlin ZHANG, Jiong LI, Humin LEI, Wanli LI, Xiao TANG. Capture region of 3D realistic true proportional navigation with finite overload [J]. Systems Engineering and Electronics, 2022, 44(3): 986-997.
[12]	Qi WANG, Zhizhong LIAO, Fei YAN. Algorithm for countering velocity gate pull-off jamming of radar seeker based on probability data association [J]. Systems Engineering and Electronics, 2022, 44(2): 448-454.
[13]	Xiao TANG, Jikun YE, Xu LI. Design of 3D nonlinear prescribed performance guidance law [J]. Systems Engineering and Electronics, 2022, 44(2): 619-627.
[14]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Multi-function radar intelligent jamming decision method based on prior knowledge [J]. Systems Engineering and Electronics, 2022, 44(12): 3685-3695.
[15]	Qingqing YANG, Yingying GAO, Yu GUO, Boyuan XIA, Kewei YANG. Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(11): 3486-3495.

Research on deep deterministic policy gradient guidance method for reentry vehicle

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 24

Related Articles 15

Recommended Articles

Metrics

Comments