Q-learning强化学习制导律

doi:10.3969/j.issn.1001-506X.2020.02.21

Abstract

Abstract:

As the intelligent missile being a major development trend, it is foreseeable that it will become a precise and effective strike weapon in the future battlefields. On the basis of the traditional proportional guidance law, this paper proposes a guidance algorithm based on reinforcement learning with variable proportional coefficient. Taking the line-of-sight rate as the state, this algorithm designs a discretized action space, as well as a reward function based on the miss distance, to determine the correct guidance command for the missile. The simulation results prove the algorithm possesses better guidance accuracy than the traditional proportional guidance law and endows the missile with the ability of autonomous decision-making.

Key words: proportional guidance, guidance law, miss distance, maneuvering target, reinforcement learning, Q-learning, timing difference algorithm

CLC Number:

V448.133

Qinhao ZHANG, Baiqiang AO, Qinxue ZHANG. Reinforcement learning guidance law of Q-learning[J]. Systems Engineering and Electronics, 2020, 42(2): 414-419.

Figures/Tables 10

Fig.1

Table 1

Fig.2

Fig.3

Fig.4

Fig.5

Table 2

Fig.6

Fig.7

Fig.8

References 30

1	聂永芳, 周卿吉, 张涛. 制导规律研究现状及展望[J]. 飞行力学, 2001, 19 (3): 7- 11. doi: 10.3969/j.issn.1002-0853.2001.03.002
	NIE Y F , ZHOU Q J , ZHANG T . Research status and prospect of guidance law[J]. Flight mechanics, 2001, 19 (3): 7- 11. doi: 10.3969/j.issn.1002-0853.2001.03.002
2	郭鹏飞.基于模糊逻辑的精确末制导律研究[D].西安:西北工业大学, 2003.
	GUO P F. Research on precise terminal guidance law based on fuzzy logic[D]. Xi'an: Northwest polytechnic university, 2003.
3	李红霞.拦截大机动目标的模糊导引律研究[D].沈阳:东北大学, 2013.
	LI H X. Research on fuzzy guidance law for intercepting large maneuvering targets[D]. Shenyang: Northeastern University, 2013.
4	魏航.基于强化学习的无人机空中格斗算法研究[D].哈尔滨:哈尔滨工业大学, 2015.
	WEI H. Research on UAV aerial combat algorithm based on reinforcement learning[D]. Harbin: Harbin Institute of Technology, 2015.
5	CHITHAPURAM C , CHERUKURI A K , JEPPU Y . Aerial vehicle guidance based on passive machine learning technique[J]. International Journal of Intelligent Computing and Cybernetics, 2016, 9 (3): 255- 273. doi: 10.1108/IJICC-12-2015-0042
6	CHITHAPURAM C, JEPPU Y, CHERUKURI A K. Artificial intelligence learning based on proportional navigation guidance[P]. India: 10.1109/ICACCI.2013.6637338. 2013-8-22.
7	陈自立, 徐娅萍, 顾立彬. 基于模糊Q学习算法的AGV路径规划研究[J]. 制造业自动化, 2012, 34 (11): 4- 6, 16. doi: 10.3969/j.issn.1009-0134.2012.6(s).02
	CHEN Z L , XU Y P , GU L B . AGV path planning based on fuzzy Q learning algorithm[J]. Manufacturing automation, 2012, 34 (11): 4- 6, 16. doi: 10.3969/j.issn.1009-0134.2012.6(s).02
8	葛媛, 布朋生, 刘强. 模糊强化学习在机器人导航中的应用[J]. 信息技术, 2009, 33 (10): 127- 130. doi: 10.3969/j.issn.1009-2552.2009.10.038
	GE Y , BU P S , LIU Q . Application of fuzzy reinforcement learning in robot navigation[J]. Information technology, 2009, 33 (10): 127- 130. doi: 10.3969/j.issn.1009-2552.2009.10.038
9	聂春雨, 祝明, 郑泽伟, 等. 基于Q-learning算法和神经网络的飞艇控制[J]. 北京航空航天大学学报, 2017, 43 (12): 2431- 2438.
	NIE C Y , ZHU M , ZHENG Z W , et al. Airship control based on Q-learning algorithm and neural network[J]. Journal of Beijing university of aeronautics and astronautics, 2017, 43 (12): 2431- 2438.
10	谭浪, 巩庆海, 王会霞. 基于深度强化学习的追逃博弈算法[J]. 航天控制, 2018, 36 (6): 3- 8, 19.
	TAN L , GONG Q H , WANG H X . Pursuit game algorithm based on deep reinforcement learning[J]. Aerospace control, 2008, 36 (6): 3- 8, 19.
11	PRASHANT B , FARUK K , NAVDEEP S . Reinforcement learning based obstacle avoidance for autonomous underwater vehicle[J]. Journal of Marine Science and Application, 2019, 18 (2): 228- 238. doi: 10.1007/s11804-019-00089-3
12	PANOV A I , YAKOVLEV K S , SUVOROV R . Grid path planning with deep reinforcement learning: preliminary results[J]. Procedia Computer Science, 2018, 123 (1): 347- 353.
13	YANG J , YOU X H , WU G X , et al. Application of reinforcement learning in UAV cluster task scheduling[J]. Future Generation Computer Systems, 2019, 95 (11): 140- 148.
14	张晶晶, 周德云, 张堃. 一种基于强化学习的UAV目标搜索算法[J]. 计算机应用研究, 2011, 28 (10): 3659- 3661. doi: 10.3969/j.issn.1001-3695.2011.10.014
	ZHANG J J , ZHOU D Y , ZHANG K . A UAV target search algorithm based on reinforcement learning[J]. Application Research of Computers, 2011, 28 (10): 3659- 3661. doi: 10.3969/j.issn.1001-3695.2011.10.014
15	LAMPTON A , VALASEK J , KUMAR M . Multiresolution state-space discretization for Q-learning with pseudorandomized discretization[J]. Journal of Control Theory and Applications, 2011, 9 (3): 431- 439. doi: 10.1007/s11768-011-1012-4
16	TARN T J . Hybrid MDP based integrated hierarchical Q-learning[J]. Science China Information Sciences, 2011, 54 (11): 2279- 2294. doi: 10.1007/s11432-011-4332-6
17	ZHANG W Z , LYU T S . Reactive fuzzy controller design by Q-learning for mobile robot navigation[J]. Journal of Harbin Institute of Technology, 2005, 12 (3): 319- 324.
18	YANG B H. A novel experience-based exploration method for Q-learning[C]//Proc.of the 4th International Conference of Pioneering Computer Scientists, Engineers and Educators, 2018: 39.
19	WANG J W. Kicking motion design of humanoid robots using gradual accumulation learning method based on Q-learning[C]// Proc.of the 28th China Control and Decision Conference, 2016: 328-333.
20	TANG R K. An error-sensitive Q-learning approach for robot navigation[C]//Proc.of the 34th Chinese Control Conference (Volume D) Professional Committee of Control Theory of Chinese Association of Automation, 2015: 785-790.
21	PARK K H , KIM Y J , KIM J H . Modular Q-learning based multi-agent cooperation for robot soccer[J]. Robotics and Autonomous Systems, 2001, 35 (2): 109- 122. doi: 10.1016/S0921-8890(01)00114-2
22	BONARINI A , LAZARIC A , MONTRONE F , et al. Reinforcement distribution in fuzzy Q-learning[J]. Fuzzy Sets and Systems, 2008, 160 (10): 1420- 1443.
23	LIN L X, XIE H B, ZHANG D B, et al.Supervised neural Q-learning based motion control for bionic underwater robots[C]//Proc.of the 3rd International Conference of Bionic Engineering, 2010: 178.
24	SHI Z G. The improved Q-learning algorithm based on pheromone mechanism for swarm robot system[C]//Proc.of the 32nd Chinese Control Conference, 2013: 1131-1136.
25	WANG H. The application of proportional navigation in the process of UAV air combat guidance and optimization of proportional parameter[C]//Proc.of the 33rd Chinese Control Conference, 2014: 1232-1236.
26	WU S J . Illegal radio station localization with UAV-based Q-learning[J]. China Communications, 2018, 15 (12): 122- 131.
27	史豪斌, 徐梦. 基于强化学习的旋翼无人机智能追踪方法[J]. 电子科技大学学报, 2019, 48 (4): 553- 559. doi: 10.3969/j.issn.1001-0548.2019.04.012
	SHI H B , XU M . A intelligent tracking method of rotor UAV based on reinforcement learning[J]. Journal of University of Electronic Science and Technology of China, 2019, 48 (4): 553- 559. doi: 10.3969/j.issn.1001-0548.2019.04.012
28	ZHANG T Z. Hybrid path planning of a quadrotor UAV based on Q-learning algorithm[C]//Proc.of the 37th Chinese Control Conference, 2018: 301-305.
29	ZHAO Y J. Q learning algorithm based UAV path learning and obstacle avoidence approach[C]//Proc.of the 36th Chinese Control Conference, 2017: 95-100.
30	徐小野, 李爱军, 张丛丛, 等. 基于Q学习的变体无人机控制系统设计[J]. 西北工业大学学报, 2012, 30 (3): 340- 344. doi: 10.3969/j.issn.1000-2758.2012.03.006
	XU X Y , LI A J , ZHANG C C , et al. Design of variant UAV control system based on Q learning[J]. Journal of Northwestern Polytechnical University, 2012, 30 (3): 340- 344. doi: 10.3969/j.issn.1000-2758.2012.03.006

S	A	P	R	γ
状态空间	动作空间	状态转移概率	奖励函数	折扣因子

制导律	机动方式	机动过载/g	脱靶量/m
PNG	圆周	7	1.26
QNG	圆周	7	0.333 2

[1]	Mengping ZHOU, Xiuyun MENG, Junhui LIU. Design of optimal sliding mode guidance law for head-on interception of maneuvering targets with large angle of fall [J]. Systems Engineering and Electronics, 2022, 44(9): 2886-2893.
[2]	Zilin HOU, Ting CHENG, Han PENG. GMPHD based on measurement conversion sequential filtering for maneuvering target tracking [J]. Systems Engineering and Electronics, 2022, 44(8): 2474-2482.
[3]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Research on decision-making modeling of cognitive jamming for multi-functional radar based on Markov [J]. Systems Engineering and Electronics, 2022, 44(8): 2488-2497.
[4]	Guan WANG, Haizhong RU, Dali ZHANG, Guangcheng MA, Hongwei XIA. Design of intelligent control system for flexible hypersonic vehicle [J]. Systems Engineering and Electronics, 2022, 44(7): 2276-2285.
[5]	Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.
[6]	Dongzi GUO, Rong HUANG, Hechuan XU, Liwei SUN, Naigang CUI. Research on deep deterministic policy gradient guidance method for reentry vehicle [J]. Systems Engineering and Electronics, 2022, 44(6): 1942-1949.
[7]	Guang ZHAI, Yanxin WANG, Yiyong SUN. Cooperative tracking filtering technology of multi-target based on low orbit satellite constellation [J]. Systems Engineering and Electronics, 2022, 44(6): 1957-1967.
[8]	Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.
[9]	Shihan TAN, Fenglin JIN, Congying DUN. Task assignment strategy for space-air-ground integrated vehicular networks oriented to user demand [J]. Systems Engineering and Electronics, 2022, 44(5): 1717-1727.
[10]	Li HE, Liang SHEN, Hui LI, Zhuang WANG, Wenquan TANG. Survey on policy reuse in reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(3): 884-899.
[11]	Jinlin ZHANG, Jiong LI, Humin LEI, Wanli LI, Xiao TANG. Capture region of 3D realistic true proportional navigation with finite overload [J]. Systems Engineering and Electronics, 2022, 44(3): 986-997.
[12]	Xiao TANG, Jikun YE, Xu LI. Design of 3D nonlinear prescribed performance guidance law [J]. Systems Engineering and Electronics, 2022, 44(2): 619-627.
[13]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Multi-function radar intelligent jamming decision method based on prior knowledge [J]. Systems Engineering and Electronics, 2022, 44(12): 3685-3695.
[14]	Qingqing YANG, Yingying GAO, Yu GUO, Boyuan XIA, Kewei YANG. Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(11): 3486-3495.
[15]	Bin ZENG, Hongqiang ZHANG, Houpu LI. Research on anti-submarine strategy for unmanned undersea vehicles [J]. Systems Engineering and Electronics, 2022, 44(10): 3174-3181.

Reinforcement learning guidance law of Q-learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 30

Related Articles 15

Recommended Articles

Metrics

Comments