基于多智能体博弈强化学习的无人机智能攻击策略生成模型

doi:10.12305/j.issn.1001-506X.2023.10.21

Abstract

Abstract:

How to utilize new combat forces represented by offensive unmanned aerial vehicle (UAV) to enhance combat effectiveness is one of the focuses of intelligent and unmanned warfare research. This article is based on the key technology of UAV intelligent attack using multi-agent game reinforcement learning, as well as the basic concept of Markov random games. A model for generating UAV intelligent attack strategies based on multi-agent game reinforcement learning is established, and an optimization method is proposed using the "trembling hand perfect" idea in the game theory to improve the strategy model. Simulation experiments show that the optimized algorithm has improved the original algorithm, and the trained model can generate various real-time attack tactics, which has strong practical significance for intelligent command and control.

Key words: multi-agent game reinforcement learning, Markov stochastic game, unmanned aerial vehicle (UAV), tactical strategy

CLC Number:

E917

Zhiruo ZHAO, Lei CAO, Xiliang CHEN, Jun LAI, Legui ZHANG. UAV intelligent attack strategy generation model based on multi-agent game reinforcement learning[J]. Systems Engineering and Electronics, 2023, 45(10): 3165-3171.

Figures/Tables 9

Fig.1

Fig.2

Table 1

Table 2

Fig.3

Fig.4

Table 3

Table 4

Fig.5

References 30

1	孙彧, 李清伟, 徐志雄,等.基于多智能体深度强化学习的空战博弈对抗策略训练模型[J].指挥信息系统与技术,2021,12(2):16-20.
	SUNY , LIQ W , XUZ X ,et al.Game confrontation strategy training model for air combat based on multi agent deep reinforcement learning[J].Command Information System and Technology,2021,12(2):16-20.
2	陈希亮, 曹雷, 沈驰.基于深度逆向强化学习的行动序列规划问题研究[J].国防科技,2019,40(4):55-61.
	CHENX L , CAOL , SHENC .Research on action sequence planning based on deep inverse reinforcement learning[J].National Defense Science & Technology,2019,40(4):55-61.
3	曹雷, 孙彧, 陈希亮,等.联合作战任务智能规划关键技术及其应用思考[J].国防科技,2020,41(3):49-56.
	CAOL , SUNY , CHENX L ,et al.Key technology and application of intelligent mission planning in joint operations[J].National Defense Science & Technology,2020,41(3):49-56.
4	曹雷, 陈希亮, 汤伟.智能化陆军建设[J].国防科技,2019,40(4):14-19.
	CAOL , CHENX L , TANGW .Intelligent army construction[J].National Defense Science & Technology,2019,40(4):14-19.
5	陈希亮, 李清伟, 孙彧.基于博弈对抗的空战智能决策关键技术[J].指挥信息系统与技术,2021,12(2):6.
	CHENX L , LIQ W , SUNY .Key technologies for air combat intelligent decision based on game confrontation[J].Command Information System and Technology,2021,12(2):6.
6	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[C]// Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems, 2018: 10-15.
7	RASHID T, SAMVELYAN M, WITT C D, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proc. of the 35th International Confe-rence on Machine Learning, 2018: 4295-4304.
8	YANG Y, RUI L, LI M, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018: 5571-5580.
9	FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130.
10	PENG P, WEN Y, YANG Y, et al. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games[EB/OL]. [2021-10-10]. https://arxiv.org/pdf/1703.10069.pdf.
11	HU D P, JIANG X S, WEI X M, et al. State representation learning for minimax deep deterministic policy gradient[C]//Proc. of the 12th International Conference on Knowledge Science, Engineering and Management, 2019: 481-487.
12	YANG Y D, RUI L, LI M N, et al. Mean field multi-agent reinforcement learning[C]//Proc. of the 35th International Conference on Machine Learning, 2018.
13	FOERSTER J N, CHEN R Y, AL S M, et al. Learning with opponent-learning awareness[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2017: 122-130.
14	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1706.02275.
15	HERNANDEZ-LEAL P, KAISERS M, BAARSLAG T, et al. A survey of learning in multiagent environments: dealing with non-stationarity[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.09183.
16	WATSONJ .Strategy: an introduction to game theory[M].New York:W. W. Norton & Company,2013.
17	SUTTONR , BARTOA .Reinforcement learning: an introduction[M].Cambridge:MIT Press,1998.
18	LECUNY , BENGIOY , HINTONG .Deep learning[J].Nature,2015,521(7553):436-444.
19	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proc. of the 4th International Conference on Learning Representations, 2016.
20	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1707.06347.
21	RASMUSENE .Games and information: an introduction to game theory[J].International Journal of Industrial Organization,1991,9(3):474-476.
22	LEE K, RENGARAJAN D, KALATHIL D, et al. Learning trembling hand perfect mean field equilibrium for dynamic mean field games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2006.11683.
23	SHAPLEY L S. Stochastic games[J]//Proceedings of the National Academy of Sciences, 1953, 39(10): 1095-1100.
24	BAŞART .Dynamic noncooperative game theory[J].Society for Industrial and Applied Mathematics,1982,19(2):139-152.
25	郝峰, 张栋, 唐硕,等.基于改进RRT算法的巡飞弹快速航迹规划方法[J].飞行力学,2019,37(3):58-63.
	HAOF , ZHANGD , TANGS ,et al.A rapid route planning method of loitering munitions based on improved RRT algorithm[J].Flight Mechanics,2019,37(3):58-63.
26	YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of MAPPO in cooperative, multi-agent games[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2103.01955.
27	BOOTH J. PPO dash: improving generalization in deep reinforcement learning[EB/OL]. [2021-10-10]. https://arxiv.org/abs/1907.06704.
28	ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: a case study on PPO and TRPO[EB/OL]. [2021-10-10]. https://arxiv.org/abs/2005.12729.
29	PEDREGOSAF , VAROQUAUXG , GRAMFORTA ,et al.Scikit-learn: machine learning in Python[J].The Journal of Machine Learning Research,2011,12(4):2825-2830.
30	ABADI M, BARHAM P, CHEN J, et al. Tensorflow: a system for large-scale machine learning[C]//Proc. of the 12th Unix Users'Group Symposium on Operating Systems Design and Implementation, 2016: 265-283.

挂载武器	红蓝双方	初始位置
AGM-114K型地狱火Ⅱ 反坦克导弹	无人机1	航向: 230° 航速: 129 km/h 高度: 777 m 东经: 43°56′37″ 北纬: 33°52′11″
	无人机2	航向: 91° 航速: 129 km/h 高度: 610 m 东经: 43°58′06″ 北纬: 32°54′23″
	无人机3	航向: 230° 航速: 248 km/h 高度: 777 m 东经: 45°09′16″ 北纬: 33°48′30″
-	坦克排1	东经: 44°09′09″ 北纬: 33°45′09″
	坦克排2	东经: 44°07′44″ 北纬: 33°16′38″
	坦克排3	东经: 44°19′35″ 北纬: 33°24′24″
	坦克排4	东经: 44°35′10″ 北纬: 33°38’28″
	坦克排5	东经: 44°35′04″ 北纬: 33°24′24″
	坦克排6	东经: 44°35′20″ 北纬: 33°11′37″
	坦克排7	东经: 44°49′43″ 北纬: 33°24′35″
	坦克排8	东经: 45°01′42″ 北纬: 33°31′36″
	坦克排9	东经: 44°59′43″ 北纬: 33°04′46″
萨姆22“灰狗” 地空导弹	地空导弹排1	东经: 44°20′07″ 北纬: 33°36′55″
	地空导弹排2	东经: 44°20′45″ 北纬: 33°12′18″
	地空导弹排3	东经: 44°49′51″ 北纬: 33°36′28″
	地空导弹排4	东经: 44°50′46″ 北纬: 33°12′28″

参数	取值
学习率	0.000 5
折扣因子	0.99
经验回放池	100 000
激活函数	ReLU
近端策略优化算法回合数	15
Clip	0.2

迭代轮数	无人机1	无人机2	无人机3
0~500	-15	-18	-10
500~1 000	108	117	90
1 000~2 000	117	125	100
2 000~3 000	113	120	99

迭代轮数	无人机1	无人机2	无人机3
0~500	-17	-20	-11
500~1 000	121	131	100
1 000~2 000	162	178	130
2 000~3 000	167	185	131

[1]	Zhongbao WANG, Kuiying YIN. Block effect suppression method of UAV-borne SAR image based on joint domain filtering [J]. Systems Engineering and Electronics, 2023, 45(9): 2768-2776.
[2]	Honghai ZHANG, Zhenping REN, Ouge FENG, Fei WANG, Hao LIU. Logistics unmanned aerial vehicle flight plan pre-allocation in urban low-altitude airspace [J]. Systems Engineering and Electronics, 2023, 45(9): 2802-2811.
[3]	Yufeng LIANG, Jingchao ZHAO, Wangkui LIU, Lei WANG, Shipeng WANG, Shilong RUAN. Air combat guidance method based on top rolling optimization and bottom tracking [J]. Systems Engineering and Electronics, 2023, 45(9): 2866-2872.
[4]	Huizhu HAN, Yangchao HUANG, Hang HU, Qi AN, Shihao LIU. Energy-spectrum efficiency trade-off optimization based on short packet transmission in UAV communication [J]. Systems Engineering and Electronics, 2023, 45(9): 2956-2964.
[5]	Xiaogang QI, Yutong ZHOU, Lifang LIU. Evaluation of the reliability of UAV swarm for ground combat missions [J]. Systems Engineering and Electronics, 2023, 45(9): 2971-2978.
[6]	Lei SHENG, Manhong SHI, Yingchuan QI, Hao LI, Mingjun PANG. Dynamic offense and defense of UAV swarm based on situation evolution game [J]. Systems Engineering and Electronics, 2023, 45(8): 2332-2342.
[7]	Tong XU, Yazhou CHEN, Yuming WANG, Min ZHAO. Research on wideband white noise electromagnetic interference effect of UAV data link [J]. Systems Engineering and Electronics, 2023, 45(7): 1965-1973.
[8]	Xiaocao YANG, Yanli DU, Yunong BU, Yanbin LIU, Cheng GAO. Online three-dimensional RRT^* cooperative route planning based on hierarchical decomposition [J]. Systems Engineering and Electronics, 2023, 45(5): 1409-1419.
[9]	Yang PANG, Ming WANG, Ziyi YAN, Tongyao YUE, Zhe ZHOU. UAV localization method with multi-view fusion [J]. Systems Engineering and Electronics, 2023, 45(4): 1127-1133.
[10]	Haigang SUI, Jiajie LI, Guohua GOU. Online fast localization method of UAVs based on heterologous image matching [J]. Systems Engineering and Electronics, 2023, 45(10): 3008-3015.
[11]	Honghai ZHANG, Bowen LI, Hao LIU, Gang ZHONG, Yuhan FEI. Demarcation method of safety separation for multi-rotor UAV in free airspace [J]. Systems Engineering and Electronics, 2023, 45(10): 3149-3156.
[12]	Kun FANG, Xiaohui LI, Tao FAN. High-precision positioning algorithm for UAV based on random forest weight compensation [J]. Systems Engineering and Electronics, 2023, 45(1): 202-209.
[13]	Tianye SUN, Wei SUN, Jianjun WU. UAV formation rapid assembly method based on improved Quatre algorithm [J]. Systems Engineering and Electronics, 2022, 44(9): 2840-2848.
[14]	Jing YU, Enmi YONG, Hanyang CHEN, Dong HAO, Xiancai ZHANG. Bi-level mission planning method for multi-cooperative UAV air-to-ground attack [J]. Systems Engineering and Electronics, 2022, 44(9): 2849-2857.
[15]	Jianfeng YANG, Heye XIAO, Liang LI, Junqiang BAI, Weihao DONG. Multi-level module partition method of UAV based on fuzzy clustering and expert scoring mechanism [J]. Systems Engineering and Electronics, 2022, 44(8): 2530-2539.

UAV intelligent attack strategy generation model based on multi-agent game reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 30

Related Articles 15

Recommended Articles

Metrics

Comments