基于多智能体强化学习的协同目标分配

doi:10.12305/j.issn.1001-506X.2023.09.18

Abstract

Abstract:

Aiming at the problem that traditional methods are difficult to apply to large-scale cooperative targets assignment in dynamic uncertain environment, a cooperative targets assignment model and training method based on multi-agent reinforcement learning is proposed. Through the description of related concepts and mathematical models, the cooperative targets assignment is transformed into a multi-agent cooperation problem. Focusing on the learning of top-level assignment strategy, the scoring model and reasoning model of strategy are constructed, and the Advantage Actor-Critic algorithm is used for strategy optimization. The simulation results show that the proposed method can accurately describe the evolution of the cooperative relationship between operational units, and effectively realize the dynamic generation of large-scale cooperative targets assignment scheme.

Key words: cooperative targets assignment, multi-agent cooperation, reinforcement learning, neural network, Advantage Actor-Critic

CLC Number:

TP301.6

Yue MA, Lin WU, Xiao XU. Cooperative targets assignment based on multi-agent reinforcement learning[J]. Systems Engineering and Electronics, 2023, 45(9): 2793-2801.

Figures/Tables 8

Fig.1

Fig.2

Fig.3

Table 1

Table 2

Fig.4

Fig.5

Fig.6

References 30

1	欧峤,贺筱媛,陶九阳.协同目标分配问题研究综述[J].系统仿真学报,2019,31(11):2216-2227. doi: 10.16182/j.issn1004731x.joss.19-FZ0382
	OUQ,HEX Y,TAOJ Y.Overview of cooperative target assignment[J].Journal of System Simulation,2019,31(11):2216-2227. doi: 10.16182/j.issn1004731x.joss.19-FZ0382
2	KLINEA,AHNERD,HILLR.The weapon-target assignment problem[J].Computers and Operations Research,2019,105,226-236. doi: 10.1016/j.cor.2018.10.015
3	KLINEA G,AHNERD K,LUNDAYB J.Real-time heuristic algorithms for the static weapon target assignment problem[J].Journal of Heuristic,2019,25(3):377-397. doi: 10.1007/s10732-018-9401-1
4	MA F, NI M F, YU Z K, et al. An optimal assignment of multi-type weapons to single-target[C]//Proc. of the IEEE Advanced Information Technology, Electronic and Automation Control Conference, 2015: 75-78.
5	SONUCE,SENB,BAYIRS.A parallel simulated annealing algorithm for weapon-target assignment problem[J].International Journal of Advanced Computer Science & Applications,2017,8(4):87-92.
6	吴坤鸿,詹世贤.分布式遗传模拟退火算法的火力打击目标分配优化[J].火力与指挥控制,2016,41(3):89-92. doi: 10.3969/j.issn.1002-0640.2016.03.022
	WUK H,ZHANS X.Optimization for target assignment in fire strike based on distributed genetic simulated annealing algorithm[J].Fire Control & Command Control,2016,41(3):89-92. doi: 10.3969/j.issn.1002-0640.2016.03.022
7	HU X W, LUO P C, ZHANG X N, et al. Improved ant colony optimization for weapon-target assignment[EB/OL]. [2021-12-25]. https://doi.org/10.1155/2018/6481635.
8	LIY,DONGY N.Weapon-target assignment based on simulated annealing and discrete particle swarm optimization in coope-rative air combat[J].Hangkong Xuebao / Acta Aeronautica et Astronautica Sinica,2010,31(3):626-631.
9	LAIC M,WUT H.Simplified swarm optimization with initialization scheme for dynamic weapon-target assignment problem[J].Application Soft Computing,2019,82,105542. doi: 10.1016/j.asoc.2019.105542
10	ZHOU Y L, LI X B, ZHU Y F, et al. A discrete particle swarm optimization algorithm applied in constrained static weapon-target assignment problem[C]//Proc. of World Congress on Intelligent Control and Automation, 2016: 3118-3123.
11	CHANGT Q,KONGD P,HAON,et al.Solving the dynamic weapon target assignment problem by an improved artificial bee colony algorithm with heuristic factor initialization[J].Application Soft Computing,2018,70,845-863. doi: 10.1016/j.asoc.2018.06.014
12	LI X Y, ZHOU D Y, PAN Q, et al. Weapon-target assignment problem by multi objective evolutionary algorithm based on decomposition[EB/OL]. [2021-12-01]. https://doi.org/10.1155/2018/8623051.
13	BOGDANOWICZZ R.Advanced input generating algorithm for effect-based weapon-target pairing Optimization[J].IEEE Trans.on Systems, Man and Cybernetics, Part A (Systems and Humans),2012,42(1):276-280. doi: 10.1109/TSMCA.2011.2159591
14	PHAM H A, VU T C, NGUYEN B D, et al. Engineering optimization using an improved epsilon differential evolution with directional mutation and nearest neighbor comparison[C]//Proc. of the International Conference on Advances in Computational Mechanics, 2017: 201-216.
15	吴文海,郭晓峰,周思羽,等.改进差分进化算法求解武器目标分配问题[J].系统工程与电子技术,2021,43(4):1012-1021.
	WUW H,GUOX F,ZHOUS Y,et al.Improved differential evolution algorithm for solving weapon-target assignment pro-blem[J].Systems Engineering and Electronics,2021,43(4):1012-1021.
16	郭智杰,糜玉林,肖阳,等.改进合同网协议在防空武器目标分配中的应用[J].现代防御技术,2017,45(4):104-111. doi: 10.3969/j.issn.1009-086x.2017.04.017
	GUOZ J,MIY L,XIAOY,et al.Application of improved contract net protocol on weapon target assignment of air defense combat[J].Modern Defence Technology,2017,45(4):104-111. doi: 10.3969/j.issn.1009-086x.2017.04.017
17	黄广连. 分布式作战体系自同步构建方法研究[D]. 长沙: 国防科技大学, 2007.
	HUANG G L. Research on methodology of distributed combat SoS self-synchronizing constructed[D]. Changsha: National University of Defense Technology, 2007.
18	HERNANDEZ-LEALP,KARTALB,TAYLORM E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. doi: 10.1007/s10458-019-09421-1
19	KURACH K, RAICHUK A, STANCZYK P, et al. Google research football: a novel reinforcement learning environment[EB/OL]. [2021-12-05]. https://doi.org/10.48550/arXiv.1907.11180.
20	CHUT,WANGJ,CODECAL,et al.Multi-agent deep reinforcement learning for large-scale traffic signal control[J].IEEE Trans.on Intelligent Transportation Systems,2020,21(3):1086-1095. doi: 10.1109/TITS.2019.2901791
21	李琛,黄炎焱,张永亮,等.Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J].系统工程与电子技术,2021,43(3):755-762.
	LIC,HUANGY Y,ZHANGY L,et al.Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J].Systems Engineering and Electronics,2021,43(3):755-762.
22	YE D H, LIU Z, SUN M F, et al. Mastering complex control in MOBA games with deep reinforcement learning[EB/OL]. [2021-12-05]. https://doi.org/10.48550/arXiv.1912.09729.
23	OHMER X, MARINO M, FRANKE M, et al. Why and how to study the impact of perception on language emergence in artificial agents[EB/OL]. [2022-03-27]. https://escholarship.org/uc/item/6p82v6st.
24	GUESTRIN C, KOLLER D, GEARHART C, et al. Generalizing plans to new environments in relational mdps[C]//Proc. of the 18th International Joint Conference on Artificial Intelligence, 2003: 1003-1010.
25	PROPER S, TADEPALLI P. Solving multi agent assignment markov decision processes[C]//Proc. of the 8th International Conference on Autonomous Agents and Multi agent Systems-Volume 1, 2009: 681-688.
26	RASHID T, SAMVELYAN M, WITT C S, et al. Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. [2022-02-24]. https://arxiv.org/abs/1803.11485.
27	LIN K X, ZHAO R Y, XU Z, et al. Efficient large-scale fleet management via multi-agent deep reinforcement learning[C]//Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018: 1774-1783.
28	CARION N, SYNNAEVE G, LAZARIC A, et al. A structured prediction approach for generalization in cooperative multi-agent reinforcement learning[EB/OL]. [2021-12-01]. https://proceedings.neurips.cc/paper/2019/file/3c3c139bd8467c1587a41081ad78045e-Paper.pdf.
29	WU Y H, MANSIMOV E, LIAO S, et al. OpenAI baselines: ACKTR & A2C[EB/OL]. [2021-12-29]. https://openai.com/blog/baselines-acktr-a2c/.
30	SUTTONR,BARTOA.Reinforcement learning: an introduction[M].London:MIT press,2018.

类型	平台价值系数	挂载类型	数量		弹药
类型	平台价值系数	挂载类型	小场景	大场景	数量	价值系数
常导	0.6	W₁	2	8	-	-
歼轰机1	0.4	W₂	3	4	4	0.05
歼轰机2	0.5	W₂	3	4	4	0.05
轰炸机	0.6	W₃	2	3	8	0.05
雷达	0.5	-	2	4	-	-
导弹发射架	0.8	-	4	14	-	-
机场跑道	0.6	-	1	2	-	-
指挥所	0.9	-	1	2	-	-

武器	目标
武器	常导	歼轰机1	歼轰机2	轰炸机	雷达	防空阵地	机场跑道	指挥所
常导	-	-	-	-	0.6	0.7	0	0.8
歼轰机1	-	-	-	-	0.8	0.7	0	0.6
歼轰机2	-	-	-	-	0.6	0.6	0.5	0.7
轰炸机	-	-	-	-	0.7	0	0.7	0.6
防空反导系统	0.4	0.4	0.4	0.5	-	-	-	-

[1]	Luwei FENG, Songtao LIU, Huazhi XU. Intelligent radar jamming decision-making method based on POMDP model [J]. Systems Engineering and Electronics, 2023, 45(9): 2755-2760.
[2]	Huiying WANG, Chunping WANG, Qiang FU, Zishuo HAN, Dongdong ZHANG. Infrared and low illumination image fusion based on image features [J]. Systems Engineering and Electronics, 2023, 45(8): 2395-2404.
[3]	Yushi JIANG, Yang CHEN, Lu GAO, Ligen CAI, Jixing LYU. Predefined-time adaptive control for heavy-lift launch vehicles [J]. Systems Engineering and Electronics, 2023, 45(8): 2570-2577.
[4]	Fan YANG, Ping MA, Wei LI, Ming YANG. Intelligent ranking evaluation method of simulation models based on siamese network [J]. Systems Engineering and Electronics, 2023, 45(7): 2060-2068.
[5]	Daozhi WEI, Zhaoyu ZHANG, Jiahao XIE, Ning LI. Multi-sensor cross cueing technique based on improved Actor-Critic algorithm [J]. Systems Engineering and Electronics, 2023, 45(6): 1624-1632.
[6]	Fengguo WU, Wei TAO, Hui LI, Jianwei ZHANG, Chengchen ZHENG. UAV intelligent avoidance decisions based on deep reinforcement learning algorithm [J]. Systems Engineering and Electronics, 2023, 45(6): 1702-1711.
[7]	Yu JIANG, Qi YUAN, Zhitao HU, Weiwei WU, Xin GU. Airport arrival and departure delay time prediction based on meteorological factors [J]. Systems Engineering and Electronics, 2023, 45(6): 1722-1731.
[8]	Xinzhi LI, Shengbo DONG, Xiangyang CUI. Reinforcement learning technology based on asymmetric unobservable state [J]. Systems Engineering and Electronics, 2023, 45(6): 1755-1761.
[9]	Jin TANG, Yangang LIANG, Zhihui BAI, Kebo LI. Landing control algorithm of rotor UAV based on DQN [J]. Systems Engineering and Electronics, 2023, 45(5): 1451-1460.
[10]	Zehong DONG, Yinghui LI, Maolong LYU, Zhe LI, Binbin PEI. Singularity-free fixed-time adaptive switching control for hypersonic flight vehicle with input constraints [J]. Systems Engineering and Electronics, 2023, 45(5): 1476-1488.
[11]	Licheng YE, Jun WANG, Shaoqing MAO, Shuai LIU. Fast communication jamming decision-making method based on multi-parameter joint stepwise discretization [J]. Systems Engineering and Electronics, 2023, 45(5): 1518-1525.
[12]	Kaifeng CHEN, Borui TIAN, Heqing LI, Chenyang ZHAO, Zuxing LU, Xinde LI, Yong DENG. Research on DDPG-based motion control of two-wheel-legged robot [J]. Systems Engineering and Electronics, 2023, 45(4): 1144-1151.
[13]	Zihan SHEN, Xiubin ZHAO, Chuang ZHANG, Liang ZHANG, Xinxian LIU. Adaptive fault-tolerant method based on long-short term memory neural network [J]. Systems Engineering and Electronics, 2023, 45(3): 831-838.
[14]	Siqi TANG, Zhisong PAN, Guyu HU, Yang WU, Yunbo LI. Application of deep reinforcement learning in space information network——status quo and prospects [J]. Systems Engineering and Electronics, 2023, 45(3): 886-901.
[15]	Rui WANG, Tianqi ZHANG, Zeliang AN, Xueyi WANG, Zhu FANG. Modulation recognition algorithm for MIMO-OFDM system based on joint characteristic parameters and one-dimensional CNN [J]. Systems Engineering and Electronics, 2023, 45(3): 902-912.

Cooperative targets assignment based on multi-agent reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 30

Related Articles 15

Recommended Articles

Metrics

Comments