基于多智能体强化学习的协同目标分配

doi:10.12305/j.issn.1001-506X.2023.09.18

摘要/Abstract

摘要：

针对传统方法难以适用于动态不确定环境下的大规模协同目标分配问题, 提出一种基于多智能体强化学习的协同目标分配模型及训练方法。通过对相关概念和数学模型的描述, 将协同目标分配转化为多智能体协作问题。聚焦于顶层分配策略的学习, 构建了策略评分模型和策略推理模型, 采用Advantage Actor-Critic算法进行策略优化。仿真实验结果表明, 所提方法能够准确刻画作战单元之间的协同演化内因, 有效地实现了大规模协同目标分配方案的动态生成。

关键词: 协同目标分配, 多智能体协作, 强化学习, 神经网络, Advantage Actor-Critic

Abstract:

Aiming at the problem that traditional methods are difficult to apply to large-scale cooperative targets assignment in dynamic uncertain environment, a cooperative targets assignment model and training method based on multi-agent reinforcement learning is proposed. Through the description of related concepts and mathematical models, the cooperative targets assignment is transformed into a multi-agent cooperation problem. Focusing on the learning of top-level assignment strategy, the scoring model and reasoning model of strategy are constructed, and the Advantage Actor-Critic algorithm is used for strategy optimization. The simulation results show that the proposed method can accurately describe the evolution of the cooperative relationship between operational units, and effectively realize the dynamic generation of large-scale cooperative targets assignment scheme.

Key words: cooperative targets assignment, multi-agent cooperation, reinforcement learning, neural network, Advantage Actor-Critic

中图分类号:

TP301.6

马悦, 吴琳, 许霄. 基于多智能体强化学习的协同目标分配[J]. 系统工程与电子技术, 2023, 45(9): 2793-2801.

Yue MA, Lin WU, Xiao XU. Cooperative targets assignment based on multi-agent reinforcement learning[J]. Systems Engineering and Electronics, 2023, 45(9): 2793-2801.

图/表 8

图1

图2

图3

表1

表2

图4

图5

图6

参考文献 30

1	欧峤,贺筱媛,陶九阳.协同目标分配问题研究综述[J].系统仿真学报,2019,31(11):2216-2227. doi: 10.16182/j.issn1004731x.joss.19-FZ0382
	OUQ,HEX Y,TAOJ Y.Overview of cooperative target assignment[J].Journal of System Simulation,2019,31(11):2216-2227. doi: 10.16182/j.issn1004731x.joss.19-FZ0382
2	KLINEA,AHNERD,HILLR.The weapon-target assignment problem[J].Computers and Operations Research,2019,105,226-236. doi: 10.1016/j.cor.2018.10.015
3	KLINEA G,AHNERD K,LUNDAYB J.Real-time heuristic algorithms for the static weapon target assignment problem[J].Journal of Heuristic,2019,25(3):377-397. doi: 10.1007/s10732-018-9401-1
4	MA F, NI M F, YU Z K, et al. An optimal assignment of multi-type weapons to single-target[C]//Proc. of the IEEE Advanced Information Technology, Electronic and Automation Control Conference, 2015: 75-78.
5	SONUCE,SENB,BAYIRS.A parallel simulated annealing algorithm for weapon-target assignment problem[J].International Journal of Advanced Computer Science & Applications,2017,8(4):87-92.
6	吴坤鸿,詹世贤.分布式遗传模拟退火算法的火力打击目标分配优化[J].火力与指挥控制,2016,41(3):89-92. doi: 10.3969/j.issn.1002-0640.2016.03.022
	WUK H,ZHANS X.Optimization for target assignment in fire strike based on distributed genetic simulated annealing algorithm[J].Fire Control & Command Control,2016,41(3):89-92. doi: 10.3969/j.issn.1002-0640.2016.03.022
7	HU X W, LUO P C, ZHANG X N, et al. Improved ant colony optimization for weapon-target assignment[EB/OL]. [2021-12-25]. https://doi.org/10.1155/2018/6481635.
8	LIY,DONGY N.Weapon-target assignment based on simulated annealing and discrete particle swarm optimization in coope-rative air combat[J].Hangkong Xuebao / Acta Aeronautica et Astronautica Sinica,2010,31(3):626-631.
9	LAIC M,WUT H.Simplified swarm optimization with initialization scheme for dynamic weapon-target assignment problem[J].Application Soft Computing,2019,82,105542. doi: 10.1016/j.asoc.2019.105542
10	ZHOU Y L, LI X B, ZHU Y F, et al. A discrete particle swarm optimization algorithm applied in constrained static weapon-target assignment problem[C]//Proc. of World Congress on Intelligent Control and Automation, 2016: 3118-3123.
11	CHANGT Q,KONGD P,HAON,et al.Solving the dynamic weapon target assignment problem by an improved artificial bee colony algorithm with heuristic factor initialization[J].Application Soft Computing,2018,70,845-863. doi: 10.1016/j.asoc.2018.06.014
12	LI X Y, ZHOU D Y, PAN Q, et al. Weapon-target assignment problem by multi objective evolutionary algorithm based on decomposition[EB/OL]. [2021-12-01]. https://doi.org/10.1155/2018/8623051.
13	BOGDANOWICZZ R.Advanced input generating algorithm for effect-based weapon-target pairing Optimization[J].IEEE Trans.on Systems, Man and Cybernetics, Part A (Systems and Humans),2012,42(1):276-280. doi: 10.1109/TSMCA.2011.2159591
14	PHAM H A, VU T C, NGUYEN B D, et al. Engineering optimization using an improved epsilon differential evolution with directional mutation and nearest neighbor comparison[C]//Proc. of the International Conference on Advances in Computational Mechanics, 2017: 201-216.
15	吴文海,郭晓峰,周思羽,等.改进差分进化算法求解武器目标分配问题[J].系统工程与电子技术,2021,43(4):1012-1021.
	WUW H,GUOX F,ZHOUS Y,et al.Improved differential evolution algorithm for solving weapon-target assignment pro-blem[J].Systems Engineering and Electronics,2021,43(4):1012-1021.
16	郭智杰,糜玉林,肖阳,等.改进合同网协议在防空武器目标分配中的应用[J].现代防御技术,2017,45(4):104-111. doi: 10.3969/j.issn.1009-086x.2017.04.017
	GUOZ J,MIY L,XIAOY,et al.Application of improved contract net protocol on weapon target assignment of air defense combat[J].Modern Defence Technology,2017,45(4):104-111. doi: 10.3969/j.issn.1009-086x.2017.04.017
17	黄广连. 分布式作战体系自同步构建方法研究[D]. 长沙: 国防科技大学, 2007.
	HUANG G L. Research on methodology of distributed combat SoS self-synchronizing constructed[D]. Changsha: National University of Defense Technology, 2007.
18	HERNANDEZ-LEALP,KARTALB,TAYLORM E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. doi: 10.1007/s10458-019-09421-1
19	KURACH K, RAICHUK A, STANCZYK P, et al. Google research football: a novel reinforcement learning environment[EB/OL]. [2021-12-05]. https://doi.org/10.48550/arXiv.1907.11180.
20	CHUT,WANGJ,CODECAL,et al.Multi-agent deep reinforcement learning for large-scale traffic signal control[J].IEEE Trans.on Intelligent Transportation Systems,2020,21(3):1086-1095. doi: 10.1109/TITS.2019.2901791
21	李琛,黄炎焱,张永亮,等.Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J].系统工程与电子技术,2021,43(3):755-762.
	LIC,HUANGY Y,ZHANGY L,et al.Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J].Systems Engineering and Electronics,2021,43(3):755-762.
22	YE D H, LIU Z, SUN M F, et al. Mastering complex control in MOBA games with deep reinforcement learning[EB/OL]. [2021-12-05]. https://doi.org/10.48550/arXiv.1912.09729.
23	OHMER X, MARINO M, FRANKE M, et al. Why and how to study the impact of perception on language emergence in artificial agents[EB/OL]. [2022-03-27]. https://escholarship.org/uc/item/6p82v6st.
24	GUESTRIN C, KOLLER D, GEARHART C, et al. Generalizing plans to new environments in relational mdps[C]//Proc. of the 18th International Joint Conference on Artificial Intelligence, 2003: 1003-1010.
25	PROPER S, TADEPALLI P. Solving multi agent assignment markov decision processes[C]//Proc. of the 8th International Conference on Autonomous Agents and Multi agent Systems-Volume 1, 2009: 681-688.
26	RASHID T, SAMVELYAN M, WITT C S, et al. Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. [2022-02-24]. https://arxiv.org/abs/1803.11485.
27	LIN K X, ZHAO R Y, XU Z, et al. Efficient large-scale fleet management via multi-agent deep reinforcement learning[C]//Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018: 1774-1783.
28	CARION N, SYNNAEVE G, LAZARIC A, et al. A structured prediction approach for generalization in cooperative multi-agent reinforcement learning[EB/OL]. [2021-12-01]. https://proceedings.neurips.cc/paper/2019/file/3c3c139bd8467c1587a41081ad78045e-Paper.pdf.
29	WU Y H, MANSIMOV E, LIAO S, et al. OpenAI baselines: ACKTR & A2C[EB/OL]. [2021-12-29]. https://openai.com/blog/baselines-acktr-a2c/.
30	SUTTONR,BARTOA.Reinforcement learning: an introduction[M].London:MIT press,2018.

类型	平台价值系数	挂载类型	数量		弹药
类型	平台价值系数	挂载类型	小场景	大场景	数量	价值系数
常导	0.6	W₁	2	8	-	-
歼轰机1	0.4	W₂	3	4	4	0.05
歼轰机2	0.5	W₂	3	4	4	0.05
轰炸机	0.6	W₃	2	3	8	0.05
雷达	0.5	-	2	4	-	-
导弹发射架	0.8	-	4	14	-	-
机场跑道	0.6	-	1	2	-	-
指挥所	0.9	-	1	2	-	-

武器	目标
武器	常导	歼轰机1	歼轰机2	轰炸机	雷达	防空阵地	机场跑道	指挥所
常导	-	-	-	-	0.6	0.7	0	0.8
歼轰机1	-	-	-	-	0.8	0.7	0	0.6
歼轰机2	-	-	-	-	0.6	0.6	0.5	0.7
轰炸机	-	-	-	-	0.7	0	0.7	0.6
防空反导系统	0.4	0.4	0.4	0.5	-	-	-	-

[1]	冯路为, 刘松涛, 徐华志. 基于POMDP模型的智能雷达干扰决策方法[J]. 系统工程与电子技术, 2023, 45(9): 2755-2760.
[2]	王慧赢, 王春平, 付强, 韩子硕, 张冬冬. 基于图像特征的红外与低照度图像融合[J]. 系统工程与电子技术, 2023, 45(8): 2395-2404.
[3]	姜雨石, 陈旸, 高路, 蔡李根, 吕吉星. 重型运载火箭预设时间自适应控制[J]. 系统工程与电子技术, 2023, 45(8): 2570-2577.
[4]	杨帆, 马萍, 李伟, 杨明. 基于孪生网络的仿真模型智能排序评估方法[J]. 系统工程与电子技术, 2023, 45(7): 2060-2068.
[5]	韦道知, 张曌宇, 谢家豪, 李宁. 基于改进Actor-Critic算法的多传感器交叉提示技术[J]. 系统工程与电子技术, 2023, 45(6): 1624-1632.
[6]	吴冯国, 陶伟, 李辉, 张建伟, 郑成辰. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45(6): 1702-1711.
[7]	姜雨, 袁琪, 胡志韬, 吴薇薇, 顾欣. 基于气象因素的机场进离港延误预测[J]. 系统工程与电子技术, 2023, 45(6): 1722-1731.
[8]	李欣致, 董胜波, 崔向阳. 基于非对称不可观测状态的强化学习技术[J]. 系统工程与电子技术, 2023, 45(6): 1755-1761.
[9]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[10]	董泽洪, 李颖晖, 吕茂隆, 李哲, 裴彬彬. 考虑输入受限的高超声速飞行器非奇异固定时间自适应切换控制[J]. 系统工程与电子技术, 2023, 45(5): 1476-1488.
[11]	叶立诚, 王军, 毛少卿, 刘帅. 基于多参数联合逐级离散的快速通信干扰决策方法[J]. 系统工程与电子技术, 2023, 45(5): 1518-1525.
[12]	陈恺丰, 田博睿, 李和清, 赵晨阳, 陆祖兴, 李新德, 邓勇. 基于DDPG算法的双轮腿机器人运动控制研究[J]. 系统工程与电子技术, 2023, 45(4): 1144-1151.
[13]	沈子涵, 赵修斌, 张闯, 张良, 刘鑫贤. 基于长短期记忆神经网络的自适应容错方法[J]. 系统工程与电子技术, 2023, 45(3): 831-838.
[14]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
[15]	汪锐, 张天骐, 安泽亮, 王雪怡, 方竹. 基于联合特征参数和一维CNN的MIMO-OFDM系统调制识别算法[J]. 系统工程与电子技术, 2023, 45(3): 902-912.