系统工程与电子技术 ›› 2023, Vol. 45 ›› Issue (9): 2793-2801.doi: 10.12305/j.issn.1001-506X.2023.09.18

• 系统工程 • 上一篇    下一篇

基于多智能体强化学习的协同目标分配

马悦1,2,*, 吴琳3, 许霄3   

  1. 1. 国防大学研究生院, 北京 100091
    2. 中国人民解放军31002部队, 北京 100091
    3. 国防大学联合作战学院, 北京 100091
  • 收稿日期:2021-12-31 出版日期:2023-08-30 发布日期:2023-09-05
  • 通讯作者: 马悦
  • 作者简介:马悦 (1990—), 男, 工程师, 博士研究生, 主要研究方向为军事运筹、智能决策
    吴琳 (1974—), 男, 教授, 博士, 主要研究方向为军事运筹
    许霄 (1989—), 男, 工程师, 博士, 主要研究方向为军事运筹

Cooperative targets assignment based on multi-agent reinforcement learning

Yue MA1,2,*, Lin WU3, Xiao XU3   

  1. 1. Graduate School, National Defense University, Beijing 100091, China
    2. Unit 31002 of the PLA, Beijing 100091, China
    3. Academy of Joint Operation, National Defense University, Beijing 100091, China
  • Received:2021-12-31 Online:2023-08-30 Published:2023-09-05
  • Contact: Yue MA

摘要:

针对传统方法难以适用于动态不确定环境下的大规模协同目标分配问题, 提出一种基于多智能体强化学习的协同目标分配模型及训练方法。通过对相关概念和数学模型的描述, 将协同目标分配转化为多智能体协作问题。聚焦于顶层分配策略的学习, 构建了策略评分模型和策略推理模型, 采用Advantage Actor-Critic算法进行策略优化。仿真实验结果表明, 所提方法能够准确刻画作战单元之间的协同演化内因, 有效地实现了大规模协同目标分配方案的动态生成。

关键词: 协同目标分配, 多智能体协作, 强化学习, 神经网络, Advantage Actor-Critic

Abstract:

Aiming at the problem that traditional methods are difficult to apply to large-scale cooperative targets assignment in dynamic uncertain environment, a cooperative targets assignment model and training method based on multi-agent reinforcement learning is proposed. Through the description of related concepts and mathematical models, the cooperative targets assignment is transformed into a multi-agent cooperation problem. Focusing on the learning of top-level assignment strategy, the scoring model and reasoning model of strategy are constructed, and the Advantage Actor-Critic algorithm is used for strategy optimization. The simulation results show that the proposed method can accurately describe the evolution of the cooperative relationship between operational units, and effectively realize the dynamic generation of large-scale cooperative targets assignment scheme.

Key words: cooperative targets assignment, multi-agent cooperation, reinforcement learning, neural network, Advantage Actor-Critic

中图分类号: