系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (3): 755-762.doi: 10.12305/j.issn.1001-506X.2021.03.20

• 系统工程 • 上一篇    下一篇

Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用

李琛1(), 黄炎焱1,*(), 张永亮2(), 陈天德1()   

  1. 1. 南京理工大学自动化学院, 江苏 南京 210094
    2. 陆军工程大学指挥控制工程学院, 江苏 南京 210007
  • 收稿日期:2020-05-06 出版日期:2021-03-01 发布日期:2021-03-16
  • 通讯作者: 黄炎焱 E-mail:1120544671@qq.com;huangyy@njust.edu.cn;zhangylnj@qq.com;369253482@qq.com
  • 作者简介:李琛(1995-), 男, 硕士研究生, 主要研究方向为系统建模与仿真。E-mail:1120544671@qq.com|张永亮(1982-), 男, 副教授, 博士, 主要研究方向为指挥理论与仿真、作战任务智能规划。E-mail:zhangylnj@qq.com|陈天德(1994-), 男, 博士研究生, 主要研究方向为仿真建模与指挥决策。E-mail:369253482@qq.com
  • 基金资助:
    国家自然科学基金(61374186);2018年装备预研领域基金(61403120205)

Multi-agent decision-making method based on Actor-Critic framework and its application in wargame

Chen LI1(), Yanyan HUANG1,*(), Yongliang ZHANG2(), Tiande CHEN1()   

  1. 1. School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
    2. Command and Control Engineering College, Army Engineering University, Nanjing 210007, China
  • Received:2020-05-06 Online:2021-03-01 Published:2021-03-16
  • Contact: Yanyan HUANG E-mail:1120544671@qq.com;huangyy@njust.edu.cn;zhangylnj@qq.com;369253482@qq.com

摘要:

将人工智能应用于兵棋推演的智能战术兵棋正逐年发展, 基于Actor-Critic框架的决策方法可以实现智能战术兵棋的战术行动动态决策。但若Critic网络只对单算子进行评价, 多算子之间的网络没有协同, 本方算子之间各自行动决策会不够智能。针对上述方法的不足, 提出了一种基于强化学习并结合规则的多智能体决策方法, 以提升兵棋推演的智能水平。侧重采用强化学习对多算子的行动决策进行决策分析, 并结合产生式规则对战术决策进行规划。构建基于Actor-Critic框架的多算子分布执行集中训练的行动决策模型, 对比每个算子互不交流的封闭式行动决策学习方法, 提出的分布执行集中训练方法更具优势且有效。

关键词: 智能战术, 兵棋推演, 多智能体强化学习, Actor-Critic框架, 分布执行集中训练

Abstract:

The intelligent tactical wargame which applies artificial intelligence to wargame deduction is developed year by year. The decision-making method based on Actor-Critic framework can realize the dynamic decision-making of tactical action of intelligent tactical wargame. However, if the Critic network only evaluates the single agent, and there is no cooperation among multiple agents, the decision-making of each agent will not be intelligent enough. In order to improve the intelligence level of wargame deduction, a multi-agent decision-making method based on reinforcement learning and rules is proposed. The decision analysis of the multi-agent action decision by using reinforcement learning is focuses, and combining with the production rules to plan tactical decision. An action decision model based on Actor-Critic framework for multi-agent distributed execution training is constructed. Compared with the closed action decision-making learning method in which each operator does not communicate with each other, the proposed distributed execution and centralized training method is more advantageous and effective.

Key words: intelligent tactics, wargame, multi-agent reinforcement learning, Actor-Critic framework, distributed execution and centralized training

中图分类号: