系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (5): 1628-1655.doi: 10.12305/j.issn.1001-506X.2024.05.17

• 系统工程 • 上一篇    

多智能体博弈学习研究进展

罗俊仁, 张万鹏, 苏炯铭, 袁唯淋, 陈璟   

  1. 国防科技大学智能科学学院, 湖南 长沙 410073
  • 收稿日期:2022-01-29 出版日期:2024-04-30 发布日期:2024-04-30
  • 通讯作者: 陈璟
  • 作者简介:罗俊仁(1989—), 男, 博士研究生, 主要研究方向为多智能体学习、智能博弈
    张万鹏(1981—), 男, 研究员, 博士, 主要研究方向为大数据智能、智能演进
    苏炯铭(1984—), 男, 副研究员, 博士, 主要研究方向为可解释人工智能、智能博弈
    袁唯淋(1994—), 男, 博士研究生, 主要研究方向为智能博弈、多智能体强化学习
    陈璟(1972—), 男, 教授, 博士, 主要研究方向为认知决策博弈、分布式智能
  • 基金资助:
    国家自然科学基金(61806212);湖南省研究生科研创新项目(CX20210011)

Research progress of multi-agent learning in games

Junren LUO, Wanpeng ZHANG, Jiongming SU, Weilin YUAN, Jing CHEN   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2022-01-29 Online:2024-04-30 Published:2024-04-30
  • Contact: Jing CHEN

摘要:

随着深度学习和强化学习而来的人工智能新浪潮, 为智能体从感知输入到行动决策输出提供了“端到端”解决方案。多智能体学习是研究智能博弈对抗的前沿课题, 面临着对抗性环境、非平稳对手、不完全信息和不确定行动等诸多难题与挑战。本文从博弈论视角入手, 首先给出了多智能体学习系统组成,进行了多智能体学习概述, 简要介绍了各类多智能体学习研究方法。其次, 围绕多智能体博弈学习框架, 介绍了多智能体博弈基础模型及元博弈模型, 均衡解概念和博弈动力学, 学习目标多样、环境(对手)非平稳、均衡难解且易变等挑战。再次, 全面梳理了多智能体博弈策略学习方法, 离线博弈策略学习方法, 在线博弈策略学习方法。最后,从智能体认知行为建模与协同、通用博弈策略学习方法和分布式博弈策略学习框架共3个方面探讨了多智能体学习的前沿研究方向。

关键词: 博弈学习, 多智能体学习, 元博弈, 在线无悔学习

Abstract:

The new wave of artificial intelligence brought about by deep learning and reinforcement learning provides an "end-to-end" solution for agents from perception input to action decision-making output. Multi-agent learning is a frontier subject in the field of intelligent game confrontation, and it faces many problems and challenges such as adversarial environments, non-stationary opponents, incomplete information and uncertain actions. This paper starts from the perspective of game theory, firstly gives the organization of multi-agent learning system, gives an overview of multi-agent learning, and briefly introduces the classification of various multi-agent learning research methods. Secondly, based on the multi-agent learning framework in games, it introduces the basic multi-agent game and meta-game models, game solution concepts and game dynamics, as well as challenges such as diverse learning objectives, non-stationary environment (opponent), and equilibrium hard to compute and easy to transfer. Then comprehensively sort out the multi-agent game strategy learning methods, offline game strategy learning methods and online game strategy learning methods. Finally, some frontiers of multi-agent learning are discussed from three aspects of agent cognitive behavior modelling and collaboration, general game strategy learning methods, and distributed game strategy learning framework.

Key words: learning in games, multi-agent learning, meta-game, online no regret learning

中图分类号: