系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (11): 3338-3351.doi: 10.12305/j.issn.1001-506X.2021.11.35

• 通信与网络 • 上一篇    下一篇

基于多智能体强化学习的动态频谱分配方法综述

宋波1,2,*, 叶伟1, 孟祥辉2   

  1. 1. 航天工程大学电子与光学工程系, 北京 101416
    2. 中国人民解放军95801部队, 北京 100076
  • 收稿日期:2021-01-13 出版日期:2021-11-01 发布日期:2021-11-12
  • 通讯作者: 宋波
  • 作者简介:宋波(1992—), 男, 硕士研究生, 主要研究方向为认知无线电、强化学习|叶伟(1969—), 男, 教授, 博士, 主要研究方向为雷达对抗、电磁频谱智能作战|孟祥辉(1986—), 男, 工程师, 本科, 主要研究方向为电磁频谱管理

Review of multi-agent reinforcement learning based dynamic spectrum allocation method

Bo SONG1,2,*, Wei YE1, Xianghui MENG2   

  1. 1. Department of Electronic and Optical Engineering, Space Engineering University, Beijing 101416, China
    2. Unit 95801 of the PLA, Beijing 100076, China
  • Received:2021-01-13 Online:2021-11-01 Published:2021-11-12
  • Contact: Bo SONG

摘要:

认知无线电和动态频谱分配技术是解决频谱资源短缺问题的有效手段。随着近年来深度学习和强化学习等机器学习技术迅速发展, 以多智能体强化学习为代表的群体智能技术不断取得突破, 使得分布式智能动态频谱分配成为可能。本文详细梳理了强化学习和多智能体强化学习领域关键研究成果, 以及基于多智能体强化学习的动态频谱分配过程建模方法与算法研究。并将现有算法归结为独立Q-学习、合作Q-学习、联合Q-学习和多智能体行动器-评判器算法4种, 分析了这些方法的优点与不足, 总结并给出了基于多智能体强化学习的动态频谱分配方法的关键问题与解决思路。

关键词: 频谱管理, 认知无线电, 动态频谱分配, 机器学习, 强化学习, 多智能体强化学习

Abstract:

Cognitive radio and dynamic spectrum allocation technology are effective means to solve the scarcity of spectrum. With the rapid development of machine learning technology including deep learning and reinforcement learning in recent years, the swarm intelligence technology represented by multi-agent reinforcement learning is continuously making breakthroughs, which is also making distributed and intelligent dynamic spectrum allocation possible. This paper reviews the key research achievements in reinforcement learning and multi-agent reinforcement learning in detail, as well as research in modeling methods and algorithms of dynamic spectrum allocation process based on multi-agent reinforcement learning. The method could boil down to four types: independent Q-learning, cooperating Q-learning, joint Q-learning and multi-agent actor-critic. The advantages and disadvantages of the existing four types of methods are analyzed, and the critical problems and possible solutions of the dynamic spectrum allocation method based on multi-agent reinforcement learning are summarized.

Key words: spectrum management, cognitive radio, dynamic spectrum allocation, machine learning, reinforcement learning, multi-agent reinforcement learning

中图分类号: