系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (4): 1108-1114.doi: 10.12305/j.issn.1001-506X.2025.04.07

• 传感器与信号处理 • 上一篇    下一篇

基于分层多智能体强化学习的雷达协同抗干扰策略优化

王子怡, 傅雄军, 董健, 冯程   

  1. 北京理工大学集成电路与电子学院, 北京 100081
  • 收稿日期:2024-01-19 出版日期:2025-04-25 发布日期:2025-05-28
  • 通讯作者: 傅雄军
  • 作者简介:王子怡 (1999—), 女, 硕士研究生, 主要研究方向为雷达抗干扰策略优化
    傅雄军 (1978—), 男, 教授, 博士, 主要研究方向为雷达信号处理、低截获概率雷达、逆合成孔径雷达、电磁频谱战
    董健 (1982—), 男, 副教授, 博士, 主要研究方向为雷达信号处理、SAR/ISAR成像技术、雷达组网技术
    冯程 (1986—), 男, 博士研究生, 主要研究方向为雷达信号处理、雷达系统、雷达博弈论策略

Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning

Ziyi WANG, Xiongjun FU, Jian DONG, Cheng FENG   

  1. School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China
  • Received:2024-01-19 Online:2025-04-25 Published:2025-05-28
  • Contact: Xiongjun FU

摘要:

雷达协同抗干扰决策过程中奖励存在稀疏性,导致强化学习算法难以收敛,协同训练困难。为解决该问题, 提出一种分层多智能体深度确定性策略梯度(hierarchical multi-agent deep deterministic policy gradient, H-MADDPG)算法, 通过稀疏奖励的累积提升训练过程的收敛性能, 引入哈佛结构思想分别存储多智能体的训练经验以消除经验回放混乱问题。在2部和4部雷达组网仿真中, 在某种强干扰条件下, 雷达探测成功率比多智能体深度确定性梯度(multi-agent deep deterministic policy gradient, MADDPG)算法分别提高了15%和30%。

关键词: 雷达抗干扰策略, 分层强化学习, 多智能体系统, 深度确定性策略梯度, 稀疏奖励

Abstract:

The sparsity of rewards in the decision-making process of radar collaborative anti-jamming makes it difficult for reinforcement learning algorithms to converge and for collaborative training. To address this issue, a hierarchical multi-agent deep deterministic policy gradient (H-MADDPG) algorithm is proposed. By accumulating sparse rewards, the convergence performance of the training process is improved, and the Harvard structure idea is introduced to separately store the training experiences of multi-agent to eliminate the confusion in experience replay. In the simulations of two and four radars network simulation, under certain strong jamming conditions, the radar detection success rate is respectively increased by 15% and 30% compared to the multi-agent deep deterministic policy gradient(MADDPG) algorithm.

Key words: radar anti-jamming, hierarchical reinforcement learning, multi-agent system, deep deterministic policy gradient (DDPG), sparse reward

中图分类号: