基于分层多智能体强化学习的雷达协同抗干扰策略优化

doi:10.12305/j.issn.1001-506X.2025.04.07

系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (4): 1108-1114.doi: 10.12305/j.issn.1001-506X.2025.04.07

基于分层多智能体强化学习的雷达协同抗干扰策略优化

王子怡, 傅雄军, 董健, 冯程

北京理工大学集成电路与电子学院, 北京 100081

收稿日期:2024-01-19 出版日期:2025-04-25 发布日期:2025-05-28
通讯作者: 傅雄军
作者简介:王子怡 (1999—), 女, 硕士研究生, 主要研究方向为雷达抗干扰策略优化
傅雄军 (1978—), 男, 教授, 博士, 主要研究方向为雷达信号处理、低截获概率雷达、逆合成孔径雷达、电磁频谱战
董健 (1982—), 男, 副教授, 博士, 主要研究方向为雷达信号处理、SAR/ISAR成像技术、雷达组网技术
冯程 (1986—), 男, 博士研究生, 主要研究方向为雷达信号处理、雷达系统、雷达博弈论策略

Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning

Ziyi WANG, Xiongjun FU, Jian DONG, Cheng FENG

School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China

Received:2024-01-19 Online:2025-04-25 Published:2025-05-28
Contact: Xiongjun FU

摘要/Abstract

摘要：

雷达协同抗干扰决策过程中奖励存在稀疏性，导致强化学习算法难以收敛，协同训练困难。为解决该问题, 提出一种分层多智能体深度确定性策略梯度(hierarchical multi-agent deep deterministic policy gradient, H-MADDPG)算法, 通过稀疏奖励的累积提升训练过程的收敛性能, 引入哈佛结构思想分别存储多智能体的训练经验以消除经验回放混乱问题。在2部和4部雷达组网仿真中, 在某种强干扰条件下, 雷达探测成功率比多智能体深度确定性梯度(multi-agent deep deterministic policy gradient, MADDPG)算法分别提高了15%和30%。

关键词: 雷达抗干扰策略, 分层强化学习, 多智能体系统, 深度确定性策略梯度, 稀疏奖励

Abstract:

The sparsity of rewards in the decision-making process of radar collaborative anti-jamming makes it difficult for reinforcement learning algorithms to converge and for collaborative training. To address this issue, a hierarchical multi-agent deep deterministic policy gradient (H-MADDPG) algorithm is proposed. By accumulating sparse rewards, the convergence performance of the training process is improved, and the Harvard structure idea is introduced to separately store the training experiences of multi-agent to eliminate the confusion in experience replay. In the simulations of two and four radars network simulation, under certain strong jamming conditions, the radar detection success rate is respectively increased by 15% and 30% compared to the multi-agent deep deterministic policy gradient(MADDPG) algorithm.

Key words: radar anti-jamming, hierarchical reinforcement learning, multi-agent system, deep deterministic policy gradient (DDPG), sparse reward

中图分类号:

TN974

王子怡, 傅雄军, 董健, 冯程. 基于分层多智能体强化学习的雷达协同抗干扰策略优化[J]. 系统工程与电子技术, 2025, 47(4): 1108-1114.

Ziyi WANG, Xiongjun FU, Jian DONG, Cheng FENG. Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning[J]. Systems Engineering and Electronics, 2025, 47(4): 1108-1114.

图/表 8

图1

表1

图2

图3

图4

图5

图6

图7

参考文献 25

1	HUANGJ S,CAOG D.Joint transmitting subarray partition and beamforming for active jamming suppression in phased-MIMO radar[J].Radio Science,2022,57(1):1-18.
2	JOHNSON N, CIVEROLO M, LUMSDEN N. Techniques and methods for adaptive single antenna radar system polarization optimization for anti-jam and anti-clutter applications[C]//Proc. of the IEEE National Radar Conference, 2015: 210-213.
3	LIY C,WANGJ D,WANGY,et al.Random frequency coded waveform optimization and signal coherent accumulation against compound deception jamming[J].IEEE Trans.on Aerospace and Electronic Systems,2023,59(4):4434-4449. doi: 10.1109/TAES.2023.3243884
4	CHENG G J, FU X J, MA S S, et al. Anti-jamming technology of dense co-frequency synchronous range false targets[C]//Proc. of the IEEE International Radar Conference, 2016.
5	SUTTONR S,BARTOA G.Reinforcement learning: an introduction[J].Robotica,1999,17(2):229-235.
6	FANGY Y,ZHANGL,WEIS P,et al.Online frequency-agile strategy for radar detection based on constrained combinatorial non-stationary bandit[J].IEEE Trans.on Aerospace and Electronic Systems,2022,59(2):1693-1706.
7	LI K, JIU B, LIU H W, et al. Reinforcement learning based anti-jamming frequency hopping strategies design for cognitive radar[C]// Proc. of the IEEE International Conference on Signal Processing, Communications and Computing, 2018.
8	AI L Y, WEI Y, YE Y. Reinforcement learning-based joint adaptive frequency hopping and pulse-width allocation for radar anti-jamming[C]//Proc. of the IEEE Radar Conference, 2020.
9	WEI J J, YU L, XU R Q. Intelligent decision method of slope perturbing based on Q-learning for anti-deception jamming[C]//Proc. of the 6th International Conference on Imaging, Signal Processing and Communications, 2022: 71-76.
10	LEI A F, FAN W W, ZHOU F. A cognitive radar anti-jamming strategy generation algorithm based on dueling double DQN[C]//Proc. of the IEEE International Radar Conference, 2023.
11	LI H Y, HAN Z W, PU W Q, et al. Counterfactual regret minimization for anti-jamming game of frequency agile radar[C]// Proc. of the IEEE 12th Sensor Array and Multichannel Signal Processing Workshop, 2022: 111-115.
12	AIL Y,YIW,VARSHNEYP K.Adaptation of frequency hopping interval for radar anti-jamming based on reinforcement learning[J].IEEE Trans.on Vehicular Technology,2022,71(12):12434-12449. doi: 10.1109/TVT.2022.3197425
13	JIANG X F, ZHOU F, JIAN Y, et al. An optimal POMDP-based anti-jamming policy for cognitive radar[C]//Proc. of the 13th IEEE Conference on Automation Science and Engineering, 2017: 938-943.
14	YANG T, YUAN Y, YI W. Multi-domain resource scheduling for surveillance radar anti-jamming based on Q-learning[C]//Proc. of the IEEE Radar Conference, 2023.
15	AZIZ M M, MAUD A R M, HABIB A. Reinforcement learning based techniques for radar anti-jamming[C]//Proc. of the International Bhurban Conference on Applied Sciences and Technologies, 2021: 1021-1025.
16	汪浩,王峰.强化学习算法在雷达智能抗干扰中的应用[J].现代雷达,2020,42(3):40-44, 48.
	WANGH,WANGF.Application of reinforcement learning algorithm in radar intelligent anti-jamming[J].Modern Radar,2020,42(3):40-44, 48.
17	袁泉. 智能雷达网络抗有源干扰方法[D]. 哈尔滨: 哈尔滨工业大学, 2020.
	YUAN Q. Anti-active jamming method of intelligent radar network[D]. Harbin: Harbin Institute of Technology, 2020.
18	JIANG W, WANG Y P, LI Y, et al. An intelligent anti-jamming decision-making method based on deep reinforcement learning for cognitive radar[C]//Proc. of the 26th International Conference on Computer Supported Cooperative Work in Design, 2023: 1662-1666.
19	FENGC,FUX J,LANGP,et al.A radar anti-jamming strategy based on game theory with temporal constraints[J].IEEE Access,2022,10,97429-97438. doi: 10.1109/ACCESS.2022.3200761
20	LIK,LIUH W,JIUB,et al.Knowledge aided model-based reinforcement learning for anti-jamming strategy learning[J].IEEE Trans.on Aerospace and Electronic System,2024,60(3):2976-2994. doi: 10.1109/TAES.2024.3358779
21	LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proc. of the 31st International Conference on Neural Information Processing Systems, 2017: 6382-6393.
22	RASHIDT,SAMVELYANM,DE-WITTC S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].The Journal of Machine Learning Research,2020,21(1):7234-7284.
23	FENGC,FUX J,WANGZ,et al.An optimization method for collaborative radar antijamming based on multi-agent reinforcement learning[J].Remote Sensing,2023,15(11):2893.
24	RIEDMILLER M, HAFNER R, LAMPE T, et al. Learning by playing solving sparse reward tasks from scratch[C]//Proc. of the International Conference on Machine Learning, 2018: 4344-4353.
25	RAFATI J, NOELLE D C. Learning representations in model-free hierarchical reinforcement learning[C]//Proc. of the AAAI Conference on Artificial Intelligence, 2019: 10009-10010.

工作模式	威胁度
搜索	1
跟踪	5
火控	10

[1]	王子怡, 傅雄军, 董健, 冯程. 基于分层多智能体强化学习的雷达协同抗干扰策略优化[J]. 系统工程与电子技术, 2025, 47(4): 1108-1114.
[2]	李嘉乐, 钟绮霖, 肖杰, 李国飞. 多智能体系统自适应固定时间编队控制[J]. 系统工程与电子技术, 2025, 47(2): 600-607.
[3]	闫循良, 王宽, 张子剑, 王培臣. 基于LSTM-DDPG的再入制导方法[J]. 系统工程与电子技术, 2025, 47(1): 268-279.
[4]	刘伟民, 王永越, 马欣阳, 刘金琨. 输入时滞多智能体系统的输入受限一致性控制[J]. 系统工程与电子技术, 2024, 46(9): 3176-3184.
[5]	张杰, 刘开蓉, 陈金宝, 张迎雪, 陈传志, 余虹志, 张云啸. 基于空间对抗的多智能体编队控制方法[J]. 系统工程与电子技术, 2024, 46(6): 2082-2091.
[6]	孙谷昊, 蔡中泽, 曾庆双. 多智能体编队加权中心点固定时间分布式跟踪控制[J]. 系统工程与电子技术, 2024, 46(12): 4165-4172.
[7]	陈恺丰, 田博睿, 李和清, 赵晨阳, 陆祖兴, 李新德, 邓勇. 基于DDPG算法的双轮腿机器人运动控制研究[J]. 系统工程与电子技术, 2023, 45(4): 1144-1151.
[8]	任智, 张栋, 唐硕. 基于强化学习的改进三维A^*算法在线航迹规划[J]. 系统工程与电子技术, 2023, 45(1): 193-201.
[9]	马子杰, 谢拥军. 体系作战下巡航导弹的动态隐身[J]. 系统工程与电子技术, 2022, 44(9): 2826-2831.
[10]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[11]	郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949.
[12]	张普, 薛惠锋, 高山, 左轩. 具有混合执行器故障的多智能体分布式有限时间自适应协同容错控制[J]. 系统工程与电子技术, 2022, 44(4): 1220-1229.
[13]	罗哲, 权婉珍, 张朴睿, 杨小冈. 单边Lipschitz非线性多智能体系统一致性追踪控制[J]. 系统工程与电子技术, 2022, 44(1): 279-284.
[14]	刘家义, 岳韶华, 王刚, 姚小强, 张杰. 复杂任务下的多智能体协同进化算法[J]. 系统工程与电子技术, 2021, 43(4): 991-1002.
[15]	高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433.

基于分层多智能体强化学习的雷达协同抗干扰策略优化

Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 25

相关文章 15

编辑推荐

Metrics

本文评价