基于分层多智能体强化学习的雷达协同抗干扰策略优化

doi:10.12305/j.issn.1001-506X.2025.04.07

Abstract

Abstract:

The sparsity of rewards in the decision-making process of radar collaborative anti-jamming makes it difficult for reinforcement learning algorithms to converge and for collaborative training. To address this issue, a hierarchical multi-agent deep deterministic policy gradient (H-MADDPG) algorithm is proposed. By accumulating sparse rewards, the convergence performance of the training process is improved, and the Harvard structure idea is introduced to separately store the training experiences of multi-agent to eliminate the confusion in experience replay. In the simulations of two and four radars network simulation, under certain strong jamming conditions, the radar detection success rate is respectively increased by 15% and 30% compared to the multi-agent deep deterministic policy gradient(MADDPG) algorithm.

Key words: radar anti-jamming, hierarchical reinforcement learning, multi-agent system, deep deterministic policy gradient (DDPG), sparse reward

CLC Number:

TN974

Ziyi WANG, Xiongjun FU, Jian DONG, Cheng FENG. Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning[J]. Systems Engineering and Electronics, 2025, 47(4): 1108-1114.

Figures/Tables 8

Fig.1

Table 1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

References 25

1	HUANGJ S,CAOG D.Joint transmitting subarray partition and beamforming for active jamming suppression in phased-MIMO radar[J].Radio Science,2022,57(1):1-18.
2	JOHNSON N, CIVEROLO M, LUMSDEN N. Techniques and methods for adaptive single antenna radar system polarization optimization for anti-jam and anti-clutter applications[C]//Proc. of the IEEE National Radar Conference, 2015: 210-213.
3	LIY C,WANGJ D,WANGY,et al.Random frequency coded waveform optimization and signal coherent accumulation against compound deception jamming[J].IEEE Trans.on Aerospace and Electronic Systems,2023,59(4):4434-4449. doi: 10.1109/TAES.2023.3243884
4	CHENG G J, FU X J, MA S S, et al. Anti-jamming technology of dense co-frequency synchronous range false targets[C]//Proc. of the IEEE International Radar Conference, 2016.
5	SUTTONR S,BARTOA G.Reinforcement learning: an introduction[J].Robotica,1999,17(2):229-235.
6	FANGY Y,ZHANGL,WEIS P,et al.Online frequency-agile strategy for radar detection based on constrained combinatorial non-stationary bandit[J].IEEE Trans.on Aerospace and Electronic Systems,2022,59(2):1693-1706.
7	LI K, JIU B, LIU H W, et al. Reinforcement learning based anti-jamming frequency hopping strategies design for cognitive radar[C]// Proc. of the IEEE International Conference on Signal Processing, Communications and Computing, 2018.
8	AI L Y, WEI Y, YE Y. Reinforcement learning-based joint adaptive frequency hopping and pulse-width allocation for radar anti-jamming[C]//Proc. of the IEEE Radar Conference, 2020.
9	WEI J J, YU L, XU R Q. Intelligent decision method of slope perturbing based on Q-learning for anti-deception jamming[C]//Proc. of the 6th International Conference on Imaging, Signal Processing and Communications, 2022: 71-76.
10	LEI A F, FAN W W, ZHOU F. A cognitive radar anti-jamming strategy generation algorithm based on dueling double DQN[C]//Proc. of the IEEE International Radar Conference, 2023.
11	LI H Y, HAN Z W, PU W Q, et al. Counterfactual regret minimization for anti-jamming game of frequency agile radar[C]// Proc. of the IEEE 12th Sensor Array and Multichannel Signal Processing Workshop, 2022: 111-115.
12	AIL Y,YIW,VARSHNEYP K.Adaptation of frequency hopping interval for radar anti-jamming based on reinforcement learning[J].IEEE Trans.on Vehicular Technology,2022,71(12):12434-12449. doi: 10.1109/TVT.2022.3197425
13	JIANG X F, ZHOU F, JIAN Y, et al. An optimal POMDP-based anti-jamming policy for cognitive radar[C]//Proc. of the 13th IEEE Conference on Automation Science and Engineering, 2017: 938-943.
14	YANG T, YUAN Y, YI W. Multi-domain resource scheduling for surveillance radar anti-jamming based on Q-learning[C]//Proc. of the IEEE Radar Conference, 2023.
15	AZIZ M M, MAUD A R M, HABIB A. Reinforcement learning based techniques for radar anti-jamming[C]//Proc. of the International Bhurban Conference on Applied Sciences and Technologies, 2021: 1021-1025.
16	汪浩,王峰.强化学习算法在雷达智能抗干扰中的应用[J].现代雷达,2020,42(3):40-44, 48.
	WANGH,WANGF.Application of reinforcement learning algorithm in radar intelligent anti-jamming[J].Modern Radar,2020,42(3):40-44, 48.
17	袁泉. 智能雷达网络抗有源干扰方法[D]. 哈尔滨: 哈尔滨工业大学, 2020.
	YUAN Q. Anti-active jamming method of intelligent radar network[D]. Harbin: Harbin Institute of Technology, 2020.
18	JIANG W, WANG Y P, LI Y, et al. An intelligent anti-jamming decision-making method based on deep reinforcement learning for cognitive radar[C]//Proc. of the 26th International Conference on Computer Supported Cooperative Work in Design, 2023: 1662-1666.
19	FENGC,FUX J,LANGP,et al.A radar anti-jamming strategy based on game theory with temporal constraints[J].IEEE Access,2022,10,97429-97438. doi: 10.1109/ACCESS.2022.3200761
20	LIK,LIUH W,JIUB,et al.Knowledge aided model-based reinforcement learning for anti-jamming strategy learning[J].IEEE Trans.on Aerospace and Electronic System,2024,60(3):2976-2994. doi: 10.1109/TAES.2024.3358779
21	LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proc. of the 31st International Conference on Neural Information Processing Systems, 2017: 6382-6393.
22	RASHIDT,SAMVELYANM,DE-WITTC S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].The Journal of Machine Learning Research,2020,21(1):7234-7284.
23	FENGC,FUX J,WANGZ,et al.An optimization method for collaborative radar antijamming based on multi-agent reinforcement learning[J].Remote Sensing,2023,15(11):2893.
24	RIEDMILLER M, HAFNER R, LAMPE T, et al. Learning by playing solving sparse reward tasks from scratch[C]//Proc. of the International Conference on Machine Learning, 2018: 4344-4353.
25	RAFATI J, NOELLE D C. Learning representations in model-free hierarchical reinforcement learning[C]//Proc. of the AAAI Conference on Artificial Intelligence, 2019: 10009-10010.

工作模式	威胁度
搜索	1
跟踪	5
火控	10

[1]	Jiale LI, Qilin ZHONG, Jie XIAO, Guofei LI. Adaptive fixed-time formation control for multi-agent system [J]. Systems Engineering and Electronics, 2025, 47(2): 600-607.
[2]	Xunliang YAN, Kuan WANG, Zijian ZHANG, Peichen WANG. Reentry guidance method based on LSTM-DDPG [J]. Systems Engineering and Electronics, 2025, 47(1): 268-279.
[3]	Lei YU, Yipin LIU, Yinsheng WEI. Anti-mainlobe jamming method via blind extraction based on maximizing SINR [J]. Systems Engineering and Electronics, 2024, 46(9): 2968-2979.
[4]	Weimin IU, Yongyue WANG, Xinyang MA, Jinkun LIU. Consensus control for input-delay multi-agent system with input constraint [J]. Systems Engineering and Electronics, 2024, 46(9): 3176-3184.
[5]	Jie ZHANG, Kairong LIU, Jinbao CHEN, Yingxue ZHANG, Chuanzhi CHEN, Hongzhi YU, Yunxiao ZHANG. Multi-agents formation control method based on space confrontation [J]. Systems Engineering and Electronics, 2024, 46(6): 2082-2091.
[6]	Guhao SUN, Zhongze CAI, Qingshuang ZENG. Fixed-time distributed tracking and control of multi-agent formation weighted centroid [J]. Systems Engineering and Electronics, 2024, 46(12): 4165-4172.
[7]	Jiahao ZHANG, Wei LI, Wantian WANG, Hengfeng WANG, Yaxing LI, Ji MENG. Radar interference cancellation method and performance analysis under condition of incomplete array degree of freedom [J]. Systems Engineering and Electronics, 2024, 46(11): 3639-3647.
[8]	Kaifeng CHEN, Borui TIAN, Heqing LI, Chenyang ZHAO, Zuxing LU, Xinde LI, Yong DENG. Research on DDPG-based motion control of two-wheel-legged robot [J]. Systems Engineering and Electronics, 2023, 45(4): 1144-1151.
[9]	Pu ZHANG, Huifeng XUE, Shan GAO, Xuan ZUO. Distributed finite-time adaptive cooperative fault-tolerant control for multi-agent systems with integrated actuators faults [J]. Systems Engineering and Electronics, 2022, 44(4): 1220-1229.
[10]	Zhe LUO, Wanzhen QUAN, Purui ZHANG, Xiaogang YANG. Consensus tracking control for one-side Lipschitz nonlinear multi-agent systems [J]. Systems Engineering and Electronics, 2022, 44(1): 279-284.
[11]	Juntao ZHANG, Shangsheng LI, Xukun WANG. Method of radar anti-jamming performance evaluation based on grey correlation-fuzzy comprehensive evaluation [J]. Systems Engineering and Electronics, 2021, 43(6): 1557-1563.
[12]	Jiayi LIU, Shaohua YUE, Gang WANG, Xiaoqiang YAO, Jie ZHANG. Cooperative evolution algorithm of multi-agent system under complex tasks [J]. Systems Engineering and Electronics, 2021, 43(4): 991-1002.
[13]	Ang GAO, Zhiming DONG, Liang LI, Jinghua SONG, Li DUAN. Parallel priority experience replay mechanism of MADDPG algorithm [J]. Systems Engineering and Electronics, 2021, 43(2): 420-433.
[14]	Pu ZHANG, Huifeng XUE, Shan GAO, Xuan ZUO. Distributed adaptive cooperative tracking control of multi-agent system with weak communication [J]. Systems Engineering and Electronics, 2021, 43(2): 487-498.
[15]	Kun ZHANG, Ke LI, Haotian SHI, Zhenchong ZHANG, Zekun LIU. Autonomous guidance maneuver control and decision-making algorithm [J]. Systems Engineering and Electronics, 2020, 42(7): 1567-1574.

Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 25

Related Articles 15

Recommended Articles

Metrics

Comments