基于POMDP模型的智能雷达干扰决策方法

doi:10.12305/j.issn.1001-506X.2023.09.13

Abstract

Abstract:

In order to effectively improve the jamming efficiency and accuracy of intelligent radar with unknown working mode of non partners in complex electromagnetic environment, a jamming decision method based on partially observable Markov decision process (POMDP) is proposed. Firstly, according to the working characteristics of intelligent radar, the POMDP model of intelligent radar countermeasure system is constructed, the nonparametric and sample based belief distribution is used to reflect the agent's cognition of the environment, and the Bayesian filter is used to update the agent's belief in the environment. Then, taking the information entropy as the evaluation criterion, make the jammer choose the jamming style with the largest information entropy and try again and again. Finally, the simulation results are compared with the interference decision-making performance of traditional Q-learning method and empirical decision-making method to verify the superiority of the proposed method. The results show that the proposed method can dynamically select the optimal jamming mode according to the changes of unknown radar state, and realize the jamming decision of intelligent radar faster.

Key words: intelligent radar, reinforcement learning, partially observable Markov decision process (POMDP) model, Bayesian filtering

CLC Number:

TN973

Luwei FENG, Songtao LIU, Huazhi XU. Intelligent radar jamming decision-making method based on POMDP model[J]. Systems Engineering and Electronics, 2023, 45(9): 2755-2760.

Figures/Tables 10

Fig.1

Fig.2

Table 1

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Table 2

Fig.8

References 18

1	HAYKIN S . Cognitive radar-a way of the future[J]. IEEE Signal Processing Magazine, 2006, 23 (1): 30- 40. doi: 10.1109/MSP.2006.1593335
2	BACHMANN D J , EVANS R J , MORAN B . Game theoretic analysis of adaptive radar jamming[J]. IEEE Trans.on Aerospace and Electronic Systems, 2011, 47 (2): 1081- 1100. doi: 10.1109/TAES.2011.5751244
3	WANG B , WANG J K , SONG X , et al. Research on model and algorithm of waveform selection in cognitive radar[J]. Journal of Networks, 2010, 5 (9): 1041- 1046.
4	李云杰, 朱云鹏, 高梅国. 基于Q-学习算法的认知雷达对抗过程设计[J]. 北京理工大学学报, 2015, 35 (11): 1194- 1199.
	LI Y J , ZHU Y P , GAO M G . Design of cognitive radar jamming based on Q-learning algorithm[J]. Transactions of Beijing Institute of Technology, 2015, 35 (11): 1194- 1199.
5	邢强, 贾鑫, 朱卫纲. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40 (5): 1031- 1035.
	XING Q , JIA X , ZHU W G . Intelligent radar countermeasure based on Q-learning[J]. Systems Engineering and Electronics, 2018, 40 (5): 1031- 1035.
6	张柏开, 朱卫纲. 对多功能雷达的DQN认知干扰决策方法[J]. 系统工程与电子技术, 2020, 42 (4): 819- 825.
	ZHANG B K , ZHU W G . DQN based decision-making method of cognitive jamming against multifunctional radar[J]. Systems Engineering and Electronics, 2020, 42 (4): 819- 825.
7	周脉成. 基于博弈论的雷达干扰决策技术研究[D]. 西安: 西安电子科技大学, 2014.
	ZHOU M C. Research on radar jamming decision technology based on game theory[D]. Xi'an: Xidian University, 2014.
8	孙宏伟, 童宁宁, 孙富君. 基于D-S证据理论的电子干扰模式选择[J]. 弹箭与制导学报, 2003, (S2): 218- 220.
	SUN H W , TONG N N , SUN F J . Jamming design selection based on D-S theory[J]. Journal of Projectiles Rockets Missiles and Guidance, 2003, (S2): 218- 220.
9	张思齐. 基于部分可观测马尔可夫决策过程的干扰决策研究[D]. 西安: 西安电子科技大学, 2019.
	ZHANG S Q. Research on interference decision based on partially observable Markov decision process[D]. Xi'an: Xidian University, 2019.
10	NGO A V , LEE S G , CHUNG T C . Bayes-adaptive hierarchical MDPs[J]. Applied Intelligence, 2016, 45 (1): 112- 126. doi: 10.1007/s10489-015-0742-2
11	RICHARD D , EDWARD J . The optimal control of partially observable Markov processes over a finite horizon[J]. Operations Research, 1973, 21 (5): 1019- 1175. doi: 10.1287/opre.21.5.1019
12	RAZIEH G , HOSSEIN A M , FALLAHNEZ M . A POMDP framework to find optimal policy in sustainable maintenance[J]. Scientia Iranica, 2020, 27 (3): 1544- 1561.
13	孟磊, 吴芝亮, 王轶强. POMDP模型在多机器人环境探测中的应用研究[J]. 机械科学与技术, 2022, 41 (2): 178- 185.
	MENG L , WU Z L , WANG Y Q . Research on multi-robot environment exploration using POMDP[J]. Mechanical Science and Technology for Aerospace Engineering, 2022, 41 (2): 178- 185.
14	XIANG X C , SIMON F . Recent advances in deep reinforcement learning applications for solving partially observable markov decision processes (POMDP) problems[J]. Machine Learning and Knowledge Extraction, 2021, 3 (3): 554- 581.
15	WANG T X, TAGHVAEI A, MEHTA P G. Q-learning for POMDP: an application to learning locomotion gaits[C]//Proc. of the IEEE 58th Conference on Decision and Control, 2019: 2758-2763.
16	FU Y H, LIANG X X, HUANG M K, et al. Coordinating multi-agent deep reinforce-ment learning in wargame[C]//Proc. of the 3rd International Conference on Algorithms, Computing and Artificial Intelligence, 2020: 38-42.
17	SOONHO H , SIN G . Design of control framework based on deep reinforcement learning and Monte-Carlo sampling in downstream separation[J]. Computers & Chemical Engineering, 2020, 140 (2): 106910.
18	PARK H , SIM M K , CHOI D G . An intelligent financial portfolio trading strategy using deep Q-learning[J]. Expert Systems with Applications, 2020, 158 (15): 113573.

序号	雷达状态	威胁等级	参数特征
S_T+1¹	导弹制导	1	[i₁.i_c]
S_T+1²	非合作目标识别	1	[i₂.i_c]
S_T+1³	稳定跟踪	2	[i₃.i_c]
S_T+1⁴	边搜索边跟踪	2	[i₄.i_c]
S_T+1⁵	被动跟踪	2	[i₁.i_b]
S_T+1⁶	目标成像	3	[i₂.i_b]
S_T+1⁷	特征截获	3	[i₃.i_b]
S_T+1⁸	火炮测距	4	[i₄.i_b]
S_T+1⁹	目标检测	4	[i₁.i_a]
S_T+1¹⁰	边监视边搜索	5	[i₂.i_a]
S_T+1¹¹	立体搜索	5	[i₃.i_a]
S_T+1¹²	粗搜索	6	[i₄.i_a]

决策步骤	决策时间
状态初始输入	1.7
一次更新环境信念	1.0
二次更新环境信念	0.6
三次更新环境信念	0.3
生成对抗策略	0.1
总计	3.7

[1]	Yue MA, Lin WU, Xiao XU. Cooperative targets assignment based on multi-agent reinforcement learning [J]. Systems Engineering and Electronics, 2023, 45(9): 2793-2801.
[2]	Daozhi WEI, Zhaoyu ZHANG, Jiahao XIE, Ning LI. Multi-sensor cross cueing technique based on improved Actor-Critic algorithm [J]. Systems Engineering and Electronics, 2023, 45(6): 1624-1632.
[3]	Fengguo WU, Wei TAO, Hui LI, Jianwei ZHANG, Chengchen ZHENG. UAV intelligent avoidance decisions based on deep reinforcement learning algorithm [J]. Systems Engineering and Electronics, 2023, 45(6): 1702-1711.
[4]	Xinzhi LI, Shengbo DONG, Xiangyang CUI. Reinforcement learning technology based on asymmetric unobservable state [J]. Systems Engineering and Electronics, 2023, 45(6): 1755-1761.
[5]	Jin TANG, Yangang LIANG, Zhihui BAI, Kebo LI. Landing control algorithm of rotor UAV based on DQN [J]. Systems Engineering and Electronics, 2023, 45(5): 1451-1460.
[6]	Licheng YE, Jun WANG, Shaoqing MAO, Shuai LIU. Fast communication jamming decision-making method based on multi-parameter joint stepwise discretization [J]. Systems Engineering and Electronics, 2023, 45(5): 1518-1525.
[7]	Xiang FENG, Fengcong LI, Yu FAN, Tao LIU, Wenqing CUI, Yinan ZHAO. Radar composite waveform design with low range sidelobes based on particles sampling projection [J]. Systems Engineering and Electronics, 2023, 45(4): 1008-1015.
[8]	Kaifeng CHEN, Borui TIAN, Heqing LI, Chenyang ZHAO, Zuxing LU, Xinde LI, Yong DENG. Research on DDPG-based motion control of two-wheel-legged robot [J]. Systems Engineering and Electronics, 2023, 45(4): 1144-1151.
[9]	Siqi TANG, Zhisong PAN, Guyu HU, Yang WU, Yunbo LI. Application of deep reinforcement learning in space information network——status quo and prospects [J]. Systems Engineering and Electronics, 2023, 45(3): 886-901.
[10]	Zhi REN, Dong ZHANG, Shuo TANG. Improved three-dimensional A^* algorithm of real-time path planning based on reinforcement learning [J]. Systems Engineering and Electronics, 2023, 45(1): 193-201.
[11]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Research on decision-making modeling of cognitive jamming for multi-functional radar based on Markov [J]. Systems Engineering and Electronics, 2022, 44(8): 2488-2497.
[12]	Guan WANG, Haizhong RU, Dali ZHANG, Guangcheng MA, Hongwei XIA. Design of intelligent control system for flexible hypersonic vehicle [J]. Systems Engineering and Electronics, 2022, 44(7): 2276-2285.
[13]	Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.
[14]	Dongzi GUO, Rong HUANG, Hechuan XU, Liwei SUN, Naigang CUI. Research on deep deterministic policy gradient guidance method for reentry vehicle [J]. Systems Engineering and Electronics, 2022, 44(6): 1942-1949.
[15]	Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.

Intelligent radar jamming decision-making method based on POMDP model

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 18

Related Articles 15

Recommended Articles

Metrics

Comments