基于POMDP模型的智能雷达干扰决策方法

doi:10.12305/j.issn.1001-506X.2023.09.13

摘要/Abstract

摘要：

为了有效提高复杂电磁环境下对非合作方工作模式未知的智能雷达的干扰效率和准确率, 提出了一种基于部分可观测马尔可夫决策过程(partially observable Markov decision process, POMDP)的干扰决策方法。首先, 根据智能雷达的工作特点构建了智能雷达对抗系统的POMDP模型, 采用非参数的、基于样本的信念分布反映智能体对环境的认知, 并利用贝叶斯滤波更新智能体对环境的信念。然后, 以信息熵作为评估准则, 令干扰机选择信息熵最大的干扰样式不断尝试。最后, 通过仿真实验与传统Q-学习法和经验决策法的干扰决策性能进行比较, 验证所提方法的优越性。结果表明, 所提方法能够根据未知雷达状态变化动态地选择最优干扰方式, 且能更快实现对智能雷达的干扰决策。

关键词: 智能雷达, 强化学习, 部分可观测马尔可夫决策过程模型, 贝叶斯滤波

Abstract:

In order to effectively improve the jamming efficiency and accuracy of intelligent radar with unknown working mode of non partners in complex electromagnetic environment, a jamming decision method based on partially observable Markov decision process (POMDP) is proposed. Firstly, according to the working characteristics of intelligent radar, the POMDP model of intelligent radar countermeasure system is constructed, the nonparametric and sample based belief distribution is used to reflect the agent's cognition of the environment, and the Bayesian filter is used to update the agent's belief in the environment. Then, taking the information entropy as the evaluation criterion, make the jammer choose the jamming style with the largest information entropy and try again and again. Finally, the simulation results are compared with the interference decision-making performance of traditional Q-learning method and empirical decision-making method to verify the superiority of the proposed method. The results show that the proposed method can dynamically select the optimal jamming mode according to the changes of unknown radar state, and realize the jamming decision of intelligent radar faster.

Key words: intelligent radar, reinforcement learning, partially observable Markov decision process (POMDP) model, Bayesian filtering

中图分类号:

TN973

冯路为, 刘松涛, 徐华志. 基于POMDP模型的智能雷达干扰决策方法[J]. 系统工程与电子技术, 2023, 45(9): 2755-2760.

Luwei FENG, Songtao LIU, Huazhi XU. Intelligent radar jamming decision-making method based on POMDP model[J]. Systems Engineering and Electronics, 2023, 45(9): 2755-2760.

图/表 10

图1

图2

表1

图3

图4

图5

图6

图7

表2

图8

参考文献 18

1	HAYKIN S . Cognitive radar-a way of the future[J]. IEEE Signal Processing Magazine, 2006, 23 (1): 30- 40. doi: 10.1109/MSP.2006.1593335
2	BACHMANN D J , EVANS R J , MORAN B . Game theoretic analysis of adaptive radar jamming[J]. IEEE Trans.on Aerospace and Electronic Systems, 2011, 47 (2): 1081- 1100. doi: 10.1109/TAES.2011.5751244
3	WANG B , WANG J K , SONG X , et al. Research on model and algorithm of waveform selection in cognitive radar[J]. Journal of Networks, 2010, 5 (9): 1041- 1046.
4	李云杰, 朱云鹏, 高梅国. 基于Q-学习算法的认知雷达对抗过程设计[J]. 北京理工大学学报, 2015, 35 (11): 1194- 1199.
	LI Y J , ZHU Y P , GAO M G . Design of cognitive radar jamming based on Q-learning algorithm[J]. Transactions of Beijing Institute of Technology, 2015, 35 (11): 1194- 1199.
5	邢强, 贾鑫, 朱卫纲. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40 (5): 1031- 1035.
	XING Q , JIA X , ZHU W G . Intelligent radar countermeasure based on Q-learning[J]. Systems Engineering and Electronics, 2018, 40 (5): 1031- 1035.
6	张柏开, 朱卫纲. 对多功能雷达的DQN认知干扰决策方法[J]. 系统工程与电子技术, 2020, 42 (4): 819- 825.
	ZHANG B K , ZHU W G . DQN based decision-making method of cognitive jamming against multifunctional radar[J]. Systems Engineering and Electronics, 2020, 42 (4): 819- 825.
7	周脉成. 基于博弈论的雷达干扰决策技术研究[D]. 西安: 西安电子科技大学, 2014.
	ZHOU M C. Research on radar jamming decision technology based on game theory[D]. Xi'an: Xidian University, 2014.
8	孙宏伟, 童宁宁, 孙富君. 基于D-S证据理论的电子干扰模式选择[J]. 弹箭与制导学报, 2003, (S2): 218- 220.
	SUN H W , TONG N N , SUN F J . Jamming design selection based on D-S theory[J]. Journal of Projectiles Rockets Missiles and Guidance, 2003, (S2): 218- 220.
9	张思齐. 基于部分可观测马尔可夫决策过程的干扰决策研究[D]. 西安: 西安电子科技大学, 2019.
	ZHANG S Q. Research on interference decision based on partially observable Markov decision process[D]. Xi'an: Xidian University, 2019.
10	NGO A V , LEE S G , CHUNG T C . Bayes-adaptive hierarchical MDPs[J]. Applied Intelligence, 2016, 45 (1): 112- 126. doi: 10.1007/s10489-015-0742-2
11	RICHARD D , EDWARD J . The optimal control of partially observable Markov processes over a finite horizon[J]. Operations Research, 1973, 21 (5): 1019- 1175. doi: 10.1287/opre.21.5.1019
12	RAZIEH G , HOSSEIN A M , FALLAHNEZ M . A POMDP framework to find optimal policy in sustainable maintenance[J]. Scientia Iranica, 2020, 27 (3): 1544- 1561.
13	孟磊, 吴芝亮, 王轶强. POMDP模型在多机器人环境探测中的应用研究[J]. 机械科学与技术, 2022, 41 (2): 178- 185.
	MENG L , WU Z L , WANG Y Q . Research on multi-robot environment exploration using POMDP[J]. Mechanical Science and Technology for Aerospace Engineering, 2022, 41 (2): 178- 185.
14	XIANG X C , SIMON F . Recent advances in deep reinforcement learning applications for solving partially observable markov decision processes (POMDP) problems[J]. Machine Learning and Knowledge Extraction, 2021, 3 (3): 554- 581.
15	WANG T X, TAGHVAEI A, MEHTA P G. Q-learning for POMDP: an application to learning locomotion gaits[C]//Proc. of the IEEE 58th Conference on Decision and Control, 2019: 2758-2763.
16	FU Y H, LIANG X X, HUANG M K, et al. Coordinating multi-agent deep reinforce-ment learning in wargame[C]//Proc. of the 3rd International Conference on Algorithms, Computing and Artificial Intelligence, 2020: 38-42.
17	SOONHO H , SIN G . Design of control framework based on deep reinforcement learning and Monte-Carlo sampling in downstream separation[J]. Computers & Chemical Engineering, 2020, 140 (2): 106910.
18	PARK H , SIM M K , CHOI D G . An intelligent financial portfolio trading strategy using deep Q-learning[J]. Expert Systems with Applications, 2020, 158 (15): 113573.

序号	雷达状态	威胁等级	参数特征
S_T+1¹	导弹制导	1	[i₁.i_c]
S_T+1²	非合作目标识别	1	[i₂.i_c]
S_T+1³	稳定跟踪	2	[i₃.i_c]
S_T+1⁴	边搜索边跟踪	2	[i₄.i_c]
S_T+1⁵	被动跟踪	2	[i₁.i_b]
S_T+1⁶	目标成像	3	[i₂.i_b]
S_T+1⁷	特征截获	3	[i₃.i_b]
S_T+1⁸	火炮测距	4	[i₄.i_b]
S_T+1⁹	目标检测	4	[i₁.i_a]
S_T+1¹⁰	边监视边搜索	5	[i₂.i_a]
S_T+1¹¹	立体搜索	5	[i₃.i_a]
S_T+1¹²	粗搜索	6	[i₄.i_a]

决策步骤	决策时间
状态初始输入	1.7
一次更新环境信念	1.0
二次更新环境信念	0.6
三次更新环境信念	0.3
生成对抗策略	0.1
总计	3.7

[1]	马悦, 吴琳, 许霄. 基于多智能体强化学习的协同目标分配[J]. 系统工程与电子技术, 2023, 45(9): 2793-2801.
[2]	韦道知, 张曌宇, 谢家豪, 李宁. 基于改进Actor-Critic算法的多传感器交叉提示技术[J]. 系统工程与电子技术, 2023, 45(6): 1624-1632.
[3]	吴冯国, 陶伟, 李辉, 张建伟, 郑成辰. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45(6): 1702-1711.
[4]	李欣致, 董胜波, 崔向阳. 基于非对称不可观测状态的强化学习技术[J]. 系统工程与电子技术, 2023, 45(6): 1755-1761.
[5]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[6]	叶立诚, 王军, 毛少卿, 刘帅. 基于多参数联合逐级离散的快速通信干扰决策方法[J]. 系统工程与电子技术, 2023, 45(5): 1518-1525.
[7]	冯翔, 李风从, 范羽, 刘涛, 崔文卿, 赵宜楠. 基于粒子采样投影的雷达低旁瓣复合波形设计[J]. 系统工程与电子技术, 2023, 45(4): 1008-1015.
[8]	陈恺丰, 田博睿, 李和清, 赵晨阳, 陆祖兴, 李新德, 邓勇. 基于DDPG算法的双轮腿机器人运动控制研究[J]. 系统工程与电子技术, 2023, 45(4): 1144-1151.
[9]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
[10]	任智, 张栋, 唐硕. 基于强化学习的改进三维A^*算法在线航迹规划[J]. 系统工程与电子技术, 2023, 45(1): 193-201.
[11]	李信, 李勇军, 赵尚弘. 基于深度强化学习的卫星光网络波长路由算法[J]. 系统工程与电子技术, 2023, 45(1): 264-270.
[12]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497.
[13]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[14]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[15]	郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949.