基于先验知识的多功能雷达智能干扰决策方法

doi:10.12305/j.issn.1001-506X.2022.12.12

系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (12): 3685-3695.doi: 10.12305/j.issn.1001-506X.2022.12.12

基于先验知识的多功能雷达智能干扰决策方法

朱霸坤^1,^2,*, 朱卫纲¹, 李伟³, 杨莹³, 高天昊³

1. 航天工程大学电子光学工程系, 北京 101416
2. 电子信息系统复杂电磁环境效应国家重点实验室, 河南洛阳 471032
3. 航天工程大学研究生院, 北京 101416

收稿日期:2021-07-16 出版日期:2022-11-14 发布日期:2022-11-24
通讯作者: 朱霸坤
作者简介:朱霸坤(1997—), 男, 硕士研究生, 主要研究方向为认知电子战、雷达对抗|朱卫纲(1973—), 女, 教授, 博士, 主要研究方向为现代信号处理、空间信息对抗、认知电子战|李伟(1994—), 男, 硕士研究生, 主要研究方向为雷达辐射源识别|杨莹(1997—), 女, 硕士研究生, 主要研究方向为雷达信号处理|高天昊(1997—), 男, 硕士研究生, 主要研究方向为雷达辐射源识别
基金资助:
电子信息系统复杂电磁环境效应国家重点实验室项目(CEMEE2020Z0203B)

Multi-function radar intelligent jamming decision method based on prior knowledge

Bakun ZHU^1,^2,*, Weigang ZHU¹, Wei LI³, Ying YANG³, Tianhao GAO³

1. Department of Electronic and Optical Engineering, Space Engineering University, Beijing 101416, China
2. State Key Laboratory of Complex Electromagnetic Environment E f f ects on Electronics and Information System, Luoyang 471032, China
3. Campany of Postgraduate Management, Space Engineering University, Beijing 101416, China

Received:2021-07-16 Online:2022-11-14 Published:2022-11-24
Contact: Bakun ZHU

摘要/Abstract

摘要：

针对基于强化学习的多功能雷达干扰决策方法训练周期长、收敛慢的问题，本文提出了基于先验知识的多功能雷达智能干扰决策算法。所提算法使用了基于势能函数的收益塑造理论，利用先验知识设置收益函数，相比于传统算法，具有更快的收敛速率。利用先验知识加速算法收敛速率的方法对强化学习在多功能雷达干扰决策中的实际应用具有重要的意义，对于强化学习在其他领域的应用也具有很好的参考价值。

关键词: 雷达对抗, 马尔可夫决策过程, 强化学习, 收益塑造, 先验知识

Abstract:

In view of the problems of long training period and slow convergence of multi-function radar jamming decision method based on reinforcement learning, this paper proposes a multi-function radar intelligent jamming decision algorithm based on prior knowledge. The proposed algorithm uses the revenue shaping theory based on potential function, and uses prior knowledge to set the revenue function. Compared with the traditional algorithm, the algorithm has faster convergence rate. The method of accelerating the convergence rate of algorithm by using prior knowledge is of great significance for the practical application of reinforcement learning in multi-function radar jamming decision, and also has a good reference value for the application of reinforcement learning in other fields.

Key words: radar confrontation, Markov decision process (MDP), reinforcement learning, reward shaping, prior knowledge

中图分类号:

TN974

朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12): 3685-3695.

Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Multi-function radar intelligent jamming decision method based on prior knowledge[J]. Systems Engineering and Electronics, 2022, 44(12): 3685-3695.

图/表 10

图1

图2

表1

表2

表3

图3

表4

图4

表5

图5

参考文献 39

1	MARK A R , JAMES A S , WILLIAM A H . Principles of modern radar: basic principles[M]. Raleigh, NC, USA: Scitech, 2010.
2	王沙飞, 李岩, 徐迈, 等. 认知电子战原理与技术[M]. 北京: 国防工业出版社, 2018.
	WANG S F , LI Y , XU M , et al. Principle and technology of cognitive electronic warfare[M]. Beijing: National Defense Industry Press, 2018.
3	LI K , JIU B , LIU H W , et al. Game theoretic strategies design for monostatic radar and jammer based on mutual information[J]. IEEE Access, 2019, 7, 72257- 72266. doi: 10.1109/ACCESS.2019.2920398
4	SONG X F, WILLETT P, ZHOU S L, et al. The power game between a MIMO radar and jammer[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.
5	SONG X F , WILLETT P , ZHOU S L , et al. The MIMO radar and jammer games[J]. IEEE Trans.on Signal Processing, 2011, 60 (2): 687- 699.
6	赫彬, 苏洪涛. 认知雷达抗干扰中的博弈论分析综述[J]. 电子与信息学报, 2021, 43 (5): 1199- 1211.
	HE B , SU H T . A review of game theory analysis in cognitive radar anti-jamming[J]. Journal of Electronics and Information Technology, 2021, 43 (5): 1199- 1211.
7	HAN L , NING Q , CHEN B C , et al. Ground threat evaluation and jamming allocation model with Markov chain for aircraft[J]. IET Radar, Sonar & Navigation, 2020, 14 (7): 1039- 1045.
8	OSNER N R , DU PLESSIS W P . Threat evaluation and jamming allocation[J]. IET Radar Sonar & Navigation, 2017, 11 (3): 459- 465.
9	LI T P , WANG Z L , LIU J Y . Evaluating effect of blanket jamming on radar via robust time-frequency analysis and peak to average power ratio[J]. IEEE Access, 2020, 8, 214504- 214519. doi: 10.1109/ACCESS.2020.3040514
10	邢强, 朱卫纲, 贾鑫, 等. 干扰规则库未知条件下的干扰决策[J]. 系统工程与电子技术, 2019, 41 (2): 298- 303.
	XING Q , ZHU W G , JIA X , et al. Jamming decision under condition of unknown jamming rule base[J]. Systems Engineering and Electronics, 2019, 41 (2): 298- 303.
11	TANG Z , GAO X G . Research on the self-defence electronic jamming decision-making based on the discrete dynamic Bayesian network[J]. Journal of Systems Engineering and Electronics, 2008, 19 (4): 702- 708. doi: 10.1016/S1004-4132(08)60142-5
12	SILVER D , SCHRITTWIESER J , SIMONYAN K , et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550 (7676): 354- 359. doi: 10.1038/nature24270
13	YOO J , JANG D , KIM H J , et al. Hybrid reinforcement learning control for a micro quadrotor flight[J]. IEEE Control Systems Letters, 2020, 5 (2): 505- 510.
14	GUO X X , YAN W S , CUI R X . Reinforcement learning-based nearly optimal control for constrained-input partially unknown systems using differentiator[J]. IEEE Trans.on Neural Networks and Learning Systems, 2019, 31 (11): 4713- 4725.
15	MANDOW L , PÉREZ-DE-LA-CRUZ J L , RODRÍGUEZ-GAVILÁN A B , et al. Architectural planning with shape grammars and reinforcement learning: habitability and energy efficiency[J]. Engineering Applications of Artificial Intelligence, 2020, 96, 103909. doi: 10.1016/j.engappai.2020.103909
16	WU J D , ZHONG B W , LI W H , et al. Battery thermal-and health-constrained energy management for hybrid electric bus based on soft actor-critic DRL algorithm[J]. IEEE Trans.on Industrial Informatics, 2020, 17 (6): 3751- 3761.
17	WU J D , ZHONG B W , LIU K L , et al. Battery-involved energy management for hybrid electric bus based on expert-assistance deep deterministic policy gradient algorithm[J]. IEEE Trans.on Vehicular Technology, 2020, 69 (11): 12786- 12796. doi: 10.1109/TVT.2020.3025627
18	PARK H , SIM M K , CHOI D . An intelligent financial portfolio trading strategy using deep Q-learning[J]. Expert Systems with Applications, 2020, 158, 113573. doi: 10.1016/j.eswa.2020.113573
19	李云杰, 朱云鹏, 高梅国. 基于Q-学习算法的认知雷达对抗过程设计[J]. 北京理工大学学报, 2015, 35 (11): 1194- 1199.
	LI Y J , ZHU Y P , GAO M J . Design of cognitive radar jamming based on Q-learning algorithm[J]. Transactions of Beijing Institute of Technology, 2015, 35 (11): 1194- 1199.
20	WANG Y H, ZHANG T X, XU L X, et al. Model-free reinforcement learning based multi-stage smart noise jamming[C]// Proc. of the IEEE Radar Conference, 2019.
21	邢强, 贾鑫, 朱卫纲. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40 (5): 1031- 1035.
	XING Q , JIA X , ZHU W G . Intelligent radar countermeasure based on Q-learning[J]. Systems Engineering and Electronics, 2018, 40 (5): 1031- 1035.
22	张柏开, 朱卫纲. 对多功能雷达的DQN认知干扰决策方法[J]. 系统工程与电子技术, 2020, 42 (4): 819- 825.
	ZHANG B K , ZHU W G . DQN based decision-making method of cognitive jamming against multifunctional radar[J]. Systems Engineering and Electronics, 2020, 42 (4): 819- 825.
23	ZHONG J , WANG T , CHENG L L , et al. Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics[J]. Complex & Intelligent Systems, 2022, 8, 1899- 1912.
24	SHI Q , WANG D Y , LYU L , et al. Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints[J]. Industrial Robot: the International Journal of Robotics Research and Application, 2020, 47 (3): 335- 347. doi: 10.1108/IR-11-2019-0240
25	YANG S D, WANG H, GAO Y, et al. An optimal algorithm for the stochastic bandits with knowing near-optimal mean reward[C]//Proc. of the 17th International Conference on Autonomous Agents and Multi Agent Systems, 2018: 2130-2132.
26	SUI Z Z , PU Z Q , YI J Q , et al. Formation control with collision avoidance through deep reinforcement learning using model- guided demonstration[J]. IEEE Trans.on Neural Networks and Learning Systems, 2020, 32 (6): 2358- 2372.
27	SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. 2nd ed Cambridge, Massachusetts USA: MIT Press, 2018.
28	WATKINS C J , DAYAN P J M L . Q-learning[J]. Machine Learning, 1992, 8 (3): 279- 292.
29	VISNEVSKI N, KRISHNAMURTHY V, HAYKIN S, et al. Multi-function radar emitter modelling: a stochastic discrete event system approach[C]//Proc. of the IEEE 42nd International Conference on Decision and Control, 2003: 6295-6300.
30	VISNEVSKI N , KRISHNAMURTHY V , WANG A , et al. Syntactic modeling and signal multifunction radars: a stochastic context-free grammar approach[J]. Proceedings of the IEEE, 2007, 95 (5): 1000- 1025.
31	张光义. 相控阵雷达技术[M]. 北京: 国防工业出版社, 2009: 30- 32.
	ZHANG G Y . Phased array radar technology[M]. Beijing: National Defense Industry Press, 2009: 30- 32.
32	LIU Z M . Recognition of multifunction radars via hierarchically mining and exploiting pulse group patterns[J]. IEEE Trans.on Aerospace and Electronic Systems, 2020, 56 (6): 4659- 4672.
33	XU X S , BI D P , PAN J F . Method for functional state recognition of multifunction radars based on recurrent neural networks[J]. IET Radar, Sonar & Navigation, 2021, 15 (7): 724- 732.
34	MIRANDA S , BAKER C , WOODBRIDGE K , et al. Comparison of scheduling algorithms for multifunction radar[J]. IET Radar, Sonar & Navigation, 2007, 1 (6): 414- 424.
35	NGUYEN N H , DOǦANÇAY K , DAVIS L M . Adaptive waveform and Cartesian estimate selection for multistatic target tracking[J]. Signal Processing, 2015, 111, 13- 25.
36	ORMAN A , POTTS C N , SHAHANI A , et al. Scheduling for a multifunction phased array radar system[J]. European Journal of Operational research, 1996, 90 (1): 13- 25.
37	ROY V , SIMONETTO A , LEUS G . Spatio-temporal sensor management for environmental field estimation[J]. Signal Processing, 2016, 128, 369- 381.
38	NG A Y , HARADA D , RUSSELL S . Policy invariance under reward transformations: theory and application to reward shaping[J]. ICML, 1999, 99, 278- 287.
39	GUO Q , NAN P L , WAN J . Signal classification method based on data mining for multi-mode radar[J]. Journal of Systems Engineering and Electronics, 2016, 27 (5): 1010- 1017.

参数	值	名称
α	0.01	学习率
γ	0.95	折扣因子
ε	0.1	探索率

ω_s	ω_p
ω_s	1	2	4	8	16	32	64
0	1	1	1	1	1	0.91	0.99
1	1	1	1	1	1	0.96	0.98
2	1	1	1	1	1	0.98	0.98
4	1	1	0.99	1	1	1	0.97
8	0.99	1	0.99	1	1	0.98	0.98
16	1	1	1	1	1	0.98	0.97
32	1	1	1	1	0.99	1	0.96
64	1	1	1	0.99	1	0.99	0.98

参数	(ω_s, ω_p)
参数	(8, 1)	(4, 4)	(8, 4)	(64, 8)	(32, 16)	(0, 32)₁	(0, 32)₂	(0, 32)₃	(0, 32)₄	(0, 32)₅	(0, 32)₆
mean-step	7.55	7.57	7.54	7.44	8.45	8.53	8.54	8.45	8.37	8.47	8.56

参数	(ω_s, ω_p)
参数	(0, 32)₇	(0, 32)₈	(0, 32)₉	(1, 32)₁	(1, 32)₂	(1, 32)₃	(1, 32)₄	(2, 32)₁	(2, 32)₂	(8, 32)₁	(8, 32)₂
mean-step	8.41	8.58	8.55	8.64	8.54	8.75	8.56	8.58	8.64	8.32	8.60

参数	(ω_s, ω_p)
参数	(16, 32)₁	(16, 32)₂	(64, 32)	(0, 64)	(1, 64)₁	(1, 64)₂	(2, 64)₁	(2, 64)₂	(4, 64)₁	(4, 64)₂	(4, 64)₃
mean-step	8.45	8.62	8.50	9.74	8.41	8.59	8.50	8.51	8.51	8.67	8.44

参数	(ω_s, ω_p)
参数	(8, 64)₁	(8, 64)₂	(16, 64)₁	(16, 64)₂	(16, 64)₃	(32, 64)₁	(32, 64)₂	(32, 64)₃	(32, 64)₄	(64, 64)₁	(64, 64)₂
mean-step	8.57	8.56	8.53	8.38	8.49	8.61	8.47	8.45	8.47	8.56	8.42

算法	Q-Learning	基于先验知识(先验雷达状态数量)
算法	Q-Learning	1	2	3	4	5	6
收敛率	1	1	1	1	1	1	1
最优收敛率	1	1	1	0.97	0.98	0.99	1
平均收敛总步数	12 046.80	7 900.23	6 673.09	5 088.83	4 738.72	4 687.20	4 677.76

算法	Q-Learning	基于先验知识(先验知识错误率/%)
算法	Q-Learning	0	33	66	100
收敛率	1	1	1	1	1
最优收敛率	1	0.97	0.89	0.76	0.59
平均收敛总步数	12 099.24	5 101.05	9 013.92	11 517.08	12 298.90

基于先验知识的多功能雷达智能干扰决策方法

Multi-function radar intelligent jamming decision method based on prior knowledge

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 39

相关文章 15

编辑推荐

Metrics

本文评价

[1]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497.
[2]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[3]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[4]	郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949.
[5]	韩明仁, 王玉峰. 基于强化学习的全电推进卫星变轨优化方法[J]. 系统工程与电子技术, 2022, 44(5): 1652-1661.
[6]	谭诗翰, 金凤林, 顿聪颖. 面向用户需求的空天地一体化车载网络任务分配策略[J]. 系统工程与电子技术, 2022, 44(5): 1717-1727.
[7]	何立, 沈亮, 李辉, 王壮, 唐文泉. 强化学习中的策略重用: 研究进展[J]. 系统工程与电子技术, 2022, 44(3): 884-899.
[8]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[9]	曾斌, 张鸿强, 李厚朴. 针对无人潜航器的反潜策略研究[J]. 系统工程与电子技术, 2022, 44(10): 3174-3181.
[10]	万齐天, 卢宝刚, 赵雅心, 温求遒. 基于深度强化学习的驾驶仪参数快速整定方法[J]. 系统工程与电子技术, 2022, 44(10): 3190-3199.
[11]	曾斌, 王睿, 李厚朴, 樊旭. 基于强化学习的战时保障力量调度策略研究[J]. 系统工程与电子技术, 2022, 44(1): 199-208.
[12]	江志炜, 黄洋, 吴启晖. 基于核函数强化学习的抗干扰频点分配[J]. 系统工程与电子技术, 2021, 43(6): 1547-1556.
[13]	刘家义, 岳韶华, 王刚, 姚小强, 张杰. 复杂任务下的多智能体协同进化算法[J]. 系统工程与电子技术, 2021, 43(4): 991-1002.
[14]	闫安, 陈章, 董朝阳, 何康辉. 基于模糊强化学习的双轮机器人姿态平衡控制[J]. 系统工程与电子技术, 2021, 43(4): 1036-1043.
[15]	李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762.