基于强化学习的多机协同传感器管理

doi:10.3969/j.issn.1001-506X.2020.08.12

Abstract

Abstract:

In the networked war, it is urgent that airborne radar can continuously acquire target information while ensuring the safe survival. Focusing on this problem, in the context of safe transition tasks of multi-airborne cooperative operations, this paper proposes a intelligent sensor management method based on deep reinforcement learning. First, the real-time threat membership is calculated considering the signal radiation and several threat factors. Then, the radar-target assignment problem is modeled in a reinforcement learning framework. The neural network is used to approximate the action-value function, and the parameters are updated according to the temporal-difference algorithm. It can be seen from the simulation that the proposed algorithm improves the task success rate and shortens the time of task completion compared with the traditional scheduling methods.

Key words: sensor management, reinforcement learning, threat membership

CLC Number:

TP181

Shi YAN, Jing HE, Yuedong WANG, Ziqiang SUN, Yan LIANG. Multi-airborne cooperative sensor management based on reinforcement learning[J]. Systems Engineering and Electronics, 2020, 42(8): 1726-1733.

Figures/Tables 13

Fig.1

Fig.2

Fig.3

Fig.4

Table 1

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Table 2

Table 3

References 21

1	闫涛, 韩崇昭, 张光华. 空中目标传感器管理方法综述[J]. 航空学报, 2018, 39 (10): 26- 36.
	YAN T , HAN C Z , ZHANG G H . An overview of sensor management approaches for aerial target[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39 (10): 26- 36.
2	HERO A O , COCHRAN D . Sensor management: past, present, and future[J]. IEEE Sensors Journal, 2011, 11 (12): 3064- 3075. doi: 10.1109/JSEN.2011.2167964
3	KATSILIERIS F , DRIESSEN H , YAROVOY A . Threat-based sensor management for target trac king[J]. IEEE Trans.on Aerospace and Electronic Systems, 2015, 51 (4): 2772- 2785. doi: 10.1109/TAES.2015.140052
4	刘先省, 申石磊, 潘泉, 等. 基于信息熵的一种传感器管理算法[J]. 电子学报, 2000, 28 (9): 39- 41. doi: 10.3321/j.issn:0372-2112.2000.09.011
	LIU X X , SHEN S L , PAN Q , et al. An algorithm of sensor management based on information entropy[J]. Acta Eletronica Sinica, 2000, 28 (9): 39- 41. doi: 10.3321/j.issn:0372-2112.2000.09.011
5	何友, 关欣, 王国宏. 多传感器信息融合研究进展与展望[J]. 宇航学报, 2005, 26 (4): 524- 530. doi: 10.3321/j.issn:1000-1328.2005.04.028
	HE Y , GUAN X , WANG G H . Survey on the progress and prospect of multisensor information fusion[J]. Journal of Astronautics, 2005, 26 (4): 524- 530. doi: 10.3321/j.issn:1000-1328.2005.04.028
6	LIGGINS M E, CHONG C Y. Distributed multi-platform fusion for enhanced radar management[C]//Proc.of the IEEE National Radar Conference, 1997: 115-119.
7	RISTIC B , VO B N , CLARK D . A note on the reward function for PHD filters with sensor control[J]. IEEE Trans.on Aerospace and Electronic Systems, 2011, 47 (2): 1521- 1529. doi: 10.1109/TAES.2011.5751278
8	GOSTAR A K , HOSEINNEZHAD R , BAB-HADIASHAR A . Multi-bernoulli sensor control via minimization of expected estimation errors[J]. IEEE Trans.on Aerospace and Electronic Systems, 2015, 51 (3): 1762- 1773. doi: 10.1109/TAES.2015.140211
9	WANG X , HOSEINNEZHAD R , GOSTAR A K , et al. Multi-sensor control for multi-object Bayes filters[J]. Signal Proces-sing, 2018, 142, 260- 270. doi: 10.1016/j.sigpro.2017.07.031
10	JOSHI S , BOYD S . Sensor selection via convex optimization[J]. IEEE Trans.on Signal Processing, 2009, 57 (2): 451- 462. doi: 10.1109/TSP.2008.2007095
11	徐瑞阳, 冯新喜. 基于矩阵遗传的传感器管理算法[J]. 现代雷达, 2016, 38 (1): 42- 46.
	XU R Y , FENG X X . Sensor management algorithm based on matrix genetic algorithm[J]. Modern Radar, 2016, 38 (1): 42- 46.
12	VINYALS M , RODRIGUEZ-AGUILAR J A , CERQUIDES J . A survey on sensor networks from a multiagent perspective[J]. The Computer Journal, 2011, 54 (3): 455- 470.
13	SHAH K , DI FRANCESCO M , KUMAR M . Distributed re source management in wireless sensor networks using reinforcement learning[J]. Wireless Networks, 2013, 19 (5): 705- 724. doi: 10.1007/s11276-012-0496-2
14	CAI J , HUANG C Q , GUO H F . Multi-sensor cooperative tracking using distributed Nash Q-learning[J]. Advanced Materials Research, 2012, 591-593, 1475- 1478. doi: 10.4028/www.scientific.net/AMR.591-593.1475
15	SUTTON R , BARTO A . Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998.
16	LI Y. Deep reinforcement learning: an overview[EB/OL].[2020-04-03]. http://arxiv.org/abs/1701.07274.
17	WATKINS C J C H , DAYAN P . Q-learning[J]. Machine Learning, 1992, 8 (3/4): 279- 292. doi: 10.1023/A:1022676722315
18	MNIH V , KAVUKCUOGLU K , SILVER D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
19	闫冲冲, 郝永生. 基于层次分析法(AHP)的空中目标威胁度估计[J]. 计算技术与自动化, 2011, 30 (2): 118- 121. doi: 10.3969/j.issn.1003-6199.2011.02.029
	YAN C C , HAO Y S . Threat assessment of aerial target based on AHP[J]. Computing Technology and Automation, 2011, 30 (2): 118- 121. doi: 10.3969/j.issn.1003-6199.2011.02.029
20	刘顺利, 陈亚生, 陈琳, 等. 基于Agent的空中目标威胁度评估模型[J]. 弹箭与制导学报, 2010, 30 (6): 212- 215. doi: 10.3969/j.issn.1673-9728.2010.06.065
	LIU S L , CHEN Y S , CHEN L , et al. Model for aerial targeis threat evaluation based on agent[J]. Journal of Projectiles, Rockets, Missiles and Guidance, 2010, 30 (6): 212- 215. doi: 10.3969/j.issn.1673-9728.2010.06.065
21	李俊, 郝成民, 刘湘伟. 改进PSO算法在雷达干扰任务分配中的应用[J]. 计算机仿真, 2008, (12): 38- 41. doi: 10.3969/j.issn.1006-9348.2008.12.011
	LI J , HAO C M , LIU X W . Application of improved PSO arithmetic in radar jamming task assignment[J]. Computer Simulation, 2008, (12): 38- 41. doi: 10.3969/j.issn.1006-9348.2008.12.011

立场	运动情况	初始状态
红方	2架侦察机按预设轨迹执行穿越	$侦察机初始状态\left[\begin{array}{cccc}230 & 0 & 0 & 100 \\200 & 0 & 10 & 100\end{array}\right]$
蓝方	5个目标顺时针螺旋飞行	$巡逻机初始状态\left[ {\begin{array}{*{20}{r}}{240}&{100}&{280}&{ - 100}\\{240}&{100}&{260}&{ - 100}\\{240}&{100}&{200}&{ - 100}\\{80}&{100}&{320}&{ - 100}\\{80}&{100}&{340}&{ - 100}\end{array}} \right] $

指标	RL调度	PSO调度	固定调度
平均威胁度	39.08	45.72	75.54
任务成功率/%	73	62	1

方案	平均用时/s
RL调度	0.96
PSO调度	3.55

[1]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Research on decision-making modeling of cognitive jamming for multi-functional radar based on Markov [J]. Systems Engineering and Electronics, 2022, 44(8): 2488-2497.
[2]	Guan WANG, Haizhong RU, Dali ZHANG, Guangcheng MA, Hongwei XIA. Design of intelligent control system for flexible hypersonic vehicle [J]. Systems Engineering and Electronics, 2022, 44(7): 2276-2285.
[3]	Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.
[4]	Dongzi GUO, Rong HUANG, Hechuan XU, Liwei SUN, Naigang CUI. Research on deep deterministic policy gradient guidance method for reentry vehicle [J]. Systems Engineering and Electronics, 2022, 44(6): 1942-1949.
[5]	Mingren HAN, Yufeng WANG. Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(5): 1652-1661.
[6]	Li HE, Liang SHEN, Hui LI, Zhuang WANG, Wenquan TANG. Survey on policy reuse in reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(3): 884-899.
[7]	Bakun ZHU, Weigang ZHU, Wei LI, Ying YANG, Tianhao GAO. Multi-function radar intelligent jamming decision method based on prior knowledge [J]. Systems Engineering and Electronics, 2022, 44(12): 3685-3695.
[8]	Qingqing YANG, Yingying GAO, Yu GUO, Boyuan XIA, Kewei YANG. Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(11): 3486-3495.
[9]	Bin ZENG, Hongqiang ZHANG, Houpu LI. Research on anti-submarine strategy for unmanned undersea vehicles [J]. Systems Engineering and Electronics, 2022, 44(10): 3174-3181.
[10]	Qitian WAN, Baogang LU, Yaxin ZHAO, Qiuqiu WEN. Autopilot parameter rapid tuning method based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2022, 44(10): 3190-3199.
[11]	Bin ZENG, Rui WANG, Houpu LI, Xu FAN. Scheduling strategies research based on reinforcement learning for wartime support force [J]. Systems Engineering and Electronics, 2022, 44(1): 199-208.
[12]	Zhiwei JIANG, Yang HUANG, Qihui WU. Anti-interference frequency allocation based on kernel reinforcement learning [J]. Systems Engineering and Electronics, 2021, 43(6): 1547-1556.
[13]	Jiayi LIU, Shaohua YUE, Gang WANG, Xiaoqiang YAO, Jie ZHANG. Cooperative evolution algorithm of multi-agent system under complex tasks [J]. Systems Engineering and Electronics, 2021, 43(4): 991-1002.
[14]	An YAN, Zhang CHEN, Chaoyang DONG, Kanghui HE. Attitude balance control of two-wheeled robot based on fuzzy reinforcement learning [J]. Systems Engineering and Electronics, 2021, 43(4): 1036-1043.
[15]	Chen LI, Yanyan HUANG, Yongliang ZHANG, Tiande CHEN. Multi-agent decision-making method based on Actor-Critic framework and its application in wargame [J]. Systems Engineering and Electronics, 2021, 43(3): 755-762.

Multi-airborne cooperative sensor management based on reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 21

Related Articles 15

Recommended Articles

Metrics

Comments