基于改进Actor-Critic算法的多传感器交叉提示技术

doi:10.12305/j.issn.1001-506X.2023.06.05

摘要/Abstract

摘要：

针对在减少战场资源浪费、平衡战场效费比的同时提高目标探测概率, 保证目标的可持续跟踪, 提出利用改进Actor-Critic算法的多传感器交叉提示技术进行目标探测。首先, 综合传感器探测、能耗、时效等因素搭建基于“交叉提示”传感器的动态管理评估模型; 其次, 重点分析利用Actor-Critic交叉提示算法的传感器管理决策规则, 并且提出了Actor-Critic算法，以根据任务自身需求组建中央评价网络, 加大传感器与外部环境的交互。仿真结果表明，改进的算法可以加速网络收益, 实现对目标的持续性探测, 加强传感器之间的交叉提示功能, 提升调度的智能化水平, 具有较大的应用价值。

关键词: 多传感器交叉提示, Actor-Critic算法, 强化学习, 目标探测, 传感器资源调度

Abstract:

To reduce the waste of battlefield resources, balance the cost-effectiveness ratio of the battlefield, increase the probability of target detection, and to ensure the sustainable tracking of the target, a multi-sensor cross-cueing technique based on improved Actor-Critic algorithm for target detection is proposed in this paper. Firstly, a sensor dynamic management evaluation model based on "cross-cueing" is built by integrating factors such as sensor detection, energy consumption, and timing. Secondly, the paper focuses on analyzing the decision rules of sensor management under the Actor-Critic cross-cueing algorithm. The improvement of the Actor-Critic algorithm is proposed to form a central evaluation network according to the needs of the task itself, and enlarge the interaction between the sensor and the external environment. Simulation result shows that the improved algorithm can accelerate the profit of the alliance, realize the continuous detection of the target, strengthen the cross-cueing function between sensors, and improve the intelligent level of scheduling, which provides great application value.

Key words: multi-sensor cross-cueing, Actor-Critic algorithm, reinforcement learning, object detection, sensor resource scheduling

中图分类号:

TP18

韦道知, 张曌宇, 谢家豪, 李宁. 基于改进Actor-Critic算法的多传感器交叉提示技术[J]. 系统工程与电子技术, 2023, 45(6): 1624-1632.

Daozhi WEI, Zhaoyu ZHANG, Jiahao XIE, Ning LI. Multi-sensor cross cueing technique based on improved Actor-Critic algorithm[J]. Systems Engineering and Electronics, 2023, 45(6): 1624-1632.

图/表 12

图1

图2

图3

图4

图5

图6

表1

图7

图8

图9

图10

图11

参考文献 25

1	XIE J H , HUANG S C , WEI D Z . Uncertain hybrid multi-sensor alliance dynamic control problem using an uncertain ideal point approach under the P-EV principle[J]. IEEE Access, 2020, 8, 169385- 169395. doi: 10.1109/ACCESS.2020.3022295
2	庞策, 单甘霖, 段修生. 多传感器协同识别跟踪多目标管理方法[J]. 北京航空航天大学学报, 2019, 45 (8): 1674- 1680. doi: 10.13700/j.bh.1001-5965.2018.0612
	PANG C , SHAN G L , DUAN X S . Research on multi-sensor collaborative identification and tracking multi-target management method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45 (8): 1674- 1680. doi: 10.13700/j.bh.1001-5965.2018.0612
3	KHANMIRZA H , YAZDANI N . Game of energy consumption balancing in heterogeneous sensor networks[J]. Wireless Communications and Mobile Computing, 2016, 16, 1457- 1477. doi: 10.1002/wcm.2606
4	YOO S J , SHRESTHA A P , SEO M , et al. Joint spectrum sensing and resource allocation optimization using genetic algorithm for frequency hopping-based cognitive radio networks[J]. International Journal of Communication Systems, 2018, 31 (13): e3733. doi: 10.1002/dac.3733
5	MUHAMMED O S , CHUNG W L , SHINICHI S , et al. Information-driven autonomous intersection control via incentive compatible mechanisms[J]. IEEE Trans.on Intelligent Transportation Systems, 2019, 20 (3): 912- 924. doi: 10.1109/TITS.2018.2838049
6	吴巍, 王国宏, 李朝霞, 等. 基于双边组合拍卖的传感器管理算法[J]. 系统工程与电子技术, 2014, 36 (10): 1960- 1965. doi: 10.3969/j.issn.1001-506X.2014.10.12
	WU W , WANG G H , LI Z X , et al. Sensor management algorithm based on bilateral combination auction[J]. Systems Engineering and Electronics, 2014, 36 (10): 1960- 1965. doi: 10.3969/j.issn.1001-506X.2014.10.12
7	庞策, 黄树彩, 刘锦昌, 等. 基于博弈论的多传感器交叉提示算法[J]. 系统工程与电子技术, 2017, 39 (8): 1684- 1690.
	PANG C , HUANG S C , LIU J C , et al. Multi-sensor cross cue algorithm based on game theory[J]. Systems Engineering and Electronics, 2017, 39 (8): 1684- 1690.
8	VEERASAMY G , KANNAN R , SIDDHARTHAN R K , et al. Integration of genetic algorithm tuned adaptive fading memory Kalman filter with model predictive controller for active fault-tolerant control of cement kiln under sensor faults with inaccurate noise covariance[J]. Mathematics and Computers in Simulation, 2022, 191, 256- 277. doi: 10.1016/j.matcom.2021.07.023
9	何建华, 陶思琦, 邓扬, 等. 多传感器资源动态分配拍卖算法研究[J]. 西北工业大学学报, 2019, 37 (2): 330- 336. doi: 10.3969/j.issn.1000-2758.2019.02.016
	HE J H , TAO S Q , DENG Y , et al. Research on multi-sensor resource dynamic allocation auction algorithm[J]. Journal of Northwestern Polytechnical University, 2019, 37 (2): 330- 336. doi: 10.3969/j.issn.1000-2758.2019.02.016
10	田晨, 裴扬, 侯鹏, 等. 基于决策不确定性的多目标跟踪传感器管理[J]. 航空学报, 2020, 41 (10): 267- 280.
	TIAN C , PEI Y , HOU P , et al. Multi-target tracking sensor management based on decision uncertainty[J]. Journal of Aeronautics and Astronautics, 2020, 41 (10): 267- 280.
11	PARK M , OH H . Cooperative information-driven source search and estimation for multiple agents[J]. Information Fusion, 2020, 54, 72- 84. doi: 10.1016/j.inffus.2019.07.007
12	闫实, 贺静, 王跃东, 等. 基于强化学习的多机协同传感器管理[J]. 系统工程与电子技术, 2020, 42 (8): 1726- 1733.
	YAN S , HE J , WANG Y D , et al. Multi-machine collaborative sensor management based on reinforcement learning[J]. Systems Engineering and Electronics, 2020, 42 (8): 1726- 1733.
13	MESBAH M, YAHYAOUY A, SABRI M A. Intersection management approach based on multi-agent system[C]//Proc. of the 6th International conference on wireless Technologies, Embedded and Intelligent Systems, 2022: 157-166.
14	BINOY R , RAVI K D , BEHROOZ S . Adaptive resource management algorithms for periodic tasks in dynamic real-time distributed systems-science direct[J]. Journal of Parallel and Distributed Computing, 2002, 62 (10): 1527- 1547. doi: 10.1016/S0743-7315(02)91864-2
15	AMARK G , GUYE G . Equivalence class verification of the contract net protocol extension[J]. International Journal on Software Tools for Technology Transfer, 2016, 18, 685- 706. doi: 10.1007/s10009-015-0376-z
16	SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. 2nd ed. Cambridge: MIT Press, 2018.
17	SCHMID H J . Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61, 85- 117.
18	XIONG J , LI F M , LIU J . Fusion of different height pyroelectric infrared sensors for person identification[J]. IEEE Sensors Journal, 2016, 16 (2): 436- 446. doi: 10.1109/JSEN.2015.2454000
19	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46 (7): 1301- 1312.
	SUN C Y , MU C X . Several key scientific issues of multi-agent deep reinforcement learning[J]. Chinese Journal of Automation, 2020, 46 (7): 1301- 1312.
20	STREIT R L. Multi sensor multitarget intensity filter[C]//Proc. of the International Conference on Information Fusion, 2008.
21	YE X Y , LI M , SI P B , et al. Blockchain and MEC-assisted reliable billing data transmission over electric vehicular network: an actor-critic RL approach[J]. China Communications, 2021, 18 (8): 279- 296.
22	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative competitive environments[C]//Proc. of the 31st International Conference on Neural Information Processing Systems, 2017: 6382-6393.
23	ZHANG L L , LI D W , XI Y G , et al. Reinforcement learning with actor-critic for knowledge graph reasoning[J]. Science China (Information Sciences), 2020, 63 (6): 223- 225.
24	LI L T , LI D Z , SONG T H , et al. Actor-critic learning control based on-regularized temporal-difference prediction with gradient correction[J]. IEEE Trans.on Neural Networks and Learning Systems, 2018, 29 (10): 5899- 5909.
25	HEREDIA P C , MOU S . Distributed multi-agent reinforcement learning by actor-critic method[J]. IFAC-PapersOn Line, 2019, 52 (20): 363- 368.

传感器	部署区域	坐标/km	相对能耗	高度/km	探测半径/km
1	底层	(189, 763)	0.31	0.00	200.00
2	高层	(593, 672)	0.64	1 067.00	650.00
3	中层	(729, 837)	0.52	23.00	350.00
4	中层	(903, 127)	0.28	0.00	200.00
5	高层	(672, 356)	0.48	960.00	600.00
6	底层	(305, 985)	0.55	0.00	500.00
7	底层	(370, 234)	0.43	0.00	450.00

[1]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[2]	叶立诚, 王军, 毛少卿, 刘帅. 基于多参数联合逐级离散的快速通信干扰决策方法[J]. 系统工程与电子技术, 2023, 45(5): 1518-1525.
[3]	陈恺丰, 田博睿, 李和清, 赵晨阳, 陆祖兴, 李新德, 邓勇. 基于DDPG算法的双轮腿机器人运动控制研究[J]. 系统工程与电子技术, 2023, 45(4): 1144-1151.
[4]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
[5]	任智, 张栋, 唐硕. 基于强化学习的改进三维A^*算法在线航迹规划[J]. 系统工程与电子技术, 2023, 45(1): 193-201.
[6]	李信, 李勇军, 赵尚弘. 基于深度强化学习的卫星光网络波长路由算法[J]. 系统工程与电子技术, 2023, 45(1): 264-270.
[7]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497.
[8]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[9]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[10]	郭冬子, 黄荣, 许河川, 孙立伟, 崔乃刚. 再入飞行器深度确定性策略梯度制导方法研究[J]. 系统工程与电子技术, 2022, 44(6): 1942-1949.
[11]	韩明仁, 王玉峰. 基于强化学习的全电推进卫星变轨优化方法[J]. 系统工程与电子技术, 2022, 44(5): 1652-1661.
[12]	何立, 沈亮, 李辉, 王壮, 唐文泉. 强化学习中的策略重用: 研究进展[J]. 系统工程与电子技术, 2022, 44(3): 884-899.
[13]	朱霸坤, 朱卫纲, 李伟, 杨莹, 高天昊. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12): 3685-3695.
[14]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[15]	金韬, 朱迪, 何杰颖, 王文煜. 星载太赫兹高频段大气背景辐射特性研究[J]. 系统工程与电子技术, 2022, 44(10): 3003-3011.