基于对抗进化强化学习的多无人艇追捕方法

doi:10.12305/j.issn.1001-506X.2025.09.17

Abstract

Abstract:

A pursuit-evasion framework is proposed based on the adversarial evolutionary reinforcement learning algorithm for the problem of blue target intrusion in unmanned surface vehicle response to maritime emergencies. In order to improve the pursuit effect and generalization performance, the reinforcement learning method is used to increase the diversity of strategies for both the red unmanned surface vehicle and blue escape target, and the performance of the pursuit team is improved through the iterative adversarial evolution of both sides. For the pursuit team, considering that the individual may be damaged or exhausted of oil in the process of task execution, the multi-agent posthumous credit assignment algorithm is adopted and the residual-connected hidden long short-term memory network is introduced to improve the strategy network, and the obstacles such as islands and reefs are used to assist in improving the efficiency of unmanned surface vehicle encirclement and capture. Simulation results show that the adversarial evolution iterative training framework can effectively achieve the common progress of both pursuers and evaders, and the stability and convergence effect of the improved reinforcement learning algorithm are relatively strong. The proposed method demonstrates better intelligence and flexibility in addressing the problem of unmanned surface vehicle pursuit, and pursuit effect is significantly improved.

Key words: unmanned surface vehicle (USV), pursuit-escape, adversarial evolution, reinforcement learning

CLC Number:

TP 242

Peng YAO, Meiyu HAN, Dechuan WANG, Zhicheng GAO. Multiple unmanned surface vehicles pursuit method based on adversarial evolutionary reinforcement learning[J]. Systems Engineering and Electronics, 2025, 47(9): 2960-2970.

Figures/Tables 16

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Table 1

Table 2

Fig.8

Table 3

Fig.9

Fig.10

Table 4

Fig.11

Fig.12

References 30

1	张卫东, 刘笑成, 韩鹏. 水上无人系统研究进展及其面临的挑战[J]. 自动化学报, 2020, 46 (5): 847- 857.
	ZHANG W D, LIU X C, HAN P. Progress and challenges of overwater Unmanned Systems[J]. Acta Automatica Sinica, 2020, 46 (5): 847- 857.
2	LIU Z X, ZHANG Y M, YU X, et al. Unmanned surface vehicles: an overview of developments and challenges[J]. Annual Reviews in Control, 2016, 41, 71- 93. doi: 10.1016/j.arcontrol.2016.04.018
3	MU Z X, PAN J, ZHOU Z Y, et al. A survey of the pursuit-evasion problem in swarm intelligence[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24 (8): 1093- 1116.
4	李瑞珍, 杨惠珍, 萧丛杉. 基于动态围捕点的多机器人协同策略[J]. 控制工程, 2019, 26 (3): 510- 514.
	LI R Z, YANG H Z, XIAO C S. Cooperative hunting strategy for multi-mobile robot systems based on dynamic hunting points[J]. Control Engineering of China, 2019, 26 (3): 510- 514.
5	XIE Y L, LIANG X, LOU L X, et al. Self-organization method of USV swarm target strike task based on ant colony algorithm[C]//Proc. of the 3rd International Symposium on Autonomous Systems, 2019: 388−393.
6	王浩丞, 罗贺, 马滢滢, 等. 基于纳什均衡博弈的多无人机对地攻击目标分配方法[J]. 控制与决策, 2024, 39 (4): 1364- 1369.
	WANG H C, LUO H, MA Y Y, et al. A target assignment method based on Nash equilibrium game for multi UAV ground attack[J]. Control and Decision, 2024, 39 (4): 1364- 1369.
7	WANG Y H, LIU Y, XIE K C. Dynamic hunting method for Multi-USVs based on improved game theory model[C]// Proc. of the 33rd Chinese Control and Decision Conference, 2021: 3212−3217.
8	LIU F, DONG X W, YU J L, et al. Distributed Nash equilibrium seeking of N-coalition noncooperative games with application to UAV swarms[J]. IEEE Trans. on Network Science and Engineering, 2022, 9 (4): 2392- 2405. doi: 10.1109/TNSE.2022.3163447
9	FRANCIS A, FAUST A, CHIANG H T L, et al. Long-range indoor navigation with PRM-RL[J]. IEEE Trans. on Robotics, 2020, 36 (4): 1115- 1134. doi: 10.1109/TRO.2020.2975428
10	XUE W Q, KOLARIC P, FAN J L, et al. Inverse reinforcement learning in tracking control based on inverse optimal control[J]. IEEE Trans. on Cybernetics, 2022, 52 (10): 10570- 10581. doi: 10.1109/TCYB.2021.3062856
11	FANG F, LIANG W Y, WU Y, et al. Self-supervised reinforcement learning for active object detection[J]. IEEE Robotics and Automation Letters, 2022, 7 (4): 10224- 10231. doi: 10.1109/LRA.2022.3193019
12	FAN Z L, YANG H Y, LIU F, et al. Reinforcement learning method for target hunting control of multi-robot systems with obstacles[J]. International Journal of Intelligent Systems, 2022, 37 (12): 11275- 11298. doi: 10.1002/int.23042
13	夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法研究[J]. 控制与决策, 2023, 38 (5): 1438- 1447.
	XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2023, 38 (5): 1438- 1447.
14	HARATI A, AHMADABADI M N, ARAABI B N. Knowledge-based multi-agent credit assignment: a study on task type and critic information[J]. IEEE Systems Journal, 2007, 1 (1): 55- 67. doi: 10.1109/JSYST.2007.901641
15	LI Q, PENG H, LI J X, et al. A survey on text classification: from traditional to deep learning[J]. Association for Computing Machinery, 2022, 13 (2): 2157- 6904.
16	GREFF K, SRIVASTAVA R K, KOUTNIK J, et al. LSTM: a search space odyssey[J]. IEEE Trans. on Neural Networks and Learning Systems, 2017, 28 (10): 2222- 2232. doi: 10.1109/TNNLS.2016.2582924
17	XIE G, SHANGGUAN A Q, FEI R, et al. Motion trajectory prediction based on a CNN-LSTM sequential model[J]. Science China Information Sciences, 2020, 63, 212207. doi: 10.1007/s11432-019-2761-y
18	MASMITJA I, MARTIN M, OREILLY T, et al. Dynamic robotic tracking of underwater targets using reinforcement learning[J]. Science Robotics, 2023, 8(80): eade7811.
19	COHEN A, TENG E, BERGES V P, et al. On the use and misuse of absorbing states in multi-agent reinforcement learning[EB/OL]. [2024-06-23]. https: //arxiv.org/abs/2111.05992.
20	FOSSEN T I. Handbook of marine craft hydrodynamics and motion control[M]. England: John Wiley & Sons Limited, 2011.
21	CHEN L N, JIN Y C, YIN Y. Ocean wave rendering with whitecap in the visual system of a maritime simulator[J]. Journal of Computing and Information Technology, 2017, 25 (1): 63- 76. doi: 10.20532/cit.2017.1003327
22	FTRECHOT J. Realistic simulation of ocean surface using wave spectra[C]//Proc. of the International Conference on Computer Graphics Theory and Applications, 2006: 76−83.
23	SILVER D, HUBET T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, Shogi, and go through Self-Play[J]. Science, 2018, 362 (6419): 1140- 1144. doi: 10.1126/science.aar6404
24	FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proc. of the 32nd AAAI Conference on Artificial Intelligence, 2018.
25	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proc. of the 31st Conference on Neural Information Processing Systems, 2017: 6000−6010.
26	LI X S, YE P J, JIN J C, et al. Data augmented deep behavioral cloning for urban traffic control operations under a parallel learning framework[J]. IEEE Trans. on Intelligent Transportation Systems, 2022, 23 (6): 5128- 5137. doi: 10.1109/TITS.2020.3048151
27	YANG B, MA C F, XIA X F. An interrelated imitation learning method for heterogeneous drone swarm coordination[J]. IEEE Trans. on Emerging Topics in Computing, 2022, 10 (4): 1704- 1716. doi: 10.1109/TETC.2022.3202297
28	ABLETT T, CHAN B, KELLY J. Learning from guided play: improving exploration for adversarial imitation learning with simple auxiliary tasks[J]. IEEE Robotics and Automation Letters, 2023, 8 (3): 1263- 1270. doi: 10.1109/LRA.2023.3236882
29	SHI H B, SHI L, XU M, et al. End-to-end navigation strategy with deep reinforcement learning for mobile robots[J]. IEEE Trans. on Industrial Informatics, 2020, 16 (4): 2393- 2402. doi: 10.1109/TII.2019.2936167
30	XIAO D M, WANG B, SUN Z Q, et al. Behavioral cloning based model generation method for reinforcement learning[C]//Proc. of the China Automation Congress, 2023: 6776−6781.

课程场景	初始环境尺寸/m²	海况等级	逃逸者速度/（m/s）
1	15×15	0	2
2	25×25	1	6
3	35×35	2	8
4	50×50	2	12

Actor结构	奖励平均值	奖励均方差
全连接网络	23.04	29.84
LSTM-FC	66.20	20.67
H-LSTM-FC	84.76	21.56
R-H-LSTM-FC	92.48	15.60

课程场景	初始环境尺寸/m²	海况等级	追捕者速度/（m/s）
1	50×50	0	4
2	35×35	1	8
3	25×25	2	12

追捕团队策略	逃逸目标策略	追捕成功率/%	趋势	逃逸成功率/%	趋势
1	a	94	—	6	—
1	b	12	下降	88	上升
2	b	49	上升	51	下降
2	c	26	下降	74	上升
3	c	58	上升	42	下降
3	d	40	下降	60	上升
4	d	56	上升	44	下降
4	e	55	下降	45	上升

[1]	Xiaowei FU, Xinyi WANG, Zhe QIAO. Attack-defense confrontation strategy of multi-UAV based on APIQ algorithm [J]. Systems Engineering and Electronics, 2025, 47(7): 2205-2215.
[2]	Jiahao LIU, Renjie XU, Maotong SUN, Jiuyao JIANG, Jichao LI, Kewei YANG. Reinforcement learning-based resilience optimization method of equipment system-of-systems [J]. Systems Engineering and Electronics, 2025, 47(7): 2216-2223.
[3]	Yundou ZHU, Haiquan SUN, Xiaoxuan HU. Multi-satellite cooperative imaging task planning method based on pointer network architecture [J]. Systems Engineering and Electronics, 2025, 47(7): 2246-2255.
[4]	Xiaowei FU, Xinyi WANG, Zhe QIAO. Confront strategy of multi-unmanned aerial vehicle based on ASDDPG algorithm [J]. Systems Engineering and Electronics, 2025, 47(6): 1867-1879.
[5]	Linzhi MENG, Xiaojuan SUN, Yuxin HU, Bin GAO, Guoqing SUN, Wenhao MU. Reinforcement learning task scheduling algorithm for satellite on-orbit processing [J]. Systems Engineering and Electronics, 2025, 47(6): 1917-1929.
[6]	Kangjie ZHENG, Xinyu ZHANG, Weisong WANG, Zhensheng LIU. Intelligent ship dynamic autonomous obstacle avoidance decision based on DQN and rule [J]. Systems Engineering and Electronics, 2025, 47(6): 1994-2001.
[7]	Shuhan LIU, Tong LI, Fuqiang LI, Chungang YANG. Intent and situation-dual driven anti-jamming communication mechanism for data link [J]. Systems Engineering and Electronics, 2025, 47(6): 2055-2064.
[8]	Zhikang LIN, Longfei SHI, Jialei LIU, Jiazhi MA. Scintillation detection scheduling method of netted radar based on deep Q-learning [J]. Systems Engineering and Electronics, 2025, 47(5): 1443-1452.
[9]	Ziyi WANG, Xiongjun FU, Jian DONG, Cheng FENG. Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning [J]. Systems Engineering and Electronics, 2025, 47(4): 1108-1114.
[10]	Wei XIONG, Dong ZHANG, Zhi REN, Shuheng YANG. Research on intelligent decision-making methods for coordinated attack by manned aerial vehicles and unmanned aerial vehicles [J]. Systems Engineering and Electronics, 2025, 47(4): 1285-1299.
[11]	Peng MA, Rui JIANG, Bin WANG, Mengfei XU, Changbo HOU. Strategy reconstruction for resilience against intelligence jamming based on implicit opponent modeling [J]. Systems Engineering and Electronics, 2025, 47(4): 1355-1363.
[12]	Kaiqiang TANG, Huiqiao FU, Jiasheng LIU, Guizhou DENG, Chunlin CHEN. Hierarchical optimization research of constrained vehicle routing based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2025, 47(3): 827-841.
[13]	Xiarong CHEN, Jichao LI, Gang CHEN, Peng LIU, Jiang JIANG. Portfolio of weapon system-of-systems based on heterogeneous information networks [J]. Systems Engineering and Electronics, 2025, 47(3): 855-861.
[14]	Ke FU, Hao CHEN, Yu WANG, Quan LIU, Jian HUANG. Uncertainty-based Bayesian policy reuse method [J]. Systems Engineering and Electronics, 2025, 47(2): 535-543.
[15]	Xiaolin LIU, Mengjiao GUO, Zhuo LI. Adaptive graph convolutional recurrent network prediction method for flight delay based on Dueling DQN optimization [J]. Systems Engineering and Electronics, 2025, 47(2): 568-579.

Multiple unmanned surface vehicles pursuit method based on adversarial evolutionary reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 30

Related Articles 15

Recommended Articles

Metrics

Comments