基于对抗进化强化学习的多无人艇追捕方法

doi:10.12305/j.issn.1001-506X.2025.09.17

摘要/Abstract

摘要：

针对无人艇在应对海上突发事件中蓝方目标入侵问题，提出一种基于对抗进化强化学习算法的追逃框架。为提高追捕效果和泛化性能，红方无人艇与蓝方逃逸目标均采用强化学习方法来增加策略的多样性，通过双方的迭代对抗进化使追捕团队性能提高。对于追捕团体，考虑到任务执行过程中可能会出现个体损毁或油量耗尽等情况，采用多智能体毁后信用分配算法，并引入残差连接嵌入式长短时记忆网络以改进策略网络，同时利用岛礁等障碍物辅助提高无人艇围捕效率。仿真结果表明，对抗进化迭代训练框架能有效实现追逃双方的共同进步，且改进强化学习算法的稳定性和收敛效果相对较强。本文方法在应对多无人艇追捕问题时，具备更高的智能性与更强的灵活性，围捕效果显著提升。

关键词: 无人艇, 追逃, 对抗进化, 强化学习

Abstract:

A pursuit-evasion framework is proposed based on the adversarial evolutionary reinforcement learning algorithm for the problem of blue target intrusion in unmanned surface vehicle response to maritime emergencies. In order to improve the pursuit effect and generalization performance, the reinforcement learning method is used to increase the diversity of strategies for both the red unmanned surface vehicle and blue escape target, and the performance of the pursuit team is improved through the iterative adversarial evolution of both sides. For the pursuit team, considering that the individual may be damaged or exhausted of oil in the process of task execution, the multi-agent posthumous credit assignment algorithm is adopted and the residual-connected hidden long short-term memory network is introduced to improve the strategy network, and the obstacles such as islands and reefs are used to assist in improving the efficiency of unmanned surface vehicle encirclement and capture. Simulation results show that the adversarial evolution iterative training framework can effectively achieve the common progress of both pursuers and evaders, and the stability and convergence effect of the improved reinforcement learning algorithm are relatively strong. The proposed method demonstrates better intelligence and flexibility in addressing the problem of unmanned surface vehicle pursuit, and pursuit effect is significantly improved.

Key words: unmanned surface vehicle (USV), pursuit-escape, adversarial evolution, reinforcement learning

中图分类号:

TP 242

姚鹏, 韩美玉, 王德川, 高志诚. 基于对抗进化强化学习的多无人艇追捕方法[J]. 系统工程与电子技术, 2025, 47(9): 2960-2970.

Peng YAO, Meiyu HAN, Dechuan WANG, Zhicheng GAO. Multiple unmanned surface vehicles pursuit method based on adversarial evolutionary reinforcement learning[J]. Systems Engineering and Electronics, 2025, 47(9): 2960-2970.

图/表 16

图1

图2

图3

图4

图5

图6

图7

表1

表2

图8

表3

图9

图10

表4

图11

图12

参考文献 30

1	张卫东, 刘笑成, 韩鹏. 水上无人系统研究进展及其面临的挑战[J]. 自动化学报, 2020, 46 (5): 847- 857.
	ZHANG W D, LIU X C, HAN P. Progress and challenges of overwater Unmanned Systems[J]. Acta Automatica Sinica, 2020, 46 (5): 847- 857.
2	LIU Z X, ZHANG Y M, YU X, et al. Unmanned surface vehicles: an overview of developments and challenges[J]. Annual Reviews in Control, 2016, 41, 71- 93. doi: 10.1016/j.arcontrol.2016.04.018
3	MU Z X, PAN J, ZHOU Z Y, et al. A survey of the pursuit-evasion problem in swarm intelligence[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24 (8): 1093- 1116.
4	李瑞珍, 杨惠珍, 萧丛杉. 基于动态围捕点的多机器人协同策略[J]. 控制工程, 2019, 26 (3): 510- 514.
	LI R Z, YANG H Z, XIAO C S. Cooperative hunting strategy for multi-mobile robot systems based on dynamic hunting points[J]. Control Engineering of China, 2019, 26 (3): 510- 514.
5	XIE Y L, LIANG X, LOU L X, et al. Self-organization method of USV swarm target strike task based on ant colony algorithm[C]//Proc. of the 3rd International Symposium on Autonomous Systems, 2019: 388−393.
6	王浩丞, 罗贺, 马滢滢, 等. 基于纳什均衡博弈的多无人机对地攻击目标分配方法[J]. 控制与决策, 2024, 39 (4): 1364- 1369.
	WANG H C, LUO H, MA Y Y, et al. A target assignment method based on Nash equilibrium game for multi UAV ground attack[J]. Control and Decision, 2024, 39 (4): 1364- 1369.
7	WANG Y H, LIU Y, XIE K C. Dynamic hunting method for Multi-USVs based on improved game theory model[C]// Proc. of the 33rd Chinese Control and Decision Conference, 2021: 3212−3217.
8	LIU F, DONG X W, YU J L, et al. Distributed Nash equilibrium seeking of N-coalition noncooperative games with application to UAV swarms[J]. IEEE Trans. on Network Science and Engineering, 2022, 9 (4): 2392- 2405. doi: 10.1109/TNSE.2022.3163447
9	FRANCIS A, FAUST A, CHIANG H T L, et al. Long-range indoor navigation with PRM-RL[J]. IEEE Trans. on Robotics, 2020, 36 (4): 1115- 1134. doi: 10.1109/TRO.2020.2975428
10	XUE W Q, KOLARIC P, FAN J L, et al. Inverse reinforcement learning in tracking control based on inverse optimal control[J]. IEEE Trans. on Cybernetics, 2022, 52 (10): 10570- 10581. doi: 10.1109/TCYB.2021.3062856
11	FANG F, LIANG W Y, WU Y, et al. Self-supervised reinforcement learning for active object detection[J]. IEEE Robotics and Automation Letters, 2022, 7 (4): 10224- 10231. doi: 10.1109/LRA.2022.3193019
12	FAN Z L, YANG H Y, LIU F, et al. Reinforcement learning method for target hunting control of multi-robot systems with obstacles[J]. International Journal of Intelligent Systems, 2022, 37 (12): 11275- 11298. doi: 10.1002/int.23042
13	夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法研究[J]. 控制与决策, 2023, 38 (5): 1438- 1447.
	XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2023, 38 (5): 1438- 1447.
14	HARATI A, AHMADABADI M N, ARAABI B N. Knowledge-based multi-agent credit assignment: a study on task type and critic information[J]. IEEE Systems Journal, 2007, 1 (1): 55- 67. doi: 10.1109/JSYST.2007.901641
15	LI Q, PENG H, LI J X, et al. A survey on text classification: from traditional to deep learning[J]. Association for Computing Machinery, 2022, 13 (2): 2157- 6904.
16	GREFF K, SRIVASTAVA R K, KOUTNIK J, et al. LSTM: a search space odyssey[J]. IEEE Trans. on Neural Networks and Learning Systems, 2017, 28 (10): 2222- 2232. doi: 10.1109/TNNLS.2016.2582924
17	XIE G, SHANGGUAN A Q, FEI R, et al. Motion trajectory prediction based on a CNN-LSTM sequential model[J]. Science China Information Sciences, 2020, 63, 212207. doi: 10.1007/s11432-019-2761-y
18	MASMITJA I, MARTIN M, OREILLY T, et al. Dynamic robotic tracking of underwater targets using reinforcement learning[J]. Science Robotics, 2023, 8(80): eade7811.
19	COHEN A, TENG E, BERGES V P, et al. On the use and misuse of absorbing states in multi-agent reinforcement learning[EB/OL]. [2024-06-23]. https: //arxiv.org/abs/2111.05992.
20	FOSSEN T I. Handbook of marine craft hydrodynamics and motion control[M]. England: John Wiley & Sons Limited, 2011.
21	CHEN L N, JIN Y C, YIN Y. Ocean wave rendering with whitecap in the visual system of a maritime simulator[J]. Journal of Computing and Information Technology, 2017, 25 (1): 63- 76. doi: 10.20532/cit.2017.1003327
22	FTRECHOT J. Realistic simulation of ocean surface using wave spectra[C]//Proc. of the International Conference on Computer Graphics Theory and Applications, 2006: 76−83.
23	SILVER D, HUBET T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, Shogi, and go through Self-Play[J]. Science, 2018, 362 (6419): 1140- 1144. doi: 10.1126/science.aar6404
24	FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proc. of the 32nd AAAI Conference on Artificial Intelligence, 2018.
25	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proc. of the 31st Conference on Neural Information Processing Systems, 2017: 6000−6010.
26	LI X S, YE P J, JIN J C, et al. Data augmented deep behavioral cloning for urban traffic control operations under a parallel learning framework[J]. IEEE Trans. on Intelligent Transportation Systems, 2022, 23 (6): 5128- 5137. doi: 10.1109/TITS.2020.3048151
27	YANG B, MA C F, XIA X F. An interrelated imitation learning method for heterogeneous drone swarm coordination[J]. IEEE Trans. on Emerging Topics in Computing, 2022, 10 (4): 1704- 1716. doi: 10.1109/TETC.2022.3202297
28	ABLETT T, CHAN B, KELLY J. Learning from guided play: improving exploration for adversarial imitation learning with simple auxiliary tasks[J]. IEEE Robotics and Automation Letters, 2023, 8 (3): 1263- 1270. doi: 10.1109/LRA.2023.3236882
29	SHI H B, SHI L, XU M, et al. End-to-end navigation strategy with deep reinforcement learning for mobile robots[J]. IEEE Trans. on Industrial Informatics, 2020, 16 (4): 2393- 2402. doi: 10.1109/TII.2019.2936167
30	XIAO D M, WANG B, SUN Z Q, et al. Behavioral cloning based model generation method for reinforcement learning[C]//Proc. of the China Automation Congress, 2023: 6776−6781.

课程场景	初始环境尺寸/m²	海况等级	逃逸者速度/（m/s）
1	15×15	0	2
2	25×25	1	6
3	35×35	2	8
4	50×50	2	12

Actor结构	奖励平均值	奖励均方差
全连接网络	23.04	29.84
LSTM-FC	66.20	20.67
H-LSTM-FC	84.76	21.56
R-H-LSTM-FC	92.48	15.60

课程场景	初始环境尺寸/m²	海况等级	追捕者速度/（m/s）
1	50×50	0	4
2	35×35	1	8
3	25×25	2	12

追捕团队策略	逃逸目标策略	追捕成功率/%	趋势	逃逸成功率/%	趋势
1	a	94	—	6	—
1	b	12	下降	88	上升
2	b	49	上升	51	下降
2	c	26	下降	74	上升
3	c	58	上升	42	下降
3	d	40	下降	60	上升
4	d	56	上升	44	下降
4	e	55	下降	45	上升

[1]	杨知沐, 张绍杰, 张朝原, 王浩宇, 赵卯卯. 基于微分博弈的导弹避撞协同制导律设计[J]. 系统工程与电子技术, 2025, 47(8): 2667-2675.
[2]	符小卫, 王辛夷, 乔哲. 基于APIQ算法的多无人机攻防对抗策略[J]. 系统工程与电子技术, 2025, 47(7): 2205-2215.
[3]	柳佳豪, 徐任杰, 孙茂桐, 姜九瑶, 李际超, 杨克巍. 基于强化学习的装备体系韧性优化方法[J]. 系统工程与电子技术, 2025, 47(7): 2216-2223.
[4]	朱运豆, 孙海权, 胡笑旋. 基于指针网络架构的多星协同成像任务规划方法[J]. 系统工程与电子技术, 2025, 47(7): 2246-2255.
[5]	符小卫, 王辛夷, 乔哲. 基于ASDDPG算法的多无人机对抗策略[J]. 系统工程与电子技术, 2025, 47(6): 1867-1879.
[6]	孟麟芝, 孙小涓, 胡玉新, 高斌, 孙国庆, 牟文浩. 面向卫星在轨处理的强化学习任务调度算法[J]. 系统工程与电子技术, 2025, 47(6): 1917-1929.
[7]	刘伊婕, 姜斌, 马亚杰, 李文博, 刘成瑞. 无人艇编队避碰路径规划与重规划[J]. 系统工程与电子技术, 2025, 47(6): 1964-1974.
[8]	郑康洁, 张新宇, 王伟菘, 刘震生. DQN与规则结合的智能船舶动态自主避障决策[J]. 系统工程与电子技术, 2025, 47(6): 1994-2001.
[9]	刘书含, 李彤, 李富强, 杨春刚. 意图态势双驱动的数据链抗干扰通信机制[J]. 系统工程与电子技术, 2025, 47(6): 2055-2064.
[10]	林志康, 施龙飞, 刘甲磊, 马佳智. 基于深度Q学习的组网雷达闪烁探测调度方法[J]. 系统工程与电子技术, 2025, 47(5): 1443-1452.
[11]	王创歌, 陈丹鹤, 廖文和. 采用势函数的三自由度气浮模拟器追逃策略[J]. 系统工程与电子技术, 2025, 47(5): 1655-1662.
[12]	王子怡, 傅雄军, 董健, 冯程. 基于分层多智能体强化学习的雷达协同抗干扰策略优化[J]. 系统工程与电子技术, 2025, 47(4): 1108-1114.
[13]	熊威, 张栋, 任智, 杨书恒. 面向有人/无人机协同打击的智能决策方法研究[J]. 系统工程与电子技术, 2025, 47(4): 1285-1299.
[14]	马鹏, 蒋睿, 王斌, 徐盟飞, 侯长波. 基于隐式对手建模的策略重构抗智能干扰方法[J]. 系统工程与电子技术, 2025, 47(4): 1355-1363.
[15]	唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841.