系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (9): 2960-2970.doi: 10.12305/j.issn.1001-506X.2025.09.17

• 系统工程 • 上一篇    

基于对抗进化强化学习的多无人艇追捕方法

姚鹏(), 韩美玉(), 王德川(), 高志诚()   

  1. 中国海洋大学工程学院,山东 青岛 266404
  • 收稿日期:2024-07-24 出版日期:2025-09-25 发布日期:2025-09-16
  • 通讯作者: 姚鹏 E-mail:yaopenghappy@163.com;441213731@qq.com;798268927@qq.com;gzc309727@163.com
  • 作者简介:韩美玉(1999—),女,硕士研究生,主要研究方向为深度强化学习、目标搜索与围捕
    王德川(1999—),男,硕士研究生,主要研究方向为制导律设计
    高志诚(2000—),男,硕士研究生,主要研究方向为无人系统运动规划
  • 基金资助:
    山东省自然科学基金(ZR2023ME009);国家自然科学基金(51909252)资助课题

Multiple unmanned surface vehicles pursuit method based on adversarial evolutionary reinforcement learning

Peng YAO(), Meiyu HAN(), Dechuan WANG(), Zhicheng GAO()   

  1. College of Engineering,Ocean University of China,Qingdao 266404,China
  • Received:2024-07-24 Online:2025-09-25 Published:2025-09-16
  • Contact: Peng YAO E-mail:yaopenghappy@163.com;441213731@qq.com;798268927@qq.com;gzc309727@163.com

摘要:

针对无人艇在应对海上突发事件中蓝方目标入侵问题,提出一种基于对抗进化强化学习算法的追逃框架。为提高追捕效果和泛化性能,红方无人艇与蓝方逃逸目标均采用强化学习方法来增加策略的多样性,通过双方的迭代对抗进化使追捕团队性能提高。对于追捕团体,考虑到任务执行过程中可能会出现个体损毁或油量耗尽等情况,采用多智能体毁后信用分配算法,并引入残差连接嵌入式长短时记忆网络以改进策略网络,同时利用岛礁等障碍物辅助提高无人艇围捕效率。仿真结果表明,对抗进化迭代训练框架能有效实现追逃双方的共同进步,且改进强化学习算法的稳定性和收敛效果相对较强。本文方法在应对多无人艇追捕问题时,具备更高的智能性与更强的灵活性,围捕效果显著提升。

关键词: 无人艇, 追逃, 对抗进化, 强化学习

Abstract:

A pursuit-evasion framework is proposed based on the adversarial evolutionary reinforcement learning algorithm for the problem of blue target intrusion in unmanned surface vehicle response to maritime emergencies. In order to improve the pursuit effect and generalization performance, the reinforcement learning method is used to increase the diversity of strategies for both the red unmanned surface vehicle and blue escape target, and the performance of the pursuit team is improved through the iterative adversarial evolution of both sides. For the pursuit team, considering that the individual may be damaged or exhausted of oil in the process of task execution, the multi-agent posthumous credit assignment algorithm is adopted and the residual-connected hidden long short-term memory network is introduced to improve the strategy network, and the obstacles such as islands and reefs are used to assist in improving the efficiency of unmanned surface vehicle encirclement and capture. Simulation results show that the adversarial evolution iterative training framework can effectively achieve the common progress of both pursuers and evaders, and the stability and convergence effect of the improved reinforcement learning algorithm are relatively strong. The proposed method demonstrates better intelligence and flexibility in addressing the problem of unmanned surface vehicle pursuit, and pursuit effect is significantly improved.

Key words: unmanned surface vehicle (USV), pursuit-escape, adversarial evolution, reinforcement learning

中图分类号: