系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (11): 3486-3495.doi: 10.12305/j.issn.1001-506X.2022.11.24

• 制导、导航与控制 • 上一篇    下一篇

基于深度强化学习的海战场目标搜寻路径规划

杨清清, 高盈盈*, 郭玙, 夏博远, 杨克巍   

  1. 国防科技大学系统工程学院, 湖南 长沙 410073
  • 收稿日期:2021-09-01 出版日期:2022-10-26 发布日期:2022-10-29
  • 通讯作者: 高盈盈
  • 作者简介:杨清清(1982—), 女, 副教授, 硕士研究生导师, 博士,主要研究方向为应急管理智能决策、资源优化、任务规划方法|高盈盈(1996—), 女, 博士研究生, 主要研究方向为海上搜救任务规划、智能决策|郭玙(1993—), 女, 博士研究生, 主要研究方向为海上搜救任务规划、智能决策|夏博远(1994—), 男, 博士研究生, 主要研究方向为作战体系建模、评估与优化、复杂系统机理性研究|杨克巍(1977—), 男, 教授, 博士研究生导师, 博士, 主要研究方向为大数据与体系工程、复杂系统机理性
  • 基金资助:
    国家自然科学基金(72071206);国家自然科学基金(71690233);湖南省科技创新计划(2020RC4046);中国博士后基金(2019M653923)

Target search path planning for naval battle field based on deep reinforcement learning

Qingqing YANG, Yingying GAO*, Yu GUO, Boyuan XIA, Kewei YANG   

  1. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
  • Received:2021-09-01 Online:2022-10-26 Published:2022-10-29
  • Contact: Yingying GAO

摘要:

海战场是未来大国冲突的主阵地之一, 强大的海战场目标搜寻能力是执行海上训练和作战的最后一道屏障, 同时也因其复杂多变的环境和重要战略地位成为战场联合搜救中最艰难最核心的部分。面向海战场目标搜寻的存活时间短、实时性要求高等特点, 提出一种基于深度强化学习的海战场目标搜寻规划方法。首先, 构建了海战场目标搜寻场景数学规划模型, 并将其映射为一种强化学习模型; 然后, 基于Rainbow深度强化学习算法, 设计了海战场目标搜寻规划的状态向量、神经网络结构以及算法框架与流程。最后, 用一个案例, 验证了所提方法的可行性与有效性, 与常规应用的平行搜寻模式相比大大提高了搜寻成功率。

关键词: 海战场, 目标搜寻, 路径规划, 动态规划, 深度强化学习

Abstract:

The naval battle field is one of the main situations of the future great power conflicts. The powerful target search capability of the naval battle field is the last protection for the implementation of maritime training and combat, and becomes the most difficult and core part of the battlefield joint search and rescue because of its complex and changeable environment and important strategic position. A path planning method based on deep reinforcement learning is proposed to solve the problem of short time cycle and high real-time requirement of target search in naval battle field. Firstly, the mathematical programming model of naval battle field target search is constructed and mapped into a reinforcement learning model. Then, based on Rainbow deep reinforcement learning algorithm, the state vector, neural network structure and algorithm framework and flow of target search planning in naval battle field are designed. Finally, a case is used to verify the feasibility and effectiveness of the proposed method, which greatly improves the search success rate compared with the conventional parallel search mode.

Key words: naval battle field, target search, path planning, dynamic planning, deep reinforcement learning

中图分类号: