系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (9): 2993-3003.doi: 10.12305/j.issn.1001-506X.2025.09.20

• 系统工程 • 上一篇    

基于深度强化学习的无人机空战机动分层决策算法

魏潇龙1(), 吴亚荣1(), 姚登凯2,*(), 赵顾颢1()   

  1. 1. 空军工程大学空管领航学院,陕西 西安 710051
    2. 广州理工学院航空学院,广东 广州 510540
  • 收稿日期:2024-07-23 出版日期:2025-09-25 发布日期:2025-09-16
  • 通讯作者: 姚登凯 E-mail:xiaolong3494@163.com;chumiaoying2023@163.com;yao13321185369@163.com;zghlupin@163.com
  • 作者简介:魏潇龙(1989—),男,讲师,硕士,主要研究方向为无人机运行与管控
    吴亚荣(1977—),女,教授,博士,主要研究方向为航空管制
    赵顾颢(1986—),男,副教授,博士,主要研究方向为无人机运行与管控、航空管制
  • 基金资助:
    国家自然科学基金(52074309)资助课题

Hierarchical decision-making algorithm for UAV air combat maneuvering based on deep reinforcement learning

Xiaolong WEI1(), Yarong WU1(), Dengkai YAO2,*(), Guhao ZHAO1()   

  1. 1. Air Traffic Control and Navigation School,Air Force Engineering University,Xi’an 710051,China
    2. Modern aviation college,Guangzhou institute of science and technology,Guangzhou 510540,China
  • Received:2024-07-23 Online:2025-09-25 Published:2025-09-16
  • Contact: Dengkai YAO E-mail:xiaolong3494@163.com;chumiaoying2023@163.com;yao13321185369@163.com;zghlupin@163.com

摘要:

针对无人机(unmanned aerial vehicle,UAV)超视距空战机动决策复杂度高、时效性强的问题,提出基于深度强化学习的分层决策算法。首先,根据超视距空战的战术特点,对UAV的态势判断、状态转移、胜负判定等过程进行建模,搭建空战仿真环境。其次,对深度强化学习网络模型进行构建,引入分层决策机制,使用蚁群算法作为目标网络Q值估计的启发式因子。仿真验证表明,所提算法可以使UAV根据态势变化及时采取机动策略,且策略输出和机动指令输出较为稳定,决策效率较高。所提算法可在拓宽UAV战术样式的基础上降低网络的学习难度,提升决策质量。

关键词: 无人机, 超视距, 空战对抗, 深度强化学习, 分层决策

Abstract:

Aiming at the problem of high decision-making complexity and strong timeliness in unmanned aerial vehicle (UAV) beyond visual range air combat maneuvering, deep reinforcement learning based hierarchical decision-making algorithm is proposed. Firstly, based on the tactical characteristics of beyond visual range air combat, the process of situational assessment, state transition, and success or failure judgment of UAV is modeled, and an air combat simulation environment is established. Secondly, the deep reinforcement learning network model is constructed which introduced a hierarchical decision-making mechanism. Ant colony algorithm is used as a heuristic factor in Q value estimation of the target network. Simulation results show that the proposed algorithm can enable UAV to adopt timely maneuvering strategies based on situational changes. The output of strategy and maneuvering command are relatively stable, and the decision-making efficiency is high. The proposed algorithm can reduce the learning difficulty of the network and improve the quality of decision while expanding the tactical types of UAV.

Key words: unmanned aerial vehicle (UAV), beyond visual range, air combat, deep reinforcement learning, hierarchical decision-making

中图分类号: