系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (4): 1285-1299.doi: 10.12305/j.issn.1001-506X.2025.04.25

• 制导、导航与控制 • 上一篇    下一篇

面向有人/无人机协同打击的智能决策方法研究

熊威1,2, 张栋1,2,*, 任智1,2, 杨书恒1,2   

  1. 1. 西北工业大学航天学院, 陕西 西安 710072
    2. 陕西省空天飞行器设计重点实验室, 陕西 西安 710072
  • 收稿日期:2024-04-26 出版日期:2025-04-25 发布日期:2025-05-28
  • 通讯作者: 张栋
  • 作者简介:熊威(2000—), 男, 博士研究生, 主要研究方向为飞行器集群智能规划与自主控制
    张栋(1986—), 男, 副教授, 博士, 主要研究方向为飞行器集群智能规划与自主控制
    任智(1999—), 男, 博士研究生, 主要研究方向为飞行器集群智能规划与自主控制
    杨书恒(2001—), 男, 博士研究生, 主要研究方向为飞行器集群智能规划与自主控制
  • 基金资助:
    国家自然科学基金(52472417);群体协同与自主实验室开放基金(QXZ23013402)

Research on intelligent decision-making methods for coordinated attack by manned aerial vehicles and unmanned aerial vehicles

Wei XIONG1,2, Dong ZHANG1,2,*, Zhi REN1,2, Shuheng YANG1,2   

  1. 1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China
    2. Shaanxi Key Laboratory of Space Vehicle Design, Xi'an 710072, China
  • Received:2024-04-26 Online:2025-04-25 Published:2025-05-28
  • Contact: Dong ZHANG

摘要:

有人/无人机协同是目前无人机空战发展的趋势, 智能决策是实现有人机与无人机协同打击的关键。高动态战场环境、非对称作战任务和异构多源协同体系, 导致无人机自主能力和实时性较差, 策略训练困难, 是有人/无人机协同打击研究的难点。基于有人/无人机协同的忠诚僚机方案, 设计典型的有人/无人机协同打击样式, 提出一种基于改进多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic, MATD3)策略梯度算法的强化学习方法。首先, 设计基于MATD3策略梯度算法、课程学习(curriculum learning, CL)的协同机动决策训练框架和基于迁移学习的预训练(pre-train, PT)策略, 解决有人/无人机协同打击策略训练困难的问题。其次, 建立面向有人/无人机协同机动的多机协同奖励函数和状态空间。最后, 结合设计的搭载六自由度仿真模型的数字仿真推演平台, 验证训练得到的打击策略具有高效的打击和生存能力, 能够指导未来有人/无人机协同打击作战的实际应用。

关键词: 有人/无人机协同, 空战机动决策, 深度强化学习, 忠诚僚机

Abstract:

The trend in unmanned aerial vehicle air combat is the coordination between manned aerial vehicles and unmanned aerial vehicles, with intelligent decision-making being crucial for achieving coordinated attack between manned aerial vehicles and unmanned aerial vehicles. High dynamic battlefield environment, asymmetric combat tasks and heterogeneous multi-source coordination system lead to poor autonomous capability and real-time performance of unmanned aerial vhicles, and difficult strategic training, which is the difficulty of manned aerial vehicles and unmanned aerial vehicles cooperative attack research. A typical style of manned aerial vehicles and unmanned aerial vehicles cooperative attack pattern is designed based on the loyal wingman scheme of cooperative maneuvering of manned aerial vehicles and unmanned aerial vehicles. A reinforcement learning method based on improved multi-agent twin delayed deep deterministic (MATD3) policy gradient algorithm is proposed. Firstly, the cooperative maneuvering decision-making training framework based on MATD3 policy gradient algorithm, curriculum learning (CL) and the pre-train (PT) strategy based on transfer learning are designed to solve the difficult problem of the cooperative attack strategy training of manned aerial vehicles and unmanned aerial vehicles. Secondly, the reward function and state space for unmanned aerial vehicles cooperative maneuvers are established to facilitate multi-aerial coordinated operations. Finally, a digital simulation and deduction platform is built based on the six-degree-freedom simulation model to verify that the trained attack strategy has efficient attack and survivability and can guide the practical application of manned aerial vehicles and unmanned aerial vehicles coordinated arrack operations.

Key words: manned aerial vehicles and unmanned aerial vehicles coordination, air combat maneuver decision, deep reinforcement learning, loyal wingman

中图分类号: