系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (6): 1867-1879.doi: 10.12305/j.issn.1001-506X.2025.06.15

• 系统工程 • 上一篇    下一篇

基于ASDDPG算法的多无人机对抗策略

符小卫, 王辛夷, 乔哲   

  1. 西北工业大学电子信息学院, 陕西 西安 710129
  • 收稿日期:2024-03-05 出版日期:2025-06-25 发布日期:2025-07-09
  • 通讯作者: 符小卫
  • 作者简介:符小卫 (1976—), 男, 教授, 博士, 主要研究方向为无人系统协同控制与决策、无人机群智能
    王辛夷 (2001—), 女, 硕士研究生, 主要研究方向为深度强化学习
    乔哲 (1999—), 男, 硕士, 主要研究方向为深度强化学习

Confront strategy of multi-unmanned aerial vehicle based on ASDDPG algorithm

Xiaowei FU, Xinyi WANG, Zhe QIAO   

  1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
  • Received:2024-03-05 Online:2025-06-25 Published:2025-07-09
  • Contact: Xiaowei FU

摘要:

在多无人机对抗中, 无人机通信范围内的友方数量不定, 导致其获得的信息量存在变化。而深度强化学习中神经网络的输入维度是固定的, 很多算法只考虑距离较近的固定数量友方无人机的交互信息, 导致信息丢失且不符合实际战场环境。对此, 基于多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient, MADDPG)算法, 结合注意力机制, 提出注意力状态深度确定性策略梯度(attention state-deep deterministic policy gradient, ASDDPG)算法, 将变化的信息转化为固定长度的特征向量, 解决信息量与输入维度不匹配的问题, 并通过编解码结构进行状态特征提取, 增强无人机的决策能力。通过仿真实验对算法的性能进行对比分析, 验证该算法控制下的无人机具有更高胜率, 且泛化性良好, 在提升无人机对抗决策能力和泛化性方面具备优势。

关键词: 多无人机, 强化学习, 策略梯度, 机动决策, 注意力机制

Abstract:

In a multi-unmanned aerial vehicle (UAV) confrontation, the number of friendly UAVs within the range of the UAVs communication is indeterminate, resulting in changes in the amount of information it obtains. In deep reinforcement learning, the input dimension of the neural network is fixed, and many algorithms only consider the interaction information of a fixed number of friendly UAVs at a relatively close distance, resulting in information loss and inconsistent with the actual battlefield environment. In this regard, based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm and attention mechanism, the attention state-deep deterministic policy gradient (ASDDPG) algorithm is proposed to transform changing information into fixed-length feature vectors, which solves the problem of mismatch between amount of information and input dimension, and extracts state features through coder and decoder structure to enhance the decision-making ability of UAVs. Simulation experiments are designed to compare and analyze the performance of the proposed algorithm, and verify the performance advantage of the proposed algorithm with a better winning probability. The algorithm's advantages in improving UAVs adversarial decision-making and generalization have been verified in this study.

Key words: multi-unmanned aerial vehicle (UAV), reinforcement learning, policy gradient, maneuver decision-making, attention mechanism

中图分类号: