系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (7): 2205-2215.doi: 10.12305/j.issn.1001-506X.2025.07.14

• 系统工程 • 上一篇    

基于APIQ算法的多无人机攻防对抗策略

符小卫, 王辛夷, 乔哲   

  1. 西北工业大学电子信息学院, 陕西 西安 710129
  • 收稿日期:2024-03-05 出版日期:2025-07-16 发布日期:2025-07-22
  • 通讯作者: 符小卫
  • 作者简介:符小卫(1976—), 男, 教授, 博士, 主要研究方向为无人系统协同控制与决策、无人机群智能
    王辛夷(2001—), 女, 硕士研究生, 主要研究方向为深度强化学习
    乔哲(1999—), 男, 硕士研究生, 主要研究方向为深度强化学习、无人机群智能

Attack-defense confrontation strategy of multi-UAV based on APIQ algorithm

Xiaowei FU, Xinyi WANG, Zhe QIAO   

  1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
  • Received:2024-03-05 Online:2025-07-16 Published:2025-07-22
  • Contact: Xiaowei FU

摘要:

在多无人机(unmanned aerial vehicles, UAVs)对抗环境中, 由于UAV的数量较大, 使用常规深度强化学习方法处理此类问题时可能存在值函数维度爆炸、策略网络难收敛等问题。对此, 提出一种基于值分解思想与注意力机制的策略交互Q学习(attention policy interaction Q-learning, APIQ)集群对抗算法, 引入值分解思想, 缓解了值函数维度爆炸的问题, 并基于注意力机制对值分解中的各值进行权重分配, 促进了策略网络的收敛。为验证APIQ算法在多UAV对抗问题中的可行性, 建立较为真实的环境模型, 并通过仿真验证了该算法的可行性。与其他算法对比结果表明, APIQ算法控制下的UAV具有更高的对抗胜率。

关键词: 多无人机, 强化学习, 值分解网络, 注意力机制, 机动决策

Abstract:

Due to the large number of unmanned aerial vehicles (UAVs) in the multi-UAV confrontation environment, there may be some problems such as value function dimension explosion and difficult convergence of strategy network when using conventional deep reinforcement learning methods to deal with such problems. Therefore, a strategy, attention policy interaction Q-learning(APIQ) swarm adversarial algorithm based on value decomposition and attention mechanism is proposed. The value decomposition idea is introduced to alleviate the dimension explosion problem of value function, and the weight of each value in the value decomposition is assigned based on attention mechanism, which promotes the convergence of the policy network. In order to verify the feasibility of APIQ algorithm in the multi-UAV confrontation problem, a realistic environment model is established, and the feasibility of the algorithm is verified by simulation. The comparison with other algorithms shows that the UAV controlled by APIQ algorithm has a higher victory rate in the confrontation.

Key words: multi-unmanned aerial vehicle (UAV), reinforcement learning, value-decomposition network (VDN), attention mechanism, maneuver decision-making

中图分类号: