系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (10): 3288-3299.doi: 10.12305/j.issn.1001-506X.2025.10.16

• 系统工程 • 上一篇    

基于ME-DDPG算法的无人机多对一追逃博弈

张耀中1,*, 吴卓然1, 张建东1, 杨啟明1, 史国庆1, 徐自祥2   

  1. 1. 西北工业大学电子信息学院,西安 陕西 710072
    2. 中国科学院天津工业生物技术研究所,天津 300308
  • 收稿日期:2023-12-11 出版日期:2025-10-25 发布日期:2025-10-23
  • 通讯作者: 张耀中
  • 作者简介:吴卓然(1999—),男,硕士研究生,主要研究方向为多智能体强化学习
    张建东(1974—),男,副教授,博士,主要研究方向为有/无人机协同控制、智能无人系统与任务规划、复杂系统建模与性能评估
    杨啟明(1988—),男,助理研究员,博士,主要研究方向为智能无人系统与任务规划
    史国庆(1974—),男,副教授,博士,主要研究方向为复杂系统建模、仿真与性能评估、机载嵌入式系统设计、开发与测试
    徐自祥(1969—),男,副研究员,博士,主要研究方向为人工智能、博弈论、空战
  • 基金资助:
    陕西省重点研发计划(2022GY-089);陕西省自然科学基础研究计划(2022JQ-593)资助课题

UAV many-to-one pursuit-evasion game based on ME-DDPG algorithm

Yaozhong ZHANG1,*, Zhuoran WU1, Jiandong ZHANG1, Qiming YANG1, Guoqing SHI1, Zixiang XU2   

  1. 1. School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710072,China
    2. Tianjin Institute of Industrial Biotechnology,Chinese Academy of Sciences,Tianjin 300308,China
  • Received:2023-12-11 Online:2025-10-25 Published:2025-10-23
  • Contact: Yaozhong ZHANG

摘要:

针对无人机(unmanned aerial vehicle,UAV)多对一追逃博弈问题,以强化学习的深度确定性策略梯度算法(deep deterministic policy gradient, DDPG)为基础,结合追逃问题的微分博弈对抗数值求解结果,提出一种混合经验的DDPG(mixed experienced DDPG,ME-DDPG)算法。在探索学习的策略集中加入博弈对抗数值解,计算出指向性策略,提升UAV追击策略的训练效率,改善UAV追逃博弈问题中由于回合任务过长、回报奖励稀疏、强化学习算法探索不足而导致的算法收敛速度缓慢且容易局部收敛的问题,提高了强化学习算法的学习效率。仿真实验结果表明,使用ME-DDPG算法解决UAV多对一博弈追逃任务时能够快速收敛,任务成功率达到83%。对比实验验证了所提算法相较DDPG算法在收敛性、稳定性以及任务成功率方面的优势。

关键词: 博弈论, 深度强化学习, 追逃博弈, 无人机, 多智能体

Abstract:

Aiming at the problem of many-to-one pursuit-evasion game of unmanned aerial vehicle (UAV), based on deep deterministic policy gradient (DDPG) of reinforcement learning, and numerical solution results of differential game confrontation combined with pursuit-evasion problem, a mixed experienced DDPG (ME-DDPG) algorithm is proposed. By incorporating game adversarial numerical solutions into the strategy set of exploratory learning, directional strategies are calculated to enhance the training efficiency of UAV pursuit strategies and improve the slow convergence speed and easy local convergence caused by long turn tasks, sparse reward rewards, and insufficient exploration of reinforcement learning algorithms in UAV pursuit-evasion game problems. This improves the learning efficiency of reinforcement learning algorithm. The simulation experiment results show that using the ME-DDPG algorithm to solve the pursuit-evasion task of UAV in a many-to-one game can quickly converge, and the success rate of the task reaches 83%. Comparative experiments verify the advantages of the proposed algorithm over the DDPG algorithm in terms of convergence, stability, and task success rate.

Key words: game theory, deep reinforcement learning, pursuit-evasion game, unmanned aerial vehicle (UAV), multi-agent

中图分类号: