基于ME-DDPG算法的无人机多对一追逃博弈

doi:10.12305/j.issn.1001-506X.2025.10.16

系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (10): 3288-3299.doi: 10.12305/j.issn.1001-506X.2025.10.16

• 系统工程 • 上一篇

基于ME-DDPG算法的无人机多对一追逃博弈

张耀中¹^,*, 吴卓然¹, 张建东¹, 杨啟明¹, 史国庆¹, 徐自祥²

1. 西北工业大学电子信息学院，西安陕西 710072
2. 中国科学院天津工业生物技术研究所，天津 300308

收稿日期:2023-12-11 出版日期:2025-10-25 发布日期:2025-10-23
通讯作者: 张耀中
作者简介:吴卓然（1999—），男，硕士研究生，主要研究方向为多智能体强化学习
张建东（1974—），男，副教授，博士，主要研究方向为有/无人机协同控制、智能无人系统与任务规划、复杂系统建模与性能评估
杨啟明（1988—），男，助理研究员，博士，主要研究方向为智能无人系统与任务规划
史国庆（1974—），男，副教授，博士，主要研究方向为复杂系统建模、仿真与性能评估、机载嵌入式系统设计、开发与测试
徐自祥（1969—），男，副研究员，博士，主要研究方向为人工智能、博弈论、空战
基金资助:
陕西省重点研发计划（2022GY-089）；陕西省自然科学基础研究计划（2022JQ-593）资助课题

UAV many-to-one pursuit-evasion game based on ME-DDPG algorithm

Yaozhong ZHANG¹^,*, Zhuoran WU¹, Jiandong ZHANG¹, Qiming YANG¹, Guoqing SHI¹, Zixiang XU²

1. School of Electronics and Information，Northwestern Polytechnical University，Xi’an 710072，China
2. Tianjin Institute of Industrial Biotechnology，Chinese Academy of Sciences，Tianjin 300308，China

Received:2023-12-11 Online:2025-10-25 Published:2025-10-23
Contact: Yaozhong ZHANG

摘要/Abstract

摘要：

针对无人机（unmanned aerial vehicle，UAV）多对一追逃博弈问题，以强化学习的深度确定性策略梯度算法（deep deterministic policy gradient, DDPG）为基础，结合追逃问题的微分博弈对抗数值求解结果，提出一种混合经验的DDPG（mixed experienced DDPG，ME-DDPG）算法。在探索学习的策略集中加入博弈对抗数值解，计算出指向性策略，提升UAV追击策略的训练效率，改善UAV追逃博弈问题中由于回合任务过长、回报奖励稀疏、强化学习算法探索不足而导致的算法收敛速度缓慢且容易局部收敛的问题，提高了强化学习算法的学习效率。仿真实验结果表明，使用ME-DDPG算法解决UAV多对一博弈追逃任务时能够快速收敛，任务成功率达到83%。对比实验验证了所提算法相较DDPG算法在收敛性、稳定性以及任务成功率方面的优势。

关键词: 博弈论, 深度强化学习, 追逃博弈, 无人机, 多智能体

Abstract:

Aiming at the problem of many-to-one pursuit-evasion game of unmanned aerial vehicle （UAV）, based on deep deterministic policy gradient （DDPG） of reinforcement learning, and numerical solution results of differential game confrontation combined with pursuit-evasion problem, a mixed experienced DDPG （ME-DDPG） algorithm is proposed. By incorporating game adversarial numerical solutions into the strategy set of exploratory learning, directional strategies are calculated to enhance the training efficiency of UAV pursuit strategies and improve the slow convergence speed and easy local convergence caused by long turn tasks, sparse reward rewards, and insufficient exploration of reinforcement learning algorithms in UAV pursuit-evasion game problems. This improves the learning efficiency of reinforcement learning algorithm. The simulation experiment results show that using the ME-DDPG algorithm to solve the pursuit-evasion task of UAV in a many-to-one game can quickly converge, and the success rate of the task reaches 83%. Comparative experiments verify the advantages of the proposed algorithm over the DDPG algorithm in terms of convergence, stability, and task success rate.

Key words: game theory, deep reinforcement learning, pursuit-evasion game, unmanned aerial vehicle （UAV）, multi-agent

中图分类号:

V 279

张耀中, 吴卓然, 张建东, 杨啟明, 史国庆, 徐自祥. 基于ME-DDPG算法的无人机多对一追逃博弈[J]. 系统工程与电子技术, 2025, 47(10): 3288-3299.

Yaozhong ZHANG, Zhuoran WU, Jiandong ZHANG, Qiming YANG, Guoqing SHI, Zixiang XU. UAV many-to-one pursuit-evasion game based on ME-DDPG algorithm[J]. Systems Engineering and Electronics, 2025, 47(10): 3288-3299.

图/表 13

图1

图2

图3

图4

图5

图6

图7

图8

表1

图9

图10

图11

图12

参考文献 28

29	WANG Y Y, WANG X, ZHOU W X, et al. Threat potential field based pursuit-evasion games for under actuated unmanned surface vehicles[J]. Ocean Engineering, 2023, 285(Part 2): 115381.
30	DANKWA S, ZHENG W. Twin-delayed ddpg: a deep reinforcement learning technique to model a continuous movement of an intelligent robot agent[C]//Proc. of the 3rd International Conference on Vision, Image and Signal Processing, 2019.
31	ALACAOGLU A, VIANO L, HE N, et al. A natural actor-critic framework for zero-sum Markov games[C]//Proc. of the 39th International Conference on Machine Learning, 2022: 307−366.
32	DUAN J, GUAN Y, LI S E, et al. Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors[J]. IEEE Trans. on Neural Networks and Learning Systems, 2021, 33 (11): 6584- 6598.
33	HUANG Z Q, HUA G Y, WANG J Y, et al. Exploration strategy improved DDPG for lane keeping tasks in autonomous driving[C]//Proc. of the 2nd International Conference on Artificial Intelligence, Automation and Algorithms, 2022: 012020.
34	ICARTE R T, KLASSEN T Q, VALENZANO R, et al. Reward machines: exploiting reward function structure in reinforcement learning[EB/OL]. [2023-11-11]. https://arxiv.org/abs/2010.03950.
1	EMIMI M, KHALEEL M, ALKRASH A. The current opportunities and challenges in drone technology[J]. International Journal of Electrical Engineering and Sustainability, 2023, 1 (3): 74- 89.
2	ZHANG R L, ZONG Q, ZHANG X Y, et al. Game of drones: multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning[J]. IEEE Trans. on Neural Networks and Learning Systems, 2023, 34 (10): 7900- 7909. doi: 10.1109/TNNLS.2022.3146976
3	CHAPPELL A R. Knowledge-based reasoning in the Paladin tactical decision generation system[C]//Proc. of the 11th Digital Avionics Systems Conference, 1992: 155−160.
4	TENG T H, TAN A H, TAN Y S, et al. Self-organizing neural networks for learning air combat maneuvers[C]//Proc. of the International Joint Conference on Neural Networks, 2012.
5	BATHER J A. Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization[J]. Wiley, 1966, 129 (3): 474- 475.
6	左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38 (10): 321168.
	ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38 (10): 321168.
7	王炫, 王维嘉, 宋科璞, 等. 基于进化式专家系统树的无人机空战决策技术[J]. 兵工自动化, 2019, 38 (1): 48- 53.
	WANG X, WANG W J, SONG K P, et al. UAV air combat decision technology based on evolutionary expert system tree[J]. Ordnance Industry Automation, 2019, 38 (1): 48- 53.
8	张宏鹏, 黄长强, 轩永波, 等. 基于深度神经网络的无人作战飞机自主空战机动决策[J]. 兵工学报, 2020, 41 (8): 1613- 1622.
	ZHANG H P, HUANG C Q, XUAN Y B, et al. Autonomous air combat maneuver decision of unmanned combat aircraft based on deep neural networks[J]. Acta Armamentarii, 2020, 41 (8): 1613- 1622.
9	DONG B, FENG Z A, CUI Y M, et al. Event-triggered adaptive fuzzy optimal control of modular robot manipulators using zero-sum differential game through value iteration[J]. International Journal of Adaptive Control and Signal Processing, 2023, 37 (9): 2364- 2379. doi: 10.1002/acs.3642
10	ZHANG Y Q, ZHANG P F, WANG X D, et al. An open loop Stackelberg solution to optimal strategy for UAV pursuit-evasion game[J]. Aerospace Science and Technology, 2022, 129, 107840. doi: 10.1016/j.ast.2022.107840
11	WANG X, WEI Q L, LI T, et al. Optimal strategy for aircraft pursuit-evasion games via self-play iteration[EB/OL]. [2023-11-12]. http://doi.org/10.1007/s11633-022-1413-5.
12	GARCIA E, CASBEER D W, PACHTER M. Active target defence differential game: fast defender case[J]. IET Control Theory & Applications, 2017, 11 (17): 2985- 2993.
13	魏慎娜. 基于新型态势函数的空战微分博弈问题研究[D]. 沈阳: 沈阳航空航天大学, 2018.
	WEI S N. Research on differential game problem of air combat based on new situation function[D]. Shenyang: Shenyang University of Aeronautics and Astronautics, 2018.
14	李守义, 陈谋, 王玉惠, 等. 非完备策略集下人机对抗空战决策方法[J]. 中国科学: 信息科学, 2022, 52 (12): 2239- 2253. doi: 10.1360/SSI-2022-0222
	LI S Y, CHEN M, WANG Y H, et al. Human-computer gaming decision-making method in air combat under an incomplete strategy set[J]. SCIENTIA SINICA Informationis, 2022, 52 (12): 2239- 2253. doi: 10.1360/SSI-2022-0222
15	HUA X, LIU J, ZHANG J J, et al. An apollonius circle based game theory and Q-learning for cooperative hunting in unmanned aerial vehicle cluster[J]. Computers and Electrical Engineering, 2023, 110, 108876. doi: 10.1016/j.compeleceng.2023.108876
16	ZULUAGA J, LEIDIG J P, TREFFTZ C, et al. Deep reinforcement learning for autonomous search and rescue[C]//Proc. of the National Aerospace and Electronics Conference, 2018: 521−525.
17	GONG Z H, XU Y, LUO D L. UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning[J]. Unmanned Systems, 2023, 11 (3): 273- 286. doi: 10.1142/S2301385023410029
18	KURNIAWAN B, VAMPLEW P, PAPASIMEON M, et al. An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation[J]. Lecture Notes in Computer Science, 2019, 11919, 54- 65.
19	HE G H, KANG M X, JIANG K C. A decision method for simulated confrontation of UAVs based on deep reinforcement learning[C]//Proc. of the 42nd Chinese Control Conference, 2023: 8098−8103.
20	ZHENG J Q, MA Q H, YANG S J, et al. Research on cooperative operation of air combat based on multi-agent[C]//Proc. of the 2nd International Conference on Human Interaction and Emerging Technologies: Future Applications, 2020: 175−179.
21	张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机多维空战决策[J]. 兵工学报, 2023, 44 (6): 1547- 1563.
	ZHANG J D, WANG D H, YANG Q M, et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning[J]. Acta Armamentarii, 2023, 44 (6): 1547- 1563.
22	PIPLAI A, ANORUO M, FASAYE K, et al. Knowledge guided two-player reinforcement learning for cyber attacks and defenses[C]//Proc. of the 21st IEEE International Conference on Machine Learning and Applications, 2022: 1342−1349.
23	ZHANG M, LIU T, CHEN Y Y, et al. A h-D3QN-QMIX design for formation decision in air combat[C]//Proc. of the 42nd Chinese Control Conference, 2023: 5577−5582.
24	XIONG H, ZHANG Y. Reinforcement learning-based formation-surrounding control for multiple quadrotor UAVs pursuit-evasion games[J]. ISA Transactions, 2024, 145, 205- 224. doi: 10.1016/j.isatra.2023.12.006
25	刘冰雁, 叶雄兵, 高勇, 等. 基于分支深度强化学习的非合作目标追逃博弈策略求解[J]. 航空学报, 2020, 41 (10): 348- 358.
	LIU B Y, YE X B, GAO Y, et al. Strategy solving of non-cooperative target pursuit game based on branch deep reinforcement learning[J]. Chinese Journal of Aeronautics, 2020, 41 (10): 348- 358.
26	GARCIA E, CASBEER D W, MOLL A V, et al. Multiple pursuer multiple evader differential games[J]. IEEE Trans. on Automatic Control, 2019, 66 (5): 2345- 2350.
27	LIANG X, ZHOU B R, JIANG L P, et al. Collaborative pursuit-evasion game of multi-UAVs based on Apollonius circle in the environment with obstacle[J]. Connection Science, 2023, 35 (1): 2168253. doi: 10.1080/09540091.2023.2168253
28	SONG F, QIAN B Y, WANG Y. Collision avoidance method of autonomous vehicle based on improved artificial potential field algorithm[J]. Journal of Automobile Engineering, 2021, 235 (14): 3416- 3430. doi: 10.1177/09544070211014319

$p$值	追捕成功率	平均奖励值
0	0.714	51.694
0.01	0.749	57.653
0.05	0.784	61.583
0.10	0.822	69.159
0.20	0.801	68.984

[1]	闻雯, 时晨光, 周建江. 多元威胁环境下无人机集群隐身航迹规划算法[J]. 系统工程与电子技术, 2025, 47(9): 2971-2984.
[2]	魏潇龙, 吴亚荣, 姚登凯, 赵顾颢. 基于深度强化学习的无人机空战机动分层决策算法[J]. 系统工程与电子技术, 2025, 47(9): 2993-3003.
[3]	郭方杰, 李靖, 张朝辉. 具有输入时滞的MAS预设时间滞后一致性[J]. 系统工程与电子技术, 2025, 47(9): 3041-3046.
[4]	张国庆, 徐轶晖, 李纪强, 张显库, 邱斌. 基于异步搜寻制导的机/船协同事件触发控制[J]. 系统工程与电子技术, 2025, 47(9): 3058-3065.
[5]	杨大鹏, 龚资浩, 王小也, 郭正玉, 罗德林. 基于多智能体强化学习的无人机协同截击机动决策研究[J]. 系统工程与电子技术, 2025, 47(9): 3076-3085.
[6]	张欣悦, 吴晓莉, 王名珺, 晏彪, 武愈涵. 有/无人机协同操作界面的最佳交互方式评估[J]. 系统工程与电子技术, 2025, 47(8): 2600-2611.
[7]	李延通, 李子璠, 周姗姗, 张闯. 无人机光伏电站巡检双目标选址-路径问题研究[J]. 系统工程与电子技术, 2025, 47(8): 2612-2621.
[8]	羊钊, 胡锦标, 王艳, 齐洪彪. 考虑异巢起降的无人机山地巡检覆盖路径规划[J]. 系统工程与电子技术, 2025, 47(8): 2622-2631.
[9]	闫小伟, 凌冲, 石胜斌. 地表未爆子弹药快速检测系统设计与实现[J]. 系统工程与电子技术, 2025, 47(8): 2639-2645.
[10]	张晓璐, 陈亚洲, 赵敏. 无人机数据链带内电磁干扰效应预测模型与验证[J]. 系统工程与电子技术, 2025, 47(8): 2763-2773.
[11]	符小卫, 王辛夷, 乔哲. 基于APIQ算法的多无人机攻防对抗策略[J]. 系统工程与电子技术, 2025, 47(7): 2205-2215.
[12]	朱运豆, 孙海权, 胡笑旋. 基于指针网络架构的多星协同成像任务规划方法[J]. 系统工程与电子技术, 2025, 47(7): 2246-2255.
[13]	郑凯文, 杜承泽, 赵兴芳, 逄晓凡. 融合时空散列的三维RRT^*多编队航路规划[J]. 系统工程与电子技术, 2025, 47(7): 2256-2266.
[14]	吴北苹, 何晶, 党慧莹, 岳地久. 基于作战环的反无人机作战体系贡献率评估[J]. 系统工程与电子技术, 2025, 47(7): 2267-2274.
[15]	林思颖, 郁丰, 熊智, 吴方, 周紫君. 基于AHRS的GNSS间断拒止下低成本无人机导航方法[J]. 系统工程与电子技术, 2025, 47(7): 2329-2338.

基于ME-DDPG算法的无人机多对一追逃博弈

UAV many-to-one pursuit-evasion game based on ME-DDPG algorithm

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 28

相关文章 15

编辑推荐

Metrics

本文评价