基于APIQ算法的多无人机攻防对抗策略

doi:10.12305/j.issn.1001-506X.2025.07.14

摘要/Abstract

摘要：

在多无人机(unmanned aerial vehicles, UAVs)对抗环境中, 由于UAV的数量较大, 使用常规深度强化学习方法处理此类问题时可能存在值函数维度爆炸、策略网络难收敛等问题。对此, 提出一种基于值分解思想与注意力机制的策略交互Q学习(attention policy interaction Q-learning, APIQ)集群对抗算法, 引入值分解思想, 缓解了值函数维度爆炸的问题, 并基于注意力机制对值分解中的各值进行权重分配, 促进了策略网络的收敛。为验证APIQ算法在多UAV对抗问题中的可行性, 建立较为真实的环境模型, 并通过仿真验证了该算法的可行性。与其他算法对比结果表明, APIQ算法控制下的UAV具有更高的对抗胜率。

关键词: 多无人机, 强化学习, 值分解网络, 注意力机制, 机动决策

Abstract:

Due to the large number of unmanned aerial vehicles (UAVs) in the multi-UAV confrontation environment, there may be some problems such as value function dimension explosion and difficult convergence of strategy network when using conventional deep reinforcement learning methods to deal with such problems. Therefore, a strategy, attention policy interaction Q-learning(APIQ) swarm adversarial algorithm based on value decomposition and attention mechanism is proposed. The value decomposition idea is introduced to alleviate the dimension explosion problem of value function, and the weight of each value in the value decomposition is assigned based on attention mechanism, which promotes the convergence of the policy network. In order to verify the feasibility of APIQ algorithm in the multi-UAV confrontation problem, a realistic environment model is established, and the feasibility of the algorithm is verified by simulation. The comparison with other algorithms shows that the UAV controlled by APIQ algorithm has a higher victory rate in the confrontation.

Key words: multi-unmanned aerial vehicle (UAV), reinforcement learning, value-decomposition network (VDN), attention mechanism, maneuver decision-making

中图分类号:

TP181

符小卫, 王辛夷, 乔哲. 基于APIQ算法的多无人机攻防对抗策略[J]. 系统工程与电子技术, 2025, 47(7): 2205-2215.

Xiaowei FU, Xinyi WANG, Zhe QIAO. Attack-defense confrontation strategy of multi-UAV based on APIQ algorithm[J]. Systems Engineering and Electronics, 2025, 47(7): 2205-2215.

图/表 14

图1

图2

图3

图4

图5

图6

图7

图8

表1

表2

图9

图10

图11

图12

参考文献 29

30	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[C]//Proc. of the 17th International Conference on Autonomous Agents and Systems, 2018: 2085-2087.
31	CARMONA R , LAURIERE M , TAN Z . Model-free mean-field reinforcement learning: mean-field MDP and mean-field Q-learning[J]. The Annals of Applied Probability, 2023, 33 (6B): 5334- 5381.
32	TAMPUU A , MATⅡSEN T , KODELJA D , et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PloS One, 2017, 12 (4): e0172395.
1	DANOY G, BRUST M R, BOUVRY P. Connectivity stability in autonomous multi-level UAV swarms for wide area monitoring[C]//Proc. of the 5th ACM Symposium on Development and Analysis of Intelligent Vehicular Networks and Applications, 2015.
2	CHEN H X , NAN Y , YANG Y . A two-stage method for UCAV TF/TA path planning based on approximate dynamic programming[J]. Mathematical Problems in Engineering, 2018, 2018 (11): 1092092.
3	YOU S X , GAO L P , DIAO M . Real-time path planning based on the situation space of UCAVS in a dynamic environment[J]. Microgravity Science and Technology, 2018, 30 (6): 899- 910.
4	HUANG C Q , DONG K S , HUANG H Q , et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J]. Journal of Systems Engineering and Electronics, 2018, 29 (1): 86- 97.
5	JIA N P , YANG Z W , YANG K W . Operational effectiveness evaluation of the swarming UAVs combat system based on a system dynamics model[J]. IEEE Access, 2019, 7, 25209- 25224.
6	OUSINGSAWAT J, CAMPBELL M E. Multiple vehicle team tasking for cooperative estimation[C]//Proc. of the American Control Conference, 2004: 36-42.
7	YANG Q M , ZHANG J D , SHI G Q , et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2019, 8, 363- 378.
8	LI S W , JIA Y H , YANG F , et al. Collaborative decision-making method for multi-UAV based on multiagent reinforcement learning[J]. IEEE Access, 2022, 10, 91385- 91396.
9	YUE L F , YANG R N , ZHANG Y , et al. Research on reinforcement learning-based safe decision-making methodology for multiple unmanned aerial vehicles[J]. Frontiers in Neurorobo-tics, 2023, 16, 1105480.
10	KIM J, HESPANHA J P. Cooperative radar jamming for groups of unmanned air vehicles[C]//Proc. of the 43rd IEEE Conference on Decision and Control, 2004: 632-637.
11	YAO Z X , LI M , CHEN Z J , et al. Mission decision-making method of multi-aircraft cooperatively attacking multi-target based on game theoretic framework[J]. Chinese Journal of Aeronautics, 2016, 29 (6): 1685- 1694.
12	KUNG C C . Study on consulting air combat simulation of cluster UAV based on mixed parallel computing framework of graphics processing unit[J]. Electronics, 2018, 7 (9): 160- 183.
13	ZHOU Y K , RAO B , WANG W . UAV swarm intelligence: recent advances and future trends[J]. IEEE Access, 2020, 8, 183856- 183878.
14	TIAN B L , LI P P , LU H C , et al. Distributed pursuit of an evader with collision and obstacle avoidance[J]. IEEE Trans.on Cybernetics, 2021, 52 (12): 13512- 13520.
15	SHAHID S , ZHEN Z Y , JAVAID U , et al. Offense-defense distributed decision making for swarm vs. swarm confrontation while attacking the aircraft carriers[J]. Drones, 2022, 6 (10): 271- 291.
16	WU P C , WANG H Q , LIANG G W , et al. Research on unmanned aerial vehicle cluster collaborative countermeasures based on dynamic non-zero-sum game under asymmetric and uncertain information[J]. Aerospace, 2023, 10 (8): 711- 729.
17	ZHU X N. Analysis of military application of UAV swarm technology[C]//Proc. of the IEEE 3rd International Confe-rence on Unmanned Systems, 2020: 1200-1204.
18	SU W J , GAO M , GAO X B , et al. A decision-making method for distributed unmanned aerial vehicle swarm considering attack constraints in the cooperative strike phase[J]. Interna tional Journal of Aerospace Engineering, 2023, 2023 (1): 6568359.
19	GONG Z H , XU Y , LUO D L . UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning[J]. Unmanned Systems, 2023, 11 (3): 273- 286.
20	邹立岩, 张明智, 柏俊汝, 等. 无人机集群作战建模与仿真研究综述[J]. 战术导弹技术, 2021 (3): 98- 108.
	ZOU L Y , ZHANG M Z , BAI J R , et al. A survey of modeling and simulation of UAS swarm operation[J]. Tactical Missile Technology, 2021 (3): 98- 108.
21	罗德林, 张海洋, 谢荣增, 等. 基于多agent系统的大规模无人机集群对抗[J]. 控制理论与应用, 2015, 32 (11): 1498- 1504.
	LUO D L , ZHANG H Y , XIE R Z , et al. Unmanned aerial vehicles swarm conflict based on multi-agent system[J]. Control Theory & Applications, 2015, 32 (11): 1498- 1504.
22	邢冬静. 无人机集群作战自主任务规划方法研究[D]. 南京: 南京航空航天大学, 2019.
	XING D J. Autonomous mission planning method for unmanned aerial vehicle swarm operations[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2019.
23	LI Q N , WANG F W , YANG W P , et al. Air combat maneuver strategy algorithm based on two-layer game decision-making and distributed double game trees MCTS under uncertain information[J]. Electronics, 2022, 11 (16): 2608- 2614.
24	HU Z C, GAO P, WANG F. Research on autonomous maneuvering decision of UCAV based on approximate dynamic programming[C]//Proc. of the International Conference on Image and Video Processing, and Artificial Intelligence, 2019.
25	XU J, GUO Q, XIAO L, et al. Autonomous decision-making method for combat mission of UAV based on deep reinforcement learning[C]//Proc. of the IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference, 2019: 538-544.
26	马小梦, 高梅国, 于默涵, 等. 智能对抗无人机的干扰组合序列生成算法研究[J]. 西安电子科技大学学报, 2023, 50 (6): 44- 61.
	MA X M , GAO M G , YU M H , et al. Research on the interference combinational sequence generation algorithm for the intelligent countermeasure UAV[J]. Journal of Xidian University, 2023, 50 (6): 44- 61.
27	高甲博, 肖玮, 何智杰. P3C-MADDPG算法的多无人机协同追捕对抗策略研究[J]. 指挥控制与仿真, 2023, 45 (6): 7- 18.
	GAO J B , XIAO W , HE Z J . Research on multi-UAV cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm[J]. Command Control & Simulation, 2023, 45 (6): 7- 18.
28	杨晟琦, 田明俊, 司迎利, 等. 基于分层强化学习的无人机机动决策[J]. 火力与指挥控制, 2023, 48 (8): 48-52, 59.
	YANG S Q , TIAN M J , SI Y L , et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning[J]. Fire Control & Command Control, 2023, 48 (8): 48-52, 59.
29	WANG W X, YANG T P, LIU Y, et al. From few to more: large-scale dynamic multiagent curriculum learning[C]//Proc. of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 2020.

环境和无人机属性	参数
战场范围x_min, x_max, y_min, y_max/km	0, 100, 0, 80
障碍物数量	4
障碍物区域半径/km	4, 4, 5, 6
障碍物随机生成区域/km²	[15, 85]×[15, 65]
红蓝两方无人机数量	5~7
红方无人机雷达探测范围(火力打击范围)	6 km×120°
蓝方无人机雷达探测范围(火力打击范围)	7.5 km×120°
红方无人机最大速度/(m/s)	300
蓝方无人机最大速度/(m/s)	340
红方无人机最大角速度/(rad/s)	π/22.6
蓝方无人机最大角速度/(rad/s)	π/15.7
红方无人机的位置随机生成区域/km²	[90, 100]×[0, 80]
蓝方无人机的位置随机生成区域/km²	[0, 10]×[0, 80]
红方无人机初始航向角/rad	π
蓝方无人机初始航向角/rad	0
红方无人机阵地中心位置及半径/km	[98, 40], 2
蓝方无人机阵地中心位置及半径/km	[2, 40], 2
红方无人机信息交流半径/km	20
蓝方无人机信息交流半径/km	20
无人机碰撞坠毁距离/km	0.5
目标区域半径/km	2

超参数	数值
最大回合数	1 000
每回合最大步数	1 000
学习率	0.000 1
初始探索率	0.1
软更新率	0.01
折扣因子	0.95
经验池大小	50 000
批采样数量	64

[1]	符小卫, 王辛夷, 乔哲. 基于ASDDPG算法的多无人机对抗策略[J]. 系统工程与电子技术, 2025, 47(6): 1867-1879.
[2]	孟麟芝, 孙小涓, 胡玉新, 高斌, 孙国庆, 牟文浩. 面向卫星在轨处理的强化学习任务调度算法[J]. 系统工程与电子技术, 2025, 47(6): 1917-1929.
[3]	郑康洁, 张新宇, 王伟菘, 刘震生. DQN与规则结合的智能船舶动态自主避障决策[J]. 系统工程与电子技术, 2025, 47(6): 1994-2001.
[4]	刘书含, 李彤, 李富强, 杨春刚. 意图态势双驱动的数据链抗干扰通信机制[J]. 系统工程与电子技术, 2025, 47(6): 2055-2064.
[5]	林志康, 施龙飞, 刘甲磊, 马佳智. 基于深度Q学习的组网雷达闪烁探测调度方法[J]. 系统工程与电子技术, 2025, 47(5): 1443-1452.
[6]	王子怡, 傅雄军, 董健, 冯程. 基于分层多智能体强化学习的雷达协同抗干扰策略优化[J]. 系统工程与电子技术, 2025, 47(4): 1108-1114.
[7]	何肖阳, 陈小龙, 杜晓林, 苏宁远, 袁旺, 关键. 基于CBAM-Swin-Transformer迁移学习的海上微动目标分类方法[J]. 系统工程与电子技术, 2025, 47(4): 1155-1167.
[8]	熊威, 张栋, 任智, 杨书恒. 面向有人/无人机协同打击的智能决策方法研究[J]. 系统工程与电子技术, 2025, 47(4): 1285-1299.
[9]	马鹏, 蒋睿, 王斌, 徐盟飞, 侯长波. 基于隐式对手建模的策略重构抗智能干扰方法[J]. 系统工程与电子技术, 2025, 47(4): 1355-1363.
[10]	李家宽, 冯博, 刘红亮, 叶春茂, 余继周. 基于角度引导注意力的气动目标宽带PD识别方法[J]. 系统工程与电子技术, 2025, 47(3): 807-816.
[11]	唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841.
[12]	陈夏瑢, 李际超, 陈刚, 刘鹏, 姜江. 基于异质网络的装备体系组合发展规划问题[J]. 系统工程与电子技术, 2025, 47(3): 855-861.
[13]	付可, 陈浩, 王宇, 刘权, 黄健. 基于不确定性的贝叶斯策略重用方法[J]. 系统工程与电子技术, 2025, 47(2): 535-543.
[14]	刘晓琳, 郭梦娇, 李卓. Dueling DQN优化下的航班延误自适应图卷积循环网络预测方法[J]. 系统工程与电子技术, 2025, 47(2): 568-579.
[15]	赵万兵, 夏元清, 戴荔, 张元. 弱通信下无人潜航器事件触发一致性协同控制[J]. 系统工程与电子技术, 2025, 47(2): 591-599.