基于APIQ算法的多无人机攻防对抗策略

doi:10.12305/j.issn.1001-506X.2025.07.14

Abstract

Abstract:

Due to the large number of unmanned aerial vehicles (UAVs) in the multi-UAV confrontation environment, there may be some problems such as value function dimension explosion and difficult convergence of strategy network when using conventional deep reinforcement learning methods to deal with such problems. Therefore, a strategy, attention policy interaction Q-learning(APIQ) swarm adversarial algorithm based on value decomposition and attention mechanism is proposed. The value decomposition idea is introduced to alleviate the dimension explosion problem of value function, and the weight of each value in the value decomposition is assigned based on attention mechanism, which promotes the convergence of the policy network. In order to verify the feasibility of APIQ algorithm in the multi-UAV confrontation problem, a realistic environment model is established, and the feasibility of the algorithm is verified by simulation. The comparison with other algorithms shows that the UAV controlled by APIQ algorithm has a higher victory rate in the confrontation.

Key words: multi-unmanned aerial vehicle (UAV), reinforcement learning, value-decomposition network (VDN), attention mechanism, maneuver decision-making

CLC Number:

TP181

Xiaowei FU, Xinyi WANG, Zhe QIAO. Attack-defense confrontation strategy of multi-UAV based on APIQ algorithm[J]. Systems Engineering and Electronics, 2025, 47(7): 2205-2215.

Figures/Tables 14

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Table 1

Table 2

Fig.9

Fig.10

Fig.11

Fig.12

References 29

30	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[C]//Proc. of the 17th International Conference on Autonomous Agents and Systems, 2018: 2085-2087.
31	CARMONA R , LAURIERE M , TAN Z . Model-free mean-field reinforcement learning: mean-field MDP and mean-field Q-learning[J]. The Annals of Applied Probability, 2023, 33 (6B): 5334- 5381.
32	TAMPUU A , MATⅡSEN T , KODELJA D , et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PloS One, 2017, 12 (4): e0172395.
1	DANOY G, BRUST M R, BOUVRY P. Connectivity stability in autonomous multi-level UAV swarms for wide area monitoring[C]//Proc. of the 5th ACM Symposium on Development and Analysis of Intelligent Vehicular Networks and Applications, 2015.
2	CHEN H X , NAN Y , YANG Y . A two-stage method for UCAV TF/TA path planning based on approximate dynamic programming[J]. Mathematical Problems in Engineering, 2018, 2018 (11): 1092092.
3	YOU S X , GAO L P , DIAO M . Real-time path planning based on the situation space of UCAVS in a dynamic environment[J]. Microgravity Science and Technology, 2018, 30 (6): 899- 910.
4	HUANG C Q , DONG K S , HUANG H Q , et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J]. Journal of Systems Engineering and Electronics, 2018, 29 (1): 86- 97.
5	JIA N P , YANG Z W , YANG K W . Operational effectiveness evaluation of the swarming UAVs combat system based on a system dynamics model[J]. IEEE Access, 2019, 7, 25209- 25224.
6	OUSINGSAWAT J, CAMPBELL M E. Multiple vehicle team tasking for cooperative estimation[C]//Proc. of the American Control Conference, 2004: 36-42.
7	YANG Q M , ZHANG J D , SHI G Q , et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2019, 8, 363- 378.
8	LI S W , JIA Y H , YANG F , et al. Collaborative decision-making method for multi-UAV based on multiagent reinforcement learning[J]. IEEE Access, 2022, 10, 91385- 91396.
9	YUE L F , YANG R N , ZHANG Y , et al. Research on reinforcement learning-based safe decision-making methodology for multiple unmanned aerial vehicles[J]. Frontiers in Neurorobo-tics, 2023, 16, 1105480.
10	KIM J, HESPANHA J P. Cooperative radar jamming for groups of unmanned air vehicles[C]//Proc. of the 43rd IEEE Conference on Decision and Control, 2004: 632-637.
11	YAO Z X , LI M , CHEN Z J , et al. Mission decision-making method of multi-aircraft cooperatively attacking multi-target based on game theoretic framework[J]. Chinese Journal of Aeronautics, 2016, 29 (6): 1685- 1694.
12	KUNG C C . Study on consulting air combat simulation of cluster UAV based on mixed parallel computing framework of graphics processing unit[J]. Electronics, 2018, 7 (9): 160- 183.
13	ZHOU Y K , RAO B , WANG W . UAV swarm intelligence: recent advances and future trends[J]. IEEE Access, 2020, 8, 183856- 183878.
14	TIAN B L , LI P P , LU H C , et al. Distributed pursuit of an evader with collision and obstacle avoidance[J]. IEEE Trans.on Cybernetics, 2021, 52 (12): 13512- 13520.
15	SHAHID S , ZHEN Z Y , JAVAID U , et al. Offense-defense distributed decision making for swarm vs. swarm confrontation while attacking the aircraft carriers[J]. Drones, 2022, 6 (10): 271- 291.
16	WU P C , WANG H Q , LIANG G W , et al. Research on unmanned aerial vehicle cluster collaborative countermeasures based on dynamic non-zero-sum game under asymmetric and uncertain information[J]. Aerospace, 2023, 10 (8): 711- 729.
17	ZHU X N. Analysis of military application of UAV swarm technology[C]//Proc. of the IEEE 3rd International Confe-rence on Unmanned Systems, 2020: 1200-1204.
18	SU W J , GAO M , GAO X B , et al. A decision-making method for distributed unmanned aerial vehicle swarm considering attack constraints in the cooperative strike phase[J]. Interna tional Journal of Aerospace Engineering, 2023, 2023 (1): 6568359.
19	GONG Z H , XU Y , LUO D L . UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning[J]. Unmanned Systems, 2023, 11 (3): 273- 286.
20	邹立岩, 张明智, 柏俊汝, 等. 无人机集群作战建模与仿真研究综述[J]. 战术导弹技术, 2021 (3): 98- 108.
	ZOU L Y , ZHANG M Z , BAI J R , et al. A survey of modeling and simulation of UAS swarm operation[J]. Tactical Missile Technology, 2021 (3): 98- 108.
21	罗德林, 张海洋, 谢荣增, 等. 基于多agent系统的大规模无人机集群对抗[J]. 控制理论与应用, 2015, 32 (11): 1498- 1504.
	LUO D L , ZHANG H Y , XIE R Z , et al. Unmanned aerial vehicles swarm conflict based on multi-agent system[J]. Control Theory & Applications, 2015, 32 (11): 1498- 1504.
22	邢冬静. 无人机集群作战自主任务规划方法研究[D]. 南京: 南京航空航天大学, 2019.
	XING D J. Autonomous mission planning method for unmanned aerial vehicle swarm operations[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2019.
23	LI Q N , WANG F W , YANG W P , et al. Air combat maneuver strategy algorithm based on two-layer game decision-making and distributed double game trees MCTS under uncertain information[J]. Electronics, 2022, 11 (16): 2608- 2614.
24	HU Z C, GAO P, WANG F. Research on autonomous maneuvering decision of UCAV based on approximate dynamic programming[C]//Proc. of the International Conference on Image and Video Processing, and Artificial Intelligence, 2019.
25	XU J, GUO Q, XIAO L, et al. Autonomous decision-making method for combat mission of UAV based on deep reinforcement learning[C]//Proc. of the IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference, 2019: 538-544.
26	马小梦, 高梅国, 于默涵, 等. 智能对抗无人机的干扰组合序列生成算法研究[J]. 西安电子科技大学学报, 2023, 50 (6): 44- 61.
	MA X M , GAO M G , YU M H , et al. Research on the interference combinational sequence generation algorithm for the intelligent countermeasure UAV[J]. Journal of Xidian University, 2023, 50 (6): 44- 61.
27	高甲博, 肖玮, 何智杰. P3C-MADDPG算法的多无人机协同追捕对抗策略研究[J]. 指挥控制与仿真, 2023, 45 (6): 7- 18.
	GAO J B , XIAO W , HE Z J . Research on multi-UAV cooperative pursuit and confrontation strategy based on P3C-MADDPG algorithm[J]. Command Control & Simulation, 2023, 45 (6): 7- 18.
28	杨晟琦, 田明俊, 司迎利, 等. 基于分层强化学习的无人机机动决策[J]. 火力与指挥控制, 2023, 48 (8): 48-52, 59.
	YANG S Q , TIAN M J , SI Y L , et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning[J]. Fire Control & Command Control, 2023, 48 (8): 48-52, 59.
29	WANG W X, YANG T P, LIU Y, et al. From few to more: large-scale dynamic multiagent curriculum learning[C]//Proc. of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 2020.

环境和无人机属性	参数
战场范围x_min, x_max, y_min, y_max/km	0, 100, 0, 80
障碍物数量	4
障碍物区域半径/km	4, 4, 5, 6
障碍物随机生成区域/km²	[15, 85]×[15, 65]
红蓝两方无人机数量	5~7
红方无人机雷达探测范围(火力打击范围)	6 km×120°
蓝方无人机雷达探测范围(火力打击范围)	7.5 km×120°
红方无人机最大速度/(m/s)	300
蓝方无人机最大速度/(m/s)	340
红方无人机最大角速度/(rad/s)	π/22.6
蓝方无人机最大角速度/(rad/s)	π/15.7
红方无人机的位置随机生成区域/km²	[90, 100]×[0, 80]
蓝方无人机的位置随机生成区域/km²	[0, 10]×[0, 80]
红方无人机初始航向角/rad	π
蓝方无人机初始航向角/rad	0
红方无人机阵地中心位置及半径/km	[98, 40], 2
蓝方无人机阵地中心位置及半径/km	[2, 40], 2
红方无人机信息交流半径/km	20
蓝方无人机信息交流半径/km	20
无人机碰撞坠毁距离/km	0.5
目标区域半径/km	2

超参数	数值
最大回合数	1 000
每回合最大步数	1 000
学习率	0.000 1
初始探索率	0.1
软更新率	0.01
折扣因子	0.95
经验池大小	50 000
批采样数量	64

[1]	Xiaowei FU, Xinyi WANG, Zhe QIAO. Confront strategy of multi-unmanned aerial vehicle based on ASDDPG algorithm [J]. Systems Engineering and Electronics, 2025, 47(6): 1867-1879.
[2]	Linzhi MENG, Xiaojuan SUN, Yuxin HU, Bin GAO, Guoqing SUN, Wenhao MU. Reinforcement learning task scheduling algorithm for satellite on-orbit processing [J]. Systems Engineering and Electronics, 2025, 47(6): 1917-1929.
[3]	Kangjie ZHENG, Xinyu ZHANG, Weisong WANG, Zhensheng LIU. Intelligent ship dynamic autonomous obstacle avoidance decision based on DQN and rule [J]. Systems Engineering and Electronics, 2025, 47(6): 1994-2001.
[4]	Shuhan LIU, Tong LI, Fuqiang LI, Chungang YANG. Intent and situation-dual driven anti-jamming communication mechanism for data link [J]. Systems Engineering and Electronics, 2025, 47(6): 2055-2064.
[5]	Zhikang LIN, Longfei SHI, Jialei LIU, Jiazhi MA. Scintillation detection scheduling method of netted radar based on deep Q-learning [J]. Systems Engineering and Electronics, 2025, 47(5): 1443-1452.
[6]	Ziyi WANG, Xiongjun FU, Jian DONG, Cheng FENG. Optimization of radar collaborative anti-jamming strategies based on hierarchical multi-agent reinforcement learning [J]. Systems Engineering and Electronics, 2025, 47(4): 1108-1114.
[7]	Xiaoyang HE, Xiaolong CHEN, Xiaolin DU, Ningyuan SU, Wang YUAN, Jian GUAN. Classification of maritime micromotion target based on transfer learning in CBAM-Swin-Transformer [J]. Systems Engineering and Electronics, 2025, 47(4): 1155-1167.
[8]	Wei XIONG, Dong ZHANG, Zhi REN, Shuheng YANG. Research on intelligent decision-making methods for coordinated attack by manned aerial vehicles and unmanned aerial vehicles [J]. Systems Engineering and Electronics, 2025, 47(4): 1285-1299.
[9]	Peng MA, Rui JIANG, Bin WANG, Mengfei XU, Changbo HOU. Strategy reconstruction for resilience against intelligence jamming based on implicit opponent modeling [J]. Systems Engineering and Electronics, 2025, 47(4): 1355-1363.
[10]	Jiakuan LI, Bo FENG, Hongliang LIU, Chunmao YE, Jizhou YU. Angle-guided attention-based wideband PD recognition method for aerodynamic targets [J]. Systems Engineering and Electronics, 2025, 47(3): 807-816.
[11]	Kaiqiang TANG, Huiqiao FU, Jiasheng LIU, Guizhou DENG, Chunlin CHEN. Hierarchical optimization research of constrained vehicle routing based on deep reinforcement learning [J]. Systems Engineering and Electronics, 2025, 47(3): 827-841.
[12]	Xiarong CHEN, Jichao LI, Gang CHEN, Peng LIU, Jiang JIANG. Portfolio of weapon system-of-systems based on heterogeneous information networks [J]. Systems Engineering and Electronics, 2025, 47(3): 855-861.
[13]	Ke FU, Hao CHEN, Yu WANG, Quan LIU, Jian HUANG. Uncertainty-based Bayesian policy reuse method [J]. Systems Engineering and Electronics, 2025, 47(2): 535-543.
[14]	Xiaolin LIU, Mengjiao GUO, Zhuo LI. Adaptive graph convolutional recurrent network prediction method for flight delay based on Dueling DQN optimization [J]. Systems Engineering and Electronics, 2025, 47(2): 568-579.
[15]	Qiang LIU, Haoran SUN, Denghua HU, Shuang ZHANG. Time alignment fusion algorithm based on Vondrak-Cepek combined filtering and attention mechanism weighting [J]. Systems Engineering and Electronics, 2025, 47(2): 673-679.

Attack-defense confrontation strategy of multi-UAV based on APIQ algorithm

RichHTML

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 29

Related Articles 15

Recommended Articles

Metrics

Comments