融合动态奖励策略的无人机编队路径规划方法

doi:10.12305/j.issn.1001-506X.2024.10.27

Abstract

Abstract:

For the unmanned aerial vehicle (UAV) formation path planning problem in unknown dynamic environment, an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function (MATD3-IDFRF) algorithm is proposed. Firstly, the sparsity reward function is extended for the obstacle-free environment. Then, the dynamic formation problem, which is the focus of attention in UAV formation path planning, is analyzed in depth. It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment. The essence of the analysis is that the spacing between each two UAVs remains relatively stable, while it is also fine-tuned by the external environment. A reward function based on the optimal distance and current distance between each pair of UAVs is designed, leading to the proposal of a dynamic formation reward function, and which is then combined with the multi-agent twin delayed deep deterministic (MATD3) algorithm to propose the MATD3-IDFRF algorithm. Finally, comparison experiments are designed, and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%, while improving the converged reward average by 2.3% and reducing the formation deformation rate by 97% in the complex obstacle environment.

Key words: reinforcement learning (RL), reward function, unmanned aerial vehicle (UAV), dynamic formation, path planning

CLC Number:

TP181

Heng TANG, Wei SUN, Lei LYU, Ruofei HE, Jianjun WU, Changhao SUN, Tianye SUN. UAV formation path planning approach incorporating dynamic reward strategy[J]. Systems Engineering and Electronics, 2024, 46(10): 3506-3518.

Figures/Tables 20

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

Table 1

Table 2

Fig.12

Fig.13

Fig.14

Fig.15

Table 3

Fig.16

Fig.17

References 34

1	贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41 (S1): 4- 14.
	JIA Y N , TIAN S Y , LI Q . Recent development of unmanned aerial vehicle swarms[J]. Acta Aeronauticaet Astronautica Sinica, 2020, 41 (S1): 4- 14.
2	AL-HILO A , SAMIR M , ASSI C , et al. UAV-assisted content delivery in intelligent transportation systems-joint trajectory planning and cache management[J]. IEEE Trans.on Intelligent Transportation Systems, 2020, 22 (8): 5155- 5167.
3	ERDELJ M , NATALIZIO E , CHOWDHURY K R , et al. Help from the sky: leveraging UAVs for disaster management[J]. IEEE Pervasive Computing, 2017, 16 (1): 24- 32. doi: 10.1109/MPRV.2017.11
4	宗群, 王丹丹, 邵士凯, 等. 多无人机协同编队飞行控制研究现状及发展[J]. 哈尔滨工业大学学报, 2017, 49 (3): 1- 14.
	ZONG Q , WANG D D , SHAO S K , et al. Research status and development of multi UAV coordinated formation flight control[J]. Journal of Harbin Institute of Technology, 2017, 49 (3): 1- 14.
5	SHAO X L , LIU H C , ZHANG W D , et al. Path driven formation-containment control of multiple UAVs: a path-following framework[J]. Aerospace Science and Technology, 2023, 135, 108168. doi: 10.1016/j.ast.2023.108168
6	CHEN L , DUAN H B . Collision-free formation-containment control for a group of UAVs with unknown disturbances[J]. Aerospace Science and Technology, 2022, 126, 107618. doi: 10.1016/j.ast.2022.107618
7	SHAO S K , PENG Y , HE C L , et al. Efficient path planning for UAV formation via comprehensively improved particle swarm optimization[J]. ISA Transactions, 2020, 97, 415- 430. doi: 10.1016/j.isatra.2019.08.018
8	WU Y , GOU J Z , HU X T , et al. A new consensus theory-based method for formation control and obstacle avoidance of UAVs[J]. Aerospace Science and Technology, 2020, 107, 106332. doi: 10.1016/j.ast.2020.106332
9	QU C Z , GAI W D , ZHONG M Y , et al. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning[J]. Applied Soft Computing, 2020, 89, 106099. doi: 10.1016/j.asoc.2020.106099
10	ZHANG Z , WU J , DAI J Y , et al. A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment[J]. IEEE Access, 2020, 8, 122757- 122771. doi: 10.1109/ACCESS.2020.3007496
11	吴文海, 郭晓峰, 周思羽. 基于改进约束差分进化算法的动态航迹规划[J]. 控制与决策, 2020, 35 (10): 2381- 2390.
	WU W H , GUO X F , ZHOU S Y . Dynamic route planning based on improved constrained differential evolution algorithm[J]. Control and Decision, 2020, 35 (10): 2381- 2390.
12	YU X B , JIANG N J , WANG X M , et al. A hybrid algorithm based on grey wolf optimizer and differential evolution for UAV path planning[J]. Expert Systems with Applications, 2023, 215, 119327. doi: 10.1016/j.eswa.2022.119327
13	XU L , CAO X B , DU W B , et al. Cooperative path planning optimization for multiple UAVs with communication constraints[J]. Knowledge-Based Systems, 2023, 260, 110164. doi: 10.1016/j.knosys.2022.110164
14	SILVA J A G , SANTOS D H , NEGREIROS A P F , et al. High-level path planning for an autonomous sailboat robot using Q-Learning[J]. Sensors, 2020, 20 (6): 1550. doi: 10.3390/s20061550
15	孙辉辉, 胡春鹤, 张军国. 移动机器人运动规划中的深度强化学习方法[J]. 控制与决策, 2021, 36 (6): 1281- 1292.
	SUN H H , HU C H , ZHANG J G . Deep reinforcement learning for motion planning of mobile robots[J]. Control and Decision, 2021, 36 (6): 1281- 1292.
16	LI X J , LIU H , LI J Q , et al. Deep deterministic policy gradient algorithm for crowd-evacuation path planning[J]. Computers & Industrial Engineering, 2021, 161, 107621.
17	ZHANG S T , LI Y B , DONG Q . Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing, 2022, 115, 108194. doi: 10.1016/j.asoc.2021.108194
18	POLYDOROS A S , NALPANTIDIS L . Survey of model-based reinforcement learning: applications on robotics[J]. Journal of Intelligent & Robotic Systems, 2017, 86 (2): 153- 173.
19	ZHANG F J , LI J , LI Z . A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J]. Neurocomputing, 2020, 411, 206- 215. doi: 10.1016/j.neucom.2020.05.097
20	SUI D , XU W P , ZHANG K . Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics, 2022, 35 (2): 195- 213.
21	周治国, 余思雨, 于家宝, 等. 面向无人艇的T-DQN智能避障算法研究[J]. 自动化学报, 2023, 49 (8): 1645- 1655.
	ZHOU Z G , YU S Y , YU J B , et al. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle[J]. Acta Automatica Sinica, 2023, 49 (8): 1645- 1655.
22	YAN C , XIANG X J , WANG C . Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments[J]. Journal of Intelligent & Robotic Systems, 2020, 98, 297- 309.
23	杨秀霞, 王晨蕾, 张毅, 等. 基于逆向强化学习的无人机路径规划[J]. 电光与控制, 2023, 30 (8): 1- 7.
	YANG X X , WANG C L , ZHANG Y , et al. UAV path planning based on reverse reinforcement learning[J]. Electronics Optics & Control, 2023, 30 (8): 1- 7.
24	QIE H , SHI D X , SHEN T L , et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE Access, 2019, 7, 146264- 146272.
25	ZHOU C H , LI J X , SHI Y J , et al. Research on multi-robot formation control based on MATD3 algorithm[J]. Applied Sciences, 2023, 13 (3): 1874.
26	WU Y , GOU J Z , JI H L , et al. Hierarchical mission replanning for multiple UAV formations performing tasks in dynamic situation[J]. Computer Communications, 2023, 200, 132- 148.
27	PAN Z H , ZHANG C X , XIA Y Q , et al. An improved artificial potential field method for path planning and formation control of the multi-UAV systems[J]. IEEE Trans.on Circuits and Systems Ⅱ: Express Briefs, 2022, 69 (3): 1129- 1133.
28	TAHIR A , BOLING J M , HAGHBAYAN M H , et al. Comparison of linear and nonlinear methods for distributed control of a hierarchical formation of UAVs[J]. IEEE Access, 2020, 8, 95667- 95680.
29	王锦锦, 祁圣君, 钟海, 等. 基于Dubins曲线的一致性编队集结控制[J]. 计算机仿真, 2021, 38 (7): 40- 44.
	WANG J J , QI S J , ZHONG H , et al. Consistent formation aggregation control based on dubins curve[J]. Computer Simulation, 2021, 38 (7): 40- 44.
30	TANG J . Analysis and improvement of traffic alert and collision avoidance system[J]. IEEE Access, 2017, 5, 21419- 21429.
31	LIU H , PENG F C , MODARES H , et al. Heterogeneous formation control of multiple rotorcrafts with unknown dynamics by reinforcement learning[J]. Information Sciences, 2021, 558, 194- 207.
32	PAN C , PENG Z H , LIU L , et al. Data-driven distributed formation control of under-actuated unmanned surface vehicles with collision avoidance via model-based deep reinforcement learning[J]. Ocean Engineering, 2023, 267, 113166.
33	ZHANG Y , MOU Z Y , GAO F F , et al. UAV-enabled secure communications by multi-agent deep reinforcement learning[J]. IEEE Trans.on Vehicular Technology, 2020, 69 (10): 11599- 11611.
34	孙田野, 孙伟, 吴建军. 改进Quatre算法的无人机编队快速集结方法[J]. 系统工程与电子技术, 2022, 44 (9): 2840- 2848. doi: 10.12305/j.issn.1001-506X.2022.09.18
	SUN T Y , SUN W , WU J J . UAV formation rapid assembly method based on improved Quatre algorithm[J]. Systems Engineering and Electronics, 2022, 44 (9): 2840- 2848. doi: 10.12305/j.issn.1001-506X.2022.09.18

序号	参数	数值
1	无人机数量	5
2	1、2、3号无人机所成角度/(°)	60
3	1、2、4号相邻无人机间距/m	200
4	1、3、5号相邻无人机间距/m	200

参数类型	超参数名称	符号	取值
3种算法共有参数	折扣因子	γ	0.990
	软更新系数	τ	0.010
	经验池大小	M	1 000 000
	批样本数	m	1 024
	Actor网络学习率	α_A	0.008
	Critic网络学习率	α_C	0.010
	动作噪声标准差	σ	0.200
	回合数	MaxEpisode	5 000, 10 000
	每回合最大时间步	MaxStep	100
MATD3算法、MATD3-IDFRF算法特有参数	延迟更新频率	C	10
MATD3算法、MATD3-IDFRF算法特有参数	Critic目标网络高斯噪声标准差	$\widetilde{\sigma} $	0.200

算法	平均路径/m	编队平均变形率/%	奖励函数收敛区间	平均奖励
MADDPG	10 763	55.50	[139, 223]	211
MATD3	10 438	22.60	[187, 228]	220
MATD3-IDFRF	10 536	0.68	[567, 586]	575

[1]	Yuqi XIA, Yanyan HUANG, Qia CHEN. Path planning for unmanned vehicle reconnaissance based on deep Q-network [J]. Systems Engineering and Electronics, 2024, 46(9): 3070-3081.
[2]	Pengtao LIU, Jing LEI, Wei LIU. Unmanned aerial vehicle-enabled edge computing: architecture, multiple access and computation offloading [J]. Systems Engineering and Electronics, 2024, 46(9): 3198-3210.
[3]	Gang ZHONG, Jiangying ZHOU, Sen DU, Honghai ZHANG, Hao LIU. Short-time trajectory deviation detection method for UAV based on trajectory prediction [J]. Systems Engineering and Electronics, 2024, 46(8): 2696-2708.
[4]	Lisha PENG, Yuxiang SUN, Yufan XUE, Xianzhong ZHOU. Intelligent decision-making technology for wargame by integrating three-way multiple attribute decision-making and SAC [J]. Systems Engineering and Electronics, 2024, 46(7): 2310-2322.
[5]	Bowen FEI, Weidong BAO, Daqian LIU, Xiaomin ZHU. Air-ground cooperative autonomous task allocation method for dynamic target search and strike [J]. Systems Engineering and Electronics, 2024, 46(7): 2346-2358.
[6]	Jie LI, Yuejin TAN. Operation loop recommendation method based on integrated improved ant colony algorithm [J]. Systems Engineering and Electronics, 2024, 46(6): 2002-2012.
[7]	Xiaochen ZHAO, Dongtao ZHAO, Hang YUAN, Huan WANG, Qun ZHANG. Micro-motion parameters extraction for UAV under LPRF condition [J]. Systems Engineering and Electronics, 2024, 46(5): 1503-1513.
[8]	Jiawei SUN, Minghui YU, Dapeng YANG, Haoquan TANG, Dapeng BIAN. Path planning of carrier aircraft traction system based on CL-RRT and MPC [J]. Systems Engineering and Electronics, 2024, 46(5): 1745-1755.
[9]	Dong SUI, Zhenyu YANG, Songbin DING, Tingting ZHOU. Three-dimensional path planning of UAV based on EMSDBO algorithm [J]. Systems Engineering and Electronics, 2024, 46(5): 1756-1766.
[10]	Jing YU, Xiaojun WU, Anlin JIANG, Enmi YONG. Research on UAV path planning method based on the multi-precision planning windows [J]. Systems Engineering and Electronics, 2024, 46(5): 1767-1776.
[11]	Gang LIU, Zhibiao AN, Maojun ZHANG, Yu LIU, Wu LI. Subject objective path planning algorithm based on continuous road network environment [J]. Systems Engineering and Electronics, 2024, 46(4): 1346-1356.
[12]	Guixiang ZHAO, Jian ZHOU, Yunmiao LI, Chenxu WANG. Improved bi-directional rapidly-exploring random tree path planning for USV [J]. Systems Engineering and Electronics, 2024, 46(4): 1364-1371.
[13]	Wenhao BI, Mengqi ZHANG, Fei GAO, Mi YANG, An ZHANG. Review on UAV swarm task allocation technology [J]. Systems Engineering and Electronics, 2024, 46(3): 922-934.
[14]	Tao LIU, Shasha WANG, Chi ZHANG, Guanghan BAI, Junyong TAO. Resilience based self-organizing region coverage method for unmanned aerial vehicle swarm [J]. Systems Engineering and Electronics, 2024, 46(3): 942-952.
[15]	Cheng GAO, Yanli DU, Yunong BU, Yanbin LIU, Yufei WANG. Heterogeneous UAV swarm grouping deployment for complex multiple tasks [J]. Systems Engineering and Electronics, 2024, 46(3): 972-981.

UAV formation path planning approach incorporating dynamic reward strategy

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 20

References 34

Related Articles 15

Recommended Articles

Metrics

Comments