融合动态奖励策略的无人机编队路径规划方法

doi:10.12305/j.issn.1001-506X.2024.10.27

系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (10): 3506-3518.doi: 10.12305/j.issn.1001-506X.2024.10.27

• 制导、导航与控制 • 上一篇

融合动态奖励策略的无人机编队路径规划方法

唐恒¹, 孙伟¹^,*, 吕磊¹, 贺若飞², 吴建军³, 孙昌浩⁴, 孙田野¹

1. 西安电子科技大学空间科学与技术学院, 陕西西安 710118
2. 西北工业大学第365研究所, 陕西西安 710072
3. 西安爱生无人机技术有限公司, 陕西西安 710065
4. 中国空间技术研究院钱学森空间技术实验室, 北京 100094

收稿日期:2023-11-01 出版日期:2024-09-25 发布日期:2024-10-22
通讯作者: 孙伟
作者简介:唐恒 (1998—), 男, 硕士研究生, 主要研究方向为强化学习、无人机编队路径规划
孙伟 (1980—), 男, 教授, 博士, 主要研究方向为开放环境中不确定条件下的感知与行为的机器理解、复杂任务规划与推理
吕磊 (1995—), 男, 博士研究生, 主要研究方向为多无人机协同控制、航迹规划
贺若飞 (1982—), 男, 副研究员, 博士, 主要研究方向为无人机系统工程与总体设计、智能无人机协同控制
吴建军 (1972—), 男, 副研究员, 博士, 主要研究方向为无人机系统飞控及总体设计
孙昌浩 (1987—), 男, 高级工程师, 博士, 主要研究方向为博弈学习、分布式协同决策理论与应用
孙田野 (1995—), 男, 博士研究生, 主要研究方向为多无人机系统与无人机路径规划
基金资助:
中国高校产学研创新基金(2021ZYA08004);西安市科技计划(2022JH-RGZN-0039);陕西省重点研发计划重点产业创新链项目(2022ZDLGY03-01);国家自然科学基金(62173330)

UAV formation path planning approach incorporating dynamic reward strategy

Heng TANG¹, Wei SUN¹^,*, Lei LYU¹, Ruofei HE², Jianjun WU³, Changhao SUN⁴, Tianye SUN¹

1. School of Aerospace Science and Technology, Xidian University, Xi'an 710118, China
2. The 365th Research Institute, Northwestern Polytechnical University, Xi'an 710072, China
3. Xi'an ASN UAV Technology Co. Ltd, Xi'an 710065, China
4. Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing 100094, China

Received:2023-11-01 Online:2024-09-25 Published:2024-10-22
Contact: Wei SUN

摘要/Abstract

摘要：

针对未知动态环境下无人机(unmanned aerial vehicle, UAV)编队路径规划问题, 提出融合动态编队奖励函数的多智能体双延迟深度确定性策略梯度(multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function, MATD3-IDFRF)算法的UAV编队智能决策方案。首先, 针对无障碍物环境, 拓展稀疏性奖励函数。然后, 深入分析UAV编队路径规划中重点关注的动态编队问题, 即UAV编队以稳定的结构飞行并根据周围环境微调队形, 其本质为每两架UAV间距保持相对稳定, 同时也依据外界环境而微调。为此, 设计基于每两台UAV之间最佳间距和当前间距的奖励函数, 在此基础上提出动态编队奖励函数, 并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic, MATD3)算法提出MATD3-IDFRF算法。最后, 设计对比实验, 在复合障碍物环境中, 所提动态编队奖励函数能将算法成功率提升6.8%, 将收敛后的奖励平均值提升2.3%, 将编队变形率降低97%。

关键词: 强化学习, 奖励函数, 无人机, 动态编队, 路径规划

Abstract:

For the unmanned aerial vehicle (UAV) formation path planning problem in unknown dynamic environment, an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function (MATD3-IDFRF) algorithm is proposed. Firstly, the sparsity reward function is extended for the obstacle-free environment. Then, the dynamic formation problem, which is the focus of attention in UAV formation path planning, is analyzed in depth. It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment. The essence of the analysis is that the spacing between each two UAVs remains relatively stable, while it is also fine-tuned by the external environment. A reward function based on the optimal distance and current distance between each pair of UAVs is designed, leading to the proposal of a dynamic formation reward function, and which is then combined with the multi-agent twin delayed deep deterministic (MATD3) algorithm to propose the MATD3-IDFRF algorithm. Finally, comparison experiments are designed, and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%, while improving the converged reward average by 2.3% and reducing the formation deformation rate by 97% in the complex obstacle environment.

Key words: reinforcement learning (RL), reward function, unmanned aerial vehicle (UAV), dynamic formation, path planning

中图分类号:

TP181

唐恒, 孙伟, 吕磊, 贺若飞, 吴建军, 孙昌浩, 孙田野. 融合动态奖励策略的无人机编队路径规划方法[J]. 系统工程与电子技术, 2024, 46(10): 3506-3518.

Heng TANG, Wei SUN, Lei LYU, Ruofei HE, Jianjun WU, Changhao SUN, Tianye SUN. UAV formation path planning approach incorporating dynamic reward strategy[J]. Systems Engineering and Electronics, 2024, 46(10): 3506-3518.

图/表 20

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

表1

表2

图12

图13

图14

图15

表3

图16

图17

参考文献 34

1	贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41 (S1): 4- 14.
	JIA Y N , TIAN S Y , LI Q . Recent development of unmanned aerial vehicle swarms[J]. Acta Aeronauticaet Astronautica Sinica, 2020, 41 (S1): 4- 14.
2	AL-HILO A , SAMIR M , ASSI C , et al. UAV-assisted content delivery in intelligent transportation systems-joint trajectory planning and cache management[J]. IEEE Trans.on Intelligent Transportation Systems, 2020, 22 (8): 5155- 5167.
3	ERDELJ M , NATALIZIO E , CHOWDHURY K R , et al. Help from the sky: leveraging UAVs for disaster management[J]. IEEE Pervasive Computing, 2017, 16 (1): 24- 32. doi: 10.1109/MPRV.2017.11
4	宗群, 王丹丹, 邵士凯, 等. 多无人机协同编队飞行控制研究现状及发展[J]. 哈尔滨工业大学学报, 2017, 49 (3): 1- 14.
	ZONG Q , WANG D D , SHAO S K , et al. Research status and development of multi UAV coordinated formation flight control[J]. Journal of Harbin Institute of Technology, 2017, 49 (3): 1- 14.
5	SHAO X L , LIU H C , ZHANG W D , et al. Path driven formation-containment control of multiple UAVs: a path-following framework[J]. Aerospace Science and Technology, 2023, 135, 108168. doi: 10.1016/j.ast.2023.108168
6	CHEN L , DUAN H B . Collision-free formation-containment control for a group of UAVs with unknown disturbances[J]. Aerospace Science and Technology, 2022, 126, 107618. doi: 10.1016/j.ast.2022.107618
7	SHAO S K , PENG Y , HE C L , et al. Efficient path planning for UAV formation via comprehensively improved particle swarm optimization[J]. ISA Transactions, 2020, 97, 415- 430. doi: 10.1016/j.isatra.2019.08.018
8	WU Y , GOU J Z , HU X T , et al. A new consensus theory-based method for formation control and obstacle avoidance of UAVs[J]. Aerospace Science and Technology, 2020, 107, 106332. doi: 10.1016/j.ast.2020.106332
9	QU C Z , GAI W D , ZHONG M Y , et al. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning[J]. Applied Soft Computing, 2020, 89, 106099. doi: 10.1016/j.asoc.2020.106099
10	ZHANG Z , WU J , DAI J Y , et al. A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment[J]. IEEE Access, 2020, 8, 122757- 122771. doi: 10.1109/ACCESS.2020.3007496
11	吴文海, 郭晓峰, 周思羽. 基于改进约束差分进化算法的动态航迹规划[J]. 控制与决策, 2020, 35 (10): 2381- 2390.
	WU W H , GUO X F , ZHOU S Y . Dynamic route planning based on improved constrained differential evolution algorithm[J]. Control and Decision, 2020, 35 (10): 2381- 2390.
12	YU X B , JIANG N J , WANG X M , et al. A hybrid algorithm based on grey wolf optimizer and differential evolution for UAV path planning[J]. Expert Systems with Applications, 2023, 215, 119327. doi: 10.1016/j.eswa.2022.119327
13	XU L , CAO X B , DU W B , et al. Cooperative path planning optimization for multiple UAVs with communication constraints[J]. Knowledge-Based Systems, 2023, 260, 110164. doi: 10.1016/j.knosys.2022.110164
14	SILVA J A G , SANTOS D H , NEGREIROS A P F , et al. High-level path planning for an autonomous sailboat robot using Q-Learning[J]. Sensors, 2020, 20 (6): 1550. doi: 10.3390/s20061550
15	孙辉辉, 胡春鹤, 张军国. 移动机器人运动规划中的深度强化学习方法[J]. 控制与决策, 2021, 36 (6): 1281- 1292.
	SUN H H , HU C H , ZHANG J G . Deep reinforcement learning for motion planning of mobile robots[J]. Control and Decision, 2021, 36 (6): 1281- 1292.
16	LI X J , LIU H , LI J Q , et al. Deep deterministic policy gradient algorithm for crowd-evacuation path planning[J]. Computers & Industrial Engineering, 2021, 161, 107621.
17	ZHANG S T , LI Y B , DONG Q . Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing, 2022, 115, 108194. doi: 10.1016/j.asoc.2021.108194
18	POLYDOROS A S , NALPANTIDIS L . Survey of model-based reinforcement learning: applications on robotics[J]. Journal of Intelligent & Robotic Systems, 2017, 86 (2): 153- 173.
19	ZHANG F J , LI J , LI Z . A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J]. Neurocomputing, 2020, 411, 206- 215. doi: 10.1016/j.neucom.2020.05.097
20	SUI D , XU W P , ZHANG K . Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics, 2022, 35 (2): 195- 213.
21	周治国, 余思雨, 于家宝, 等. 面向无人艇的T-DQN智能避障算法研究[J]. 自动化学报, 2023, 49 (8): 1645- 1655.
	ZHOU Z G , YU S Y , YU J B , et al. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle[J]. Acta Automatica Sinica, 2023, 49 (8): 1645- 1655.
22	YAN C , XIANG X J , WANG C . Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments[J]. Journal of Intelligent & Robotic Systems, 2020, 98, 297- 309.
23	杨秀霞, 王晨蕾, 张毅, 等. 基于逆向强化学习的无人机路径规划[J]. 电光与控制, 2023, 30 (8): 1- 7.
	YANG X X , WANG C L , ZHANG Y , et al. UAV path planning based on reverse reinforcement learning[J]. Electronics Optics & Control, 2023, 30 (8): 1- 7.
24	QIE H , SHI D X , SHEN T L , et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE Access, 2019, 7, 146264- 146272.
25	ZHOU C H , LI J X , SHI Y J , et al. Research on multi-robot formation control based on MATD3 algorithm[J]. Applied Sciences, 2023, 13 (3): 1874.
26	WU Y , GOU J Z , JI H L , et al. Hierarchical mission replanning for multiple UAV formations performing tasks in dynamic situation[J]. Computer Communications, 2023, 200, 132- 148.
27	PAN Z H , ZHANG C X , XIA Y Q , et al. An improved artificial potential field method for path planning and formation control of the multi-UAV systems[J]. IEEE Trans.on Circuits and Systems Ⅱ: Express Briefs, 2022, 69 (3): 1129- 1133.
28	TAHIR A , BOLING J M , HAGHBAYAN M H , et al. Comparison of linear and nonlinear methods for distributed control of a hierarchical formation of UAVs[J]. IEEE Access, 2020, 8, 95667- 95680.
29	王锦锦, 祁圣君, 钟海, 等. 基于Dubins曲线的一致性编队集结控制[J]. 计算机仿真, 2021, 38 (7): 40- 44.
	WANG J J , QI S J , ZHONG H , et al. Consistent formation aggregation control based on dubins curve[J]. Computer Simulation, 2021, 38 (7): 40- 44.
30	TANG J . Analysis and improvement of traffic alert and collision avoidance system[J]. IEEE Access, 2017, 5, 21419- 21429.
31	LIU H , PENG F C , MODARES H , et al. Heterogeneous formation control of multiple rotorcrafts with unknown dynamics by reinforcement learning[J]. Information Sciences, 2021, 558, 194- 207.
32	PAN C , PENG Z H , LIU L , et al. Data-driven distributed formation control of under-actuated unmanned surface vehicles with collision avoidance via model-based deep reinforcement learning[J]. Ocean Engineering, 2023, 267, 113166.
33	ZHANG Y , MOU Z Y , GAO F F , et al. UAV-enabled secure communications by multi-agent deep reinforcement learning[J]. IEEE Trans.on Vehicular Technology, 2020, 69 (10): 11599- 11611.
34	孙田野, 孙伟, 吴建军. 改进Quatre算法的无人机编队快速集结方法[J]. 系统工程与电子技术, 2022, 44 (9): 2840- 2848. doi: 10.12305/j.issn.1001-506X.2022.09.18
	SUN T Y , SUN W , WU J J . UAV formation rapid assembly method based on improved Quatre algorithm[J]. Systems Engineering and Electronics, 2022, 44 (9): 2840- 2848. doi: 10.12305/j.issn.1001-506X.2022.09.18

序号	参数	数值
1	无人机数量	5
2	1、2、3号无人机所成角度/(°)	60
3	1、2、4号相邻无人机间距/m	200
4	1、3、5号相邻无人机间距/m	200

参数类型	超参数名称	符号	取值
3种算法共有参数	折扣因子	γ	0.990
	软更新系数	τ	0.010
	经验池大小	M	1 000 000
	批样本数	m	1 024
	Actor网络学习率	α_A	0.008
	Critic网络学习率	α_C	0.010
	动作噪声标准差	σ	0.200
	回合数	MaxEpisode	5 000, 10 000
	每回合最大时间步	MaxStep	100
MATD3算法、MATD3-IDFRF算法特有参数	延迟更新频率	C	10
MATD3算法、MATD3-IDFRF算法特有参数	Critic目标网络高斯噪声标准差	$\widetilde{\sigma} $	0.200

算法	平均路径/m	编队平均变形率/%	奖励函数收敛区间	平均奖励
MADDPG	10 763	55.50	[139, 223]	211
MATD3	10 438	22.60	[187, 228]	220
MATD3-IDFRF	10 536	0.68	[567, 586]	575

[1]	张庭瑜, 曾颖, 李楠, 黄洪钟. 基于深度强化学习的航天器功率-信号复合网络优化算法[J]. 系统工程与电子技术, 2024, 46(9): 3060-3069.
[2]	夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46(9): 3070-3081.
[3]	杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯. 复杂环境下的飞行器在线航路规划决策方法[J]. 系统工程与电子技术, 2024, 46(9): 3166-3175.
[4]	刘鹏涛, 雷菁, 刘伟. 无人机边缘计算: 架构、多址接入与计算卸载[J]. 系统工程与电子技术, 2024, 46(9): 3198-3210.
[5]	钟罡, 周蒋颖, 杜森, 张洪海, 刘皞. 基于航迹预测的无人机短时航迹偏离检测方法[J]. 系统工程与电子技术, 2024, 46(8): 2696-2708.
[6]	彭莉莎, 孙宇祥, 薛宇凡, 周献中. 融合三支多属性决策与SAC的兵棋推演智能决策技术[J]. 系统工程与电子技术, 2024, 46(7): 2310-2322.
[7]	费博雯, 包卫东, 刘大千, 朱晓敏. 面向动态目标搜索与打击的空地协同自主任务分配方法[J]. 系统工程与电子技术, 2024, 46(7): 2346-2358.
[8]	郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533.
[9]	李杰, 谭跃进. 基于集成改进蚁群算法的作战环推荐方法[J]. 系统工程与电子技术, 2024, 46(6): 2002-2012.
[10]	赵晓琛, 赵东涛, 袁航, 王欢, 张群. 低脉冲重复频率条件下无人机微动参数提取[J]. 系统工程与电子技术, 2024, 46(5): 1503-1513.
[11]	陶灿灿, 周锐. 面向空地中继网络优化的无人机运动控制方法[J]. 系统工程与电子技术, 2024, 46(5): 1712-1723.
[12]	孙家玮, 余明晖, 杨大鹏, 汤皓泉, 卞大鹏. 基于CL-RRT与MPC的舰载机牵引系统路径规划[J]. 系统工程与电子技术, 2024, 46(5): 1745-1755.
[13]	隋东, 杨振宇, 丁松滨, 周婷婷. 基于EMSDBO算法的无人机三维航迹规划[J]. 系统工程与电子技术, 2024, 46(5): 1756-1766.
[14]	余婧, 吴晓军, 蒋安林, 雍恩米. 基于多精度规划窗口的无人机航迹规划方法研究[J]. 系统工程与电子技术, 2024, 46(5): 1767-1776.
[15]	张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.

融合动态奖励策略的无人机编队路径规划方法

UAV formation path planning approach incorporating dynamic reward strategy

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 34

相关文章 15

编辑推荐

Metrics

本文评价