复杂环境下的飞行器在线航路规划决策方法

doi:10.12305/j.issn.1001-506X.2024.09.28

摘要/Abstract

摘要：

针对飞行器在线航路规划问题, 提出一种基于深度强化学习(deep reinforcement learning, DRL)的飞行器在线自主决策方法。首先对飞行器运动模型、探测模型进行了说明, 然后采用DRL深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法, 对飞行器飞行控制策略模型框架进行了构建。在此基础上, 提出了一种基于课程学习(curriculum learning, CL) 的CL-DDPG算法, 将在线航路规划任务进行分解, 引导飞行器进行目标靠近、威胁规避、航路寻优策略学习, 并设置相应的高斯噪声帮助飞行器对策略进行探索和优化, 实现了复杂场景下的飞行器自适应学习和决策控制。仿真实验证明, CL-DDPG算法能够有效提升模型的训练效率, 算法模型任务成功率更高, 具有优秀的泛化性和鲁棒性, 能够更好地应用于复杂动态环境下的在线航路规划任务中。

关键词: 在线航路规划, 深度强化学习, 自主决策, 课程学习, 威胁规避

Abstract:

Aiming at the problem of online route planning for aircraft, an online autonomous decision-making method for aircraft based on deep reinforcement learning (DRL) is proposed. Firstly, the maneuvering model and detection model of the aircraft are explained, and then the deep deterministic policy gradient (DDPG) algorithm of DRL is employed to construct the frame of the aircraft policy model. On this basis, a curriculum learning (CL)-DDPG algorithm based on CL is proposed, which decomposes the online route planning task, guides the aircraft to learn the strategies of target approach, threat avoidance, and air route optimization. The corresponding Gaussian noises are set to help the aircraft explore and optimize the strategy. And, the adaptive learning and decision-making control of the aircraft in complex scenarios are realized. Simulation experiments show that the CL-DDPG algorithm can effectively improve the training efficiency of the model. The algorithm model has higher task success rate, excellent generalization and robustness, and can be better applied to online route planning tasks in complex dynamic environments.

Key words: online route planning, deep reinforcement learning (DRL), autonomous decision-making, curriculum learning, threat avoidance

中图分类号:

TJ765

杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯. 复杂环境下的飞行器在线航路规划决策方法[J]. 系统工程与电子技术, 2024, 46(9): 3166-3175.

Zhipeng YANG, Zihao CHEN, Chang ZENG, Song LIN, Jindi MAO, Kai ZHANG. Online route planning decision-making method of aircraft in complex environment[J]. Systems Engineering and Electronics, 2024, 46(9): 3166-3175.

图/表 15

图1

图2

图3

图4

表1

表2

表3

图5

图6

图7

图8

图9

图10

图11

表4

参考文献 30

1	GUIX H,ZHANGJ F,PENGZ H.Trajectory clustering for arrival aircraft via new trajectory representation[J].Journal of Systems Engineering and Electronics,2021,32(2):473-486. doi: 10.23919/JSEE.2021.000040
2	NIKLASG,TOBIASB,DIRKN.Deep reinforcement learning with combinatorial actions spaces: an application to prescriptive maintenance[J].Computers & Industrial Engineering,2023,179(1):109165.
3	WANGX Y,YANGY P,WANGD,et al.Mission-oriented cooperative 3D path planning for modular solar-powered aircraft with energy optimization[J].Chinese Journal of Aeronautics,2022,35(1):98-109. doi: 10.1016/j.cja.2021.04.015
4	LIB,YANGZ P,CHEND Q,et al.Maneuvering target tracking of UAV based on MN-DDPG and transfer learning[J].Defence Technology,2021,17(2):457-466. doi: 10.1016/j.dt.2020.11.014
5	LIUC S,ZHANGS J.Novel robust control framework for morphing aircraft[J].Journal of Systems Engineering and Electronics,2013,24(2):281-287. doi: 10.1109/JSEE.2013.00035
6	OBAJEMUO,MAHFOUFM,MAIYARL M,et al.Real-time four-dimensional trajectory generation based on gain-sche-duling control and a high-fidelity aircraft model[J].Engineering,2021,7(4):495-506. doi: 10.1016/j.eng.2021.01.009
7	赵岩,吴建峰,高育鹏.基于多智能体导航的高超飞行器信息融合方法[J].系统工程与电子技术,2020,42(2):405-413. doi: 10.3969/j.issn.1001-506X.2020.02.20
	ZHAOY,WUJ F,GAOY P.Information fusion method of hypersonic vehicle based on multi-agent navigation[J].Systems Engineering and Electronics,2020,42(2):405-413. doi: 10.3969/j.issn.1001-506X.2020.02.20
8	陈宗基,张汝麟,张平,等.飞行器控制面临的机遇与挑战[J].自动化学报,2013,39(6):703-710.
	CHENZ J,ZHANGR L,ZHANGP,et al.Flight control: challenges and opportunities[J].Acta Automatica Sinica,2013,39(6):703-710.
9	DUCHONF,BABINECA,KAJANM,et al.Path planning with modified a star algorithm for a mobile robot[J].Procedia Engineering,2014,96(1):59-69.
10	LIUJ H,YANGJ,LIUH P,et al.An improved ant colony algorithm for robot path planning[J].Soft Computing,2017,21(1):5829-5839.
11	LI X Q, QIU L, AZIZ S, et al. Control method of UAV based on RRT * for target tracking in cluttered environment[C]//Proc. of the 7th International Conference on Power Electronics Systems and Applications-Smart Mobility, Power Transfer & Security, 2017.
12	杨杰. 具有端点方向约束的快速航迹规划方法研究[D]. 武汉: 华中科技大学, 2013.
	YANG J. Research on fast route planning method adapted to directional endpoint constraints[D]. Wuhan: Huazhong University of Science and Technology, 2013.
13	高科,宋佳,艾绍洁,等.高超声速飞行器再入段LQR自抗扰控制方法设计[J].宇航学报,2020,41(11):1418-1423. doi: 10.3873/j.issn.1000-1328.2020.11.007
	GAOK,SONGJ,AIS J,et al.LQR active disturbance rejection control method design for hypersonic vehicles in reentry phase[J].Journal of Astronautics,2020,41(11):1418-1423. doi: 10.3873/j.issn.1000-1328.2020.11.007
14	MNIHV,KAVUKCUOGLUK,SILVERD,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. doi: 10.1038/nature14236
15	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2023-04-30]. http://www.arxiv.org/abs/1509.02971.
16	HUANGC Q,DONGK S,HUANGH Q,et al.Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J].Journal of Systems Engineering and Electronics,2018,29(1):86-97. doi: 10.21629/JSEE.2018.01.09
17	WALKER O, VANEGAS F, GONZALEZ F, et al. A deep reinforcement learning framework for UAV navigation in indoor environments[C]//Proc. of the IEEE Aerospace Confe-rence, 2019.
18	LEVINES,FINNC,DARRELLT,et al.End-to-end training of deep visuomotor policies[J].The Journal of Machine Learning Research,2016,17(1):1334-1373.
19	张运涛. 面向无人机自主避障导航的深度强化学习算法研究[D]. 南京: 东南大学, 2021.
	ZHANG Y T. Research on deep reinforcement learning for autonomous obstacle avoidance and navigation of UAV[D]. Nanjing: Southeast University, 2021.
20	WANK F,GAOX G,HUZ J,et al.Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning[J].Remote Sensing,2020,12(4):640-660. doi: 10.3390/rs12040640
21	ZHANGC M,ZHUY W,YANGL P,et al.An optimal gui-dance method for free-time orbital pursuit-evasion game[J].Journal of Systems Engineering and Electronics,2022,33(6):1294-1308.
22	LIY F,SHIJ P,JIANGW,et al.Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm[J].Defence Technology,2022,18(9):1697-1714. doi: 10.1016/j.dt.2021.09.014
23	ZHANGH,JIAOZ X,SHANGY X,et al.Ground maneuver for front-wheel drive aircraft via deep reinforcement learning[J].Chinese Journal of Aeronautics,2021,34(10):166-176. doi: 10.1016/j.cja.2021.03.029
24	LIUQ,SHIL,SUNL L,et al.Path planning for UAV-mounted mobile edge computing with deep reinforcement learning[J].IEEE Trans.on Vehicular Technology,2020,69(3):5723-5728.
25	LIY H,WANGH L,WUT C,et al.Attitude control for hypersonic reentry vehicles: an efficient deep reinforcement learning method[J].Applied Soft Computing,2023,123(1):108865.
26	RUMMERY G A, NIRANJAN M. On-line Q-learning using connectionist systems[D]. Cambridge: University of Cambridge, 1994.
27	王冠,茹海忠,张大力,等.弹性高超声速飞行器智能控制系统设计[J].系统工程与电子技术,2022,44(7):2276-2285. doi: 10.12305/j.issn.1001-506X.2022.07.24
	WANGG,RUH Z,ZHANGD L,et al.Design of intelligent control system for flexible hypersonic vehicle[J].Systems Engineering and Electronics,2022,44(7):2276-2285. doi: 10.12305/j.issn.1001-506X.2022.07.24
28	YANG Q M, ZHU Y, ZHANG J D, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]//Proc. of the IEEE 15th International Conference on Control and Automation, 2019: 37-42.
29	NARVEKAR S, SINAPOV J, LEONETTI M, et al. Source task creation for curriculum learning[C]//Proc. of the ICAAMS 18th International Conference on Autonomous Agents & Multiagent Systems, 2016: 566-574.
30	DUW B,GUOT,CHENJ,et al.Cooperative pursuit of unauthorized UAVs in urban airspace via multi-agent reinforcement learning[J].Transportation Research Part C: Emerging Technologies,2021,128(1):103-122.

航路规划任务参数	数值
最大速度/(m·s^-1)	300
最大加速度/(m·s^-2)	20
最大角速度/((°)/s)	1
探测距离/km	10
威胁区数量	0~10
威胁区半径/km	3~12

算法模型参数	数值
经验回放队列容量M	100 000
采样大小N_batch	128
最大训练回合数E	2 000
每回合最大步长T	600
网络更新频率T_train/(次/训练步长)	30
动作网络学习率l_a	0.01~0.000 1
评价网络学习率l_c	0.02~0.000 1
折扣因子γ	0.97
软更新率τ	0.01

CL参数	数值
子课程1预训练回合数E₁	200
子课程1初始高斯噪声方差σ₁(0)	3
子课程2预训练回合数E₂	300
子课程2初始高斯噪声方差σ₂(0)	v
子课程3预训练回合数E₃	500
子课程3初始高斯噪声方差σ₃(0)	0.5
噪声衰减系数	0.999 95

算法	0个移动威胁区	3个移动威胁区	6个移动威胁区	9个移动威胁区
CL-DDPG DDPG	307 311	318 326	334 379	371 412

[1]	张庭瑜, 曾颖, 李楠, 黄洪钟. 基于深度强化学习的航天器功率-信号复合网络优化算法[J]. 系统工程与电子技术, 2024, 46(9): 3060-3069.
[2]	夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46(9): 3070-3081.
[3]	郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533.
[4]	张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.
[5]	李彦铃, 罗飞舟, 葛致磊. 基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究[J]. 系统工程与电子技术, 2024, 46(3): 1038-1047.
[6]	吴冯国, 陶伟, 李辉, 张建伟, 郑成辰. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45(6): 1702-1711.
[7]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[8]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
[9]	李信, 李勇军, 赵尚弘. 基于深度强化学习的卫星光网络波长路由算法[J]. 系统工程与电子技术, 2023, 45(1): 264-270.
[10]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[11]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[12]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[13]	高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433.
[14]	马文, 李辉, 王壮, 黄志勇, 吴昭欣, 陈希亮. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术, 2021, 43(2): 443-451.
[15]	高昂, 郭齐胜, 董志明, 杨绍卿. 基于EAS+MADRL的多无人车体系效能评估方法研究[J]. 系统工程与电子技术, 2021, 43(12): 3643-3651.