系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (9): 3166-3175.doi: 10.12305/j.issn.1001-506X.2024.09.28

• 制导、导航与控制 • 上一篇    

复杂环境下的飞行器在线航路规划决策方法

杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯   

  1. 湖北航天技术研究院总体设计所, 湖北 武汉 430040
  • 收稿日期:2023-05-11 出版日期:2024-08-30 发布日期:2024-09-12
  • 通讯作者: 林松
  • 作者简介:杨志鹏 (1995—), 男, 工程师, 硕士, 主要研究方向为飞行器任务规划
    陈子浩 (1995—), 男, 工程师, 硕士, 主要研究方向为飞行器航路规划
    曾长 (1987—), 男, 高级工程师, 硕士, 主要研究方向为飞行器系统总体设计
    林松 (1986—), 男, 高级工程师, 硕士, 主要研究方向为飞行器任务规划
    毛金娣 (1988—), 女, 高级工程师, 硕士, 主要研究方向为飞行器航路规划
    张凯 (1990—), 男, 高级工程师, 博士, 主要研究方向为飞行器系统总体设计
  • 基金资助:
    国家自然科学基金(62003267)

Online route planning decision-making method of aircraft in complex environment

Zhipeng YANG, Zihao CHEN, Chang ZENG, Song LIN, Jindi MAO, Kai ZHANG   

  1. System Design Institute of Hubei Aerospace Technology Academy, Wuhan 430040, China
  • Received:2023-05-11 Online:2024-08-30 Published:2024-09-12
  • Contact: Song LIN

摘要:

针对飞行器在线航路规划问题, 提出一种基于深度强化学习(deep reinforcement learning, DRL)的飞行器在线自主决策方法。首先对飞行器运动模型、探测模型进行了说明, 然后采用DRL深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法, 对飞行器飞行控制策略模型框架进行了构建。在此基础上, 提出了一种基于课程学习(curriculum learning, CL) 的CL-DDPG算法, 将在线航路规划任务进行分解, 引导飞行器进行目标靠近、威胁规避、航路寻优策略学习, 并设置相应的高斯噪声帮助飞行器对策略进行探索和优化, 实现了复杂场景下的飞行器自适应学习和决策控制。仿真实验证明, CL-DDPG算法能够有效提升模型的训练效率, 算法模型任务成功率更高, 具有优秀的泛化性和鲁棒性, 能够更好地应用于复杂动态环境下的在线航路规划任务中。

关键词: 在线航路规划, 深度强化学习, 自主决策, 课程学习, 威胁规避

Abstract:

Aiming at the problem of online route planning for aircraft, an online autonomous decision-making method for aircraft based on deep reinforcement learning (DRL) is proposed. Firstly, the maneuvering model and detection model of the aircraft are explained, and then the deep deterministic policy gradient (DDPG) algorithm of DRL is employed to construct the frame of the aircraft policy model. On this basis, a curriculum learning (CL)-DDPG algorithm based on CL is proposed, which decomposes the online route planning task, guides the aircraft to learn the strategies of target approach, threat avoidance, and air route optimization. The corresponding Gaussian noises are set to help the aircraft explore and optimize the strategy. And, the adaptive learning and decision-making control of the aircraft in complex scenarios are realized. Simulation experiments show that the CL-DDPG algorithm can effectively improve the training efficiency of the model. The algorithm model has higher task success rate, excellent generalization and robustness, and can be better applied to online route planning tasks in complex dynamic environments.

Key words: online route planning, deep reinforcement learning (DRL), autonomous decision-making, curriculum learning, threat avoidance

中图分类号: