Systems Engineering and Electronics ›› 2024, Vol. 46 ›› Issue (10): 3506-3518.doi: 10.12305/j.issn.1001-506X.2024.10.27

• Guidance, Navigation and Control • Previous Articles    

UAV formation path planning approach incorporating dynamic reward strategy

Heng TANG1, Wei SUN1,*, Lei LYU1, Ruofei HE2, Jianjun WU3, Changhao SUN4, Tianye SUN1   

  1. 1. School of Aerospace Science and Technology, Xidian University, Xi'an 710118, China
    2. The 365th Research Institute, Northwestern Polytechnical University, Xi'an 710072, China
    3. Xi'an ASN UAV Technology Co. Ltd, Xi'an 710065, China
    4. Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing 100094, China
  • Received:2023-11-01 Online:2024-09-25 Published:2024-10-22
  • Contact: Wei SUN

Abstract:

For the unmanned aerial vehicle (UAV) formation path planning problem in unknown dynamic environment, an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function (MATD3-IDFRF) algorithm is proposed. Firstly, the sparsity reward function is extended for the obstacle-free environment. Then, the dynamic formation problem, which is the focus of attention in UAV formation path planning, is analyzed in depth. It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment. The essence of the analysis is that the spacing between each two UAVs remains relatively stable, while it is also fine-tuned by the external environment. A reward function based on the optimal distance and current distance between each pair of UAVs is designed, leading to the proposal of a dynamic formation reward function, and which is then combined with the multi-agent twin delayed deep deterministic (MATD3) algorithm to propose the MATD3-IDFRF algorithm. Finally, comparison experiments are designed, and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%, while improving the converged reward average by 2.3% and reducing the formation deformation rate by 97% in the complex obstacle environment.

Key words: reinforcement learning (RL), reward function, unmanned aerial vehicle (UAV), dynamic formation, path planning

CLC Number: 

[an error occurred while processing this directive]