系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (9): 3076-3085.doi: 10.12305/j.issn.1001-506X.2025.09.29

• 制导、导航与控制 • 上一篇    

基于多智能体强化学习的无人机协同截击机动决策研究

杨大鹏1,2, 龚资浩3, 王小也2,*, 郭正玉4,5, 罗德林3   

  1. 1. 复旦大学航空航天系,上海 200433
    2. 沈阳飞机设计研究所,辽宁 沈阳 110035
    3. 厦门大学航空航天学院,福建 厦门 361102
    4. 中国空空导弹研究院,河南 洛阳,471000
    5. 空基信息感知与融合全国重点实验室,河南 洛阳,471000
  • 收稿日期:2023-09-11 出版日期:2025-09-25 发布日期:2025-09-16
  • 通讯作者: 王小也
  • 作者简介:杨大鹏(1987—),男,高级工程师,硕士,主要研究方向为飞行器控制
    龚资浩(1998—),男,硕士研究生,主要研究方向为无人机协同决策与控制
    郭正玉(1982—),男,高级工程师,博士,主要研究方向为飞行器协同控制、信息感知与融合
    罗德林(1968—),男,教授,博士,主要研究方向为飞行器制导与控制、无人机协同决策与控制、计算智能
  • 基金资助:
    空基信息感知与融合全国重点实验室与航空科学基金联合资助项目(20220001068001)资助课题

Research on UAV cooperative interception maneuver decision-making based on multi-agent reinforcement learning

Dapeng YANG1,2, Zihao GONG3, Xiaoye WANG2,*, Zhengyu GUO4,5, Delin LUO3   

  1. 1. Department of Aeronautics and Astronautics,Fudan University,Shanghai 200433,China
    2. Shenyang Aircraft Design and Research Institute,Shenyang 110035,China
    3. School of Aerospace Engineering,Xiamen University,Xiamen 361102,China
    4. China Airborne Missile Academy,Luoyang 471000,China
    5. National Key Laboratory of Air-based Information Perception and Fusion,Luoyang 471000,China
  • Received:2023-09-11 Online:2025-09-25 Published:2025-09-16
  • Contact: Xiaoye WANG

摘要:

无人机智能化协同截击博弈对抗是未来空战的重要作战场景。针对无人机协同战术截击问题,构建基于多智能体强化学习的战术截击决策框架。首先,对截击空战过程中相对态势几何关系进行分析。随后,根据截击空战态势威胁模型设置截击空战奖励函数。最后,设置无人机独立动作价值网络、编队联合动作价值网络和状态价值网络,以形成无人机协同截击战术生成最优截击策略,并引入截击线评估该截击策略的有效性。仿真结果表明,面对动态博弈条件下的多目标拦截任务,该框架能自主进行拦截目标分配并且形成智能协同截击战术。

关键词: 多目标协同截击, 截击战术, 无人机, 多智能体强化学习

Abstract:

The intelligent cooperative interception and confrontation game involving unmanned aerial vehicle (UAV) are crucial combat scenarios for the future of air warfare. To address the problem of UAV cooperative tactical interception, a tactical interception decision-making framework is proposed based on multi-agent reinforcement learning. Firstly, the relative situation geometric relationship between the interception process and the air combat situation is analyzed to form the interception air combat state space. Subsequently, an interception air combat reward function is set according to the interception air combat situational threat model. Finally, the establishment of the independent action value network for UAV, the collective action value network for formations, and the state value network is employed to formulate the optimal interception strategy for cooperative UAV interception tactics. The effectiveness of this interception strategy is evaluated by introducing the concept of an interception boundary. Simulation results show that the framework can autonomously assign interception targets and form intelligent cooperative interception tactics when facing multi-target interception tasks under dynamic game conditions.

Key words: multi-target cooperative interception, interception tactics, unmanned aerial vehicle (UAV), multi-agent reinforcement learning (MARL)

中图分类号: