系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (7): 2216-2223.doi: 10.12305/j.issn.1001-506X.2025.07.15

• 系统工程 • 上一篇    

基于强化学习的装备体系韧性优化方法

柳佳豪1, 徐任杰1,2,*, 孙茂桐2, 姜九瑶1, 李际超1, 杨克巍1   

  1. 1. 国防科技大学系统工程学院, 湖南 长沙 410073
    2. 慕尼黑工业大学管理学院, 德国 海尔布隆 74076
  • 收稿日期:2023-05-29 出版日期:2025-07-16 发布日期:2025-07-22
  • 通讯作者: 徐任杰
  • 作者简介:柳佳豪 (2002—), 男, 博士研究生, 主要研究方向为强化学习、体系韧性优化
    徐任杰 (1998—), 男, 博士研究生, 主要研究方向为管理科学与复杂系统管理、体系韧性评估及优化、复杂系统与复杂网络
    孙茂桐 (1997—), 男, 博士研究生, 主要研究方向为管理科学与医疗健康管理、强化学习应用
    姜九瑶 (1998—), 女, 博士研究生, 主要研究方向为复杂网络、作战体系韧性评估
    李际超 (1990—), 男, 副教授, 博士研究生导师, 博士, 主要研究方向为异质信息网络建模及分析、军事复杂系统智能决策
    杨克巍 (1977—), 男, 教授, 博士研究生导师, 博士, 主要研究方向为体系需求建模、体系结构设计与优化、系统建模与仿真

Reinforcement learning-based resilience optimization method of equipment system-of-systems

Jiahao LIU1, Renjie XU1,2,*, Maotong SUN2, Jiuyao JIANG1, Jichao LI1, Kewei YANG1   

  1. 1. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
    2. School of Management, Technical University of Munich, Heilbronn 74076, Germany
  • Received:2023-05-29 Online:2025-07-16 Published:2025-07-22
  • Contact: Renjie XU

摘要:

装备体系在实际运行下不可避免会受到外部攻击和内部故障等扰动事件的影响, 引起多个装备节点失效, 如何科学合理制定恢复策略迅速恢复体系能力, 增强装备体系韧性具有重要军事价值及意义。基于此, 提出一种基于强化学习的装备体系韧性优化方法。首先, 综合网络拓扑和网络性能参数建立装备体系韧性度量指标。其次, 提出基于Q-Learning节点恢复顺序的强化学习算法, 并采用不同的扰动场景来测试韧性的变化。最后, 结合典型案例验证提出算法的可行性及有效性。通过与经验性恢复策略、遗传算法的对比实验, 结果表明蓄意攻击下, 基于强化学习获得的韧性值相比于基于节点能力重要度优先恢复策略、基于度优先恢复策略和随机恢复策略高37.46%、52.28%和85.65%;与遗传算法相比, 优化后获得的韧性值提高28.72%。上述分析有效表明了所提方法和模型的优越性。

关键词: 装备体系, 强化学习, 韧性优化, 恢复策略

Abstract:

The equipment system-of-systems (ESoS) inevitably is affected by disturbance events such as external attacks and internal failures in actual operation, causing multiple equipment node failures. How to scientifically and rationally formulate recovery strategies to quickly restore system capabilities and enhance the resilience of the ESoS has important military value and significance. Based on this, this paper proposes an ESoS resilience optimization method based on reinforcement learning. Firstly, the ESoS resilience measurement index is established by integrating network topology and network performance parameters. Secondly, a reinforcement learning algorithm based on Q-Learning node recovery sequence is proposed, and different disturbance scenarios are used to test the change of resilience. Finally, combined with typical cases to verify the feasibility and effectiveness of the proposed algorithm. Through comparative experiments with empirical recovery strategies and genetic algorithm, the results show that with deliberate attacks, the toughness value obtained based on reinforcement learning is 37.46% higher than that based on node ability importance priority recovery strategy, degree priority recovery strategy and random recovery strategy 52.28% and 85.65%; compared with the genetic algorithm, the resilience value obtained after optimization increased by 28.72%. The above analysis effectively shows the superiority of the proposed method and model.

Key words: equipment system-of-systems (ESoS), reinforcement learning, resilience optimization, recovery strategy

中图分类号: