

系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (10): 3300-3312.doi: 10.12305/j.issn.1001-506X.2025.10.17
• 系统工程 • 上一篇
隋东, 蔡向嵘
收稿日期:2024-04-11
出版日期:2025-10-25
发布日期:2025-10-23
通讯作者:
蔡向嵘
作者简介:隋 东(1972—),男,副教授,博士,主要研究方向为空中交通智能化、空域规划
基金资助:Dong SUI, Xiangrong CAI
Received:2024-04-11
Online:2025-10-25
Published:2025-10-23
Contact:
Xiangrong CAI
摘要:
为提升智能飞行冲突解脱方法的持续有效性,考虑空域环境等因素动态变化的情况,在智能飞行冲突解脱算法中引入持续学习机制。首先基于马尔可夫决策过程对飞行冲突解脱问题建模;再利用深度强化学习方法对模型进行训练,使其能够有效解决飞行冲突;最后引入基于参数隔离和元学习两种持续学习方法,便于模型快速适应新冲突场景。实验结果表明,在引入持续学习方法后,模型在初期的冲突解脱成功率几乎超过70%,最终超过90%,对于初始训练场景的记忆留存程度超过87%,有效避免了灾难性遗忘,提升了模型的持续学习能力。该模型对于保障飞行安全、降低管制员工作负荷、提升空中交通运行效率有重要意义。
中图分类号:
隋东, 蔡向嵘. 智能飞行冲突解脱算法的持续学习机制[J]. 系统工程与电子技术, 2025, 47(10): 3300-3312.
Dong SUI, Xiangrong CAI. Continual learning mechanism for intelligent flight conflict resolution algorithm[J]. Systems Engineering and Electronics, 2025, 47(10): 3300-3312.
表2
实验模型设置"
| 基础模型 | 算法 | 训练数据集 | 所得模型 |
| 无 | ACKTR | ||
| 无 | ACKTR | ||
| 无 | ACKTR | ||
| 无 | ACKTR | ||
| ACKTR +EWC | |||
| ACKTR+EWC | |||
| ACKTR +MAML | |||
| ACKTR +MAML |
表5
基于MAML算法的冲突解脱实验超参数设置"
| 超参数 | 参数值 | 超参数 | 参数值 | ||
| 策略网络类型 | MLP | 策略网络层数 | 6 | ||
| 策略网络隐藏层节点数 | 64 | 策略网络更新频率 | 1 | ||
| 价值函数网络类型 | MLP | 价值函数网络层数 | 6 | ||
| 价值函数网络隐藏层节点数 | 64 | 价值函数网络更新频率 | 1 | ||
| 策略网络学习率 | 5e-4 | 价值函数网络学习率 | 5e-4 | ||
| 折扣因子 | 0.99 | 批采样数 | 32 | ||
| 内循环训练步长 | 1e6 | 外循环训练步长 | 1e6 | ||
| 值损失系数 | 0.1 | 熵损失系数 | 0.01 | ||
| 策略损失系数 | 0.1 | 内循环任务数量 | 2 |
| 1 |
CAFIERI S, DURAND N. Aircraft deconfliction with speed regulation: new models from mixed-integer optimization[J]. Journal of Global Optimization, 2014, 58 (4): 613- 629.
doi: 10.1007/s10898-013-0070-1 |
| 2 | CAFIERI S, REY D. Maximizing the number of conflict-free aircraft using mixed-integer nonlinear programming[J]. Computers & Operations Research, 2017, 80, 147- 158. |
| 3 |
CAFIERI S, OMHENI R. Mixed-integer nonlinear programming for aircraft conflict avoidance by sequentially applying velocity and heading angle changes[J]. European Journal of Operational Research, 2017, 260 (1): 283- 290.
doi: 10.1016/j.ejor.2016.12.010 |
| 4 | YAO Z, ZHANG L, XIAO H Q, et al. Modeling and detection of low-altitude flight conflict network based on SVM[J]. Measurement: Sensors, 2024, 31, 100954. |
| 5 |
WANG Z, LI H, WANG J F, et al. Deep reinforcement learning based conflict detection and resolution in air traffic control[J]. IET Intelligent Transport Systems, 2019, 13 (6): 1041- 1047.
doi: 10.1049/iet-its.2018.5357 |
| 6 |
SUI D, XU W P. Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics, 2022, 35 (2): 195- 213.
doi: 10.1016/j.cja.2021.03.015 |
| 7 | BRITTAIN M, WEI P. One to any: distributed conflict resolution with deep multi-agent reinforcement learning and long short-term memory[C]//Proc. of the AIAA SciTech Forum and Exposition, 2021: 2503−2512. |
| 8 |
BRITTAIN M W, WEI P. Scalable autonomous separation assurance with heterogeneous multi-agent reinforcement learning[J]. IEEE Trans. on Automation Science and Engineering, 2022, 19 (4): 2837- 2848.
doi: 10.1109/TASE.2022.3151607 |
| 9 | PHAM D T, TRAN N P, GOH S K, et al. Reinforcement learning for two-aircraft conflict resolution in the presence of uncertainty[C]//Proc. of the IEEE-RIVF International Conference on Computing and Communication Technologies, 2019. |
| 10 |
PHAM D T, TRAN N P, ALAM S. Deep reinforcement learning based path stretch vector resolution in dense traffic with uncertainties[J]. Transportation research Part C: Emerging Technologies, 2022, 135, 103463.
doi: 10.1016/j.trc.2021.103463 |
| 11 | 隋东, 董金涛. 基于相对熵逆强化学习的飞行冲突解脱方法[J]. 安全与环境学报, 2024, 24 (3): 1070- 1078. |
| SUI D, DONG J T. Flight conflict resolution method based on relative entropy inverse reinforcement learning[J]. Journal of Safety and Environment, 2024, 24 (3): 1070- 1078. | |
| 12 |
PAPADOPOULOS G, BASTAS A, VOUROS G A, et al. Deep reinforcement learning in service of air traffic controllers to resolve tactical conflicts[J]. Expert Systems with Applications, 2024, 236, 121234.
doi: 10.1016/j.eswa.2023.121234 |
| 13 | RIBEIRO M, ELLERBROEK J, HOEKSTRA J. Determining optimal conflict avoidance manoeuvres at high densities with reinforcement learning[C]//Proc. of the 10th SESAR Innovation Days, 2020: 7−10. |
| 14 | RIBEIRO M, ELLERBROEK J, HOEKSTRA J. Improvement of conflict detection and resolution at high densities through reinforcement learning[C]//Proc. of the International Conference on Research in Air Transportation, 2020: 23−26. |
| 15 |
江未来, 徐国强, 王耀南. 一种无人机自主避障与目标追踪方法[J]. 宇航学报, 2022, 43 (6): 802- 810.
doi: 10.3873/j.issn.1000-1328.2022.06.011 |
|
JIANG W L, XU G Q, WANG Y N. An autonomous obstacle avoidance and target tracking method for UAV[J]. Journal of Astronautics, 2022, 43 (6): 802- 810.
doi: 10.3873/j.issn.1000-1328.2022.06.011 |
|
| 16 | 魏志强, 商谢睿. 考虑环境影响的自由航路空域无冲突飞行规划[J]. 安全与环境学报, 2023, 23 (9): 3297- 3306. |
| WEI Z Q, SHANG X R. Conflict-free flight planning in free route airspace considering the influence of the environment[J]. Journal of Safety and Environment, 2023, 23 (9): 3297- 3306. | |
| 17 | 毕可心, 吴明功, 温祥西, 等. 基于飞行冲突网络和遗传算法的冲突解脱策略[J]. 系统工程与电子技术, 2023, 45 (5): 1429- 1440. |
| BI K X, WU M G, WEN X X, et al. Conflict resolution strategy based on flight conflict network and genetic algorithm[J]. Systems Engineering and Electronics, 2023, 45 (5): 1429- 1440. | |
| 18 | 陈锦辉, 田勇, 孙梦圆, 等. 基于时空棱锥的航迹冲突解脱策略研究[J]. 航空计算技术, 2024, 54 (1): 57- 61. |
| CHEN J H, TIAN Y, SUN M Y, et al. Study on trajectory conflict resolution strategies based on space-time prism[J]. Aviation Computing Technology, 2024, 54 (1): 57- 61. | |
| 19 |
PANG B Z, LOW K H, LV C. Adaptive conflict resolution for multi-UAV 4D routes optimization using stochastic fractal search algorithm[J]. Transportation Research Part C, 2022, 139, 103666.
doi: 10.1016/j.trc.2022.103666 |
| 20 |
SUI D, MA C Y, WEI C J. Tactical conflict solver assisting air traffic controllers using deep reinforcement learning[J]. Aerospace, 2023, 10 (2): 182.
doi: 10.3390/aerospace10020182 |
| 21 |
VERWIMP E, YANG K, PARISOT S, et al. CLAD: a realistic continual learning benchmark for autonomous driving[J]. Neural Networks, 2023, 161, 659- 669.
doi: 10.1016/j.neunet.2023.02.001 |
| 22 | CHOI S Y, KIM W J, KIM S W, et al. DSLR: diversity enhancement and structure learning for rehearsal-based graph continual learning[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2402.13711. |
| 23 |
KHETARPAL K, RIEMER M, RISH I, et al. Towards continual reinforcement learning: a review and perspectives[J]. Journal of Artificial Intelligence Research, 2022, 75, 1401- 1476.
doi: 10.1613/jair.1.13673 |
| 24 |
WANG L Y, ZHANG X X, SU H, et al. A comprehensive survey of continual learning: theory, method and application[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2024, 46 (8): 5362- 5383.
doi: 10.1109/TPAMI.2024.3367329 |
| 25 | DANIELS Z, RAGHAVAN A, HOSTETLER J, et al. Model-free generative replay for lifelong reinforcement learning: application to starcraft-2[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2208.05056. |
| 26 | TIWARI R, KILLAMSETTY K, LYER R, et al. GCR: gradient coreset based replay buffer selection for continual learning[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 99−108. |
| 27 | BENZING F. Unifying regularisation methods for continual learning[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2006.06357. |
| 28 | ZENKE F, POOLE B, GANGULI S. Continual learning through synaptic intelligence[C]//Proc. of the International Conference on Machine Learning, 2017: 3987−3995. |
| 29 | KAPOOR S, KARALETSOS T, BUI T D. Variational auto-regressive Gaussian processes for continual learning[C]//Proc. of the International Conference on Machine Learning, 2021. |
| 30 | HOSPEDALES T, ANTONIOU A, MIVAELLI P, et al. Meta-learning in neural networks: a survey[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021, 44 (9): 5149- 5169. |
| 31 | ALET F, SCHNEIDER M F, LOZANO-PEREZ T, et al. Meta-learning curiosity algorithms[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2003.05325. |
| 32 |
KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114 (13): 3521- 3526.
doi: 10.1073/pnas.1611835114 |
| 33 |
MACKAY D. A practical Bayesian framework for backpropagation networks[J]. Neural Computation, 1992, 4 (3): 448- 472.
doi: 10.1162/neco.1992.4.3.448 |
| 34 | FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proc. of the 34th International Conference on Machine Learning, 2017, 70: 1126−1135. |
| [1] | 魏潇龙, 吴亚荣, 姚登凯, 赵顾颢. 基于深度强化学习的无人机空战机动分层决策算法[J]. 系统工程与电子技术, 2025, 47(9): 2993-3003. |
| [2] | 朱运豆, 孙海权, 胡笑旋. 基于指针网络架构的多星协同成像任务规划方法[J]. 系统工程与电子技术, 2025, 47(7): 2246-2255. |
| [3] | 孟麟芝, 孙小涓, 胡玉新, 高斌, 孙国庆, 牟文浩. 面向卫星在轨处理的强化学习任务调度算法[J]. 系统工程与电子技术, 2025, 47(6): 1917-1929. |
| [4] | 郑康洁, 张新宇, 王伟菘, 刘震生. DQN与规则结合的智能船舶动态自主避障决策[J]. 系统工程与电子技术, 2025, 47(6): 1994-2001. |
| [5] | 刘书含, 李彤, 李富强, 杨春刚. 意图态势双驱动的数据链抗干扰通信机制[J]. 系统工程与电子技术, 2025, 47(6): 2055-2064. |
| [6] | 熊威, 张栋, 任智, 杨书恒. 面向有人/无人机协同打击的智能决策方法研究[J]. 系统工程与电子技术, 2025, 47(4): 1285-1299. |
| [7] | 马鹏, 蒋睿, 王斌, 徐盟飞, 侯长波. 基于隐式对手建模的策略重构抗智能干扰方法[J]. 系统工程与电子技术, 2025, 47(4): 1355-1363. |
| [8] | 唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841. |
| [9] | 陈夏瑢, 李际超, 陈刚, 刘鹏, 姜江. 基于异质网络的装备体系组合发展规划问题[J]. 系统工程与电子技术, 2025, 47(3): 855-861. |
| [10] | 张耀中, 吴卓然, 张建东, 杨啟明, 史国庆, 徐自祥. 基于ME-DDPG算法的无人机多对一追逃博弈[J]. 系统工程与电子技术, 2025, 47(10): 3288-3299. |
| [11] | 张庭瑜, 曾颖, 李楠, 黄洪钟. 基于深度强化学习的航天器功率-信号复合网络优化算法[J]. 系统工程与电子技术, 2024, 46(9): 3060-3069. |
| [12] | 夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46(9): 3070-3081. |
| [13] | 杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯. 复杂环境下的飞行器在线航路规划决策方法[J]. 系统工程与电子技术, 2024, 46(9): 3166-3175. |
| [14] | 郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533. |
| [15] | 张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||