智能飞行冲突解脱算法的持续学习机制

doi:10.12305/j.issn.1001-506X.2025.10.17

摘要/Abstract

摘要：

为提升智能飞行冲突解脱方法的持续有效性，考虑空域环境等因素动态变化的情况，在智能飞行冲突解脱算法中引入持续学习机制。首先基于马尔可夫决策过程对飞行冲突解脱问题建模；再利用深度强化学习方法对模型进行训练，使其能够有效解决飞行冲突；最后引入基于参数隔离和元学习两种持续学习方法，便于模型快速适应新冲突场景。实验结果表明，在引入持续学习方法后，模型在初期的冲突解脱成功率几乎超过70%，最终超过90%，对于初始训练场景的记忆留存程度超过87%，有效避免了灾难性遗忘，提升了模型的持续学习能力。该模型对于保障飞行安全、降低管制员工作负荷、提升空中交通运行效率有重要意义。

关键词: 空中交通管制, 飞行冲突解脱, 持续学习, 深度强化学习

Abstract:

In order to enhance the continual effectiveness of intelligent flight conflict resolution methods, considering the dynamic changes of factors such as airspace environment, a continual learning mechanism into the intelligent flight conflict resolution algorithm is proposed. Firstly, the flight conflict resolution problem based on Markov decision process is modeling. And then, the model using deep reinforcement learning methods is trained to effectively resolve flight conflicts. Finally, two continual learning methods, namely parameter isolation and meta-learning, are introduced to facilitate the model to rapidly adapt to new conflict scenarios. The experiment results show that after introducing the continual learning methods, the model achieves an initial successful rate of nearly over 70% in conflict resolution, eventually over 90%. The model retains over 87% of its memory for the initial training scenarios, effectively avoiding catastrophic forgetting and improving its continual learning capability. The model is of great significance in ensuring flight safety, reducing the workload of air traffic controllers and improving the efficiency of air traffic operations.

Key words: air traffic control, flight conflict resolution, continual learning, deep reinforcement learning

中图分类号:

V 355

隋东, 蔡向嵘. 智能飞行冲突解脱算法的持续学习机制[J]. 系统工程与电子技术, 2025, 47(10): 3300-3312.

Dong SUI, Xiangrong CAI. Continual learning mechanism for intelligent flight conflict resolution algorithm[J]. Systems Engineering and Electronics, 2025, 47(10): 3300-3312.

图/表 16

图1

图2

图3

图4

图5

表1

表2

表3

图6

表4

表5

图7

图8

表6

表7

表8

参考文献 34

1	CAFIERI S, DURAND N. Aircraft deconfliction with speed regulation: new models from mixed-integer optimization[J]. Journal of Global Optimization, 2014, 58 (4): 613- 629. doi: 10.1007/s10898-013-0070-1
2	CAFIERI S, REY D. Maximizing the number of conflict-free aircraft using mixed-integer nonlinear programming[J]. Computers & Operations Research, 2017, 80, 147- 158.
3	CAFIERI S, OMHENI R. Mixed-integer nonlinear programming for aircraft conflict avoidance by sequentially applying velocity and heading angle changes[J]. European Journal of Operational Research, 2017, 260 (1): 283- 290. doi: 10.1016/j.ejor.2016.12.010
4	YAO Z, ZHANG L, XIAO H Q, et al. Modeling and detection of low-altitude flight conflict network based on SVM[J]. Measurement: Sensors, 2024, 31, 100954.
5	WANG Z, LI H, WANG J F, et al. Deep reinforcement learning based conflict detection and resolution in air traffic control[J]. IET Intelligent Transport Systems, 2019, 13 (6): 1041- 1047. doi: 10.1049/iet-its.2018.5357
6	SUI D, XU W P. Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics, 2022, 35 (2): 195- 213. doi: 10.1016/j.cja.2021.03.015
7	BRITTAIN M, WEI P. One to any: distributed conflict resolution with deep multi-agent reinforcement learning and long short-term memory[C]//Proc. of the AIAA SciTech Forum and Exposition, 2021: 2503−2512.
8	BRITTAIN M W, WEI P. Scalable autonomous separation assurance with heterogeneous multi-agent reinforcement learning[J]. IEEE Trans. on Automation Science and Engineering, 2022, 19 (4): 2837- 2848. doi: 10.1109/TASE.2022.3151607
9	PHAM D T, TRAN N P, GOH S K, et al. Reinforcement learning for two-aircraft conflict resolution in the presence of uncertainty[C]//Proc. of the IEEE-RIVF International Conference on Computing and Communication Technologies, 2019.
10	PHAM D T, TRAN N P, ALAM S. Deep reinforcement learning based path stretch vector resolution in dense traffic with uncertainties[J]. Transportation research Part C: Emerging Technologies, 2022, 135, 103463. doi: 10.1016/j.trc.2021.103463
11	隋东, 董金涛. 基于相对熵逆强化学习的飞行冲突解脱方法[J]. 安全与环境学报, 2024, 24 (3): 1070- 1078.
	SUI D, DONG J T. Flight conflict resolution method based on relative entropy inverse reinforcement learning[J]. Journal of Safety and Environment, 2024, 24 (3): 1070- 1078.
12	PAPADOPOULOS G, BASTAS A, VOUROS G A, et al. Deep reinforcement learning in service of air traffic controllers to resolve tactical conflicts[J]. Expert Systems with Applications, 2024, 236, 121234. doi: 10.1016/j.eswa.2023.121234
13	RIBEIRO M, ELLERBROEK J, HOEKSTRA J. Determining optimal conflict avoidance manoeuvres at high densities with reinforcement learning[C]//Proc. of the 10th SESAR Innovation Days, 2020: 7−10.
14	RIBEIRO M, ELLERBROEK J, HOEKSTRA J. Improvement of conflict detection and resolution at high densities through reinforcement learning[C]//Proc. of the International Conference on Research in Air Transportation, 2020: 23−26.
15	江未来, 徐国强, 王耀南. 一种无人机自主避障与目标追踪方法[J]. 宇航学报, 2022, 43 (6): 802- 810. doi: 10.3873/j.issn.1000-1328.2022.06.011
	JIANG W L, XU G Q, WANG Y N. An autonomous obstacle avoidance and target tracking method for UAV[J]. Journal of Astronautics, 2022, 43 (6): 802- 810. doi: 10.3873/j.issn.1000-1328.2022.06.011
16	魏志强, 商谢睿. 考虑环境影响的自由航路空域无冲突飞行规划[J]. 安全与环境学报, 2023, 23 (9): 3297- 3306.
	WEI Z Q, SHANG X R. Conflict-free flight planning in free route airspace considering the influence of the environment[J]. Journal of Safety and Environment, 2023, 23 (9): 3297- 3306.
17	毕可心, 吴明功, 温祥西, 等. 基于飞行冲突网络和遗传算法的冲突解脱策略[J]. 系统工程与电子技术, 2023, 45 (5): 1429- 1440.
	BI K X, WU M G, WEN X X, et al. Conflict resolution strategy based on flight conflict network and genetic algorithm[J]. Systems Engineering and Electronics, 2023, 45 (5): 1429- 1440.
18	陈锦辉, 田勇, 孙梦圆, 等. 基于时空棱锥的航迹冲突解脱策略研究[J]. 航空计算技术, 2024, 54 (1): 57- 61.
	CHEN J H, TIAN Y, SUN M Y, et al. Study on trajectory conflict resolution strategies based on space-time prism[J]. Aviation Computing Technology, 2024, 54 (1): 57- 61.
19	PANG B Z, LOW K H, LV C. Adaptive conflict resolution for multi-UAV 4D routes optimization using stochastic fractal search algorithm[J]. Transportation Research Part C, 2022, 139, 103666. doi: 10.1016/j.trc.2022.103666
20	SUI D, MA C Y, WEI C J. Tactical conflict solver assisting air traffic controllers using deep reinforcement learning[J]. Aerospace, 2023, 10 (2): 182. doi: 10.3390/aerospace10020182
21	VERWIMP E, YANG K, PARISOT S, et al. CLAD: a realistic continual learning benchmark for autonomous driving[J]. Neural Networks, 2023, 161, 659- 669. doi: 10.1016/j.neunet.2023.02.001
22	CHOI S Y, KIM W J, KIM S W, et al. DSLR: diversity enhancement and structure learning for rehearsal-based graph continual learning[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2402.13711.
23	KHETARPAL K, RIEMER M, RISH I, et al. Towards continual reinforcement learning: a review and perspectives[J]. Journal of Artificial Intelligence Research, 2022, 75, 1401- 1476. doi: 10.1613/jair.1.13673
24	WANG L Y, ZHANG X X, SU H, et al. A comprehensive survey of continual learning: theory, method and application[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2024, 46 (8): 5362- 5383. doi: 10.1109/TPAMI.2024.3367329
25	DANIELS Z, RAGHAVAN A, HOSTETLER J, et al. Model-free generative replay for lifelong reinforcement learning: application to starcraft-2[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2208.05056.
26	TIWARI R, KILLAMSETTY K, LYER R, et al. GCR: gradient coreset based replay buffer selection for continual learning[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 99−108.
27	BENZING F. Unifying regularisation methods for continual learning[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2006.06357.
28	ZENKE F, POOLE B, GANGULI S. Continual learning through synaptic intelligence[C]//Proc. of the International Conference on Machine Learning, 2017: 3987−3995.
29	KAPOOR S, KARALETSOS T, BUI T D. Variational auto-regressive Gaussian processes for continual learning[C]//Proc. of the International Conference on Machine Learning, 2021.
30	HOSPEDALES T, ANTONIOU A, MIVAELLI P, et al. Meta-learning in neural networks: a survey[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021, 44 (9): 5149- 5169.
31	ALET F, SCHNEIDER M F, LOZANO-PEREZ T, et al. Meta-learning curiosity algorithms[EB/OL]. [2024-03-11]. https: //arxiv.org/abs/2003.05325.
32	KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114 (13): 3521- 3526. doi: 10.1073/pnas.1611835114
33	MACKAY D. A practical Bayesian framework for backpropagation networks[J]. Neural Computation, 1992, 4 (3): 448- 472. doi: 10.1162/neco.1992.4.3.448
34	FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proc. of the 34th International Conference on Machine Learning, 2017, 70: 1126−1135.

数据集	扇区小时流量	冲突样本数量/个	有无限制区
${d_1}$	35~40	5000	无
${d_2}$	35~40	5000	无
${d_3}$	35~40	5000	有
${D_1}$	41~46	5000	无
${D_2}$	41~46	5000	无
${D_3}$	41~46	5000	有

基础模型	算法	训练数据集	所得模型
无	ACKTR	${d_1}$	${m_1}$
无	ACKTR	${d_2}$	${m_2}$
无	ACKTR	${D_1}$	${M_1}$
无	ACKTR	${D_2}$	${M_2}$
${m_1}$	ACKTR +EWC	${d_3}$	${m_{1 - {\mathrm{EWC}}}}$
${M_1}$	ACKTR+EWC	${D_3}$	${M_{1 -{\mathrm{ EWC}}}}$
${m_1}$，${m_2}$	ACKTR +MAML	${d_3}$	${m_{12 - {\mathrm{MAML}}}}$
${M_1}$，${M_2}$	ACKTR +MAML	${D_3}$	${M_{12 - {\mathrm{MAML}}}}$

超参数	参数值	超参数	参数值
策略网络类型	MLP	策略网络层数	6
策略网络隐藏层节点数	64	策略网络更新频率	1
价值函数网络类型	MLP	价值函数网络层数	6
价值函数网络隐藏层节点数	64	价值函数网络更新频率	1
策略网络学习率	5e-4	价值函数网络学习率	5e-4
折扣因子	0.99	批采样数	32
训练步长	1e6	策略损失系数$ p\_{\mathrm{coef}} $	0.1
值损失系数$ v\_{\mathrm{coef}} $	0.1	熵损失系数$ e\_{\mathrm{coef}} $	0.01
旧任务重要性参数$\lambda $	1000	—	—

模型	训练场景	训练集SR	测试原场景SR	CLR
${m_1}$	${d_1}$	95.20	—	—
${m_{1 - {\mathrm{EWC}}}}$	${d_3}$	93.15	83.38	87.58
${M_1}$	${D_1}$	91.39	—	—
${M_{1 - {\mathrm{EWC}}}}$	${D_3}$	90.06	79.92	87.44

超参数	参数值	超参数	参数值
策略网络类型	MLP	策略网络层数	6
策略网络隐藏层节点数	64	策略网络更新频率	1
价值函数网络类型	MLP	价值函数网络层数	6
价值函数网络隐藏层节点数	64	价值函数网络更新频率	1
策略网络学习率	5e-4	价值函数网络学习率	5e-4
折扣因子	0.99	批采样数	32
内循环训练步长${m_{{\mathrm{in}}}}$	1e6	外循环训练步长${m_{{\mathrm{out}}}}$	1e6
值损失系数$ v\_{\mathrm{coef}} $	0.1	熵损失系数$ e\_{\mathrm{coef}} $	0.01
策略损失系数$ p\_{\mathrm{coef}} $	0.1	内循环任务数量$k$	2