基于深度强化学习的带约束车辆路径分层优化研究

doi:10.12305/j.issn.1001-506X.2025.03.15

摘要/Abstract

摘要：

针对带容量约束的车辆路径问题(capacitated vehicle routing problem, CVRP), 提出一种利用层次结构对容量约束进行解耦的方法, 将复杂的CVRP拆分为约束规划和路径规划, 并分别进行深度强化学习(deep reinforcement learning, DRL)优化求解。首先, 上层基于注意力模型和采样机制对配送任务进行分配, 规划出满足容量约束的子回路集。其次, 下层采用预训练的无约束的注意力机制模型, 对子回路集进行路径规划。最后, 通过Reinforce算法反馈训练和迭代优化上层的网络参数。实验结果表明, 该方法对不同规模的CVRP和异构CVRP任务具有泛化性, 性能优于最先进的DRL方法; 并且与其他启发式方法相比, 在批量运算任务中, 求解速度提升10倍以上, 且保持具有竞争力的解。

关键词: 深度强化学习, 车辆路径问题, 注意力模型, 分层优化

Abstract:

For the capacitated vehicle routing problem (CVRP), a method is proposed to decouple the capacity constraints using a hierarchical structure, split the complex CVRP into constraint planning and path planning, and perform deep reinforcement learning (DRL) optimisation for solving the problem respectively. Firstly, the upper layer allocates the vehicle distribution tasks based on the attention model and sampling mechanism to plan the set of subpaths that satisfy the constraints. Secondly, the lower layer adopts the pre-trained unconstrained attention model to plan the paths for the set of subpaths. Finally, the network parameters of the upper layer are optimized through the feedback training and iteration of the Reinforce algorithm. Experimental results show that the method generalizes to CVRP and heterogeneous CVRP tasks of different sizes, outperforms the state-of-the-art DRL method. Moreover, compared with other heuristic methods, in batch computing tasks, the solution speed improved by more than 10 times, while maintaining competitive solutions.

Key words: deep reinforcement learning (DRL), vehicle routing problem (VRP), attention model, hiera-rchical optimization

中图分类号:

TP301

唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841.

Kaiqiang TANG, Huiqiao FU, Jiasheng LIU, Guizhou DENG, Chunlin CHEN. Hierarchical optimization research of constrained vehicle routing based on deep reinforcement learning[J]. Systems Engineering and Electronics, 2025, 47(3): 827-841.

图/表 12

图1

图2

图3

图4

图5

图6

表1

图7

图8

图9

表2

表3

参考文献 34

1	DANTZIG G , FULKERSON R , JOHNSON S . Solution of a large-scale traveling-salesman problem[J]. Journal of the Operations Research Society of America, 1954, 2 (4): 393- 410. doi: 10.1287/opre.2.4.393
2	TOTH P , VIGO D . The vehicle routing problem[M]. Philadelphia: Society for Industrial and Applied Mathematics, 2002.
3	李坚强, 蔡俊创, 孙涛, 等. 面向复杂物流配送场景的车辆路径规划多任务辅助进化算法[J]. 自动化学报, 2024, 50 (3): 544- 559.
	LI J Q , CAI J C , SUN T , et al. Multitask-based assisted evolutionary algorithm for vehicle routing problems in complex logistics distribution scenarios[J]. Acta Automatica Sinica, 2024, 50 (3): 544- 559.
4	QIN G Y , TAO F M , LI L X . A vehicle routing optimization problem for cold chain logistics considering customer satisfaction and carbon emissions[J]. International Journal of Environmental Research and Public Health, 2019, 16 (4): 576. doi: 10.3390/ijerph16040576
5	陈春良, 昝翔, 张仕新, 等. 基于改进MMAS的装备维修任务路径规划方法[J]. 系统工程与电子技术, 2017, 39 (12): 2716- 2720. doi: 10.3969/j.issn.1001-506X.2017.12.13
	CHEN C L , ZAN X , ZHANG S X , et al. Routing plan method for equipment maintenance task based on improved MMAS[J]. Systems Engineering and Electronics, 2017, 39 (12): 2716- 2720. doi: 10.3969/j.issn.1001-506X.2017.12.13
6	LI B J , WU G H , HE Y M , et al. An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9 (7): 1115- 1138. doi: 10.1109/JAS.2022.105677
7	KONSTANTAKOPOULOS G D , GAYIALIS S P , KECHAGIAS E P . Vehicle routing problem and related algorithms for logistics distribution: a literature review and classification[J]. Operational Research, 2022, 22 (3): 2033- 2062. doi: 10.1007/s12351-020-00600-7
8	LIU T , LUO Z X , QIN H , et al. A branch-and-cut algorithm for the two-echelon capacitated vehicle routing problem with grouping constraints[J]. European Journal of Operational Research, 2018, 266 (2): 487- 497. doi: 10.1016/j.ejor.2017.10.017
9	ZHANG Z , LUO Z X , QIN H , et al. Exact algorithms for the vehicle routing problem with time windows and combinatorial auction[J]. Transportation Science, 2019, 53 (2): 427- 441. doi: 10.1287/trsc.2018.0835
10	MAHMOUDI M , ZHOU X S . Finding optimal solutions for vehicle routing problem with pickup and delivery services with time windows: a dynamic programming approach based on state-space-time network representations[J]. Transportation Research Part B: Methodological, 2016, 89, 19- 42. doi: 10.1016/j.trb.2016.03.009
11	TAN S Y , YEH W C . The vehicle routing problem: State-of-the-art classification and review[J]. Applied Sciences, 2021, 11 (21): 10295. doi: 10.3390/app112110295
12	LAPORTE G . The vehicle routing problem: an overview of exact and approximate algorithms[J]. European journal of operational research, 1992, 59 (3): 345- 358. doi: 10.1016/0377-2217(92)90192-C
13	KAYA E , GORKEMLI B , AKAY B , et al. A review on the studies employing artificial bee colony algorithm to solve combinatorial optimization problems[J]. Engineering Applications of Artificial Intelligence, 2022, 115, 105311. doi: 10.1016/j.engappai.2022.105311
14	ELSHAER R , AWAD H . A taxonomic review of metaheuristic algorithms for solving the vehicle routing problem and its variants[J]. Computers & Industrial Engineering, 2020, 140, 106242.
15	吴冯国, 陶伟, 李辉, 等. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45 (6): 1702- 1711. doi: 10.12305/j.issn.1001-506X.2023.06.14
	WU F G , TAO W , LI H , et al. Intelligent avoidance decision making for unmanned aerial vehicles based on deep reinforcement learning algorithm[J]. Systems Engineering and Electronics, 2023, 45 (6): 1702- 1711. doi: 10.12305/j.issn.1001-506X.2023.06.14
16	夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46 (9): 3070- 3082. doi: 10.12305/j.issn.1001-506X.2024.09.19
	XIA Y Q , HUANG Y Y , CHEN Q . Path planning for unmanned vehicle reconnaissance mission based on deep Q-Network[J]. Systems Engineering and Electronics, 2024, 46 (9): 3070- 3082. doi: 10.12305/j.issn.1001-506X.2024.09.19
17	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]//Proc. of the 28th International Conference on Neural Information Processing Systems, 2015: 2692-2700.
18	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[EB/OL]. [2024-02-07]. https://arxiv.org/abs/1611.09940.
19	GOLDEN B , ASSAD A , LEVY L , et al. The fleet size and mix vehicle routing problem[J]. Computers & Operations Research, 1984, 11 (1): 49- 66.
20	KOOL W, VAN H H, WELLING M. Attention, learn to solve routing problems[C]//Proc. of the 6th International Conference on Learning Representations, 2018.
21	KWON Y D, CHOO J, KIM B, et al. Pomo: policy optimization with multiple optima for reinforcement learning[C]//Proc. of the Advances in Neural Information Processing Systems, 2020.
22	MA Y N, LI J W, CAO Z G, et al. Learning to iteratively solve routing problems with dual-aspect collaborative transformer[C]//Proc. of the Advances in Neural Information Processing Systems, 2021.
23	XIN L, SONG W, CAO Z G, et al. Multi-decoder attention model with embedding glimpse for solving vehicle routing problems[C]//Proc. of the 35th AAAI Conference on Artificial Intelligence, 2021: 12042-12049.
24	CHEN X Y, TIAN Y D. Learning to perform local rewriting for combinatorial optimization[C]//Proc. of the Advances in Neural Information Processing systems, 2019.
25	QIN W , ZHUANG Z L , HUANG Z Z , et al. A novel reinforcement learning-based hyper-heuristic for heterogeneous vehicle routing problem[J]. Computers & Industrial Engineering, 2021, 156, 107252.
26	LI J W , MA Y N , GAO R Z , et al. Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem[J]. IEEE Trans. on Cybernetics, 2022, 52 (12): 13572- 13585. doi: 10.1109/TCYB.2021.3111082
27	MA Q, GE S W, HE D Y, et al. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning[EB/OL]. [2024-02-07]. https://arxiv.org/abs/1911.04936.
28	WANG X P , FU H Q , DENG G Z , et al. Hierarchical free gait motion planning for hexapod robots using deep reinforcement learning[J]. IEEE Trans. on Industrial Informatics, 2023, 19 (1): 10901- 10912.
29	HOU Q C, YANG J W, SU Y Q, et al. Generalize learned heuristics to solve large-scale vehicle routing problems in real-time[C]//Proc. of the 11th International Conference on Learning Representations, 2022.
30	KINGMA D P. Adam: a method for stochastic optimization[C]// Proc. of the 3rd International Conference on Learning Representation, 2014.
31	WU Y X , SONG W , CAO Z G , et al. Learning improvement heuristics for solving routing problems[J]. IEEE Trans. on Neural Networks and Learning Systems, 2021, 33 (9): 5057- 5069.
32	HELSGAUN K . An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems[J]. Roskilde: Roskilde University, 2017, 12, 966- 980.
33	FURNON V, PERRON L. OR-tools routing library[EB/OL]. [2024-02-07]. https://developers.google.com/optimization/routing/.
34	CHRISTIAENS J , VANDEN BERGHE G . Slack induction by string removals for vehicle routing problems[J]. Transportation Science, 2020, 54 (2): 417- 433. doi: 10.1287/trsc.2019.0914

算法名称	参数	CVRP20			CVRP50			CVRP100			CVRP400
算法名称	参数	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s
LKH	—	6.14	0	280	10.40	0	1 887	15.71	0	4818	24.67	0	4.1
OR-Tools	—	6.26	1.95	1 018	11.05	6.25	1 094	17.08	8.72	1381	27.01	9.49	30.4
HGPN	greedy	7.7	25.41	1.2	14.65	40.87	2.9	23.51	49.65	6.7	—	—	—
AM	greedy	6.43	4.72	0.8	11.06	6.35	1.6	16.80	6.94	2.6	29.33	18.89	0.2
POMO	greedy	6.35	3.42	1	10.74	3.27	1	16.15	2.80	3	—	—	—
MDAM	greedy	6.24	1.62	2	10.71	2.98	15	16.52	5.16	26	—	—	—
MDAM	BS=30	6.16	0.03	33	10.56	1.54	92	16.15	2.80	407	—	—	—
DACT	T=1k	6.18	0.06	93	10.88	4.62	255	16.62	5.79	1 061	—	—	—
DACT	T=5K	6.15	0.02	251	10.55	1.35	1 282	16.26	3.50	3 280	—	—	—
TAM-AM	—	—	—	—	—	—	—	16.19	3.06	9	27.03	9.57	0.3
HDRL (本文)	M=100	6.17	0.05	16	10.53	1.25	31	16.02	1.97	66	25.84	4.74	44
HDRL (本文)	M=300	6.15	0.02	29	10.49	0.87	71	15.96	1.59	143	25.06	1.58	168

算法名称	V3-40			V3-60			V3-80			V3-100			V3-120
算法名称	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s	Obj.	Gap/%	Time/s
Exact-solver^*	55.43	0	71	78.47	0	214	102.42	0	793	124.61	0	2 512	-	-	-
SISR	55.79	0.65	(254)	79.12	0.83	(478)	103.41	0.97	(763)	126.19	1.27	(1 140)	149.10	0	(1 667)
AM (sample12800)	60.26	8.71	1.35	84.96	8.27	2.31	110.94	8.32	3.61	134.72	8.11	5.19	158.19	6.10	6.86
Li (sample12800)	57.07	2.54	1.65	80.53	2.37	2.99	105.43	2.69	4.63	128.60	2.87	6.74	151.10	1.09	9.11
HDRL (sample12800)	56.75	2.38	1.70	80.10	2.08	3.05	104.79	2.31	4.68	127.71	2.49	6.80	150.22	0.75	9.18

[1]	张庭瑜, 曾颖, 李楠, 黄洪钟. 基于深度强化学习的航天器功率-信号复合网络优化算法[J]. 系统工程与电子技术, 2024, 46(9): 3060-3069.
[2]	夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46(9): 3070-3081.
[3]	杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯. 复杂环境下的飞行器在线航路规划决策方法[J]. 系统工程与电子技术, 2024, 46(9): 3166-3175.
[4]	郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533.
[5]	张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.
[6]	李彦铃, 罗飞舟, 葛致磊. 基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究[J]. 系统工程与电子技术, 2024, 46(3): 1038-1047.
[7]	吴冯国, 陶伟, 李辉, 张建伟, 郑成辰. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45(6): 1702-1711.
[8]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[9]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.
[10]	李信, 李勇军, 赵尚弘. 基于深度强化学习的卫星光网络波长路由算法[J]. 系统工程与电子技术, 2023, 45(1): 264-270.
[11]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[12]	孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.
[13]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[14]	高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433.
[15]	马文, 李辉, 王壮, 黄志勇, 吴昭欣, 陈希亮. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术, 2021, 43(2): 443-451.