DQN与规则结合的智能船舶动态自主避障决策

doi:10.12305/j.issn.1001-506X.2025.06.27

摘要/Abstract

摘要：

针对智能船舶避碰决策面临反复训练、难以灵活适应多样化会遇场景等问题。提出一种深度Q-网络(deep Q-network, DQN)与规则结合的智能船舶动态自主避障决策算法, 设计融合规则评估的部分可观测自主避障模型, 并结合深度强化学习对深度网络进行改进和训练。通过选择随机起点和终点的训练方式, 算法使智能船舶在无需反复训练的情况下, 能在动态和静态场景相结合的环境中实现自主避碰。仿真实验验证了算法无需重复训练即可实现自主避碰决策, 降低训练成本, 具有一定的泛化能力和鲁棒性, 可为智能船舶在复杂航行环境中的自主避碰提供解决方案。

关键词: 动态自主避障, 智能船舶, 免重复训练, 深度强化学习

Abstract:

Current intelligent ship collision avoidance decision-making faces challenges such as repetitive training and difficulty in adapting to diverse encounter scenarios. An intelligent ship dynamic autonomous obstacle avoidance decision-making algorithm based on deep Q-network (DQN) is proposed. The proposed algorithm designs a partially observable autonomous obstacle avoidance model that improves and trains deep network through deep reinforcement learning. By employing a training approach with random start and end points, the proposed algorithm enables intelligent ships to achieve autonomous collision avoidance in environments combining dynamic and static scenarios without the need for repetitive training. Simulation experiments validate that the proposed algorithm can achieve autonomous collision avoidance decision-making without repeated training, thereby reducing training costs. It demonstrates a certain level of generalization capability and robustness, offering a solution for autonomous collision avoidance in complex navigation environment for intelligent ships.

Key words: dynamic autonomous obstacle avoidance, intelligent ship, without repetitive training, deep reinforcement learning (DRL)

中图分类号:

U675.73

郑康洁, 张新宇, 王伟菘, 刘震生. DQN与规则结合的智能船舶动态自主避障决策[J]. 系统工程与电子技术, 2025, 47(6): 1994-2001.

Kangjie ZHENG, Xinyu ZHANG, Weisong WANG, Zhensheng LIU. Intelligent ship dynamic autonomous obstacle avoidance decision based on DQN and rule[J]. Systems Engineering and Electronics, 2025, 47(6): 1994-2001.

图/表 13

图1

图2

图3

图4

图5

图6

表1

表2

图7

图8

图9

图10

图11

参考文献 30

1	ROBLES A C. The COLREGS, mariners, and states[M]//ALFREDO C, ROBLES Jr. Vessel collisions in the law of the sea: the South China Sea arbitration. Singapore: Springer, 2022: 41-75.
2	王耀南, 安果维, 王传成, 等. 智能无人系统技术应用与发展趋势[J]. 中国舰船研究, 2022, 17 (5): 9- 26.
	WANG Y N , AN G W , WANG C C , et al. Technology application and development trend of intelligent unmanned system[J]. Chinese Journal of Ship Research, 2022, 17 (5): 9- 26.
3	LIU B , SOARES C G . Recent developments in ship collision a nalysis and challenges to an accidental limit state design method[J]. Ocean Engineering, 2023, 270 (1): 113636- 113644.
4	LI B D , LU J , LU H , et al. Predicting maritime accident consequence scenarios for emergency response decisions using optimization-based decision tree approach[J]. Maritime Policy & Management, 2023, 50 (1): 19- 41.
5	GAN L X , YE B Y , HUANG Z Q , et al. Knowledge graph construction based on ship collision accident reports to improve maritime traffic safety[J]. Ocean & Coastal Management, 2023, 240 (2): 106660- 106674.
6	赵燕, 苑茹滨, 刘帅, 等. 基于航行经验的智能船舶自主避碰算法研究[J]. 天津航海, 2023, 251 (2): 68- 74.
	ZHAO Y , YUAN R B , LIU S , et al. Research on intelligent ship autonomous collision avoidance algorithm based on navigation experience[J]. Tianjin Navigation, 2023, 251 (2): 68- 74.
7	YU Q , TEIXEIRA A P , LIU K , et al. Framework and application of multi-criteria ship collision risk assessment[J]. Ocean Engineering, 2022, 250 (4): 111006.
8	HWANG T , YOUN I H . Development of a graph-based collision risk situation model for validation of autonomous ships' collision avoidance systems[J]. Journal of Marine Science and Engineering, 2023, 11 (11): 2037- 2046. doi: 10.3390/jmse11112037
9	赵贵祥, 王晨旭, 王贺平, 等. 改进速度障碍法的无人艇局部路径规划[J]. 系统工程与电子技术, 2023, 45 (12): 3975- 3983. doi: 10.12305/j.issn.1001-506X.2023.12.28
	ZHAO G X , WANG C X , WANG H P , et al. Local path planning for unmanned surface vehicle using improved velocity obstacle method[J]. Systems Engineering and Electronics, 2023, 45 (12): 3975- 3983. doi: 10.12305/j.issn.1001-506X.2023.12.28
10	陈天德, 黄炎焱, 张永亮. 基于碰撞危险度的无陷阱动态航路规划[J]. 系统工程与电子技术, 2019, 41 (11): 2496- 2506. doi: 10.3969/j.issn.1001-506X.2019.11.13
	CHEN T D , HUANG Y Y , ZHANG Y L . Non-trap dynamic path planning based on collision risk[J]. Systems Engineering and Electronics, 2019, 41 (11): 2496- 2506. doi: 10.3969/j.issn.1001-506X.2019.11.13
11	丁振国, 张树奎, 胡甚平. 长江水道事故风险预测模型优化[J]. 上海海事大学学报, 2022, 43 (1): 66- 70.
	DING Z G , ZHANG S K , HU S P . Optimization of accident risk prediction model for Yangtze river waterway[J]. Journal of Shanghai Maritime University, 2022, 43 (1): 66- 70.
12	ZHONG S B , WEN Y Q , HUANG Y M , et al. Ontological ship behavior modeling based on COLREGs for knowledge reasoning[J]. Journal of Marine Science and Engineering, 2022, 10 (2): 203- 223.
13	VOLKOVA T A , BALYKINA Y E , BESPALOV A . Predicting ship trajectory based on neural networks using AIS data[J]. Journal of Marine Science and Engineering, 2021, 9 (3): 254- 265.
14	丁志国, 张新宇, 王程博, 等. 基于驾驶实践的无人船智能避碰决策方法[J]. 中国舰船研究, 2021, 16 (1): 96- 104.
	DING Z G , ZHANG X Y , WANG C B , et al. Intelligent collision avoidance decision-making method for unmanned ships based on driving practice[J]. Chinese Journal of Ship Research, 2021, 16 (1): 96- 104.
15	XIE S , GAROFANO V , CHU X M , et al. Model predictive ship collision avoidance based on Q-learning beetle swarm antenna search and neural networks[J]. Ocean Engineering, 2019, 193 (5): 106609- 106633.
16	HU J Y , YAN D W , ZHENG J . Embed behavior decision making into ship collision avoidance path planning based on ant colony and Q-learning algorithm[J]. Industrial Engineering and Innovation Management, 2022, 5 (1): 20- 28.
17	WANG C B , ZHANG X Y , YANG Z L , et al. Collision avoidance for autonomous ship using deep reinforcement learning and prior-knowledge-based approximate representation[J]. Frontiers in Marine Science, 2023, 9 (3): 1084763- 1084777.
18	WANG C B , ZHANG X Y , GAO H B , et al. Optimizing anti-collision strategy for MASS: a safe reinforcement learning approach to improve maritime traffic safety[J]. Ocean & Coastal Management, 2024, 253 (54): 107161- 107186.
19	WANG C B , ZHANG X Y , GAO H B , et al. COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making[J]. Knowledge-Based Systems, 2024, 300 (8): 112205- 112225.
20	WANG C B , WANG N , GAO H B , et al. Knowledge transfer enabled reinforcement learning for efficient and safe autonomous ship collision avoidance[J]. International Journal of Machine Learning and Cybernetics, 2024, 15, 3714- 3731.
21	CUI Z W , GUAN W , ZHANG X K . Collision avoidance decision-making strategy for multiple USVs based on deep reinforcement learning algorithm[J]. Ocean Engineering, 2024, 308 (2): 118323- 118343.
22	ZHANG X Y , ZHENG K J , WANG C B , et al. A novel deep reinforcement learning for POMDP-based autonomous ship collision decision-making[J]. Neural Computing and Applications, doi: 10.1007/s00521-023-08908-z
23	JIANG L L , AN L X , ZHANG X Y , et al. A human-like collision avoidance method for autonomous ship with attention-based deep reinforcement learning[J]. Ocean Engineering, 2022, 264 (3): 112378- 112390.
24	ZHENG K J , ZHANG X Y , WANG C B , et al. A partially observable multi-ship collision avoidance decision-making model based on deep reinforcement learning[J]. Ocean & Coastal Management, 2023, 242 (3): 106689- 106704.
25	SPAAN M T J. Partially observable Markov decision processes[M]// Reinforcement learning: state-of-the-art. WIERING M, OTTERLO M V. Heidelberg: Springer, 2012: 387-414.
26	RVOLODYMY M , KORAY K , DAVID S , et al. Human-level control through deep reinforcement learning[J]. Nature, 2018, 518 (7540): 529- 533.
27	TESAURO G . A self-teaching backgammon program, achieves master-level play[J]. Neural Computation, 1994, 6 (2): 215- 219.
28	WATTER M , SPRINGENBERG J , BOEDECKER J , et al. Embed to control: a locally linear latent dynamics model for control from raw images[J]. Advances in Neural Information Processing Systems, 2015, 28 (2): 165- 168.
29	HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2024-05-11]. https://arxiv.org/abs/1704.04861.
30	HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3[C]//Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.

超参数	设置值
学习率	0.000 1
经验池大小	1 000 000
折扣因子	0.98
目标网络更新步数	10 000
网络优化器	Adam
激活函数	ReLU
mini-batch大小	32

输入大小	操作类型	exp size	#out	SE	NL	s
150²×3	Conv2D	3	16	√	RE	2
75²×16	Conv2D	3	16	-	RE	2
75²×16	Bottleneck	3	72	-	RE	2
38²×72	Bottleneck	3	88	√	RE	1
38²×88	Bottleneck	5	96	√	HS	2
19²×96	Bottleneck	5	240	√	HS	1
19²×240	Bottleneck	5	240	√	HS	1
19²×240	Bottleneck	5	120	√	HS	1
19²×120	Bottleneck	5	144	√	HS	1
19²×144	Bottleneck	5	288	√	HS	2
10²×288	Bottleneck	5	576	√	HS	1
10²×575	Bottleneck	5	576	√	HS	1

[1]	孟麟芝, 孙小涓, 胡玉新, 高斌, 孙国庆, 牟文浩. 面向卫星在轨处理的强化学习任务调度算法[J]. 系统工程与电子技术, 2025, 47(6): 1917-1929.
[2]	刘书含, 李彤, 李富强, 杨春刚. 意图态势双驱动的数据链抗干扰通信机制[J]. 系统工程与电子技术, 2025, 47(6): 2055-2064.
[3]	熊威, 张栋, 任智, 杨书恒. 面向有人/无人机协同打击的智能决策方法研究[J]. 系统工程与电子技术, 2025, 47(4): 1285-1299.
[4]	马鹏, 蒋睿, 王斌, 徐盟飞, 侯长波. 基于隐式对手建模的策略重构抗智能干扰方法[J]. 系统工程与电子技术, 2025, 47(4): 1355-1363.
[5]	唐开强, 傅汇乔, 刘佳生, 邓归洲, 陈春林. 基于深度强化学习的带约束车辆路径分层优化研究[J]. 系统工程与电子技术, 2025, 47(3): 827-841.
[6]	陈夏瑢, 李际超, 陈刚, 刘鹏, 姜江. 基于异质网络的装备体系组合发展规划问题[J]. 系统工程与电子技术, 2025, 47(3): 855-861.
[7]	张庭瑜, 曾颖, 李楠, 黄洪钟. 基于深度强化学习的航天器功率-信号复合网络优化算法[J]. 系统工程与电子技术, 2024, 46(9): 3060-3069.
[8]	夏雨奇, 黄炎焱, 陈恰. 基于深度Q网络的无人车侦察路径规划[J]. 系统工程与电子技术, 2024, 46(9): 3070-3081.
[9]	杨志鹏, 陈子浩, 曾长, 林松, 毛金娣, 张凯. 复杂环境下的飞行器在线航路规划决策方法[J]. 系统工程与电子技术, 2024, 46(9): 3166-3175.
[10]	郭宏达, 娄静涛, 徐友春, 叶鹏, 李永乐, 陈晋生. 基于MADDPG的多无人车协同事件触发通信[J]. 系统工程与电子技术, 2024, 46(7): 2525-2533.
[11]	张梦钰, 豆亚杰, 陈子夷, 姜江, 杨克巍, 葛冰峰. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.
[12]	李彦铃, 罗飞舟, 葛致磊. 基于鲁棒观测器的深度强化学习垂直起降运载器姿态稳定研究[J]. 系统工程与电子技术, 2024, 46(3): 1038-1047.
[13]	吴冯国, 陶伟, 李辉, 张建伟, 郑成辰. 基于深度强化学习算法的无人机智能规避决策[J]. 系统工程与电子技术, 2023, 45(6): 1702-1711.
[14]	唐进, 梁彦刚, 白志会, 黎克波. 基于DQN的旋翼无人机着陆控制算法[J]. 系统工程与电子技术, 2023, 45(5): 1451-1460.
[15]	唐斯琪, 潘志松, 胡谷雨, 吴炀, 李云波. 深度强化学习在天基信息网络中的应用——现状与前景[J]. 系统工程与电子技术, 2023, 45(3): 886-901.