基于边缘强化的无监督单目深度估计

doi:10.12305/j.issn.1001-506X.2024.01.08

摘要/Abstract

摘要：

为解决无监督单目深度估计边缘深度估计不准确的问题, 提出了一种基于边缘强化的无监督单目深度估计网络模型。该模型由单视图深度网络和姿态网络两部分构成, 均采用编解码结构, 其中单视图深度网络编码器使用高分辨率网络(high-resolution net, HRNet)作为骨干网络, 在整个过程中保持高分辨率表示, 有利于提取精确空间特征; 单视图深度网络解码器引入条状卷积, 细化深度边缘附近的深度变化, 同时利用经典的高斯拉普拉斯算子增强边缘细节, 最终充分利用深度边缘信息提高深度估计质量。在KITTI数据集中进行的实验结果表明: 所提模型具有较好的深度估计性能, 能够使深度图中的目标边缘更加清晰, 细节更加丰富。

关键词: 单目深度估计, 无监督学习, 条状卷积, 边缘增强

Abstract:

To solve the problem of poor edge depth estimation accuracy in unsupervised monocular depth estimation, an unsupervised monocular depth estimation model based on edge enhancement is proposed. The model is composed of a single-view depth network and a camera pose estimation network, both of which adopt encoder-decoder structures. The single-view depth network encoder uses high-resolution net (HRNet) as the backbone which maintains high resolution representations throughout the whole process, and is conducive to extract accurate spatial features; The single-view depth network decoder introduces strip convolutions to refine the depth variations near the edges, while enhancing the edge details using the classical Laplace of Gaussian operator. The method fully utilizes the depth edge information to improve the quality of the depth estimation. The experimental results on the KITTI dataset show that the proposed model has good depth estimation performance, making the edges of the depth map clearer with more details.

Key words: monocular depth estimation, unsupervised learning, strip convolutions, edge enhancement

中图分类号:

TP391

曲熠, 陈莹. 基于边缘强化的无监督单目深度估计[J]. 系统工程与电子技术, 2023, 46(1): 71-79.

Yi QU, Ying CHEN. Unsupervised monocular depth estimation based on edge enhancement[J]. Systems Engineering and Electronics, 2023, 46(1): 71-79.

图/表 10

图1

图2

图3

图4

图5

表1

表2

表3

表4

图6

参考文献 34

1	KIRAN B R , SOBH I , TALPAERT V , et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Trans.on Intelligent Transportation Systems, 2022, 23 (6): 4909- 4926. doi: 10.1109/TITS.2021.3054625
2	ZENG A, SONG S, NIEBNER M, et al. 3D Match: learning local geometric descrip-tors from RGB-D reconstructions[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 199-208.
3	MACARIO B A , MICHEL M , MOLINE Y , et al. A comprehensive survey of visual slam algorithms[J]. Robotics, 2022, doi: 10.3350/robotics/1010024
4	陈科圻, 朱志亮, 邓小明, 等. 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32 (4): 1201- 1227.
	CHEN K Q , ZHU Z L , DENG X M , et al. A survey of deep learning research on multi-scale target detection[J]. Journal of Software, 2021, 32 (4): 1201- 1227.
5	曹自强, 赛斌, 吕欣. 行人跟踪算法及应用综述[J]. 物理学报, 2020, 69 (8): 41- 58.
	CAO Z Q , SAI B , LYU X . A survey of pedestrian tracking algorithms and applications[J]. Acta Physica Sinica, 2020, 69 (8): 41- 58.
6	HOU L , LUO X Y , WANG Z Y , et al. Representation learning via a semi-supervised stacked distance autoencoder for image classification[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21 (7): 1005- 1019.
7	EIGEN D , PUHRSCH C , FERGUS R . Depth map prediction from a single image using a multi-scale deep network[J]. Advances in neural information processing systems, 2014, 27, 2366- 2374.
8	LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual networks[C]//Proc. of the 4th IEEE Conference on International Conference on 3D Vision, 2016: 239-248.
9	FU H, GONG M M, WANG C H, et al. Deep ordinal regression network for monocular depth estimation[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 2002-2011.
10	YUAN W H, GU X D, DAI Z Z, et al. Neural window fully-connected CRFs for monocular depth estimation[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 3906-3915.
11	GODARD C, AODHA O M, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 270-279.
12	ZHOU T H, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 6612-6619.
13	GODARD C, AODHA O M, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]//Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 3827-3837.
14	SHU C, YU K, DUAN Z X, et al. Feature-metric loss for self-supervised learning of depth and egomotion[C]//Proc. of European Conference on Computer Vision, 2020: 572-588.
15	LEE S, IM S, LIN S, et al. Learning monocular depth in dynamic scenes via instance aware projection consistency[C]//Proc. of the AAAI Conference on Artificial Intelligence, 2021, 1863-1872.
16	ZHU S J, BRAZIL G, LIU X M. The edge of depth: explicit constraints between segmentation and depth[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13113-13122.
17	叶星余, 何元烈, 汝少楠. 基于生成式对抗网络及自注意力机制的无监督单目深度估计和视觉里程计[J]. 机器人, 2021, 43 (2): 203- 213.
	YE X Y , HE Y L , RU S N . Unsupervised monocular depth estimation and visual odometer based on generative adversarial networks and self-attention mechanism[J]. Robot, 2021, 43 (2): 203- 213.
18	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proc. of the International Conference on Medical Image Computing and Computer-assisted Intervention, 2015: 234-241.
19	XUE F , CAO J F , ZHOU Y , et al. Boundary-induced and scene-aggregated network for monocular depth prediction[J]. Pattern Recognition, 2021, 115, 107901. doi: 10.1016/j.patcog.2021.107901
20	HOU B Q, ZHANG L, CHENG M M, et al. Strip pooling: rethinking spatial pooling for scene parsing[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4003-4012.
21	HUANG G X, BORS A G. Busy-quiet video disentangling for video classification[C]//Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 1341-1350.
22	WANG Z , BOVIK A C , SHEIKH H R , et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Trans.on Image Processing, 2004, 13 (4): 600- 612. doi: 10.1109/TIP.2003.819861
23	GEIGER A , LENZ P , STILLER C , et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32 (11): 1231- 1237. doi: 10.1177/0278364913491297
24	LI B , DAI Y C , HE M Y . Monocular depth estimation with hierarchical fusion of dilated CNNS and soft-weighted-sum inference[J]. Pattern Recognition, 2018, 83, 328- 339. doi: 10.1016/j.patcog.2018.05.029
25	AKADA H, BHAT S F, ALHASHIM I, et al. Self-supervised learning of domain invariant features for depth estimation[C]//Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 3377-3387.
26	ZHOU H, GREENWOOD D, TAYLOR S. Self-supervised monocular depth estimation with internal feature fusion[C]//Proc. of the 32nd British Machine Vision Conference, 2021: 378-391.
27	KLINGNER M, TERMOHLEN J A, MIKO-LAJCZYK J, et al. Self-supervised monocular depth estimation: solving the dynamic object problem by semantic guidance[C]//Proc. of European Conference on Computer Vision, 2020: 582-600.
28	CHOI J, JUNG D, LEE D, et al. SAFENet: self-supervised monocular depth estimation with semantic-aware feature extraction[EB/OL]. [2022-10-01] https://arxiv.org/abs/2010.02893.
29	LYU X Y, LIU L, WANG M M, et al. HR-depth: high resolution self-supervised monocular depth estimation[C]//Proc. of the AAAI Conference on Artificial Intelligence, 2021, 35(3): 2294-2301.
30	LIU H, ZHU Y, HUA G L, et al. Adaptive weighted network with edge enhancement module for monocular self-supervised depth estimation[C]//Proc. of ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 2022: 2340-2344.
31	CHEN Z, YE X Q, YANG W, et al. Revealing the reciprocal relations between self supervised stereo and monocular depth estimation[C]//Proc. of the IEEE/CVF International Confe-rence on Computer Vision, 2021: 15529-15538.
32	BIAN J W , ZHAN H Y , WANG N Y , et al. Unsupervised scale-consistent depth learning from video[J]. International Journal of Computer Vision, 2021, 129 (9): 2548- 2564. doi: 10.1007/s11263-021-01484-6
33	ZHANG S, ZHANG J, TAO D C. Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating IMU motion dynamics[C]//Proc. of the European Conference on Computer Vision, 2022: 143-160.
34	KINGMA D P, BA J. Adam: a method for stochastic optimization[C]//Proc. of the International Conference on Learning Representations, 2015: 6980-6995.

方法	分辨率	监督方式	误差指标(越小越好)				预测准确率(越大越好)
方法	分辨率	监督方式	Abs Rel	Sq Rel	RMSE	RMSE log	δ＜1.25	δ＜1.25²	δ＜1.25³
文献[24]	620×188	D	0.113	-	4.687	-	0.856	0.962	0.988
文献[13]	640×192	M	0.115	0.903	4.863	0.193	0.877	0.959	0.981
文献[27]	640×192	D+M	0.113	0.835	4.639	0.191	0.879	0.945	0.977
文献[28]	640×192	D+M	0.112	0.788	4.582	0.187	0.878	0.963	0.983
文献[29]	640×192	M	0.109	0.792	4.632	0.185	0.889	0.962	0.982
文献[30]	640×192	M	0.105	0.765	4.598	0.185	0.888	0.963	0.982
文献[33]	640×192	M	0.109	0.787	4.705	0.195	0.869	0.958	0.981
本文方法	640×192	M	0.106	0.745	4.510	0.183	0.891	0.964	0.983
文献[15]	832×256	M	0.112	0.777	4.772	0.191	0.872	0.959	0.982
文献[32]	832×256	M	0.114	0.813	4.485	0.185	0.885	0.958	0.979
文献[25]	960×288	D	0.168	1.288	5.498	0.235	0.771	0.921	0.973
文献[27]	1 280×384	D+M	0.107	0.768	4.468	0.186	0.891	0.963	0.892
文献[28]	1 024×320	D+M	0.106	0.743	4.489	0.181	0.884	0.965	0.983
文献[29]	1 024×320	M	0.106	0.755	4.472	0.181	0.892	0.966	0.984
文献[14]	1 024×320	M	0.104	0.729	4.481	0.179	0.893	0.965	0.984
文献[30]	1 024×320	M	0.104	0.732	4.427	0.181	0.894	0.965	0.984
文献[31]	1 024×320	M	0.094	0.681	4.392	0.185	0.892	0.962	0.981
本文方法	1 024×320	M	0.093	0.669	4.255	0.172	0.911	0.968	0.984

方法	条状卷积	边缘增强	复杂度指标		误差指标(越小越好)				预测准确率(越大越好)
方法	条状卷积	边缘增强	参数量/MB	计算量/GB	Abs Rel	Sq Rel	RMSE	RMSE log	δ＜1.25	δ＜1.25²	δ＜1.25³
Baseline	-	-	10.8	23.3	0.107	0.773	4.540	0.184	0.889	0.963	0.983
本文方法	√		12.1	31.6	0.107	0.754	4.541	0.184	0.889	0.963	0.983
		√	10.8	23.3	0.107	0.772	4.539	0.184	0.890	0.963	0.983
	√	√	12.1	31.6	0.106	0.745	4.510	0.183	0.891	0.964	0.983

方法	边缘强化模块应用位置	误差指标(越小越好)				预测准确率(越大越好)
方法	边缘强化模块应用位置	Abs Rel	Sq Rel	RMSE	RMSE log	δ＜1.25	δ＜1.25²	δ＜1.25³
Baseline	无	0.107	0.773	4.540	0.184	0.889	0.963	0.983
本文方法	输出最小深度图的解码器节点	0.108	0.793	4.577	0.185	0.888	0.964	0.983
	所有输出深度图的解码器节点	0.107	0.756	4.524	0.184	0.889	0.964	0.983
	输出最大深度图的解码器节点	0.106	0.745	4.510	0.183	0.891	0.964	0.983

方法	条状卷积形状	误差指标(越小越好)				预测准确率(越大越好)
方法	条状卷积形状	Abs Rel	Sq Rel	RMSE	RMSE log	δ＜1.25	δ＜1.25²	δ＜1.25³
Baseline	无	0.107	0.773	4.540	0.184	0.889	0.963	0.983
本文方法	9×1	0.108	0.808	4.605	0.185	0.891	0.903	0.982
	9×3	0.107	0.784	4.574	0.185	0.889	0.963	0.982
	11×1	0.108	0.789	4.587	0.186	0.887	0.963	0.982
	11×5	0.109	0.795	4.568	0.186	0.887	0.963	0.982
	11×3	0.106	0.745	4.510	0.183	0.891	0.964	0.983