基于深度学习的轻量化目标检测算法

doi:10.12305/j.issn.1001-506X.2022.09.03

Abstract

Abstract:

Deep convolution neural networks have shown good results in various fields, accompanied by a huge amount of calculation and parameters. Aiming at the problems of high requirement of computational resources and serious memory consumption of the current deep convolution neural network based object detection algorithms, a high-performance lightweight network model is proposed. Firstly, Stem module and ShuffleNet V2 are fused to improve the network feature extraction capability, and the original YOLOv5 backbone network is reconstructed by the fused network, which significantly reduces the computational cost and memory consumption of the network. Meanwhile, deformable convolution is introduced to improve the detection performance of the network. Experimental results on the road monitoring images and VOC, COCO data sets show that the proposed model reduces the parameter and model size by 90%, and the calculation amount is only 18% of the original model, while the detection accuracy can be still maintained. The proposed lightweight detection model is more conducive to be deploied in the scenarios of limited computational resources and high real-time requirements.

Key words: object detection, convolution neural network, lightweight network, single stage detection algorithm, deformable convolution

CLC Number:

TP391

Shuang SONG, Yue ZHANG, Linna ZHANG, Yigang CEN, Yidong LI. Lightweight target detection algorithm based on deep learning[J]. Systems Engineering and Electronics, 2022, 44(9): 2716-2725.

Figures/Tables 15

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Table 1

Table 2

Table 3

Table 4

Fig.10

Fig.11

References 46

1	SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Proc. of the 31th AAAI Conference on Artificial Intelligence, 2017: 4278-4284.
2	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
3	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks[J]. Advances inNeural Information Processing Systems, 2012, 25 (2): 1097- 1105.
4	RUSSAKOVSKY O , DENG J , SU H , et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252. doi: 10.1007/s11263-015-0816-y
5	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1409.1556.
6	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
7	HE K M, ZHANG X, REN S Q, et al. Identity mappings in deep residual networks[C]//Proc. of the European Conference on Computer Vision, 2016: 630-645.
8	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
9	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proc. of the European Conference on Computer Vision, 2016: 21-37.
10	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1701.06659.
11	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
12	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
13	REDMON J, FARHADI A. Yolov3: an incremental improvement[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1804.02767.
14	REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2016, 39 (6): 1137- 1149.
15	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proc. of the Advances in Neural Information Processing Systems, 2016: 379-387.
16	张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58 (4): 438- 444.
	ZHANG X Y , GAO H B , ZHAO J H , et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University(Science and Technology), 2018, 58 (4): 438- 444.
17	CHEN C Y, SEFF A, KORNHAUSER A, et al. Deepdriving: learning affordance for direct perception in autonomous driving[C]//Proc. of the IEEE International Conference on Computer Vision, 2015: 2722-2730.
18	王云峰, 黎作鹏. 边缘环境中目标检测算法的应用研究[J]. 计算机工程与应用, 2021, 57 (16): 220- 227. doi: 10.3778/j.issn.1002-8331.2008-0280
	WANG Y F , LI Z P . Application research of target detection algorithm in edge environment[J]. Computer Engineering and Application, 2021, 57 (16): 220- 227. doi: 10.3778/j.issn.1002-8331.2008-0280
19	谌颃, 孙道宗. 基于CS优化深度学习卷积神经网络的目标检测算法[J]. 机床与液压, 2020, 48 (6): 187- 192. doi: 10.3969/j.issn.1001-3881.2020.06.028
	CHEN H , SUN D Z . Target detection algorithm based on CS optimized deep learning convolutional neural network[J]. Machine Tool & Hydraulics, 2020, 48 (6): 187- 192. doi: 10.3969/j.issn.1001-3881.2020.06.028
20	HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.
21	XIONG Y Y, LIU H X, GUPTA S, et al. Mobiledets: searching for object detection architectures for mobile accelerators[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3825-3834.
22	WU B C, DAI X L, ZHANG P Z, et al. FBNet: hardware-aware efficient convnet design via differentiable neural architecture search[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10734-10742.
23	ZHANG X Y, ZHOU X Y, LIN M X, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6848-6856.
24	MA N N, ZHANG X Y, ZHENG H T, et al. Shufflenet v2: practical guidelines for efficient CNN architecture design[C]//Proc. of the European Conference on Computer Vision, 2018: 116-131.
25	WANG R J, LI X, LING C X. Pelee: a real-time object detection system on mobile devices[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1804.06882.
26	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 764-773.
27	ZHU X Z, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
28	EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 303- 338. doi: 10.1007/s11263-009-0275-4
29	HAN S, MAO H Z, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1510.00149.
30	HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1506.02626.
31	LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 2736-2744.
32	HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1503.02531.
33	LUO P, ZHU Z Y, LIU Z W, et al. Face model compression by distilling knowledge from neurons[C]//Proc. of the 30th AAAI Conference on Artificial Intelligence, 2016: 3560-3566.
34	WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 390-391.
35	HE K M , ZHANG X Y , REN S Q , et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916. doi: 10.1109/TPAMI.2015.2389824
36	CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. cuDNN: efficient primitives for deep learning[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1410.0759.
37	HOWARD A G, ZHU M L, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1704.04861.
38	SANDLER M, HOWARD A, ZHU M L, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
39	QIN Z, LI Z M, ZHANG Z N, et al. ThunderNet: towards real-time generic object detection on mobile devices[C]//Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 6718-6727.
40	HUANG Z C , WANG J L , FU X , et al. DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection[J]. Information Sciences, 2020, 522, 241- 258. doi: 10.1016/j.ins.2020.02.067
41	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2004.10934.
42	WONG A, FAMUORI M, SHAFIEE M J, et al. YOLO nano: a highly compact you only look once convolutional neural network for object detection[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1910.01271.
43	LONG X, DENG K P, WANG G D, et al. PP-YOLO: an effective and efficient implementation of object detector[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2007.12099.
44	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proc. of the European Conference on Computer Vision, 2014: 740-755.
45	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2107.08430.
46	ZHANG Y M, LEE C C, HSIEH J W, et al. CSL-YOLO: a new lightweight object detection system for edge computing[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2107.04829.

类别	数据分布
类别	训练	测试
人	14 591	1 389
摩托车	3 468	372
小汽车	14 691	1 527
巴士	494	51
皮卡车	4 379	414
货车	7 538	752
大货车	765	76

模型	图像尺寸	FLOPs/B	模型大小/MB	mAP
MobileNet-SSD^[25]	300×300	1.15	13.2	0.680
Pelee-SSD	304×304	2.4	21.68	0.709
本文(320×320)	320×320	0.8	1.32	0.665
Tiny-YOLO	416×416	5.52	33.4	0.584
YOLO-Nano^[42]	416×416	4.57	4.0	0.691
ThunderNet_MM^[39]	416×416	-	32.9	0.738
PP-YOLO^[43]	416×416	-	269	0.843
本文(416×416)	416×416	1.3	1.32	0.681
YOLOv5s	512×512	10.9	13.73	0.852
本文(512×512)	512×512	2.0	1.32	0.696

模型	图像尺寸	FLOPs/B	模型大小/MB	AP^val
PP-YOLO_MBV3_S	320×320	-	16	0.172
PP-YOLO-Tiny	416×416	-	4.2	0.227
YOLOX-Nano^[45]	416×416	1.08	7.3	0.253
YOLOX-Tiny^[45]	416×416	6.45	38.8	0.317
Tiny-YOLO	416×416	5.52	33.4	0.166
YOLOv4-Tiny	416×416	6.9	23.1	0.217
MM-YOLO-MBV2	416×416	-	14.5	0.239
CSL-YOLO^[46]	416×416	1.47	14.6	0.245
YOLOv5s	640×640	17.1	13.73	0.367
本文	416×416	1.3	1.32	0.231

模型	参数量	模型大小/MB	推理耗时/ms	FLOPs416/B	mAP@: .5
YOLOv5s-prune	2 665 659	4.97	5.5	3.1	0.868
YOLOv5-Shuffle	364 925	0.78	2.6	0.5	0.834
YOLOv5-ShuffleS	367 629	0.80	2.8	0.6	0.846
YOLOv5s	7 276 605	13.73	5.6	7.2	0.934
本文	643 509	1.32	4.9	1.3	0.908

[1]	Wei FANG, Yu WANG, Wenjun YAN, Chong LIN. Symbolized flight action recognition based on neural network [J]. Systems Engineering and Electronics, 2022, 44(3): 737-745.
[2]	Hongyao LI, Xiaoqiang LI, Xinzhong HAN, Xueli XIE, Jianxiang XI. Cooperative object detection and recognition algorithm for multiple UAVs based on decision fusion [J]. Systems Engineering and Electronics, 2022, 44(3): 746-754.
[3]	Yonggang LI, Weigang ZHU, Qiongnan HUANG, Yuntao LI, Yonghua HE. Near-shore ship target detection with SAR images in complex background [J]. Systems Engineering and Electronics, 2022, 44(10): 3096-3103.
[4]	Jiali FAN, Shaobing TIAN, Kui HUANG, Xingdong ZHU. Multi-scale object detection algorithm for aircraft carrier surface based on Faster R-CNN [J]. Systems Engineering and Electronics, 2022, 44(1): 40-46.
[5]	Xu LI, Meng DING, Donghui WEI, Xiaozhou WU, Yunfeng CAO. Depth estimation method based on monocular infrared image in VDAS [J]. Systems Engineering and Electronics, 2021, 43(5): 1210-1217.
[6]	Dong CHEN, Yanwei JU. Ship detection in SAR image based on improved YOLOv3 [J]. Systems Engineering and Electronics, 2021, 43(4): 937-943.
[7]	Shuai ZHAO, Songtao LIU, Huiyang WANG. LPI radar waveform recognition algorithm based on PSO-CNN [J]. Systems Engineering and Electronics, 2021, 43(12): 3552-3563.
[8]	Yiming ZHANG, Jianliang AI. Positioning of aerial refueling drogue and docking control based on binocular vision [J]. Systems Engineering and Electronics, 2021, 43(10): 2940-2953.
[9]	Wang LU, Yasheng ZHANG, Can XU, Caiyong LIN. HRRP target recognition method based on bispectrum-spectrogram feature and deep convolutional neural network [J]. Systems Engineering and Electronics, 2020, 42(8): 1703-1709.
[10]	Zhenzhen YANG, Jun LE, Yongpeng YANG, Lu FAN. Object detection algorithm of nonconvex motion-assisted low rank and sparse decomposition [J]. Systems Engineering and Electronics, 2020, 42(6): 1218-1225.
[11]	Juan SU, Long YANG, Hua HUANG, Guodong JIN. Improved SSD algorithm for small-sized SAR ship detection [J]. Systems Engineering and Electronics, 2020, 42(5): 1026-1034.
[12]	ZHOU Long, WEI Suyuan, CUI Zhongma, FANG Jiaqi, YANG Xiaoting, YANG Long. Multiobjective detection of complex background radar imagebased on deep learning [J]. Systems Engineering and Electronics, 2019, 41(6): 1258-1264.
[13]	YANG Sihan, PENG Hua, XU Mankun, PAN Yiwei, HOU Xiaoyu. Ultra short wave specific signal spectrogram recognition based on convolution neural network [J]. Systems Engineering and Electronics, 2019, 41(4): 744-751.
[14]	XIONG Xinglong, CHEN Nan, LI Yongdong, MA Yuzhao, LI Meng, FENG Shuai. Type recognition of low level wind shear based on convolutional neural network [J]. Systems Engineering and Electronics, 2019, 41(4): 772-779.
[15]	WANG Quandong, CHANG Tianqing, ZHANG Lei, DAI Wenjun. Automatic detection and tracking system of tank armored targets based on deep learning algorithm [J]. Systems Engineering and Electronics, 2018, 40(9): 2143-2156.

Lightweight target detection algorithm based on deep learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 46

Related Articles 15

Recommended Articles

Metrics

Comments