注意力机制下双模态交互融合的目标跟踪网络

doi:10.12305/j.issn.1001-506X.2022.02.07

摘要/Abstract

摘要：

针对当前目标跟踪难以适应低光照、运动模糊、目标快速移动等挑战, 提出了空间通道注意力下的红外与可见光双模态交互融合跟踪网络。首先, 红外图像与可见光图像通过backbone三层卷积提取分层特征, 并降维至统一分辨率, 之后级联三层特征形成各模态特征。其次, 多模态特征通过所设计的空间通道自注意力模块和跨模态交互注意力模块使得模态聚焦于全局空间特征和高响应通道, 提高双模态信息的互补性, 然后级联得到融合特征。最后, 将融合特征送入三层全连接完成目标跟踪任务。在目前最大的红外可见光跟踪数据集RGBT234的实验结果表明, 本文网络能有效提取双模态交互特征, 提高目标跟踪精度, 其精度/成功率比基线网络分别提高了5.3%和4.2%。

关键词: 红外与可见光, 目标跟踪, 深度学习, 注意力融合

Abstract:

Aiming at the challenges of current object tracking that is difficult to low illusion, motion blur, and fast motion, a dual-modal interacive fusion tracking network of infrared and visible under spatial channel attention is proposed. First, the infrared and RGB images are extracted through the backbone three-layer convalution to extract layered features which are normalized to the same resolution via dimension reduction. The three-layer features are cascaded to form each modal feature. Then the features are sent to the designed spatial channel self-attention module and the cross-module interactive attention module which lead network focus on global spatial features and high-response channels and therefore improve the complementarity of the dual-modal information. The interacted features of the dual-modal are cascaded for the fusion and finally sent to three fully connected layers to complete the target tracking. The experimental results of the largest RGB-Themeral (RGB-T) tracking data set RGBT234 show that the proposed network can effectively extract dual-modal interactive features and improve target tracking accuracy. Its Precision/Success Rateis improced by 5.3% and 4.2%, respectively, compared with the baseline network.

Key words: RGB-Themeral (RGB-T), object tracking, deep learning, attention fusion

中图分类号:

TP391

姚云翔, 陈莹. 注意力机制下双模态交互融合的目标跟踪网络[J]. 系统工程与电子技术, 2022, 44(2): 410-419.

Yunxiang YAO, Ying CHEN. Target tracking network based on dual-modal interactive fusion under attention mechanism[J]. Systems Engineering and Electronics, 2022, 44(2): 410-419.

图/表 10

图1

图2

图3

图4

图5

图6

表1

图7

表2

图8

参考文献 30

1	谢瑜, 陈莹. 通道裁剪下的多特征组合目标跟踪算法[J]. 系统工程与电子技术, 2020, 42 (4): 764- 772.
	XIE Y , CHEN Y . Multi-feature combined target tracking algorithm based on channel clipping[J]. Systems Engineering and Electronics, 2020, 42 (4): 764- 772.
2	ZHU Y, MOTTAGHI R, KOLVE E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]// Proc. of the IEEE International Conference on Robotics and Automation, 2017: 3357-3364.
3	YU H F , LI G , ZHANG W Z , et al. The unmanned aerial vehicle benchmark: object detection, tracking and baseline[J]. International Journal of Computer Vision, 2020, 128 (5): 1141- 1159. doi: 10.1007/s11263-019-01266-1
4	DUAN X, XIE S S, MENG Y Z, et al. Brain computer integration controlled unmanned vehicle for target reconnaissance[C]//Proc. of the IEEE International Conference on Unmanned Systems, 2019: 35-39.
5	张开华, 樊佳庆, 刘青山. 视觉目标跟踪十年研究进展[J]. 计算机科学, 2021, 48 (3): 40- 49.
	ZHANG K H , FAN J Q , LIU Q S . Advances on visual object tracking in past decade[J]. Computer Science, 2021, 48 (3): 40- 49.
6	黄月平, 李小锋, 杨小冈, 等. 基于相关滤波的视觉目标跟踪算法新进展[J]. 系统工程与电子技术, 2021, 43 (8): 2051- 2065.
	HUANG Y P , LI X F , YANG X G , et al. New development of visual object tracking algorithm based on correlation filtering[J]. Systems Engineering and Electronics, 2021, 43 (8): 2051- 2065.
7	CHEN Z C, ZHONG B H, LI G, et al. Siamese box adaptive network for visual tracking[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 6668-6677.
8	WU Y H, BLASCH E, CHEN G, et al. Multiple source data fusion via sparse representation for robust visual tracking[C]//Proc. of the 14th International Conference on Information Fusion, 2011.
9	LIU H Y , SUN F F . Fusion tracking in color and infrared images using joint sparse representation[J]. Science China Information Sciences, 2012, 55 (3): 590- 599. doi: 10.1007/s11432-011-4536-9
10	LI C, ZHAO N N, LU Y F, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]// Proc. of the 25th ACM International Conference on Multimedia, 2017: 1856-1864.
11	LI C, ZHU C, HUANG Y P, et al. Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking[C]// Proc. of the European Conference on Computer Vision, 2018: 808-823.
12	ZHANG X H, ZHANG X F, DU X, et al. Learning multi-domain convolutional network for RGB-T visual tracking[C]//Proc. of the 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, 2018.
13	NAM H, HAN B B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4293-4302.
14	LI C , CHENG H Y , HU S , et al. Learning collaborative sparse representation for grayscale-thermal tracking[J]. IEEE Trans.on Image Processing, 2016, 25 (12): 5743- 5756. doi: 10.1109/TIP.2016.2614135
15	LAN X X, YE M, ZHANG S H, et al. Robust collaborative discriminative learning for RGB-infrared tracking[C]//Proc. of the AAAI Conference on Artificial Intelligence New Orleans, 2018.
16	ZHU Y F , LI C D , TANG J , et al. Quality-aware feature aggregation network for robust RGB-T tracking[J]. IEEE Trans.on Intelligent Vehicles, 2020, 6 (1): 121- 130.
17	LI C P , LIANG X , LU Y H , et al. RGB-T object tracking: benchmark and baseline[J]. Pattern Recognition, 2019, 96, 106977. doi: 10.1016/j.patcog.2019.106977
18	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proc. of the 3rd International Conference on Learning Representations, 2015.
19	LI C P , WU X , ZHAO N S , et al. Fusing two-stream convolutional neural networks for RGB-T object tracking[J]. Neurocomputing, 2018, 281, 78- 85. doi: 10.1016/j.neucom.2017.11.068
20	ZHENG Q , CHEN Y S . Feature pyramid of bi-directional stepped concatenation for small object detection[J]. Multimedia Tools and Applications, 2021, 38 (4): 314- 322. doi: 10.1007/s11042-021-10718-1?utm_source=xmol
21	CHEN L C , PAPANDREOU G , KOKKINOS I , et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848.
22	HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
23	FU J, LIU J, TIAN H F, et al. Dual attention network for scene segmentation[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 3146-3154.
24	JUNG I, SON J, BAEK M, et al. Real-time mdnet[C]//Proc. of the European Conference on Computer Vision, 2018: 83-98.
25	GIRSHICK R. Fast R-CNN[C]//Proc. of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
26	LUKEZIC A, VOJIR T, CEHOVIN Z L, et al. Discriminative correlation filter with channel and spatial reliability[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6309-6318.
27	HENRIQUES J F , CASEIRO R , MARTINS P , et al. High-speed tracking with kernelized correlation filters[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2014, 37 (3): 583- 596.
28	ZHU Y Y, LI C, LUO B F, et al. Dense feature aggregation and pruning for RGB-T tracking[C]//Proc. of the 27th ACM International Conference on Multimedia, 2019: 465-472.
29	ZHANG Z, PENG H. Deeper and wider siamese networks for real-time visual tracking[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 4591-4600.
30	TU Z, LIN C, LI C, et al. M⁵L: multi-modal multi-margin metric learning for RGB-T tracking[EB/OL]. [2021-03-24]. https://arxiv.org/abs/2003.07650v1,2003.07650,2020.

挑战属性	MDNet+RGB-T		RT-MDNet+RGB-T		CFNet+RGB-T		CMRT		SiamDW+RGB-T		DAPNet		M5L		Our
挑战属性	PR	SR	PR	SR	PR	SR	PR	SR	PR	SR	PR	SR	PR	SR	PR	SR
BC	0.644	0.432	0.725	0.455	0.463	0.308	0.631	0.398	0.519	0.323	0.717	0.484	0.766	0.498	0.753	0.483
CM	0.640	0.454	0.644	0.455	0.417	0.318	0.629	0.447	0.562	0.382	0.668	0.474	0.716	0.500	0.711	0.498
DEF	0.668	0.473	0.670	0.466	0.523	0.367	0.667	0.473	0.558	0.390	0.717	0.578	0.727	0.500	0.723	0.507
FM	0.586	0.363	0.637	0.387	0.454	0.299	0.613	0.384	0.597	0.365	0.670	0.443	0.659	0.420	0.740	0.467
HO	0.619	0.421	0.618	0.404	0.417	0.290	0.563	0.377	0.520	0.337	0.660	0.444	0.662	0.457	0.682	0.456
LI	0.670	0.455	0.737	0.474	0.523	0.369	0.742	0.498	0.600	0.399	0.775	0.530	0.761	0.495	0.766	0.576
LR	0.759	0.493	0.760	0.483	0.551	0.365	0.687	0.420	0.605	0.370	0.750	0.510	0.762	0.496	0.784	0.513
MB	0.654	0.463	0.612	0.429	0.357	0.271	0.600	0.427	0.494	0.340	0.653	0.467	0.670	0.472	0.709	0.505
NO	0.862	0.611	0.894	0.586	0.764	0.563	0.895	0.616	0.783	0.534	0.900	0.644	0.904	0.619	0.870	0.639
PO	0.761	0.518	0.780	0.517	0.597	0.417	0.777	0.536	0.608	0.396	0.817	0.544	0.821	0.574	0.826	0.572
SV	0.735	0.505	0.735	0.482	0.596	0.433	0.710	0.493	0.609	0.405	0.772	0.513	0.780	0.542	0.773	0.545
TC	0.756	0.517	0.786	0.513	0.457	0.327	0.675	0.443	0.569	0.368	0.768	0.538	0.781	0.543	0.748	0.536
All	0.722	0.495	0.734	0.483	0.551	0.390	0.711	0.486	0.604	0.397	0.766	0.537	0.770	0.521	0.775	0.537

方法	GTOT数据集
方法	PR	SR
MDNet+RGB-T	0.8	0.637
Our-AGG	0.842	0.685
Our-SCIF	0.856	0.693
Our-SC	0.861	0.687
Our-AGGS	0.857	0.690
Our	0.878	0.714

[1]	韩啸, 陈世文, 陈蒙, 杨锦程. 基于互易点学习的LPI信号开集识别[J]. 系统工程与电子技术, 2022, 44(9): 2752-2759.
[2]	仇祝令, 查宇飞, 李振宇, 李禹铭, 张鹏, 朱川. 基于多模型蒸馏的时间正则化相关滤波跟踪算法[J]. 系统工程与电子技术, 2022, 44(8): 2448-2456.
[3]	侯子林, 程婷, 彭瀚. 基于量测转换序贯滤波的GMPHD机动目标跟踪[J]. 系统工程与电子技术, 2022, 44(8): 2474-2482.
[4]	张立民, 谭凯文, 闫文君, 张聿远. 基于多级跳线残差网络的雷达辐射源识别[J]. 系统工程与电子技术, 2022, 44(7): 2148-2156.
[5]	史浩然, 卢发兴, 祁江鑫, 杨光. 基于辅助信标的无人机协同目标跟踪[J]. 系统工程与电子技术, 2022, 44(7): 2302-2310.
[6]	金国栋, 薛远亮, 谭力宁, 许剑锟. 基于孪生神经网络的目标跟踪算法进展研究[J]. 系统工程与电子技术, 2022, 44(6): 1805-1822.
[7]	翟光, 王妍欣, 孙一勇. 基于低轨星网的多目标协同跟踪滤波技术[J]. 系统工程与电子技术, 2022, 44(6): 1957-1967.
[8]	赵晓枫, 徐叶斌, 吴飞, 牛家辉, 蔡伟, 张志利. 基于全局感知机制的地面红外目标检测方法[J]. 系统工程与电子技术, 2022, 44(5): 1461-1467.
[9]	王帅, 向建军, 彭芳, 唐书娟. 基于新最速下降法的目标跟踪算法[J]. 系统工程与电子技术, 2022, 44(5): 1512-1519.
[10]	邹虹, 白陈阳, 何鹏, 崔亚平, 王汝言, 吴大鹏. 基于分布式深度学习的边缘服务放置策略[J]. 系统工程与电子技术, 2022, 44(5): 1728-1737.
[11]	辛怀声, 曹晨. 基于交互多模型的分组δ-广义标签多伯努利算法[J]. 系统工程与电子技术, 2022, 44(4): 1128-1138.
[12]	陈冬, 句彦伟. 基于语义分割实现的SAR图像舰船目标检测[J]. 系统工程与电子技术, 2022, 44(4): 1195-1201.
[13]	孙晶明, 虞盛康, 孙俊. 基于深度学习的HRRP识别姿态敏感性分析[J]. 系统工程与电子技术, 2022, 44(3): 802-807.
[14]	谢家豪, 黄树彩, 韦道知, 张曌宇, 王文豪. 基于P_EV准则的不确定混合多传感器联盟求解[J]. 系统工程与电子技术, 2022, 44(3): 819-826.
[15]	宋子壮, 杨嘉伟, 张东方, 王诗强, 张硕. 基于无锚框的红外多类别多目标实时跟踪网络[J]. 系统工程与电子技术, 2022, 44(2): 401-409.