基于可见光与红外特征融合的轻量化目标检测方法

doi:10.12305/j.issn.1001-506X.2026.05.04

摘要/Abstract

摘要：

针对双流目标检测模型运行效率低和计算复杂度高的问题，提出一种基于可见光与红外特征融合的轻量化目标检测方法。首先，将YOLO（you only look once）v8拓展为双流目标检测模型，使用组卷积对双流骨干网络进行优化，将两路独立的骨干网络合并成一路骨干网络，实现两种模态特征的同步提取，大幅度提升了模型运行效率。其次，设计联合跨模态特征交互的跨阶段快速特征融合（faster cross-stage partial bottleneck with two convolution with cross-modal feature interaction，C2f-CMFI）模块和联合跨模态特征融合的快速空间金字塔池化(spatial pyramid pooling fast with cross-modal feature fusion，SPPF-CMFF)模块，在减少模型复杂度的同时，实现了特征提取过程中两种模态特征的融合和交互。最后，在公开的可见光-红外图像数据集上的实验结果表明，与传统的双流目标检测模型相比，所提方法的参数量与计算复杂度分别减少了19.5%和17.7%，平均精度均值50:95提高了1.9%，在型号为NVIDIA RTX 2080Ti的图形处理单元上，推理速度为140帧/秒，证明了所提方法的有效性。

关键词: 可见光-红外图像, YOLO, 轻量化, 目标检测, 双流结构

Abstract:

To address the problems of low efficiency and high computational complexity of dual-stream object detection models, a lightweight object detection method based on visible and infrared feature fusion is proposed. Firstly, you only look once （YOLO） v8 is expanded into a dual-stream object detection model, the dual-stream backbone network is optimized using group convolution, and the two independent backbone networks are merged into one backbone network, which realizes the synchronous extraction of two modal features, greatly improving the model operation efficiency. Secondly, the faster cross-stage partial bottleneck with two convolution with cross-modal feature interaction （C2f-CMFI） module and the spatial pyramid pooling fast with cross-modal feature fusion （SPPF-CMFF） module are designed, while reducing the complexity of the model, fusion and interaction of the two modal features during the feature extraction process are realized. Finally, the experimental results on the public visible-infrared dataset show that compared with the traditional dual-stream object detection models, the parameter amount and computational complexity of the proposed method are reduced by 19.5% and 17.7% respectively, and the mean average precision 50:95 is improved by 1.9%. On a NVIDIA RTX 2080Ti graphics processing unit, the inference speed is 140 frames per second, which proved the effectiveness of the proposed method.

Key words: visible-infrared image, you only look once （YOLO）, lightweighting, object detection, dual-stream structure

中图分类号:

TP 391.4

张杰, 常天庆, 王晓卫, 郝文龙, 汤鑫. 基于可见光与红外特征融合的轻量化目标检测方法[J]. 系统工程与电子技术, 2026, 48(5): 1481-1491.

Jie ZHANG, Tianqing CHANG, Xiaowei WANG, Wenlong HAO, Xin TANG. Lightweight object detection method based on visible and infrared feature fusion[J]. Systems Engineering and Electronics, 2026, 48(5): 1481-1491.

图/表 18

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

图12

图13

表1

表2

表3

表4

图14

参考文献 35

1	CHENG G, YUAN X, YAO X W, et al. Towards large-scale small object detection: survey and benchmark[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2023, 45 (11): 13467- 13488.
2	GHAHREMANNEZHAD H, SHI H, LIU C J. Object detection in traffic videos: a survey[J]. IEEE Trans. on Intelligent Transportation Systems, 2023, 24 (7): 6780- 6799. doi: 10.1109/TITS.2023.3258683
3	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464−7475.
4	LI R M, XIANG J J, SUN F X, et al. Multiscale cross-modal homogeneity enhancement and confidence-aware fusion for multispectral pedestrian detection[J]. IEEE Trans. on Multimedia, 2024, 26, 852- 863. doi: 10.1109/TMM.2023.3272471
5	SHAO Y H, HUANG Q M, MEI Y Y, et al. MOD-YOLO: multispectral object detection based on transformer dual-stream YOLO[J]. Pattern Recognition Letters, 2024, 183, 26- 34. doi: 10.1016/j.patrec.2024.05.001
6	CHEN Y Y, JHONG S Y, LO Y J. Reinforcement-and-alignment multispectral object detection using visible-thermal vision sensors in intelligent vehicles[J]. IEEE Sensors Journal, 2023, 23 (21): 26873- 26886. doi: 10.1109/JSEN.2023.3319230
7	张睿, 李允臣, 王家宝, 等. 多尺度特征融合的双模态目标检测方法[J]. 计算机工程与应用, 2024, 60 (17): 233- 242.
	ZHANG R, LI Y C, WANG J B, et al. Multiscale feature fusion approach for dual-modal object detection[J]. Computer Engineering and Applications, 2024, 60 (17): 233- 242.
8	LI Q, ZHANG C Q, HU Q H, et al. Stabilizing multispectral pedestrian detection with evidential hybrid fusion[J]. IEEE Trans. on Circuits and Systems for Video Technology, 2024, 34 (4): 3017- 3029. doi: 10.1109/TCSVT.2023.3306870
9	YOU S A, XIE X D, FENG Y J, et al. Multi-scale aggregation transformers for multispectral object detection[J]. IEEE Signal Processing Letters, 2023, 30, 1172- 1176. doi: 10.1109/LSP.2023.3309578
10	LI Q, ZHANG C Q, HU Q H, et al. Confidence-aware fusion using Dempster-Shafer theory for multispectral pedestrian detection[J]. IEEE Trans. on Multimedia, 2023, 25, 3420- 3431. doi: 10.1109/TMM.2022.3160589
11	ZHU J H, CHEN X, ZHANG H, et al. Transformer based remote sensing object detection with enhanced multispectral feature extraction[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20, 5001405.
12	SONG K H, ZHAO Y, HUANG L M, et al. RGB-T image analysis technology and application: a survey[J]. Engineering Applications of Artificial Intelligence, 2023, 120, 105919. doi: 10.1016/j.engappai.2023.105919
13	LIU J Y, FAN X, HUANG Z B, et al. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5802−5811.
14	HUANG L, PENG Z J, CHEN F, et al. Cross-modality interaction for few-shot multispectral object detection with semantic knowledge[J]. Neural Networks, 2024, 173, 106156. doi: 10.1016/j.neunet.2024.106156
15	CHEN K Y, LIU J Q, ZHANG H. IGT: Illumination-guided RGB-T object detection with transformers[J]. Knowledge-Based Systems, 2023, 268, 110423. doi: 10.1016/j.knosys.2023.110423
16	韩子硕, 范喜全, 付强, 等. 面向无人机视角的多源信息融合目标检测[J]. 系统工程与电子技术, 2025, 47 (1): 52- 61.
	HAN Z S, FAN X Q, FU Q. Target detection based on multi-source information fusion from the perspective of drones[J]. Systems Engineering and Electronics, 2025, 47 (1): 52- 61.
17	HU W J, FU C L, CAO R L, et al. Joint dual-stream interaction and multi-scale feature extraction network for multi-spectral pedestrian detection[J]. Applied Soft Computing, 2023, 147, 110768. doi: 10.1016/j.asoc.2023.110768
18	AN Z J, LIU C L, HAN Y Q. Effectiveness guided cross-modal information sharing for aligned RGB-T object detection[J]. IEEE Signal Processing Letters, 2022, 29, 2562- 2566. doi: 10.1109/LSP.2022.3229571
19	FU H L, WANG S X, DUAN P H, et al. LRAF-Net: long-range attention fusion network for visible-infrared object detection[J]. IEEE Trans. on Neural Networks and Learning Systems, 2023, 35 (10): 13232- 13245.
20	LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10012−10022.
21	GONG Y, WANG L, XU L S. A feature aggregation network for multispectral pedestrian detection[J]. Applied Intelligence, 2023, 53 (19): 22117- 22131. doi: 10.1007/s10489-023-04628-y
22	XIE Y M, ZHANG L W, YU X Y, et al. YOLO-MS: multispectral object detection via feature interaction and self-attention guided fusion[J]. IEEE Trans. on Cognitive and Developmental Systems, 2023, 15 (4): 2132- 2143. doi: 10.1109/TCDS.2023.3238181
23	SHEN J F, CHEN Y F, LIU Y, et al. ICAFusion: iterative cross-attention guided feature fusion for multispectral object detection[J]. Pattern Recognition, 2024, 145, 109913. doi: 10.1016/j.patcog.2023.109913
24	ZHOU K L, CHEN L S, CAO X. Improving multispectral pedestrian detection by addressing modality imbalance problems[C]//Proc. of the 16th European Conference on Computer Vision, 2020: 787−803.
25	SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Trans. on Circuits and Systems for Video Technology, 2022, 32 (10): 6700- 6713. doi: 10.1109/TCSVT.2022.3168279
26	ZHAO T Y, YUAN M X, JIANG F, et al. Removal and selection: improving RGB-infrared object detection via coarse-to-fine fusion [EB/OL]. [2025-02-12]. https: //arxiv.org/abs/2401.10731.
27	WANG H Y, WANG C P, FU Q, et al. Cross-modal oriented object detection of UAV aerial images based on image feature[J]. IEEE Trans. on Geoscience and Remote Sensing, 2024, 62, 5403021. doi: 10.1109/tgrs.2024.3367934
28	WANG H Y, WANG C P, FU Q, et al. YOLOFIV: object detection algorithm for around-the-clock aerial remote sensing images by fusing infrared and visible features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17, 15269- 15287. doi: 10.1109/JSTARS.2024.3447649
29	王晓军, 陈高宇, 李晓航. 应用动态激活函数的轻量化YOLOv8行人检测算法[J]. 计算机工程与应用, 2024, 60 (15): 221- 233.
	WANG X J, CHEN G Y, LI X H. Lightweight YOLOv8 pedestrian detection algorithm using dynamic activation function[J]. Computer Engineering and Applications, 2024, 60 (15): 221- 233.
30	吴磊, 储钰昆, 杨洪刚, 等. 面向铝合金焊缝DR图像缺陷的Sim-YOLOv8目标检测模型[J]. 中国激光, 2024, 51 (16): 29- 38. doi: 10.3788/CJL231485
	WU L, CHU Y K, YANG H G, et al. Sim-YOLOv8 object detection model for DR image defects in aluminum alloy welds[J]. Chinese Journal of Lasers, 2024, 51 (16): 29- 38. doi: 10.3788/CJL231485
31	QU J L, LI Q, PAN J, et al. SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery[J]. Multimedia Systems, 2025, 31 (1): 42. doi: 10.1007/s00530-024-01622-3
32	HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580−1589.
33	ZHANG H, FROMONT E, LEFEVRE S, et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks[C]///Proc. of the IEEE International Conference on Image Processing, 2020: 276−280.
34	FANG Q Y, HANG D P, WANG Z K. Cross-modality fusion transformer for multispectral object detection[EB/OL]. [2025-02-04]. https://arxiv.org/abs/2111.00273.
35	解宇敏, 张浪文, 余孝源, 等. 可见光–红外特征交互与融合的YOLOv5目标检测算法[J]. 控制理论与应用, 2024, 41 (5): 914- 922.
	XIE Y M, ZHANG L W, YU X Y, et al. YOLOv5 object detection algorithm with visible-infrared feature interaction and fusion[J]. Control Theory and Applications, 2024, 41 (5): 914- 922.

检测模型	骨干网络		$w$	Params（×10⁶）	FLOPs（×10⁹）	FPS/（帧/秒）	mAP50:95/%
检测模型	DSB	DSBBGCO	$w$	Params（×10⁶）	FLOPs（×10⁹）	FPS/（帧/秒）	mAP50:95/%
DSODM-n	√	－	0.25	4.4	9.2	145	40.0
DSODM-n	－	√	0.25	4.4	9.2	166（+21）	40.0（+0.0）
DSODM-s	√	－	0.5	16.9	33.8	124	41.1
DSODM-s	－	√	0.5	16.9	33.8	135（+11）	41.2（+0.1）

检测模型	$ w $	骨干网络构成		Params（×10⁶）	FLOPs（×10⁹）	FPS/（帧/秒）	mAP50:95/%
检测模型	$ w $	GC+DSSPPF +DSC2f	GC+DSSPPF +C2f-CMFI	Params（×10⁶）	FLOPs（×10⁹）	FPS/（帧/秒）	mAP50:95/%
DSODM-n	0.25	√	－	4.4	9.2	166（+0）	40.0
DSODM-n	0.25	－	√	3.9	8.0	168（+2）	40.8（+0.8）
DSODM-s	0.5	√	－	16.9	33.8	135（+0）	41.2
DSODM-s	0.5	－	√	14.6	28.5	138（+3）	42.8（+1.6）

检测算法	$w$	骨干网络构成		Params（×10⁶）	FLOPs（×10⁹）	FPS/（帧/秒）	mAP50:95/%
检测算法	$w$	GC+C2f-CMFI +DSSPPF	GC+C2f-CMFI +SPPF-CMFF	Params（×10⁶）	FLOPs（×10⁹）	FPS/（帧/秒）	mAP50:95/%
DSODM-n	0.25	√	－	3.9	8.0	167	40.8
DSODM-n	0.25	－	√	3.6（-0.3）	7.8（-0.2）	170（+3）	40.9（+0.1）
DSODM-s	0.5	√	－	14.6	28.5	138	42.8
DSODM-s	0.5	－	√	13.6（-1.0）	27.8（-0.7）	140（+2）	43.0（+0.2）

检测算法	模态	参数（×10⁶）	FLOPs（×10⁹）	mAP50:95/%
YOLOv8-n	可见光	3.0	6.5	28.7
YOLOv8-n	红外	3.0	6.5	39.3
YOLOv8-s	可见光	11.1	22.8	30.3
YOLOv8-s	红外	11.1	22.8	40.9
CFT[34]	可见光+红外	206.0	224.4	40.2
YOLO-MS[22]	可见光+红外	15.3	36.9	38.3
ICAFusion[23]	可见光+红外	120.21	−	41.4
文献[35]	可见光+红外	13.01	33	37.9
LRAF-Net[19]	可见光+红外	18.8	40.5	42.8
DSODM-n	可见光+红外	4.4	9.2	40.0
本文-n	可见光+红外	3.6	7.8	40.9
DSODM-s	可见光+红外	16.9	33.8	41.1
本文-s	可见光+红外	13.6	27.8	43.0

[1]	赵巍山, 尤思洋, 黄丽佳, 周光尧. 基于散射拓扑增强的双流SAR飞机检测网络[J]. 系统工程与电子技术, 2026, 48(6): 1848-1858.
[2]	王新, 周生华, 张新勋, 南静怡, 孙世坤. 全极化MIMO雷达主瓣压制干扰抑制方法[J]. 系统工程与电子技术, 2026, 48(5): 1539-1550.
[3]	刘硕, 周宇, 张哲昊, 尚嵩, 程松. 时域多脉冲相关与聚类分析的抗同频干扰技术[J]. 系统工程与电子技术, 2026, 48(4): 1186-1194.
[4]	鲁明雨, 孟飞, 叶春茂, 李璋峰, 赵庆媛. 一种先验知识辅助的扩展目标检测方法[J]. 系统工程与电子技术, 2026, 48(4): 1209-1217.
[5]	程鲲, 雷洪涛, 吕志轩. 改进YOLOv8的轻量级无人机跟踪方法[J]. 系统工程与电子技术, 2026, 48(3): 737-750.
[6]	王威, 杨勇, 韩静雯. 基于相关性的雷达海面弱目标检测技术[J]. 系统工程与电子技术, 2026, 48(3): 787-794.
[7]	田弘宇, 葛松虎, 郭宇, 崔中普, 梁潇. 低干噪比下多特征融合的通信干扰识别方法[J]. 系统工程与电子技术, 2026, 48(3): 1061-1071.
[8]	缪燕子, 赵志非, 吴巍. 基于代价敏感学习DBN-XGBoost的海面小目标检测方法研究[J]. 系统工程与电子技术, 2026, 48(2): 402-409.
[9]	陈凯, 赵永波, 刘仍莉, 邓海涛, 孙龙. 机载长合成孔径时间海面运动舰船高分辨SAR成像算法[J]. 系统工程与电子技术, 2026, 48(2): 456-465.
[10]	丁昊, 韦继丰, 董云龙, 曹政, 于恒力. 高海况下应用先验信息的海上小目标检测方法[J]. 系统工程与电子技术, 2026, 48(1): 44-55.
[11]	蓝舒尧, 李宇, 张春华, 迟骋, 陈春辉. 垂直阵分集束控影区探测信号发射策略[J]. 系统工程与电子技术, 2025, 47(8): 2454-2462.
[12]	于营, 王春平, 徐金辉, 吕述杭, 付强, 陈明. PE-Net：一种优化剪枝的实时山体滑坡检测网络[J]. 系统工程与电子技术, 2025, 47(8): 2475-2485.
[13]	罗颖聪, 张磊, 魏少鹏, 孟智超. 联合失配滤波器的近区低旁瓣混沌波形设计算法[J]. 系统工程与电子技术, 2025, 47(8): 2511-2518.
[14]	付卫红, 彭文洪, 刘乃安. 混合注意力优化的SAR图像小目标检测方法[J]. 系统工程与电子技术, 2025, 47(8): 2519-2526.
[15]	倪康, 贾文杰, 邹旻瑞, 郑志忠. 基于动态聚合网络的SAR目标检测[J]. 系统工程与电子技术, 2025, 47(8): 2527-2539.