系统工程与电子技术 ›› 2026, Vol. 48 ›› Issue (5): 1481-1491.doi: 10.12305/j.issn.1001-506X.2026.05.04

• 电子技术 • 上一篇    下一篇

基于可见光与红外特征融合的轻量化目标检测方法

张杰1,*(), 常天庆2, 王晓卫1, 郝文龙1, 汤鑫1   

  1. 1. 陆军航空兵学院,北京 101123
    2. 陆军兵种大学,北京 100072
  • 收稿日期:2025-03-19 接受日期:2025-07-22 出版日期:2026-05-27 发布日期:2026-05-27
  • 通讯作者: 张杰 E-mail:zjwhy_8@163.com
  • 作者简介:常天庆(1963—),男,教授,博士,主要研究方向为模式识别、军事智能化
    王晓卫(1976—),女,教授,博士,主要研究方向为模式识别
    郝文龙(1987—),男,副研究员,硕士,主要研究方向模式识别
    汤 鑫(1988—),男,工程师,硕士,主要研究方向为计算机视觉

Lightweight object detection method based on visible and infrared feature fusion

Jie ZHANG1,*(), Tianqing CHANG2, Xiaowei WANG1, Wenlong HAO1, Xin TANG1   

  1. 1. Army Aviation Institute,Beijing 101123,China
    2. Army Arms University of PLA,Beijing 100072,China
  • Received:2025-03-19 Accepted:2025-07-22 Online:2026-05-27 Published:2026-05-27
  • Contact: Jie ZHANG E-mail:zjwhy_8@163.com

摘要:

针对双流目标检测模型运行效率低和计算复杂度高的问题,提出一种基于可见光与红外特征融合的轻量化目标检测方法。首先,将YOLO(you only look once)v8拓展为双流目标检测模型,使用组卷积对双流骨干网络进行优化,将两路独立的骨干网络合并成一路骨干网络,实现两种模态特征的同步提取,大幅度提升了模型运行效率。其次,设计联合跨模态特征交互的跨阶段快速特征融合(faster cross-stage partial bottleneck with two convolution with cross-modal feature interaction,C2f-CMFI)模块和联合跨模态特征融合的快速空间金字塔池化(spatial pyramid pooling fast with cross-modal feature fusion,SPPF-CMFF)模块,在减少模型复杂度的同时,实现了特征提取过程中两种模态特征的融合和交互。最后,在公开的可见光-红外图像数据集上的实验结果表明,与传统的双流目标检测模型相比,所提方法的参数量与计算复杂度分别减少了19.5%和17.7%,平均精度均值50:95提高了1.9%,在型号为NVIDIA RTX 2080Ti的图形处理单元上,推理速度为140帧/秒,证明了所提方法的有效性。

关键词: 可见光-红外图像, YOLO, 轻量化, 目标检测, 双流结构

Abstract:

To address the problems of low efficiency and high computational complexity of dual-stream object detection models, a lightweight object detection method based on visible and infrared feature fusion is proposed. Firstly, you only look once (YOLO) v8 is expanded into a dual-stream object detection model, the dual-stream backbone network is optimized using group convolution, and the two independent backbone networks are merged into one backbone network, which realizes the synchronous extraction of two modal features, greatly improving the model operation efficiency. Secondly, the faster cross-stage partial bottleneck with two convolution with cross-modal feature interaction (C2f-CMFI) module and the spatial pyramid pooling fast with cross-modal feature fusion (SPPF-CMFF) module are designed, while reducing the complexity of the model, fusion and interaction of the two modal features during the feature extraction process are realized. Finally, the experimental results on the public visible-infrared dataset show that compared with the traditional dual-stream object detection models, the parameter amount and computational complexity of the proposed method are reduced by 19.5% and 17.7% respectively, and the mean average precision 50:95 is improved by 1.9%. On a NVIDIA RTX 2080Ti graphics processing unit, the inference speed is 140 frames per second, which proved the effectiveness of the proposed method.

Key words: visible-infrared image, you only look once (YOLO), lightweighting, object detection, dual-stream structure

中图分类号: