Systems Engineering and Electronics ›› 2024, Vol. 46 ›› Issue (4): 1174-1184.doi: 10.12305/j.issn.1001-506X.2024.04.05

• Electronic Technology • Previous Articles     Next Articles

Multi-teacher joint knowledge distillation based on CenterNet

Shaohua LIU, Kang DU, Chundong SHE, Ao YANG   

  1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100080, China
  • Received:2022-12-05 Online:2024-03-25 Published:2024-03-25
  • Contact: Chundong SHE

Abstract:

This paper introduces a multi-teacher joint knowledge distillation scheme based on lightweight CenterNet. The proposed scheme can effectively solve the problem of performance deterioration caused by lightweight model, and can significantly narrow the performance gap between teacher model and student model. The large-scale complex model is used as the teacher model to guide the training of the lightweight student model. Compared with the traditional training scheme of the model, the proposed knowledge distillation training scheme can achieve better detection performance of the lightweight model after the same number of training epochs. The main contribution of this paper is to propose the multi-teacher joint knowledge distillation which is a new knowledge distillation training scheme for CenterNet object detection network. In the follow-up experiment, the distillation attention mechanism is further introduced to optimize the training effect of multi-teacher joint knowledge distillation. On Visual Object Classes 2007 Dataset (VOC2007), taking MobileNetV2 lightweight network as backbone network as an example, compared with traditional CenterNet (backbone network is ResNet50), the parameter number index is compressed by 74.7%, the inference speed is increased by 70.5%, and the mean Average Precision (mAP) is only reduced by 1.99. A better "performance-speed" balance is then achieved. In addition, the experiment result proves that after the same 100 epochs of training, the mAP of the lightweight model using the multi-teacher joint knowledge distillation training scheme is improved by 11.30 compared with the ordinary training scheme.

Key words: lightweight, knowledge distillation, attention mechanism, joint training

CLC Number: 

[an error occurred while processing this directive]