Systems Engineering and Electronics ›› 2023, Vol. 45 ›› Issue (12): 3915-3923.doi: 10.12305/j.issn.1001-506X.2023.12.21

• Systems Engineering • Previous Articles    

Fine grained cross-modal retrieval algorithm for IETM with attention mechanism fused

Yichen ZHAI, Jiaojiao GU, Fuqiang ZONG, Wenzhi JIANG   

  1. Coastal Defense College, Naval Aviation University, Yantai 264001, China
  • Received:2022-04-11 Online:2023-11-25 Published:2023-12-05
  • Contact: Jiaojiao GU

Abstract:

Interactive electronic manual is an important technology to improve the informatization and intelligence of various equipment support. Aiming at the problem of single retrieval modal, an improved fine grained cross-modal retrieval algorithm with attention mechanism fused is proposed, which takes the graphic descriptions of the data as the research object. In view of the characteristics of many image sketches and single color in the data, the feature extraction module uses the Vision Transformer model and Transformer encoder to obtain the global and local features of the picture and text, respectively. Moreover, the attention mechanism is applied to mine fine grained information between and within graphic and text modes, and text confrontation training is added to enhance the model's generalization ability. In addition, the cross-modal joint loss function is used to constrain the model. Verifying on the Pascal Sentence dataset and self-built dataset, the average accuracy of the proposed method reaches 0.964 and 0.959 respectively, which is 0.248 and 0.214 higher than the benchmark model deep supervised cross modal retrieval (DSCMR), respectively.

Key words: interactive electronic technical manual, image-text retrieval, cross-modal, attention mechanism

CLC Number: 

[an error occurred while processing this directive]