Systems Engineering and Electronics ›› 2022, Vol. 44 ›› Issue (7): 2241-2250.doi: 10.12305/j.issn.1001-506X.2022.07.20

• Systems Engineering • Previous Articles     Next Articles

Research on structural knowledge extraction and organization for multi-modal governmental documents

Ruilin XU1,2, Boying GENG2,*, Shukan LIU2,3   

  1. 1. School of Electronic Engineering, Naval University of Engineering, Wuhan 430033, China
    2. Unit 91001 of the PLA, Beijing 100036, China
    3. School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
  • Received:2020-12-16 Online:2022-06-22 Published:2022-06-28
  • Contact: Boying GENG

Abstract:

For the fact that triplet-based knowledge in large-scale knowledge graphs lacks structural logic and is difficult to form a knowledge system, this paper presents a multi-modal governmental documents dataset called GovDoc-CN. A multi-modal knowledge structure elements extraction model is proposed and knowledge structure elements are extracted, including titles, abstracts, authors, time of completion, document number, and other knowledge structure elements in documents through both text modal and image modal. The document structure tree (DST) model is designed to organize the extracted document knowledge structure elements, and a structured graph network is constructed to realize organization and management. Experiments prove that the multi-modal knowledge structural elements extraction model has achieved a significant improvement compared with the single-modal extraction models. The DST model and the structured graph network based on the DST model can provide a new way for the organization and management of document knowledge and have significant application value.

Key words: multi-modal, information extraction, knowledge organization, document structuring, governmental documents automation

CLC Number: 

[an error occurred while processing this directive]