系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (2): 740-750.doi: 10.12305/j.issn.1001-506X.2024.02.38

• 通信与网络 • 上一篇    

基于BERT与生成对抗的民航陆空通话意图挖掘

马兰1,*, 孟诗君2, 吴志军3   

  1. 1. 中国民航大学空中交通管理学院, 天津 300300
    2. 中国民航大学电子信息与自动化学院, 天津 300300
    3. 中国民航大学安全科学与工程学院, 天津 300300
  • 收稿日期:2022-12-05 出版日期:2024-01-25 发布日期:2024-02-06
  • 通讯作者: 马兰
  • 作者简介:马兰(1966—), 女, 教授, 博士, 主要研究方向为空中交通管理信息处理、管制与决策
    孟诗君(1998—), 女, 硕士研究生, 主要研究方向为自然语言处理、陆空通话信息挖掘
    吴志军(1965—), 男, 教授, 博士, 主要研究方向为航空电信网及信息安全
  • 基金资助:
    国家自然科学基金(62172418)

Intention mining for civil aviation radiotelephony communication based on BERT and generative adversarial

Lan MA1,*, Shijun MENG2, Zhijun WU3   

  1. 1. School of Air Traffic Management, Civil Aviation University of China, Tianjin 300300, China
    2. School of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China
    3. School of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Received:2022-12-05 Online:2024-01-25 Published:2024-02-06
  • Contact: Lan MA

摘要:

针对民航陆空通话领域语料难以获取、实体分布不均, 以及意图信息提取中实体规范不足且准确率有待提升等问题, 为了更好地提取陆空通话意图信息, 提出一种融合本体的基于双向转换编码器(bidirectional encoder representations from transformers, BERT)与生成对抗网络(generative adversarial network, GAN)的陆空通话意图信息挖掘方法, 并引入航班池信息对提取的部分信息进行校验修正, 形成空中交通管制(air traffic control, ATC)系统可理解的结构化信息。首先, 使用改进的GAN模型进行陆空通话智能文本生成, 可有效进行数据增强, 平衡各类实体信息分布并扩充数据集; 然后, 根据欧洲单一天空空中交通管理项目定义的本体规则进行意图的分类与标注; 之后, 通过BERT预训练模型生成字向量并解决一词多义问题, 利用双向长短时记忆(bidirectional long short-term memory, BiLSTM)网络双向编码提取上下句语义特征, 同时将该语义特征送入条件随机场(conditional random field, CRF)模型进行推理预测, 学习标签的依赖关系并加以约束, 以获取全局最优结果; 最后, 根据编辑距离(edit distance, ED)算法进行意图信息合理性校验与修正。对比实验结果表明, 所提方法的宏平均F1值达到了98.75%, 在民航陆空通话数据集上的意图挖掘性能优于其他主流模型, 为其加入数字化进程奠定了基础。

关键词: 民航陆空通话, 信息提取, 生成对抗网络, 本体, 双向转换编码器

Abstract:

In the field of civil aviation radiotelephony communication, there are problems such as difficult access to the corpus, uneven distribution of entities, and insufficient entity specification and accuracy in intention information extraction. In order to better extract the intent information of radiotelephony communication, this paper proposes a ontology fused bidirectional encoder representations from transformers (BERT) based and generative adversarial network (GAN) approach to mining intention information of radiotelephony communication. The extracted information is then partially checked and corrected by introducing the flight pool information to form structured information that can be understood by the air traffic control (ATC) system. Firstly, the improved GAN model for intelligent text generation of radiotelephony communication is used, which can effectively perform data augmentation, balance the information distribution of various entities and expand the dataset. Then, the classification and annotation of intentions are performed according to the ontology rules defined by the European Single Sky Air Traffic Management project. After that, word vectors are generated by the BERT pre-training model and solve the problem of multiple meanings of words. Simutaneously, the bidirectional long short-term memory (BiLSTM) network is used to extract contextual semantic features by bidirectional encoding. Those features are also fed into the conditional random field (CRF) model for inference prediction, learning the dependencies of the labels and constraining them to obtain the global optimal results. Finally, the intention information is verified and checked according to the edit distance (ED) algorithm. The comparative experimental results show that the proposed method achieves a Macro-F1 value of 98.75% and outperforms other mainstream models in intention mining on civil aviation radiotelephony communication datasets, laying the foundation for its inclusion in the digitization process.

Key words: civil aviation radiotelephony communication, information extraction, generative adversarial network (GAN), ontology, bidirectional encoder representations from transformers (BERT)

中图分类号: