系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (9): 2526-2534.doi: 10.12305/j.issn.1001-506X.2021.09.20

• 系统工程 • 上一篇    下一篇

基于共现分析的分类器链标签序列优化方法

赖德迪, 罗智徽, 马应龙*   

  1. 华北电力大学控制与计算机工程学院, 北京 102206
  • 收稿日期:2020-07-29 出版日期:2021-08-20 发布日期:2021-08-26
  • 通讯作者: 马应龙
  • 作者简介:赖德迪(1998—), 男, 硕士研究生, 主要研究方向为多标签分类、自然语言处理|罗智徽(1999—), 男, 硕士研究生, 主要研究方向为多标签分类、自然语言处理|马应龙(1976—), 男, 教授, 博士, 主要研究方向为人工智能与知识工程、大数据分析与处理技术、软件工程等
  • 基金资助:
    国家重点研发计划(2018YFC0831404);国家重点研发计划(2018YFC0830605)

Label order optimization method of classifier chains based on co-occurrence analysis

Dedi LAI, Zhihui LUO, Yinglong MA*   

  1. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
  • Received:2020-07-29 Online:2021-08-20 Published:2021-08-26
  • Contact: Yinglong MA

摘要:

针对分类器链模型采用随机生成方式确定标签序列会极大影响分类器链性能的问题。通过共现分析技术深入挖掘标签间的潜在关系, 提出一种基于贪心算法和n-gram模型的两种标签序列优化策略以提升分类器链模型性能。基于贪心算法的策略通过计算和排序标签之间共现率来生成优化的分类器链标签序列, 而基于n-gram模型的策略则通过最大化标签之间条件概率来生成优化的分类器链标签序列。最后通过多个多标签基准数据集进行实验验证, 实验结果表明, 与当前流行的各种分类器链模型相比, 所提的两种策略很有竞争力, 可以明显提升多标签分类效果。

关键词: 多标签分类, 分类器链, 共现分析, n元文法, 二元相关性

Abstract:

Aiming at the problem that the performance of classification chain model will be greatly affected by randomly generated label sequence, a two label sequence optimization strategies based on greedy algorithm and n-gram model is proposed to improve the performance of classification chain model through co-occurrence analysis technology. The strategy based on greedy algorithm generates the optimized classification chain labels sequence by calculating and sorting the co-occurrence rate between labels, while the strategy based on n-gram model generates the optimized classification chain labels sequence by maximizing the conditional probability between labels. Finally, experiments are carried out on multiple multi label benchmark datasets. The experimental results show that compared with the current popular classification chain models, the proposed two strategies are very competitive and can significantly improve the multi label classification effect.

Key words: multi-label classification, classification chain, co-occurrence analysis, n-gram, binary relevance

中图分类号: