系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (5): 1703-1711.doi: 10.12305/j.issn.1001-506X.2024.05.23

• 系统工程 • 上一篇    

基于全流程并行遗传算法的贝叶斯网络结构学习

蔡一鸣, 马力, 陆恒杨, 方伟   

  1. 江南大学人工智能与计算机学院, 江苏 无锡 214122
  • 收稿日期:2023-03-21 出版日期:2024-04-30 发布日期:2024-04-30
  • 通讯作者: 方伟
  • 作者简介:蔡一鸣(1998—), 男, 硕士研究生, 主要研究方向为并行计算、贝叶斯网络结构学习
    马力(2002—), 男, 主要研究方向为贝叶斯网络结构学习
    陆恒杨(1991—), 男, 副教授, 博士, 主要研究方向为机器学习
    方伟(1980—), 男, 教授, 博士, 主要研究方向为智能优化算法
  • 基金资助:
    国家自然科学基金(62073155);国家自然科学基金(62002137);国家自然科学基金(62106088);国家自然科学基金(62206113)

Full process parallel genetic algorithm for Bayesian network structure learning

Yiming CAI, Li MA, Hengyang LU, Wie FANG   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
  • Received:2023-03-21 Online:2024-04-30 Published:2024-04-30
  • Contact: Wie FANG

摘要:

为解决海量数据情况下学习贝叶斯网络(Bayesian network, BN)结构的算法性能急剧降低问题, 基于Spark框架设计了一种全流程并行遗传算法用于BN结构学习(简称为SparkGA-BN)。SparkGA-BN包含互信息计算并行化、遗传算子并行化和适应度评分并行化3个部分。互信息并行计算可以高效减少搜索空间; 在演化前增加对种群信息与选择信息的广播来对全种群执行选择操作。选择与交叉算子共用选择信息以并行执行, 从而高效演化并减少数据落盘时间。对约束和评分两阶段产生的中间数据作记忆化存储, 提升数据复用率和全局执行效率。实验结果表明, 所提算法在执行效率和学习准确率方面均优于对比算法。

关键词: 贝叶斯网络, 结构学习, 遗传算法, 并行结构学习, Spark

Abstract:

To solve the problem of algorithm performance degradation in Bayesian network (BN) structure learning in case of massive data, a full process parallel genetic algorithm (GA) for BN structure learning is proposed based on the Spark framework (SparkGA-BN). SparkGA-BN includes three parts: parallel calculation of mutual information, parallelization of genetic operators, and parallelization of fitness evaluation. Parallel computation of mutual information is employed to reduce the search space. Broadcasting is used to perform selection operation on the entire population by propagating population information and selection information before evolution. Selection and crossover operators share selection information to evolve efficiently and reduce disk write time. Intermediate data generated during the constraint and scoring stages are stored in memory to improve data reuse and overall execution efficiency. Experimental results show that the proposed algorithm outperforms the comparison algorithms in terms of execution efficiency and learning accuracy.

Key words: Bayesian network (BN), structure learning, genetic algorithm (GA), parallel structure learning, Spark

中图分类号: