系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (8): 2197-2208.doi: 10.12305/j.issn.1001-506X.2021.08.22

• 系统工程 • 上一篇    下一篇

基于矩阵分解和自适应图的无监督特征选择

曹浪财1,2,*, 林晓昌1,2, 苏思行1,2   

  1. 1. 厦门大学航空航天学院, 福建 厦门 361005
    2. 厦门大数据智能分析与决策重点实验室, 福建 厦门 361005
  • 收稿日期:2020-10-26 出版日期:2021-07-23 发布日期:2021-08-05
  • 通讯作者: 曹浪财
  • 作者简介:曹浪财(1970-),男,副教授,博士,主要研究方向为企业信息化、流程优化设计、推荐系统|林晓昌(1995—), 男, 硕士研究生, 主要研究方向为机器学习|苏思行(1999—), 男, 硕士研究生, 主要研究方向为社团检测
  • 基金资助:
    国家自然科学基金(61772442)

Unsupervised feature selection based on matrix factorization and adaptive graph

Langcai CAO1,2,*, Xiaochang LIN1,2, Sixing SU1,2   

  1. 1. School of Aerospace Engineering, Xiamen University, Xiamen 361005, China
    2. Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen 361005, China
  • Received:2020-10-26 Online:2021-07-23 Published:2021-08-05
  • Contact: Langcai CAO

摘要:

在高维数据分析中,一个不可避免且棘手的问题是维度诅咒,因而如何将高维数据通过特征选择降维为低维数据显得尤为重要。对此, 提出了基于鲁棒矩阵分解和自适应图的无监督特征选择模型(unsupervised feature selection model based on robust matrix factorization and adaptive graph, MFAGFS), 实现在一个统一的学习框架下执行鲁棒矩阵分解、特征选择以及局部结构学习。模型首先通过鲁棒矩阵分解可获得聚类标签, 将聚类标签和局部结构信息用来引导特征选择过程, 再从特征选择的结果中自适应地学习数据局部结构。通过局部结构学习和特征选择这两个基本任务的相互作用, MFAGFS可以精确捕获数据的结构信息以及选择出具有判别性的特征。然后,详细阐述了算法优化求解方法, 并证明了算法的收敛性。最后,在6个公开数据集上进行试验对比分析, 参数敏感性分析, 验证了所提模型的有效性。实验结果表明, 所提的方法与其他方法相比, 性能均有不同程度的提高。

关键词: 特征选择, 图嵌入, 自适应, 矩阵分解

Abstract:

Due to the so-called curse of dimensionality, which is inevitable and tricky in high-dimensional data analytics, it is of great importance to perform dimensionality reduction via feature selection methods. Therefore, an unsupervised feature selection model based on robust matrix factorization and adaptive graph (MFAGFS) is proposed, which can perform robust matrix factorization, feature selection and local structure learning under a unified learning framework. The model first obtains cluster tags by robust matrix decomposition, cluster tags and local structure information are used to guide the feature selection process. Then, learning the local structure of the data adaptively from the result of feature selection. MFAGFS can accurately capture the structure information of the data and select discriminative features through the interaction between the two basic tasks of the local structure learning and feature selection. Then, the optimization method of the algorithm is described in detail, and the convergence of the algorithm is proved. Finally, experimental comparative analysis and parameter sensitivity analysis are carried out on six public data sets to verify the effectiveness of the proposed model. The experimental result shows that the performance of the proposed methods presented is improved in different degrees compared with other methods.

Key words: feature selection, graph embedding, adaptation, matrix factorization

中图分类号: