Journal of Systems Engineering and Electronics ›› 2009, Vol. 31 ›› Issue (10): 2485-2488.

• 软件、算法与仿真 • 上一篇    下一篇

基于包含索引的频繁闭序列模式挖掘的新算法

李晋宏1,2, 杨炳儒1, 宋威2, 侯伟1   

  1. 1. 北京科技大学信息工程学院, 北京, 100083;
    2. 北方工业大学信息工程学院, 北京, 100144
  • 收稿日期:2008-05-26 修回日期:2008-10-20 出版日期:2009-10-20 发布日期:2010-01-03
  • 作者简介:李晋宏(1965- ),男,教授,博士,主要研究方向为数据挖掘,模糊专家系统.E-mail:ljh@ncut.edu.cn
  • 基金资助:
    国家自然科学基金(60675030);北京市属市管高等学校人才强教计划资助课题

New mining algorithm for frequent closed sequential pattern based on subsume index

LI Jin-hong1,2, YANG Bing-ru1, SONG Wei2, HOU Wei1   

  1. 1. School of Information Engineering, Univ. of Science and Technology Beijing, Beijing 100083, China;
    2. Coll. of Information Engineering, North China Univ. of Technology, Beijing 100144, China
  • Received:2008-05-26 Revised:2008-10-20 Online:2009-10-20 Published:2010-01-03

摘要: 频繁闭序列模式惟一确定全体频繁序列模式,且规模小得多.传统的闭序列模式挖掘算法对每个频繁项目都进行扩展,往往会产生大量的非闭合序列.为解决这一问题,提出了一种新的基于包含索引的频繁闭序列模式挖掘算法,其主要思想是只对闭项集进行扩展,大大减少了非闭合序列的产生.首先,论证了闭序列模式只能由闭项集组成;其次,说明了如何利用包含索引来快速发现闭项集;最后,给出了一种深度优先的挖掘频繁闭序列模式的新算法.实验结果表明,该算法具有较高的效率.

Abstract: The set of frequent closed sequential pattern determines exactly the complete set of all frequent sequential patterns and is usually much smaller than the latter.Traditional closed sequential pattern mining algorithms extend a frequent sequence with every frequent single item,which leads to the generation of a lot of non-closed sequence.To solve these problems,a new mining algorithm for frequent closed sequential pattern based on subsume index is proposed.The main idea of the proposed algorithm is to extend a frequent sequence with closed itemsets only.Thus,the generation of non-closed sequences is avoided greatly.Firstly,it is proved that a closed sequential pattern is only composed of closed itemsets.Then,it is explained that the closed itemsets can be discovered efficiently by using a subsume index.Finally,a depth-first algorithm for mining frequent closed sequential pattern is presented.The experimental results show that the proposed algorithm is efficient.

中图分类号: