Journal of Systems Engineering and Electronics ›› 2009, Vol. 31 ›› Issue (1): 195-199.

• 软件、算法与仿真 • 上一篇    下一篇

一种高效的连续属性离散化算法

赵静娴1,2, 倪春鹏1, 詹原瑞1, 杜子平2   

  1. 1. 天津大学管理学院, 天津, 300072;
    2. 天津科技大学经管学院, 天津, 300222
  • 收稿日期:2007-10-14 修回日期:2008-05-21 出版日期:2009-01-20 发布日期:2010-01-03
  • 作者简介:赵静娴(1981- ),女,博士研究生,主要研究方向为金融工程与数据挖掘.E-mail:nzjx2005@163.com
  • 基金资助:
    国家自然科学基金资助课题(70573076,70671074)

Efficient discretization algorithm for continuous attributes

ZHAO Jing-xian1,2, NI Chun-peng1, ZHAN Yuan-rui1, DU Zi-ping2   

  1. 1. School of Management, Tianjin Univ., Tianjin 300072, China;
    2. School of Economics and Management, Tianjin Univ. of Science & Technology, Tianjin 300222, China
  • Received:2007-10-14 Revised:2008-05-21 Online:2009-01-20 Published:2010-01-03

摘要: 分析了基于熵的离散化标准的切点特性,提出并证明了一种基于边界点属性值合并和不一致度检验的离散化算法。与传统离散化算法相比,此算法只对边界点属性值进行合并,切点个数无需设定,自动生成,且合并规则简单易行,大大减小了计算量,适用于处理大规模高维数据库的离散化。同时由于采用了不一致度对备选切点集合进行调整,使本算法具有全局性。试验表明,该算法有效提高了分类规则的简明性和预测精度。

Abstract: On analysis of the cut points characteristic of entropy-based discretization,an attribute discretization algorithm based on boundary points’ attribute values mergence and inconsistency check is presented.Compared with the traditional discretization algorithms,the proposed method only merges the boundary points’ attribute values,auto-generates cut points’ number without setting them in advance,applies simple rules to merge the intervals,and reduces the computational cost greatly.It is suitable for large scale and high dimension database discretization problems.By applying inconsistency to check the chosen cut points set,the algorithm possesses global property.Experiments show that the method can improve the simplicity and the prediction precision of classifying rules.

中图分类号: