Journal of Systems Engineering and Electronics ›› 2009, Vol. 31 ›› Issue (10): 2521-2526.

• 软件、算法与仿真 • 上一篇    下一篇

隐私保护的快速聚类算法

薛安荣1, 姜冬洁1,2, 鞠时光1, 陈伟鹤1, 马汉达1   

  1. 1. 江苏大学计算机科学与通信工程学院, 江苏, 镇江, 212013;
    2. 炎黄职业技术学院, 江苏, 淮安, 223400
  • 收稿日期:2008-05-22 修回日期:2009-05-21 出版日期:2009-10-20 发布日期:2010-01-03
  • 作者简介:薛安荣(1964- ),男,副教授,博士,主要研究方向为数据挖掘、信息安全.E-mail:xuear@ujs.edu.cn
  • 基金资助:
    国家自然科学基金(60773049,60603041);江苏大学高级人才启动基金(09JDG041)资助课题

Fast privacy-preserving clustering algorithm

XUE An-rong1, JIANG Dong-jie1,2, JU Shi-guang1, CHEN Wei-he1, MA Han-da1   

  1. 1. School of Computer Science and Telecommunication Engineering, Jiangsu Univ., Zhenjiang 212013, China;
    2. Progenitor Vocational and Technical Coll., Huai'an 223400, China
  • Received:2008-05-22 Revised:2009-05-21 Online:2009-10-20 Published:2010-01-03

摘要: 针对基于安全多方计算聚类算法的低效问题,提出了基于聚类特征树结构的隐私保护的层次k-means聚类算法.算法基于半诚信模型,在第三方内存中保留对各记录的索引信息及聚类特征树的当前层信息,减少了I/O次数和通信量,克服了难以适应多数据方和因过于信赖第三方导致隐私泄漏等缺陷.算法通过基于安全多方计算的标准化协议、距离计算协议和聚类中心计算协议,实现了数据的有效保护,综合层次和k-means聚类算法的优点,提高了计算精度和算法的可伸缩性.理论证明了算法的安全性和高效性,实验结果表明所提算法优于同类算法.

Abstract: To improve the efficiency of a clustering algorithm based on secure multi-party computation,a privacy-preserving hierarchical k-means algorithm based on clustering feature tree and semi-honest model is proposed.The algorithm stores the index information of every record and current hierarchical information of the clustering feature tree in the third party's memory and reduces the I/O times and communication cost,it also overcomes the drawbacks of applying to multi-party difficultly and leaking privacy due to depending on the third party excessively.The algorithm introduces three secure protocols such as distance computation,clustering center computation and standardization to protect data privacy effectively and accurately,and improves the precision and flexibility by combining the merits of hierarchical and k-means clustering algorithms.Theoretic argument demonstrates that the algorithm is secure and completes with good efficiency.The experimental results show that the proposed algorithm outperforms the other existing algorithms in communication cost and computation cost.

中图分类号: