Journal of Systems Engineering and Electronics ›› 2010, Vol. 32 ›› Issue (11): 2489-2492.doi: 10.3969/j.issn.1001-506X.2010.11.49

• 软件、算法与仿真 • 上一篇    

基于位运算的不完整记录分类检测方法

曹建军1,刁兴春1,吴建明2,袁震1,彭琮1   

    1. 南京电讯技术研究所, 江苏 南京 210007;
    2. 军械工程学院装备指挥与管理工程系, 河北 石家庄 050003
  • 出版日期:2010-11-23 发布日期:2010-01-03

Classification detection method for uncompleted records based on bit operation

CAO Jian-jun1,DIAO Xing-chun1,WU Jian-ming2,YUAN Zhen1,PENG Cong1   

  1. 1. Nanjing Telecommunication Technology Inst., Nanjing 210007, China;
    2. Dept. of Equipment Command & Management Engineering, Ordnance Engineering Coll., Shijiazhuang 050003, China
  • Online:2010-11-23 Published:2010-01-03

摘要:

缺失数据的处理是数据清洗的重要内容。提出了一种基于位运算的不完整记录分类检测方法。对不完整记录进行了界定,将记录分为完整、不完整合格、不完整修正和不完整删除四类,并给出了其层次分类流程。定义了记录的二进制表示,根据不完整记录样本生成各类记录的标准二进制表示集,按在样本中出现的次数确定标准二进制表示的优先级,并对不完整删除标准二进制表示集中的二制表示进行了表达式合并。通过位运算实现记录的分类检测,并通过处理未检出二进制表示逐步完善二进制表示集。根据不完整记录二进制表示确定记录的进一步处理。应用实例验证了方法的有效性。

Abstract:

Missing data treatment is an important content of data cleaning. A classification detection method for uncompleted records is proposed. The uncompleted record is defined and records are classified as four classes, including completed records, uncompleted and unmodifying records, uncompleted and modifying records, uncompleted and deleting records. A classifying flow with hiberarchy is given. The binary expression of a record is defined. The standard binary expression sets of each class are created according to uncompleted record samples. Priority of standard binary expressions is determined by appearance times in samples. Some specific binary expressions are merged using formulas. Classification detection of records is implemented by bit operation.Binary expression sets are perfected step by step through dealing unseen binary expressions. The next processing of uncompleted records could be confirmed by their binary expressions. The effectiveness of the proposed method is validated by an instance.