Journal of Systems Engineering and Electronics ›› 2009, Vol. 31 ›› Issue (12): 2994-2997.

• 软件、算法与仿真 • 上一篇    下一篇

基于网络日志的数据挖掘预处理改进方法

孙宇航,孙应飞   

  1. 中国科学院研究生院, 北京 100049
  • 出版日期:2009-12-24 发布日期:2010-01-03

Improved method of data mining preprocessing based on Web log

SUN Yu-hang, SUN Ying-fei   

  1. Graduate Univ. of Chinese Academy of Science, Beijing 100049, China
  • Online:2009-12-24 Published:2010-01-03

摘要:

对网络日志数据挖掘预处理技术进行研究,针对Frame页面过滤方法与超时阈值设定进行分析,提出了应用ID3算法改进Frame页面过滤过程中丢失SubFrame页面信息且需要进行站点提升步骤。在超时阈值的设定方面采用动态修正方法,提高预处理技术对长时间会话的识别能力的改进方法。通过实验验证,该方法有效地减少了预处理过程中的信息丢失,同时提高了挖掘结果的精度。

Abstract:

Data preprocessing method of Web log mining is studied. Frame pages filtering and overtime threshold value seting are analyzed. The improved method based on induction of decision tree(ID3) algorithm and threshold value dynamic amendment algorithm is proposed. This method deals with information loss by Frame pages filtering and threshold value fixing. Transaction session identification ability is also enchanced. The experiment about this method shows that this method is efficient in improving accuracy of mining result.