系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (6): 2065-2075.doi: 10.12305/j.issn.1001-506X.2025.06.34

• 通信与网络 • 上一篇    

面向不平衡数据的二阶段网络入侵检测新方法

魏波1, 胡财富2, 任芮彬1,*   

  1. 1. 西南交通大学数学学院, 四川 成都 611756
    2. 西南交通大学信息科学与技术学院, 四川 成都 611756
  • 收稿日期:2024-06-21 出版日期:2025-06-25 发布日期:2025-07-09
  • 通讯作者: 任芮彬
  • 作者简介:魏波 (2000—), 男, 硕士研究生, 主要研究方向为网络入侵检测
    胡财富 (1999—), 男, 硕士研究生, 主要研究方向为网络安全、深度学习
    任芮彬 (1990—), 女, 副教授, 博士研究生, 主要研究方向为深度学习、网络安全
  • 基金资助:
    国家自然科学基金(U20B2070);国家自然科学基金(U23B2013)

Two-stage novel method for imbalanced data distribution in network intrusion detection

Bo WEI1, Caifu HU2, Ruibin REN1,*   

  1. 1. School of Mathematics, Southwest Jiaotong University, Chengdu 611756, China
    2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
  • Received:2024-06-21 Online:2025-06-25 Published:2025-07-09
  • Contact: Ruibin REN

摘要:

虽然目前许多网络流量入侵检测模型已具有较高的检测率, 但仍存在对不平衡异常网络流量的检测率低、泛化性差等问题。因此, 提出一种面向数据不平衡的二阶段网络入侵检测方法。在第一阶段, 训练随机森林集成模型, 对网络流量进行初步的正常和异常二分类检测, 初步缓解正常流量与异常流量分布不平衡对模型训练的影响; 在第二阶段, 使用原始异常流量数据训练一维卷积神经网络-双向长短期记忆模型, 在模型训练阶段聚焦学习异常流量的关键特征, 同时引入焦点损失函数, 使得模型能够同时关注异常流量中的难分类样本和少数类样本, 进一步缓解异常流量数据分布不平衡对检测精度的影响。为了验证方法的有效性, 在UNSW2015和CIC-IDS2017数据集上进行实验验证。实验结果表明, 所提算法可以更好地提取数据特征, 在一定程度上缓解数据分布不平衡问题。与其他同类方法相比, 所提方法的整体性能较好, 且加权平均F1分数的得分提高了0.9%, 宏平均F1分数的得分提高了2.7%。

关键词: 入侵检测, 不平衡样本, 神经网络, 焦点损失

Abstract:

Although many current networks traffic intrusion detection models have relatively high detection rates, there are still problems such as low detection rates and poor generalization for imbalanced abnormal network traffic. Therefore, two-stage network intrusion detection method for imbalance data is proposed. In the first stage, a random forest ensemble model is trained to perform initial normal and abnormal binary classification on network traffic to alleviate the impact of imbalance of normal and abnormal traffic on model training. In the second stage, an initial abnormal traffic data is used to train an one-dimensional convolutional neural network-bi-directional long short-term memory model to study the key features of abnormal traffic, and the focal loss function is introduced during model training. This mechanism enables the model to simultaneously focus on difficult classification samples and minority samples in abnormal traffic, further alleviating the impact of data imbalance of abnormal traffic on detection accuracy. In order to verify the effectiveness of the proposed method, experiments are conducted on the UNSW2015 and CIC-IDS2017 dataset. The experimental results show that the proposed method can better extract data features and alleviate data imbalance to a certain extent. Compared with other similar methods proposed in recent years, the proposed model has better overall performance, and the weighted F1 score increased by 0.9% and the macro F1 score increased by 2.7%.

Key words: intrusion detection, imbalance samples, neural network, focal loss

中图分类号: