Systems Engineering and Electronics ›› 2025, Vol. 47 ›› Issue (6): 2065-2075.doi: 10.12305/j.issn.1001-506X.2025.06.34

• Communications and Networks • Previous Articles    

Two-stage novel method for imbalanced data distribution in network intrusion detection

Bo WEI1, Caifu HU2, Ruibin REN1,*   

  1. 1. School of Mathematics, Southwest Jiaotong University, Chengdu 611756, China
    2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
  • Received:2024-06-21 Online:2025-06-25 Published:2025-07-09
  • Contact: Ruibin REN

Abstract:

Although many current networks traffic intrusion detection models have relatively high detection rates, there are still problems such as low detection rates and poor generalization for imbalanced abnormal network traffic. Therefore, two-stage network intrusion detection method for imbalance data is proposed. In the first stage, a random forest ensemble model is trained to perform initial normal and abnormal binary classification on network traffic to alleviate the impact of imbalance of normal and abnormal traffic on model training. In the second stage, an initial abnormal traffic data is used to train an one-dimensional convolutional neural network-bi-directional long short-term memory model to study the key features of abnormal traffic, and the focal loss function is introduced during model training. This mechanism enables the model to simultaneously focus on difficult classification samples and minority samples in abnormal traffic, further alleviating the impact of data imbalance of abnormal traffic on detection accuracy. In order to verify the effectiveness of the proposed method, experiments are conducted on the UNSW2015 and CIC-IDS2017 dataset. The experimental results show that the proposed method can better extract data features and alleviate data imbalance to a certain extent. Compared with other similar methods proposed in recent years, the proposed model has better overall performance, and the weighted F1 score increased by 0.9% and the macro F1 score increased by 2.7%.

Key words: intrusion detection, imbalance samples, neural network, focal loss

CLC Number: 

[an error occurred while processing this directive]