面向不平衡数据的二阶段网络入侵检测新方法

doi:10.12305/j.issn.1001-506X.2025.06.34

摘要/Abstract

摘要：

虽然目前许多网络流量入侵检测模型已具有较高的检测率, 但仍存在对不平衡异常网络流量的检测率低、泛化性差等问题。因此, 提出一种面向数据不平衡的二阶段网络入侵检测方法。在第一阶段, 训练随机森林集成模型, 对网络流量进行初步的正常和异常二分类检测, 初步缓解正常流量与异常流量分布不平衡对模型训练的影响; 在第二阶段, 使用原始异常流量数据训练一维卷积神经网络-双向长短期记忆模型, 在模型训练阶段聚焦学习异常流量的关键特征, 同时引入焦点损失函数, 使得模型能够同时关注异常流量中的难分类样本和少数类样本, 进一步缓解异常流量数据分布不平衡对检测精度的影响。为了验证方法的有效性, 在UNSW2015和CIC-IDS2017数据集上进行实验验证。实验结果表明, 所提算法可以更好地提取数据特征, 在一定程度上缓解数据分布不平衡问题。与其他同类方法相比, 所提方法的整体性能较好, 且加权平均F1分数的得分提高了0.9%, 宏平均F1分数的得分提高了2.7%。

关键词: 入侵检测, 不平衡样本, 神经网络, 焦点损失

Abstract:

Although many current networks traffic intrusion detection models have relatively high detection rates, there are still problems such as low detection rates and poor generalization for imbalanced abnormal network traffic. Therefore, two-stage network intrusion detection method for imbalance data is proposed. In the first stage, a random forest ensemble model is trained to perform initial normal and abnormal binary classification on network traffic to alleviate the impact of imbalance of normal and abnormal traffic on model training. In the second stage, an initial abnormal traffic data is used to train an one-dimensional convolutional neural network-bi-directional long short-term memory model to study the key features of abnormal traffic, and the focal loss function is introduced during model training. This mechanism enables the model to simultaneously focus on difficult classification samples and minority samples in abnormal traffic, further alleviating the impact of data imbalance of abnormal traffic on detection accuracy. In order to verify the effectiveness of the proposed method, experiments are conducted on the UNSW2015 and CIC-IDS2017 dataset. The experimental results show that the proposed method can better extract data features and alleviate data imbalance to a certain extent. Compared with other similar methods proposed in recent years, the proposed model has better overall performance, and the weighted F1 score increased by 0.9% and the macro F1 score increased by 2.7%.

Key words: intrusion detection, imbalance samples, neural network, focal loss

中图分类号:

TP393

魏波, 胡财富, 任芮彬. 面向不平衡数据的二阶段网络入侵检测新方法[J]. 系统工程与电子技术, 2025, 47(6): 2065-2075.

Bo WEI, Caifu HU, Ruibin REN. Two-stage novel method for imbalanced data distribution in network intrusion detection[J]. Systems Engineering and Electronics, 2025, 47(6): 2065-2075.

图/表 15

图1

图2

图3

表1

表2

表3

图4

表4

表5

图5

表6

表7

图6

图7

图8

参考文献 26

1	LAN J H , LIU X D , LI B , et al. A novel hierarchical attention-based triplet network with unsupervised domain adaptation for network intrusion detection[J]. Applied Intelligence, 2023, 53 (10): 11705- 11726. doi: 10.1007/s10489-022-04076-0
2	THAKKAR A , LOHIYA R . A survey on intrusion detection system: feature selection, model, performance measures, appli cation perspective, challenges, and future research directions[J]. Artificial Intelligence Review, 2022, 55 (1): 453- 463. doi: 10.1007/s10462-021-10037-9
3	CUI J Y , ZONG L S , XIE J H , et al. A novel multi-module integrated intrusion detection system for high-dimensional imba-lanced data[J]. Applied Intelligence, 2023, 53 (1): 272- 288. doi: 10.1007/s10489-022-03361-2
4	KHAN S H , HAYAT M , BENNAMOUN M , et al. Cost-sensitive learning of deep feature representations from imbalanced data[J]. IEEE Trans. on Neural Networks and Learning Systems, 2018, 29 (8): 3573- 3587. doi: 10.1109/TNNLS.2017.2732482
5	李艳霞, 柴毅, 胡友强, 等. 不平衡数据分类方法综述[J]. 控制与决策, 2019, 34 (4): 673- 688.
	LI Y X , CHAI Y , HU Y Q , et al. Review of imbalanced data classification methods[J]. Control and Decision, 2019, 34 (4): 673- 688.
6	BEDI P , GUPTA N , JINDAL V . Ⅰ-SiamIDS: an improved SiamIDS for handling class imbalance in network-based intrusion detection systems[J]. Applied Intelligence, 2021, 51, 1133- 1151. doi: 10.1007/s10489-020-01886-y
7	潘成胜, 李志祥, 杨雯升, 等. 基于二次特征提取和BiLSTM-Attention的网络流量异常检测方法[J]. 电子与信息学报, 2023, 45 (12): 4539- 4547. doi: 10.11999/JEIT221296
	PAN C S , LI Z X , YANG W S , et al. Anomaly detection method of network traffic based on secondary feature extraction and BiLSTM-attention[J]. Journal of Electronics & Information Technology, 2023, 45 (12): 4539- 4547. doi: 10.11999/JEIT221296
8	LAN Y, TRUONG-HUU T, WU J, et al. Cascaded multi-class network intrusion detection with decision tree and self-attentive model[C]//Proc. of the IEEE International Conference on Data Mining Workshops, 2022.
9	DENNING D E . An intrusion-detection model[J]. IEEE Trans. on Software Engineering, 1987, 13 (2): 222- 232.
10	PORRAS P A, KEMMERER R A. Penetration state transition analysis: a rule-based intrusion detection approach[C]//Proc. of the 8th Annual Computer Security Application Conference, 1992: 220-229.
11	SHEU T F, HUANG N F, LEE H P. NIS04-6: a time-and memory-efficient string matching algorithm for intrusion detection systems[C]//Proc. of the IEEE Global Communications Conference, 2006.
12	PAN Z S, LIAN H, HU G Y, et al. An integrated model of intrusion detection based on neural network and expert system[C]// Proc. of the 17th IEEE International Conference on Tools with Artificial Intelligence, 2005.
13	LUNT T F, JAGANNATHAN R. A prototype real-time intrusion-detection expert system[C]//Proc. of the IEEE Symposium on Security & Privacy, 1988.
14	GU J , LU S . An effective intrusion detection approach using SVM with naive Bayes feature embedding[J]. Computers & Security, 2021, 103, 102158.
15	GUEZZAZ A , BENKIRANE S , AZROUR M , et al. A reliable network intrusion detection approach using decision tree with enhanced data quality[J]. Security and Communication Networks, 2021, 2021, 123059.
16	AZIZJON M, JUMABEK A, KIM W. 1D CNN based network intrusion detection with normalization on imbalanced data[C]//Proc. of the International Conference on Artificial Intelligence in Information and Communication, 2020: 218-224.
17	TIAN Q T , HAN D Z , LI K C , et al. An intrusion detection approach based on improved deep belief network[J]. Applied Intelligence, 2020, 50, 3162- 3178. doi: 10.1007/s10489-020-01694-4
18	FOTIADOU K , VELIVASSAKI T H , VOULKIDIS A , et al. Network traffic anomaly detection via deep learning[J]. Information, 2021, 12 (5): 215. doi: 10.3390/info12050215
19	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
20	DING D F, ZHU L, XIE J Y, et al. In-vehicle network intrusion detection system based on Bi-LSTM[C]//Proc. of the 7th International Conference on Intelligent Computing and Signal Processing, 2022: 580-583.
21	MOUSTAFA N, SLAY J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)[C]//Proc. of the Military Communications and Information Systems Conference, 2015.
22	SHARAFALDIN I, LASHKARI A H, GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization[C]//Proc. of the International Con-ference on Information Systems Security & Privacy, 2018: 108-116.
23	AL-TURAIKI I , ALTWAIJRY N . A convolutional neural net work for improved anomaly-based network intrusion detection[J]. Big Data, 2021, 9 (3): 233- 252. doi: 10.1089/big.2020.0263
24	HALBOUNI A , GUNAWAN T S , HABAEBI M H , et al. CNN-LSTM: hybrid deep neural network for network intrusion detection system[J]. IEEE Access, 2022, 10, 99837- 99849. doi: 10.1109/ACCESS.2022.3206425
25	UDAS P B , KARIM M E , ROY K S . SPIDER: a shallow PCA based network intrusion detection system with enhanced recurr ent neural networks[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34 (10): 10246- 10272. doi: 10.1016/j.jksuci.2022.10.019
26	REH H J , TANG Y H , DONG W Y , et al. DUEN: dynamic ensemble handling class imbalance in network intrusion detection[J]. Expert Systems with Applications, 2023, 229, 120420. doi: 10.1016/j.eswa.2023.120420

流量类型	流量子类型	数量
Benign	Normal	93 000
Intrusion	Generic	58 871
	Exploits	44 525
	Fuzzers	24 246
	DoS	16 353
	Reconnaissance	13 987
	Analysis	2 677
	Backdoor	2 329
	Shellcode	1 511
	Worms	174

流量类型	流量子类型	数量
Benign	Benign	2 273 097
Intrusion	DoS Hulk	231 073
	PortScan	158 930
	DDoS	128 027
	DoS GoldenEye	10 293
	FTP-Patator	7 938
	SSH-Patator	5 897
	DoS slowloris	5 796
	DoS Slowhttptest	5 499
	Bot	1 966
	Infiltration	36
	Heartbleed	11
	Web Attack Brute Force	1 507
	Web Attack Sql Injection	652
	Web Attack XSS	21

真实情况	预测结果
真实情况	正例	反例
正例	TP	FN
反例	FP	TN

方法	年份	准确率	加权平均F1	宏平均F1
文献[8]	2022	-	-	0.559
文献[23]	2021	0.805	0.810	-
文献[24]	2022	0.818	0.809	-
文献[25]	2022	0.729	0.737	0.525
文献[26]	2023	-	-	0.501
本文方法	-	0.818	0.822	0.586

模型(方法)	准确率	精确率	召回率	宏平均F1
1D-CNN-BiLSTM	0.802	0.608	0.510	0.524
SMOTE-NC+1D-CNN-BiLSTM	0.788	0.517	0.603	0.521
RUS+1D-CNN-BiLSTM	0.785	0.597	0.532	0.528
本文方法	0.818	0.645	0.575	0.586