基于CVAE-LSTM的服务器KPI异常检测

doi:10.12305/j.issn.1001-506X.2025.03.34

摘要/Abstract

摘要：

对于关键性能指标(key performance indicator, KPI)的异常检测是互联网智慧运维流程中的基石, 对于故障报警和保障服务器安全具有重要意义。深度生成模型已经能很好地解决机器学习模型深度特征表征能力差的问题, 但对于KPI数据中时间信息的处理和长时信息的捕获存在不足。为此, 提出一种基于条件变分自编码器(conditional variational autoencoder, CVAE)和长短时记忆(long-short term memory, LSTM)网络相结合的KPI异常检测模型, 利用CVAE网络强大的表征能力, 并将时间信息添加到深度自编码器中, 利用LSTM的长时记忆能力, 提高模型的长时异常学习和处理能力, 使用训练好的CVAE网络来进一步训练LSTM。在3个公开的数据集上与其他深度学习模型进行对比实验, 实验结果表明, 在F1值方面, 所提模型的性能优于单独的LSTM和一些效果较好的深度学习模型。

关键词: 关键性能指标异常检测, 条件变分自编码器, 长短时记忆网络, 关键性能指标, 深度学习

Abstract:

The anomaly detection of key performance indicator (KPI) is the basis of all aspects of Internet intelligent operation and maintenance, and is of great significance for fault alarm and server security. The depth generation model has been able to solve the problem of poor depth feature representation ability of machine learning model, but it is insufficient in terms of the processing of time information in KPI data and the capture of long-term information. For this reason, a KPI anomaly detection model based on the combination of conditional variational autoencoder (CVAE) and long-short term memory (LSTM) is proposed. With the powerful representation ability of CVAE network, time information is added to deep autoencoder, and the long-term memory ability of LSTM is used to improve the long-term anomaly learning and processing ability of the proposed model. The trained CVAE network is used to further train LSTM. Through the comparison experiment with other deep learning models on three open datasets, the experimental results show that the performance of the model in this paper is better than that of the LSTM alone and some deep learning models with better results in terms of F1 value.

Key words: key performance indicator (KPI) anomaly detection, conditional variational autoencoder (CVAE), long-short term memory (LSTM) network, KPI, deep learning

中图分类号:

沈夏闰, 李若楠, 张昊田. 基于CVAE-LSTM的服务器KPI异常检测[J]. 系统工程与电子技术, 2025, 47(3): 1019-1027.

Xiarun SHEN, Ruonan LI, Haotian ZHANG. Server KPI anomaly detection based on CVAE-LSTM[J]. Systems Engineering and Electronics, 2025, 47(3): 1019-1027.

图/表 9

图1

图2

图3

图4

表1

表2

表3

表4

图5

参考文献 27

1	张圣林, 林潇霏, 孙永谦, 等. 基于深度学习的无监督KPI异常检测[J]. 数据与计算发展前沿, 2020, 2 (3): 87- 100.
	ZHANG S L , LIN X F , SUN Y Q , et al. Research on unsupervised KPI anomaly detection based on deep learning[J]. Frontiers of Data and Computing, 2020, 2 (3): 87- 100.
2	YE Q L , YANG J , YIN T M , et al. Can the virtual labels obtained by traditional LP approaches be well encoded in WLR?[J]. IEEE Trans.on Neural Networks and Learning Systems, 2015, 27 (7): 1591- 1598.
3	CHANDOLA V , BANERJEE A , KUMAR V . Anomaly detection: a survey[J]. ACM Computing Surveys (CSUR), 2009, 41 (3): 1- 58.
4	RINGBERG H, SOULE A, REXFORD J, et al. Sensitivity of PCA for traffic anomaly detection[C]//Proc. of the Measurement and Modeling of Computer Systems, 2007.
5	PENA E H M, ASSIS M V O, PROENCA M L. Anomaly detection using forecasting methods ARIMA and HWDS[C]//Proc. of the 32nd International Conference of the Chilean Computer Science Society, 2013: 63-66.
6	NADAI M D, SOMEREN M V. Short-term anomaly detection in gas consumption through ARIMA and artificial neural network forecast[C]//Proc. of the IEEE Workshop on Environmental, Energy, and Structural Monitoring Systems, 2015: 250-255.
7	CHEN Y, MAHAJAN R, SRIDHARAN B, et al. A provider-side view of web search response time[C]//Proc. of the ACM Special Interest Group on Data Communication, 2013.
8	LAPTEV N, AMIZADEH S, FLINT I. Generic and scalable framework for automated time-series anomaly detection[C]//Proc. of the Knowledge Discovery and Data Mining, 2015.
9	LIU D P, ZHAO Y J, XU H W, et al. Opprentice: towards practical and automatic anomaly detection through machine learning[C]//Proc. of the Internet Measurement Conference, 2015: 211-224.
10	BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers[C]//Proc. of the ACM SIGMOD International Conference on Management of Data, 2000: 93-104.
11	AMER M, GOLDSTEIN M, ABDENNADHER S. Enhancing one-class support vector machines for unsupervised anomaly detection[C]//Proc. of the ACM SIGKDD Workshop on Outlier Detection and Description, 2013: 8-15.
12	ERFANI S M , RAJASEGARAR S , KARUNASEKERA S , et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J]. Pattern Recognition, 2016, 58, 121- 134.
13	LIU F T, TING K M, ZHOU Z H. Isolation forest[C]//Proc. of the 8th IEEE International Conference on Data Mining, 2008: 413-422.
14	MUNZ G, LI S, CARLE G. Traffic anomaly detection using k-means clustering[C]//Proc. of the GI/ITG Workshop MMBnet, 2007.
15	MALHOTRA P, VIG L, SHROFF G, et al. Long short term memory networks for anomaly detection in time series[C]//Proc. of the European Symposium on Artificial Neural Networks, 2015: 89-94.
16	AN J W , CHO S Z . Variational autoencoder based anomaly detection using reconstruction probability[J]. Special Lecture on IE, 2015, 2 (1): 1- 18.
17	ZONG B, SONG Q, MIN M R, et al. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection[C]// Proc. of the International Conference on Learning Representations, 2018.
18	XU H W, CHEN W X, ZHAO N W, et al. Unsupervised anomaly detection via variational auto-encoder for seasonal KPI in web applications[C]//Proc. of the World Wide Web Conference, 2018: 187-196.
19	LI Z Y, CHEN W X, PEI D. Robust and unsupervised KPI anomaly detection based on conditional variational autoencoder[C]// Proc. of the IEEE 37th International Performance Computing and Communications Conference, 2018.
20	SRIVASTAVA N , HINTON G , KRIZHEVSKY A , et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15 (1): 1929- 1958.
21	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proc. of the International Conference on Machine Learning, 2015: 448-456.
22	LIN S, CLARK R, BIRKE R, et al. Anomaly detection for time series using VAE-LSTM hybrid model[C]//Proc. of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 4322-4326.
23	KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. [2023-04-11]. https://arXiv.org/abs/1312.6114.
24	SOHN K, LEE H, YAN X. Learning structured output representation using deep conditional generative models[C]//Proc. of the Advances in Neural Information Processing Systems, 2015.
25	HOCHREITER S , SCHMIDHUBER J . Long-shortterm memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780.
26	LI Z Y, ZHAO N W, ZHANG S L, et al. Constructing large-scale real-world Benchmark datasets for AIOps[EB/OL]. [2023-04-11]. https://arXiv.org/abs/2208.03938.
27	ZHANG S L, ZHAO C Y, SUI Y C, et al. Robust KPI anomaly detection for large-scale software services with partial labels[C]// Proc. of the IEEE 32nd International Symposium on Software Reliability Engineering, 2021: 103-114.

数据集名称	数据点数	缺失值点数	异常数据点数
A	129 035	2 754	7 666
B	128 854	2 941	9 581
C	147 669	28	2 945

参数名称	参数意义	参数值
optimizer	优化器	Adam
learning_rate	学习率	0.001
gamma	学习率调整stepLR的惩罚因子	0.75
CVAE_epochs	CVAE网络训练轮数	20
LSTM_epochs	LSTM网络训练轮数	50
latent_dims	隐藏层维度	8

模块名称	层操作名称	参数设置	输出维度	上一层操作
Encoder	input(InputLayer)	-	(batch, 139)	-
	linear_1(Linear)	(139, 100, bias=True)	(batch, 100)	input
	activation_1(ReLU)	-	(batch, 100)	linear_1
	linear_2(Linear)	(100, 100, bias=True)	(batch, 100)	activation_1
	activation_2(ReLU)	-	(batch, 100)	linear_2
	linear_3(Linear)	(100, 8, bias=True)	(batch, 8)	activation_2
	linear_4(Linear)	(100, 8, bias=True)	(batch, 8)	activation_2
	activation_3(Softplus)	beta=1, threshold=20	(batch, 8)	linear_4
LSTM	lstm_1(LSTM)	(8, 64, batch_first=True)	(batch, k, 64)	linear_3
LSTM	lstm_2(LSTM)	(64, 8, batch_first=True)	(batch, k, 8)	lstm_1
Decoder	linear_5(Linear)	(99, 100, bias=True)	(batch, 99)	lstm_2
	activation_4(ReLU)	-	(batch, 100)	linear_5
	linear_6(Linear)	(100, 100, bias=True)	(batch, 100)	activation_4
	activation_5(ReLU)	-	(batch, 100)	linear_6
	linear_7(Linear)	(100, 48, bias=True)	(batch, 48)	activation_5
	linear_8(Linear)	(100, 48, bias=True)	(batch, 48)	activation_5
	activation_6(Softplus)	beta=1, threshold=20	(batch, 48)	linear_8

数据集	性能	LSTM-AD	VAE	CVAE	PUAD	本文模型
A	Precision	78.67	94.19	92.57	97.39	100.00
	Recall	82.31	82.71	91.78	92.12	92.35
	F1	80.44	88.08	92.17	94.68	96.02
B	Precision	58.25	91.96	93.51	95.67	99.91
	Recall	54.27	100.00	100.00	99.94	99.05
	F1	56.19	95.81	96.65	97.76	99.47
C	Precision	72.00	99.21	97.91	96.42	98.32
	Recall	69.83	79.09	88.36	88.10	88.36
	F1	70.90	88.02	92.89	92.07	93.07

[1]	刘晓琳, 郭梦娇, 李卓. Dueling DQN优化下的航班延误自适应图卷积循环网络预测方法[J]. 系统工程与电子技术, 2025, 47(2): 568-579.
[2]	付卫红, 张鑫钰, 刘乃安. 基于多尺度融合神经网络的同频同调制单通道盲源分离算法[J]. 系统工程与电子技术, 2025, 47(2): 641-649.
[3]	蔡伟, 王鑫, 蒋昕昊, 杨志勇, 陈栋. 基于解耦的小样本目标检测方法研究[J]. 系统工程与电子技术, 2024, 46(9): 2941-2950.
[4]	陈晓萱, 徐书文, 胡绍海, 马晓乐. 基于卷积与自注意力的红外与可见光图像融合[J]. 系统工程与电子技术, 2024, 46(8): 2641-2649.
[5]	汪强龙, 高晓光, 吴必聪, 胡子剑, 万开方. 受限玻尔兹曼机及其变体研究综述[J]. 系统工程与电子技术, 2024, 46(7): 2323-2345.
[6]	孙先涛, 江汪洋, 陈文杰, 陈伟海, 智亚丽. 基于感兴趣区域的物体抓取位姿检测[J]. 系统工程与电子技术, 2024, 46(6): 1867-1877.
[7]	陈雪梅, 刘志恒, 周绥平, 余航, 刘彦明. 基于HRNet的高分辨率遥感影像道路提取方法[J]. 系统工程与电子技术, 2024, 46(4): 1167-1173.
[8]	张天文, 张晓玲, 邵子康, 曾天娇. 基于掩模注意型交互的SAR舰船实例分割[J]. 系统工程与电子技术, 2024, 46(3): 831-838.
[9]	龚峻扬, 付卫红, 刘乃安. SAR图像目标轮廓增强预处理模块设计[J]. 系统工程与电子技术, 2024, 46(12): 4010-4017.
[10]	欧阳彤, 汪玲, 朱岱寅, 李勇. 融合LightGBM的ResNeXt气象目标细粒度识别方法[J]. 系统工程与电子技术, 2024, 46(12): 4034-4043.
[11]	彭珂, 王华伟, 侯召国, 曾啸寒, 罗通. 基于知识图谱的空管特情处置决策支持方法[J]. 系统工程与电子技术, 2024, 46(12): 4116-4127.
[12]	季然, 肖茂森, 李硕, 刘宇, 罗湛仪, 程嘉维. 基于机器学习的MRTD客观测试方法研究[J]. 系统工程与电子技术, 2024, 46(10): 3265-3270.
[13]	聂千祁, 沙明辉, 朱应申. 基于改进残差神经网络的雷达信号识别方法[J]. 系统工程与电子技术, 2024, 46(10): 3356-3364.
[14]	王一博, 张乐飞, 李新德. 基于多任务学习的建筑毁伤评估方法[J]. 系统工程与电子技术, 2024, 46(10): 3375-3382.
[15]	施端阳, 林强, 胡冰, 杜小帅. 基于YOLO的航管一次雷达目标检测方法[J]. 系统工程与电子技术, 2024, 46(1): 143-151.