

系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (11): 3739-3753.doi: 10.12305/j.issn.1001-506X.2025.11.22
• 系统工程 • 上一篇
收稿日期:2025-04-24
出版日期:2025-11-25
发布日期:2025-12-08
通讯作者:
李睿峰
E-mail:910073134@qq.Com
作者简介:唐 曦(1992—),男,讲师,博士研究生,主要研究方向为航空电子设备智能测试、机器学习基金资助:
Xi TANG(
), Wenhai LI, Zhenhao TANG, Ruifeng LI, Gen LI
Received:2025-04-24
Online:2025-11-25
Published:2025-12-08
Contact:
Ruifeng LI
E-mail:910073134@qq.Com
摘要:
为改善分类器对不平衡数据的分类精度,提出一种基于密度的带噪声的空间聚类方法(density-based spatial clustering of applications with noise,DBSCAN)和条件生成对抗网络(conditional generative adversarial network,CGAN)的过采样方法。首先,采用DBSCAN对正负类样本分别聚类,结合簇标签重构样本集,并结合安全级别识别和剔除噪声样本,提升数据质量。然后,将新的样本集输入CGAN模型进行训练,针对CGAN中训练不稳定和模式崩塌的问题,引入Wasserstein距离和梯度惩罚项作为损失函数,并结合分类问题对Wasserstein距离做了适应性改造,实现高质量少数类样本生成。最后,采用9个通用不平衡数据集和1个模拟电路实测数据集,在3种典型分类器上将所提方法与5个经典过采样方法进行对比实验。结果表明,所提方法在多数数据集上优于其他过采样算法,尤其在类别不平衡度较高时优势更为突出。所提方法为不平衡数据处理提供了新的思路。
中图分类号:
唐曦, 李文海, 唐贞豪, 李睿峰, 李根. 基于DBSCAN和CGAN的不平衡数据过采样方法[J]. 系统工程与电子技术, 2025, 47(11): 3739-3753.
Xi TANG, Wenhai LI, Zhenhao TANG, Ruifeng LI, Gen LI. Imbalanced data oversampling method based on DBSCAN and CGAN[J]. Systems Engineering and Electronics, 2025, 47(11): 3739-3753.
表3
不同算法在RF分类器上的对比实验结果"
| 数据集 | 过采样算法 | Recall | F-measure | AUC | Avgprecision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| Spambase | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Ionosphere | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Iris | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Credit | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Wine | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Vehicle | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Segmentation | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| CTG | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Page-blocks | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
表4
不同算法在SVM分类器上的对比实验结果"
| 数据集 | 过采样算法 | Recall | F-measure | AUC | Avgprecision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| Spambase | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Ionosphere | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Iris | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Credit | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Wine | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Vehicle | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Segmentation | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| CTG | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Page-blocks | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
表5
不同算法在KNN分类器上的对比实验结果"
| 数据集 | 过采样算法 | Recall | F-measure | AUC | Avg-precision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| Spambase | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Ionosphere | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Iris | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Credit | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Wine | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Vehicle | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Segmentation | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| CTG | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Page-blocks | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
表6
不同算法在Regulator数据集上的对比实验结果"
| 分类器 | 过采样算法 | Recall | F-measure | AUC | Avg-precision | K-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| RF | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| SVM | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| KNN | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
表7
OBDC算法在Regulator数据集上的消融实验结果"
| 分类器 | 方法 | Recall | F-measure | AUC | Avg-precision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| RF | GAN | ||||||||||||||
| CGAN | |||||||||||||||
| WCGAN_GP | |||||||||||||||
| OBDC | |||||||||||||||
| SVM | GAN | ||||||||||||||
| CGAN | |||||||||||||||
| WCGAN_GP | |||||||||||||||
| OBDC | |||||||||||||||
| KNN | GAN | ||||||||||||||
| CGAN | |||||||||||||||
| WCGAN_GP | |||||||||||||||
| OBDC | |||||||||||||||
| 1 |
GAUTAM S, DEY R. Methods for classification of imbalanced data: a review[J]. International Research Journal of Computer Science, 2022, 9 (4): 89- 95.
doi: 10.26562/irjcs.2021.v0904.004 |
| 2 | BASHA S J, MADALA S R, VIVEK K, et al. A review on imbalanced data classification techniques[C]//Proc. of the International Conference on Advanced Computing Technologies and Applications, 2022. |
| 3 | LIN W C, TSAI C F, HU Y H, et al. Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409/410, 17- 26. |
| 4 |
FAN W W, LEE C H. Classification of imbalanced data using deep learning with adding noise[J]. Journal of Sensors, 2021, 2021 (1): 1735386.
doi: 10.1155/2021/1735386 |
| 5 |
KORKAMAZ S. Deep learning-based imbalanced data classification for drug discovery[J]. Journal of Chemical Information and Modeling, 2020, 60 (9): 4180- 4190.
doi: 10.1021/acs.jcim.9b01162 |
| 6 |
HASIB K M, TOWHID N A, ISLAM M R. HSDLM: a hybrid sampling with deep learning method for imbalanced data classification[J]. International Journal of Cloud Applications and Computing, 2021, 11 (4): 1- 13.
doi: 10.4018/IJCAC.2021100101 |
| 7 |
DOUZAS G, BACAO F. Effective data generation for imbalanced learning using conditional generative adversarial networks[J]. Expert Systems with Applications, 2018, 91, 464- 471.
doi: 10.1016/j.eswa.2017.09.030 |
| 8 | 王劲波, 刘礼. 基于Bagging集成的高维不平衡数据特征选择方法[J]. 统计与决策, 2024, 40 (22): 53- 58. |
| WANG J B, LIU L. A feature selection method for high-dimensional imbalanced data based on bagging ensemble[J]. Statistics and Decision Making, 2024, 40 (22): 53- 58. | |
| 9 | 李爱华, 刘婉昕, 陈思帆, 等. 面向不平衡数据的SMOTE-BO-XGBoost集成信用评分模型研究[EB/OL]. [2025-04-11]. https://doi.org/10.16381/j.cnki. issn1003-207x.2023.0635. |
| LI A H, LIU W X, CHEN S F, et al. Research on SMOTE-BO-XGBoost ensemble credit scoring model for imbalanced data[EB/OL]. [2025-04-11]. https://doi.org/10.16381/j.cnki.issn1003-207x.2023.0635. | |
| 10 | AGUSTIANTO K, DESTARIANTO P. Imbalance data handling using neighborhood cleaning rule (NCL) sampling method for precision student modeling[C]//Proc. of the International Conference on Computer Science, Information Technology, and Electrical Engineering, 2019: 86-89. |
| 11 |
NEKOOEIMEHR I, LAI-YUEN S K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets[J]. Expert Systems with Applications, 2016, 46, 405- 416.
doi: 10.1016/j.eswa.2015.10.031 |
| 12 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]//Proc. of the IEEE International Joint Conference on Neural Networks, 2008. |
| 13 |
FU Y F, DU Y H, CAO Z J, et al. A deep learning model for network intrusion detection with imbalanced data[J]. Electronics, 2022, 11 (6): 898.
doi: 10.3390/electronics11060898 |
| 14 |
HEMALATHA P, AMALANATHAN G M. FG-SMOTE: fuzzy-based Gaussian synthetic minority oversampling with deep belief networks classifier for skewed class distribution[J]. International Journal of Intelligent Computing and Cybernetics, 2021, 14 (2): 270- 287.
doi: 10.1108/IJICC-12-2020-0202 |
| 15 | BUNKHUMPORNPAT C, SINAPIROMSARAN K, LURSINSAP C. Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Proc. of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2009: 475−482. |
| 16 | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced datasets learning[C]//Proc. of the International Conference on Intelligent Computing, 2005: 878-887. |
| 17 |
CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16, 321- 357.
doi: 10.1613/jair.953 |
| 18 | PRACHUABSUPAKIJ W. CLUS: a new hybrid sampling classification for imbalanced data[C]//Proc. of the 12th International Joint Conference on Computer Science and Software Engineering, 2015: 281−286. |
| 19 |
陈丽萍, 王洪海, 何舒平. 一种基于数据分布的不平衡数据过采样方法[J]. 安徽大学学报(自然科学版), 2024, 48 (5): 26- 36.
doi: 10.3969/j.issn.1000-2162.2024.05.005 |
|
CHEN L P, WANG H H, HE S P. An oversampling method for imbalanced data based on data distribution[J]. Journal of Anhui University (Natural Science Edition), 2024, 48 (5): 26- 36.
doi: 10.3969/j.issn.1000-2162.2024.05.005 |
|
| 20 |
LIU Y X, LIU Y, YU B X B, et al. Noise-robust oversampling for imbalanced data classification[J]. Pattern Recognition, 2023, 133, 109008.
doi: 10.1016/j.patcog.2022.109008 |
| 21 | DING H W, SUN Y, WANG Z Y, et al. RGAN-EL: a GAN and ensemble learning-based hybrid approach for imbalanced data classification[J]. Information Processing & Management, 2023, 60 (2): 103235. |
| 22 | ALMARSHDI R, NASSEF L, FADEL E, et al. Hybrid deep learning based attack detection for imbalanced data classification[J]. Intelligent Automation & Soft Computing, 2023, 35 (1): 297- 320. |
| 23 | 张钊光, 蒋庆磊, 詹瑜滨, 等. 基于VAE-GAN数据增强算法的小样本滚动轴承故障分类方法[J]. 原子能科学技术, 2023, 57 (S1): 228- 237. |
| ZHANG Z G, JIANG Q L, ZHAN Y B, et al. VAE-GAN data enhancement networks-based model for rolling bearing few-shot fault classification[J]. Atomic Energy Science and Technology, 2023, 57 (S1): 228- 237. | |
| 24 |
MI J, WANG L F, LIU Y, et al. KDE-GAN: a multimodal medical image-fusion model based on knowledge distillation and explainable AI modules[J]. Computers in Biology and Medicine, 2022, 151, 106273.
doi: 10.1016/j.compbiomed.2022.106273 |
| 25 |
ALEX S A, NAYAHI J J V. Classification of imbalanced data using SMOTE and autoencoder based deep convolutional neural network[J]. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, 2023, 31 (3): 437- 469.
doi: 10.1142/S0218488523500228 |
| 26 |
QIAN Q L, SUN W H, WANG Z, et al. GIS partial discharge data enhancement method based on self attention mechanism VAE-GAN[J]. Global Energy Interconnection, 2023, 6 (5): 601- 613.
doi: 10.1016/j.gloei.2023.10.007 |
| 27 | 曾治霖, 瞿昊, 杜正春. 基于深度学习和生成对抗网络的发动机缸体表面缺陷检测方法[J]. 机械工程学报, 2025, 61 (2): 46- 55. |
| ZENG Z L, QU H, DU Z C. An engine cylinder surface defect detection algorithm based on the YOLOv5 network and Pix2Pix model[J]. Journal of Mechanical Engineering, 2025, 61 (2): 46- 55. | |
| 28 |
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. GAN(Generative Adversarial Nets)[J]. Communications of the ACM, 2020, 63 (11): 139- 144.
doi: 10.1145/3422622 |
| 29 |
SHARMA A, SINGH P K, CHANDRA R. SMOTified-GAN for class imbalanced pattern classification problems[J]. IEEE Access, 2022, 10, 30655- 30665.
doi: 10.1109/ACCESS.2022.3158977 |
| 30 | MULLICK S S, DATTA S, DAS S. Generative adversarial minority oversampling[C]// Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 1695-1704. |
| 31 |
ZHAI J H, QI J X, SHEN C. Binary imbalanced data classification based on diversity oversampling by generative models[J]. Information Sciences, 2022, 585, 313- 343.
doi: 10.1016/j.ins.2021.11.058 |
| 32 |
ZAREAPOOR M, SHAMSOLMOALI P, YANG J. Oversampling adversarial network for class-imbalanced fault diagnosis[J]. Mechanical Systems and Signal Processing, 2021, 149, 107175.
doi: 10.1016/j.ymssp.2020.107175 |
| [1] | 焦志强, 易侃, 张杰勇, 姚佩阳. 不平衡数据下基于SVM增量学习的指挥信息系统状态监控方法[J]. 系统工程与电子技术, 2024, 46(3): 992-1003. |
| [2] | 裴家正, 黄勇, 陈宝欣, 关键, 陈小龙. 针对距离采样失配的多波形自适应脉冲压缩[J]. 系统工程与电子技术, 2023, 45(7): 2031-2042. |
| [3] | 曹鹏宇, 杨承志, 石礼盟, 吴宏超. 基于PSO-DBSCAN和SCGAN的未知雷达信号处理方法[J]. 系统工程与电子技术, 2022, 44(4): 1158-1165. |
| [4] | 唐玺博, 张立民, 钟兆根. 基于ADASYN与改进残差网络的入侵流量检测识别[J]. 系统工程与电子技术, 2022, 44(12): 3850-3862. |
| [5] | 孙煜飞, 马良荔, 吕闽晖, 覃基伟. 基于改进协同训练的本体映射方法[J]. 系统工程与电子技术, 2017, 39(2): 459-464. |
| [6] | 朱行涛, 刘郁林, 晁志超, 何为. 基于扩频码周期性的单通道直扩通信半盲分离抗干扰算法[J]. 系统工程与电子技术, 2016, 38(2): 415-422. |
| [7] | 井小沛, 汪厚祥, 聂凯. 基于修正核函数SVM的网络入侵检测[J]. Journal of Systems Engineering and Electronics, 2012, 34(5): 1036-1040. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||