Systems Engineering and Electronics ›› 2025, Vol. 47 ›› Issue (11): 3739-3753.doi: 10.12305/j.issn.1001-506X.2025.11.22
• Systems Engineering • Previous Articles
Xi TANG(
), Wenhai LI, Zhenhao TANG, Ruifeng LI, Gen LI
Received:2025-04-24
Online:2025-11-25
Published:2025-12-08
Contact:
Ruifeng LI
E-mail:910073134@qq.Com
CLC Number:
Xi TANG, Wenhai LI, Zhenhao TANG, Ruifeng LI, Gen LI. Imbalanced data oversampling method based on DBSCAN and CGAN[J]. Systems Engineering and Electronics, 2025, 47(11): 3739-3753.
Table 2
Description of experimental datasets"
| 数据集名称 | 特征维数 | 总样本数 | 少数类 样本数 | 多数类 样本数 | 不平衡比 |
| Spambase | 57 | 1 813 | 1.538 | ||
| Ionosphere | 34 | 351 | 125 | 226 | 1.808 |
| Iris | 4 | 150 | 50 | 100 | 2.000 |
| Credit | 20 | 300 | 700 | 2.333 | |
| Wine | 13 | 178 | 48 | 130 | 2.708 |
| Vehicle | 18 | 846 | 218 | 628 | 2.881 |
| Segment | 19 | 330 | 1 980 | 6.000 | |
| CTGs | 21 | 1 831 | 176 | 9.403 | |
| Pageblocks | 10 | 329 | 14.933 | ||
| Regulator | 9 | 221 | 45 | 176 | 3.911 |
Table 3
Comparative experimental results of different algorithms on RF classifier"
| 数据集 | 过采样算法 | Recall | F-measure | AUC | Avgprecision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| Spambase | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Ionosphere | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Iris | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Credit | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Wine | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Vehicle | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Segmentation | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| CTG | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Page-blocks | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
Table 4
Comparative experimental results of different algorithms on SVM classifier"
| 数据集 | 过采样算法 | Recall | F-measure | AUC | Avgprecision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| Spambase | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Ionosphere | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Iris | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Credit | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Wine | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Vehicle | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Segmentation | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| CTG | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Page-blocks | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
Table 5
Comparative experimental results of different algorithms on KNN classifier"
| 数据集 | 过采样算法 | Recall | F-measure | AUC | Avg-precision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| Spambase | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Ionosphere | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Iris | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Credit | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Wine | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Vehicle | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Segmentation | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| CTG | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| Page-blocks | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
Table 6
Comparative experimental results of different algorithms on the Regulator dataset"
| 分类器 | 过采样算法 | Recall | F-measure | AUC | Avg-precision | K-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| RF | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| SVM | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
| KNN | ADASYN | ||||||||||||||
| SMOTE | |||||||||||||||
| Borderline | |||||||||||||||
| SVMSMOTE | |||||||||||||||
| SMOTETk | |||||||||||||||
| OBDC | |||||||||||||||
Table 7
Ablation experimental results of OBDC algorithm on the Regulator dataset"
| 分类器 | 方法 | Recall | F-measure | AUC | Avg-precision | G-mean | |||||||||
| MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | MEAN | STD | ||||||
| RF | GAN | ||||||||||||||
| CGAN | |||||||||||||||
| WCGAN_GP | |||||||||||||||
| OBDC | |||||||||||||||
| SVM | GAN | ||||||||||||||
| CGAN | |||||||||||||||
| WCGAN_GP | |||||||||||||||
| OBDC | |||||||||||||||
| KNN | GAN | ||||||||||||||
| CGAN | |||||||||||||||
| WCGAN_GP | |||||||||||||||
| OBDC | |||||||||||||||
| 1 |
GAUTAM S, DEY R. Methods for classification of imbalanced data: a review[J]. International Research Journal of Computer Science, 2022, 9 (4): 89- 95.
doi: 10.26562/irjcs.2021.v0904.004 |
| 2 | BASHA S J, MADALA S R, VIVEK K, et al. A review on imbalanced data classification techniques[C]//Proc. of the International Conference on Advanced Computing Technologies and Applications, 2022. |
| 3 | LIN W C, TSAI C F, HU Y H, et al. Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409/410, 17- 26. |
| 4 |
FAN W W, LEE C H. Classification of imbalanced data using deep learning with adding noise[J]. Journal of Sensors, 2021, 2021 (1): 1735386.
doi: 10.1155/2021/1735386 |
| 5 |
KORKAMAZ S. Deep learning-based imbalanced data classification for drug discovery[J]. Journal of Chemical Information and Modeling, 2020, 60 (9): 4180- 4190.
doi: 10.1021/acs.jcim.9b01162 |
| 6 |
HASIB K M, TOWHID N A, ISLAM M R. HSDLM: a hybrid sampling with deep learning method for imbalanced data classification[J]. International Journal of Cloud Applications and Computing, 2021, 11 (4): 1- 13.
doi: 10.4018/IJCAC.2021100101 |
| 7 |
DOUZAS G, BACAO F. Effective data generation for imbalanced learning using conditional generative adversarial networks[J]. Expert Systems with Applications, 2018, 91, 464- 471.
doi: 10.1016/j.eswa.2017.09.030 |
| 8 | 王劲波, 刘礼. 基于Bagging集成的高维不平衡数据特征选择方法[J]. 统计与决策, 2024, 40 (22): 53- 58. |
| WANG J B, LIU L. A feature selection method for high-dimensional imbalanced data based on bagging ensemble[J]. Statistics and Decision Making, 2024, 40 (22): 53- 58. | |
| 9 | 李爱华, 刘婉昕, 陈思帆, 等. 面向不平衡数据的SMOTE-BO-XGBoost集成信用评分模型研究[EB/OL]. [2025-04-11]. https://doi.org/10.16381/j.cnki. issn1003-207x.2023.0635. |
| LI A H, LIU W X, CHEN S F, et al. Research on SMOTE-BO-XGBoost ensemble credit scoring model for imbalanced data[EB/OL]. [2025-04-11]. https://doi.org/10.16381/j.cnki.issn1003-207x.2023.0635. | |
| 10 | AGUSTIANTO K, DESTARIANTO P. Imbalance data handling using neighborhood cleaning rule (NCL) sampling method for precision student modeling[C]//Proc. of the International Conference on Computer Science, Information Technology, and Electrical Engineering, 2019: 86-89. |
| 11 |
NEKOOEIMEHR I, LAI-YUEN S K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets[J]. Expert Systems with Applications, 2016, 46, 405- 416.
doi: 10.1016/j.eswa.2015.10.031 |
| 12 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]//Proc. of the IEEE International Joint Conference on Neural Networks, 2008. |
| 13 |
FU Y F, DU Y H, CAO Z J, et al. A deep learning model for network intrusion detection with imbalanced data[J]. Electronics, 2022, 11 (6): 898.
doi: 10.3390/electronics11060898 |
| 14 |
HEMALATHA P, AMALANATHAN G M. FG-SMOTE: fuzzy-based Gaussian synthetic minority oversampling with deep belief networks classifier for skewed class distribution[J]. International Journal of Intelligent Computing and Cybernetics, 2021, 14 (2): 270- 287.
doi: 10.1108/IJICC-12-2020-0202 |
| 15 | BUNKHUMPORNPAT C, SINAPIROMSARAN K, LURSINSAP C. Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Proc. of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2009: 475−482. |
| 16 | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced datasets learning[C]//Proc. of the International Conference on Intelligent Computing, 2005: 878-887. |
| 17 |
CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16, 321- 357.
doi: 10.1613/jair.953 |
| 18 | PRACHUABSUPAKIJ W. CLUS: a new hybrid sampling classification for imbalanced data[C]//Proc. of the 12th International Joint Conference on Computer Science and Software Engineering, 2015: 281−286. |
| 19 |
陈丽萍, 王洪海, 何舒平. 一种基于数据分布的不平衡数据过采样方法[J]. 安徽大学学报(自然科学版), 2024, 48 (5): 26- 36.
doi: 10.3969/j.issn.1000-2162.2024.05.005 |
|
CHEN L P, WANG H H, HE S P. An oversampling method for imbalanced data based on data distribution[J]. Journal of Anhui University (Natural Science Edition), 2024, 48 (5): 26- 36.
doi: 10.3969/j.issn.1000-2162.2024.05.005 |
|
| 20 |
LIU Y X, LIU Y, YU B X B, et al. Noise-robust oversampling for imbalanced data classification[J]. Pattern Recognition, 2023, 133, 109008.
doi: 10.1016/j.patcog.2022.109008 |
| 21 | DING H W, SUN Y, WANG Z Y, et al. RGAN-EL: a GAN and ensemble learning-based hybrid approach for imbalanced data classification[J]. Information Processing & Management, 2023, 60 (2): 103235. |
| 22 | ALMARSHDI R, NASSEF L, FADEL E, et al. Hybrid deep learning based attack detection for imbalanced data classification[J]. Intelligent Automation & Soft Computing, 2023, 35 (1): 297- 320. |
| 23 | 张钊光, 蒋庆磊, 詹瑜滨, 等. 基于VAE-GAN数据增强算法的小样本滚动轴承故障分类方法[J]. 原子能科学技术, 2023, 57 (S1): 228- 237. |
| ZHANG Z G, JIANG Q L, ZHAN Y B, et al. VAE-GAN data enhancement networks-based model for rolling bearing few-shot fault classification[J]. Atomic Energy Science and Technology, 2023, 57 (S1): 228- 237. | |
| 24 |
MI J, WANG L F, LIU Y, et al. KDE-GAN: a multimodal medical image-fusion model based on knowledge distillation and explainable AI modules[J]. Computers in Biology and Medicine, 2022, 151, 106273.
doi: 10.1016/j.compbiomed.2022.106273 |
| 25 |
ALEX S A, NAYAHI J J V. Classification of imbalanced data using SMOTE and autoencoder based deep convolutional neural network[J]. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, 2023, 31 (3): 437- 469.
doi: 10.1142/S0218488523500228 |
| 26 |
QIAN Q L, SUN W H, WANG Z, et al. GIS partial discharge data enhancement method based on self attention mechanism VAE-GAN[J]. Global Energy Interconnection, 2023, 6 (5): 601- 613.
doi: 10.1016/j.gloei.2023.10.007 |
| 27 | 曾治霖, 瞿昊, 杜正春. 基于深度学习和生成对抗网络的发动机缸体表面缺陷检测方法[J]. 机械工程学报, 2025, 61 (2): 46- 55. |
| ZENG Z L, QU H, DU Z C. An engine cylinder surface defect detection algorithm based on the YOLOv5 network and Pix2Pix model[J]. Journal of Mechanical Engineering, 2025, 61 (2): 46- 55. | |
| 28 |
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. GAN(Generative Adversarial Nets)[J]. Communications of the ACM, 2020, 63 (11): 139- 144.
doi: 10.1145/3422622 |
| 29 |
SHARMA A, SINGH P K, CHANDRA R. SMOTified-GAN for class imbalanced pattern classification problems[J]. IEEE Access, 2022, 10, 30655- 30665.
doi: 10.1109/ACCESS.2022.3158977 |
| 30 | MULLICK S S, DATTA S, DAS S. Generative adversarial minority oversampling[C]// Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 1695-1704. |
| 31 |
ZHAI J H, QI J X, SHEN C. Binary imbalanced data classification based on diversity oversampling by generative models[J]. Information Sciences, 2022, 585, 313- 343.
doi: 10.1016/j.ins.2021.11.058 |
| 32 |
ZAREAPOOR M, SHAMSOLMOALI P, YANG J. Oversampling adversarial network for class-imbalanced fault diagnosis[J]. Mechanical Systems and Signal Processing, 2021, 149, 107175.
doi: 10.1016/j.ymssp.2020.107175 |
| [1] | Wenzhao YU, Jingchao QIAO, Zhe DU, Zhukai XING, Xinyuan WAN. Multi-USV cooperative task planning based on clustering optimization algorithm [J]. Systems Engineering and Electronics, 2025, 47(11): 3708-3720. |
| [2] | Zhiqiang JIAO, Kan YI, Jieyong ZHANG, Peiyang YAO. C4ISR state monitoring method based on SVM incremental learning of imbalanced data [J]. Systems Engineering and Electronics, 2024, 46(3): 992-1003. |
| [3] | Jiazheng PEI, Yong HUANG, Baoxin CHEN, Jian GUAN, Xiaolong CHEN. Multi-waveform adaptive pulse compression for range sampling mismatch [J]. Systems Engineering and Electronics, 2023, 45(7): 2031-2042. |
| [4] | Pengyu CAO, Chengzhi YANG, Limeng SHI, Hongchao WU. Unknown radar signal processing based on PSO-DBSCAN and SCGAN [J]. Systems Engineering and Electronics, 2022, 44(4): 1158-1165. |
| [5] | Ruifeng LI, Aiqiang XU, Weichao SUN, Yangyong WU. Preprocessing method based on sample resampling for imbalanced data of electronic circuits [J]. Systems Engineering and Electronics, 2020, 42(11): 2654-2660. |
| [6] | SUN Yufei, MA Liangli, LV Minhui, QIN Jiwei. Improved co-training based ontology matching method [J]. Systems Engineering and Electronics, 2017, 39(2): 459-464. |
| [7] | ZHAI Yun,YANG Bing-ru,QU Wu,SUI Hai-feng. Study on source of classification in imbalanced datasets based on new ensemble classifier [J]. Journal of Systems Engineering and Electronics, 2011, 33(1): 196-0201. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||