轻量化的增量式集成学习算法设计

doi:10.12305/j.issn.1001-506X.2021.04.01

摘要/Abstract

摘要：

常规的分类与回归树算法(classification and regression tree, CART)只能通过重新训练来增加对新类别的认知, 导致样本类别数量较多时训练成本大幅增加。针对这一问题, 提出一种轻量化的增量式集成学习算法: 当新的类别进入到训练集中, 只需在原有集成学习算法中添加具有开集识别能力的CART基分类器, 就可以实现对新类别样本的分类, 而不需要重新训练, 从而降低计算复杂度, 简化学习过程。以辐射源分类为背景的仿真实验表明, 该算法在信噪比大于等于-4 dB的环境中, 可以保持90%以上的分类准确率; 在类别数量较多的情况下, 相比常规CART, 该算法可以大幅度降低新增分类类别所需的训练成本。

关键词: 分类与回归树, 计算复杂度, 开集识别, 集成学习, 辐射源分类

Abstract:

Conventional classification and regression tree (CART) can only increase the cognition of new categories by retraining the entire model, causing a great increase in training costs when the number of sample categories is large. To solve this problem, a lightweight incremental ensemble learning algorithm is proposed. When new categories enter the training set, we can classify those new categories by only adding CART base classifiers with the ability of open set recognition into the original ensemble learning algorithm. No retraining is required, so the computational complexity is reduced and the learning process is simplified. In the simulation experiments with the background of emitter classification, the results show that this algorithm can maintain the classification accuracy of more than 90% when the signal noise ratio equal to or larger than -4 dB. In the case of a large number of categories to be classified, this algorithm can significantly reduce the training cost compared with conventional CART.

Key words: classification and regression tree (CART), computational complexity, open set recognition, ensemble learning, emitter classification

中图分类号:

TP301.6

丁嘉辉, 汤建龙, 于正洋. 轻量化的增量式集成学习算法设计[J]. 系统工程与电子技术, 2021, 43(4): 861-867.

Jiahui DING, Jianlong TANG, Zhengyang YU. Design of lightweight incremental ensemble learning algorithm[J]. Systems Engineering and Electronics, 2021, 43(4): 861-867.

图/表 11

图1

图2

图3

图4

表1

图5

图6

图7

表2

图8

图9

参考文献 26

1	BREIMAN L , FRIEDMAN J , OLSHEN R , et al. Classification and regression trees[M]. New York: Chapman and Hall, 1984: 130- 173.
2	张亮, 宁芊. CART决策树的两种改进及应用[J]. 计算机工程与设计, 2015, 36 (5): 1209- 1213.
	ZHANG L , NING Q . Two improvements on CART decision tree and its application[J]. Computer Engineering and Design, 2015, 36 (5): 1209- 1213.
3	LIN S Q , LUO W . A new multilevel CART algorithm for multilevel data with binary outcomes[J]. Multivariate Behavioral Research, 2019, 54 (4): 578- 592. doi: 10.1080/00273171.2018.1552555
4	JAWORSKI M , DUDA P , PIETRUCZUK L . The CART decision tree for mining data streams[J]. Information Sciences: An International Journal, 2014, 266, 1- 15. doi: 10.1016/j.ins.2013.12.060
5	POLAKA I , TOM I , BORISOV A . Decision tree classifiers in bioinfor-matics[J]. Scientific Journal of Riga Technical University Computer Sciences, 2010, 42 (1): 118- 123. doi: 10.2478/v10143-010-0052-4
6	SOHN S Y , KIM J W . Decision tree-based technology credit scoring for start-up firms: Korean case[J]. Expert Systems with Applications, 2011, 39 (4): 4007- 4012.
7	GALINDO J , TAMAYO P . Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications[J]. Computational Economics, 2000, 15 (1/2): 107- 143. doi: 10.1023/A:1008699112516
8	DENG H X , DIAO Y F , WU W , et al. A high-speed D-CART online fault diagnosis algorithm for rotor systems[J]. Applied Intelligence: the International Journal of Research on Intelligent Systems for Real Life Complex Problems, 2020, 50 (2): 29- 41. doi: 10.1007/s10489-019-01516-2
9	HO T K. Random decision forest[C]//Proc. of the 3rd International Conference on Document Analysis and Recognition, 1995, 1: 278-282.
10	BREIMAN L . Random forests[J]. Machine Learning, 2001, 45 (1): 5- 32. doi: 10.1023/A:1010933404324
11	李兵, 韩睿, 何怡刚, 等. 改进随机森林算法在电机轴承故障诊断中的应用[J]. 中国电机工程学报, 2020, 40 (4): 1310- 1319, 1422.
	LI B , HAN R , HE Y G , et al. Applications of the improved random forest algorithm in fault diagnosis of motor bearings[J]. Proceedings of the CSEE, 2020, 40 (4): 1310- 1319, 1422.
12	ZHOU X Y , LU P , ZHENG Z J , et al. Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree[J]. Reliability Engineering and System Safety, 2020, 200, 106931. doi: 10.1016/j.ress.2020.106931
13	ZHAO Y , SHI C , YANG T , et al. Low-complexity and joint modulation format identification and OSNR estimation using random forest for flexible coherent receivers[J]. Optics Communications, 2020, 457, 124698. doi: 10.1016/j.optcom.2019.124698
14	LITTLESTONE N. From on-line to batch learning[C]//Proc. of the Workshop on Computational Learning Theory, 1989: 269-284.
15	DING S F , ZHANG N , ZHANG J , et al. Unsupervised extreme learning machine with representational features[J]. International Journal of Machine Learning and Cybernetics, 2017, 8 (2): 587- 595. doi: 10.1007/s13042-015-0351-8
16	ZHANG J , DING S F , ZHANG N , et al. Incremental extreme learning machine based on deep feature embedded[J]. International Journal of Machine Learning and Cybernetics, 2016, 7 (1): 111- 120. doi: 10.1007/s13042-015-0419-5
17	王文哲, 吴华, 索中英, 等. 粗糙K-means和AdaBoost结合的雷达辐射源快速识别算法[J]. 空军工程大学学报(自然科学版), 2016, 17 (1): 51- 55.
	WANG W Z , WU H , SUO Z Y , et al. A fast radar emitter recognition algorithm based on rough k-means combined with Adaboost[J]. Journal of Air Force Engineering University (Natural Science Edition), 2016, 17 (1): 51- 55.
18	XU X, XIONG Z H, WANG W. Incremental discriminant analysis on interval-valued parameters for emitter identification[EB/OL]. [2020-08-01]. http://dx.doi.org/10.1155/2015/210729.
19	方章闻, 张金艺, 李科, 等. 小样本条件下的通信辐射源半监督特征提取[J]. 系统工程与电子技术, 2020, 42 (10): 239- 247.
	FANG Z W , ZHANG J Y , LI K , et al. Semi-supervised feature extraction of communication emitter under small sample condition[J]. Systems Engineering and Electronics, 2020, 42 (10): 239- 247.
20	黄颖坤, 金炜东, 余志斌, 等. 基于深度学习和集成学习的辐射源信号识别[J]. 系统工程与电子技术, 2018, 40 (11): 2420- 2425. doi: 10.3969/j.issn.1001-506X.2018.11.05
	HUANG Y K , JIN W D , YU Z B , et al. Radar emitter signal recognition based on deep learning and ensemble learning[J]. Systems Engineering and Electronics, 2018, 40 (11): 2420- 2425. doi: 10.3969/j.issn.1001-506X.2018.11.05
21	POLIKAR R, BYORICK J, KRAUSE S. Learn++: a classifier independent incremental learning algorithm for supervised neural networks[C]//Proc. of the International Joint Conference on Neural Networks, 2002: 1742-1747.
22	DITZLER G , POLIKAR R , ROSEN G . Bootstrap based Neyman-Pearson test for identifying variable importance[J]. IEEE Trans. on Neural Networks and Learning Systems, 2015, (26): 880- 886.
23	GOMES H M , BARDDAL J P , ENEMBRECK F , et al. A survey on ensemble learning for data stream classification[J]. ACM Computing Surveys, 2017, 50 (2): 1- 36.
24	MUHLBAIER M , TOPALIS A , POLIKAR R . Learn++. MT: a new approach to incremental learning[J]. Multiple Classifier Systems, 2004, 3077, 52- 61.
25	MUHLBAIER M , TOPALIS A , POLIKAR R . Learn++. NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes[J]. IEEE Trans. on Neural Networks, 2009, 20 (1): 152- 168. doi: 10.1109/TNN.2008.2008326
26	GROSSBERG S . Nonlinear neural networks: principles, mechanisms, and architectures[J]. IEEE Trans. on Neural Networks, 1988, 1 (1): 17- 61. doi: 10.1016/0893-6080(88)90021-4

特征参数	信噪比/dB
特征参数	2	0	-2	-4
脉宽/μs	0.010	0.015	0.025	0.040
中心频率/MHz	0.122	0.155	0.191	0.235
脉首频率/MHz	0.311	0.393	0.477	0.588
脉尾频率/MHz	0.307	0.381	0.462	0.568
带宽/MHz	0.381	0.447	0.528	0.603
调频斜率/(MHz/μs)	0.324	0.367	0.423	0.476

信噪比/dB	开集识别改进前	开集识别改进后
信噪比/dB	集合内分类准确率/%	集合内分类准确率/%	集合外异常检测率/%
2	99.81	99.81	99.52
0	99.43	99.43	99.49
-2	98.20	98.11	99.47
-4	95.38	94.85	99.47

[1]	韩啸, 陈世文, 陈蒙, 杨锦程. 基于互易点学习的LPI信号开集识别[J]. 系统工程与电子技术, 2022, 44(9): 2752-2759.
[2]	吕勤哲, 全英汇, 沙明辉, 董淑仙, 邢孟道. 基于集成深度学习的有源干扰智能分类[J]. 系统工程与电子技术, 2022, 44(12): 3595-3602.
[3]	马骏, 杨镜宇, 邹立岩. 基于Stacking集成元模型的作战体系能力图谱生成方法[J]. 系统工程与电子技术, 2022, 44(1): 154-163.
[4]	史蕴豪, 许华, 郑万泽, 刘英辉. 基于集成学习与特征降维的小样本调制识别方法[J]. 系统工程与电子技术, 2021, 43(4): 1099-1109.
[5]	孙艺聪, 田润澜, 董会旭, 孙亮. 基于SAMME+ResNet的多相码信号识别方法[J]. 系统工程与电子技术, 2020, 42(10): 2239-2245.
[6]	郝云飞, 刘章孟, 郭福成, 张敏. 基于生成对抗网络的信号调制方式的开集识别[J]. 系统工程与电子技术, 2019, 41(11): 2619-2624.
[7]	周玉臣, 方可, 马萍, 杨明. 基于集成学习的复杂仿真模型验证方法[J]. 系统工程与电子技术, 2018, 40(9): 2124-2130.
[8]	黄颖坤, 金炜东, 余志斌, 吴昀璞. 基于深度学习和集成学习的辐射源信号识别[J]. 系统工程与电子技术, 2018, 40(11): 2420-.
[9]	刘扬, 付征叶, 郑逢斌. 基于神经认知计算模型的高分辨率遥感图像场景分类[J]. 系统工程与电子技术, 2015, 37(11): 2623-2633.
[10]	张玉玺，王晓丹，姚旭，宋亚飞. 基于H/A/α分解的全极化HRRP目标识别方法[J]. 系统工程与电子技术, 2013, 35(12): 2501-2506.
[11]	焦亚萌，黄建国，侯云山. 基于蚁群算法的最大似然方位估计快速算法[J]. Journal of Systems Engineering and Electronics, 2011, 33(8): 1718-1721.
[12]	和洁, 冯大政, 吕晖, 向聪. 机载雷达三维空时自适应相关域降维算法[J]. Journal of Systems Engineering and Electronics, 2011, 33(2): 286-289.
[13]	冯道旺, 李腾, 黄知涛. 平方根二阶EKF及其在目标运动分析中的应用[J]. Journal of Systems Engineering and Electronics, 2009, 31(9): 2101-2105.
[14]	喻火根, 朱立东. 一种新的跳频宽带瑞利衰落信道模型[J]. Journal of Systems Engineering and Electronics, 2009, 31(6): 1295-1298.
[15]	肖秦琨, 高晓光, 高嵩, 王海芸. DBN结构学习度量分解性能分析[J]. Journal of Systems Engineering and Electronics, 2009, 31(4): 938-946.