基于NMF与CNN联合优化的声学场景分类

doi:10.12305/j.issn.1001-506X.2022.05.01

Abstract

Abstract:

To solve the problem of feature representation of complex acoustic environment in acoustic scene classification task, an optimization algorithm of joint training feature extraction and classification model is proposed. In order to learn more discriminative and supervised features, non-negative matrix factorization is combined with convolution neural network training, and the loss value of network is used to realize feature extraction and network parameters updating. The logarithmic spectrogram is extracted from the TUT2017 dataset as the basic feature. And the deep convolutional neural network is built for experimental verification.The simulation results show that the recognition accuracy of the proposed algorithm is improved by 3.9% compared with that before optimization, and is superior to the other two commonly used acoustic features, which proves that the algorithm can effectively improve the overall classification effect.

Key words: feature learning, non-negative matrix factorization, convolutional neural network, joint optimization

CLC Number:

Juan WEI, Huangwei YANG, Fangli NING. Acoustic scene classification based on joint optimization of NMF and CNN[J]. Systems Engineering and Electronics, 2022, 44(5): 1433-1438.

Figures/Tables 5

Fig.1

Table 1

Table 2

Table 3

Table 4

References 30

1	PASEDDULA C , GANGASHETTY S V . Late fusion framework for acoustic scene classification using LPCC, SCMC, and log-Mel band energies with deep neural networks[J]. Applied Acoustics, 2021, 172, 107568. doi: 10.1016/j.apacoust.2020.107568
2	刘立芳, 杨海霞, 齐小刚. 基于线性判别分析的时频域特征提取算法[J]. 系统工程与电子技术, 2019, 41 (10): 2184- 2190. doi: 10.3969/j.issn.1001-506X.2019.10.05
	LIU L F , YANG H X , QI X G . Time-frequency domain feature extraction algorithm based on linear discriminant analysis[J]. Systems Engineering and Electronics, 2019, 41 (10): 2184- 2190. doi: 10.3969/j.issn.1001-506X.2019.10.05
3	MCDONNELL M D, GAO W. Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2020.
4	SONG H W, HAN J Q, DENG S W, et al. Acoustic scene classification by implicitly identifying distinct sound events[C]//Proc. of the Interspeech, 2019: 3860-3864.
5	WANG M, WANG R, ZHANG X L, et al. Hybrid constant-Q transform based CNN ensemble for acoustic scene classification[C]//Proc. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019: 1511-1516.
6	BISOT V , SERIZEL R , ESSID S , et al. Feature learning with matrix factorization applied to acoustic scene classification[J]. IEEE/ACM Trans.on Audio Speech & Language Processing, 2017, 25 (6): 1216- 1229.
7	SPRECHMANN P, BRONSTEIN A M, SAPIRO G. Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement[C]//Proc. of the Hands-free Speech Communication and Microphone Arrays, 2014: 11-15.
8	PODWINSKA Z, SOBIERAJ I, FAZENDA B M, et al. Acoustic event detection from weakly labeled data using auditory salience[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2019.
9	姚琨, 杨吉斌, 张雄伟, 等. 基于多分辨率时频特征融合的声学场景分类[J]. 声学技术, 2020, 39 (4): 108- 114.
	YAO K , YANG J B , ZHANG X W , et al. Acoustic scene classification based on multi-resolution time-frequency feature fusion[J]. Acoustic Technology, 2020, 39 (4): 108- 114.
10	LEE S , PANG H S . Feature extraction based on the non-negative matrix factorization of convolutional neural networks for monitoring domestic activity with acoustic signals[J]. IEEE Access, 2020, 8, 122384- 122395. doi: 10.1109/ACCESS.2020.3007199
11	BISOT V, SERIZEL R, ESSID S, et al. Supervised non-negative matrix factorization for acoustic scene classification[C]//Proc. of the IEEE International Evaluation Campaign on Detection and Classification of Acousitc Scenes and Events, 2016.
12	SALAMON J , BELLOJ P . Deep convolutional neural networks and data augmentation for environmental sound classification[J]. IEEE Signal Processing Letters, 2017, 24 (3): 279- 283. doi: 10.1109/LSP.2017.2657381
13	杨浩聪, 史创, 李会勇. 保留立体声相位信息的声音场景分类系统[J]. 信号处理, 2020, 36 (6): 871- 878.
	YANG H C , SHI C , LI H Y . Sound scene classification system preserving stereo phase information[J]. Signal Processing, 2020, 36 (6): 871- 878.
14	BODDAPATI V , PETEF A , RASMUSSON J , et al. Classifying environmental sounds using image recognition networks[J]. Procedia Computer Science, 2017, 112, 2048- 2056. doi: 10.1016/j.procs.2017.08.250
15	DOAN T, NGUYEN H, NGO D T, et al. Acoustic scene classification using adeeper training method for convolution neural network[C]//Proc. of the International Symposium on Electrical and Electronics Engineering, 2019: 63-67.
16	曹毅, 黄子龙, 张威, 等. N-DenseNet的城市声音事件分类模型[J]. 西安电子科技大学学报, 2019, 46 (6): 9- 16.9-16, 94
	CAO Y , HUANG Z L , ZHANG W , et al. Urban sound event classification model based on N-DenseNet[J]. Journal of Xidian University, 2019, 46 (6): 9- 16.9-16, 94
17	李伟, 李硕. 理解数字声音——基于一般音频/环境声的计算机听觉综述[J]. 复旦学报(自然科学版), 2019, 58 (3): 269- 313.
	LI W , LI S . Understanding digital sound: a review of computer hearing based on general audio/ambient sound[J]. Journal of Fudan University (Natural Science Edition), 2019, 58 (3): 269- 313.
18	KOMATSU T, SENDA Y, KONDO R. Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2016: 2259-2263.
19	GIANNOULIS P, POTAMIANOS G, MARAGOS P. Multi-channel non-negative matrix factorization for overlapped acoustic event detection[C]//Proc. of the 26th European Signal Processing Conference, 2018: 857-861.
20	MAIRAL J , BACH F , PONCE J . Task-driven dictionary learning[J]. IEEE Trans.on Pattern Analysis & Machine Intelligence, 2012, 34 (4): 791- 804.
21	RAKOTOMAMONJY A . Supervised representation learning for audio scene classification[J]. IEEE/ACM Trans.on Audio, Speech, and Language Processing, 2017, 25 (6): 1253- 1265. doi: 10.1109/TASLP.2017.2690561
22	PHAM L, MCLOUGHLIN I, PHAN H, et al. A robust framework for acoustic scene classification[C]//Proc. of the Interspeech, 2019: 3634-3638.
23	LI X Y, CHEBIYYAM V, KIRCHHOFF K. Multi-stream network with temporal attention for environmental sound classification[C]//Proc. of the Interspeech, 2019: 3604-3608.
24	KONG Q, CAO Y, IQBAL T, et al. Cross-task learning for audio tagging, sound event detection and spatial localization: Dcase 2019 baseline systems[EB/OL]. [2021-05-28]. http://arxiv.org/abs/1904.03476v3.
25	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale imagerecognition[EB/OL]. [2021-05-28]. http://arxiv.org/abs/1409.1556v6.
26	MCDONNELL M D. Training wide residual networks for deployment using a single bit for each weight[EB/OL]. [2021-05-28]. http://arxiv.org/abs/1802.08530.
27	MESAROS A, HEITTOLA T, DIMENT A, et al. DCASE 2017 Challenge setup: tasks, datasets and baseline system[C]//Proc. of the Detection and Classification of Acoustic Scenes and Events Workshop, 2017: 85-92.
28	WANG H L, ZOU Y X, CHONG D D. Acoustic scene classification with spectrogram processing strategies[C]//Pro. of the Detection and Classification of Acoustic Scenes and Events Workshop, 2020.
29	WANG C, SANTOSO A, WANG J. Acoustic scene classification using self-determination convolutional neural network[C]//Proc. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017: 19-22.
30	DANG A, VUT H, WANG J. Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction[C]//Proc. of the IEEE International Conference on Consumer Electronics, 2018.

名称	CNN8	CNN10	CNN12
输入层	256×108×1	256×108×1	256×108×1
批归一化层, 卷积层	BN, 3×3@64	BN, 3×3@64	BN, 3×3@64
批归一化层, 激活层, 卷积层	BN, ReLu, 3×3@64	BN, ReLu, 3×3@64	BN, ReLu, 3×3@64
池化层	4×2AvgPooling	4×2AvgPooling	4×2AvgPooling
批归一化层, 激活层卷积层	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@128\end{array} \right) \times 2$	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@128\end{array} \right) \times 2$	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@128\end{array} \right) \times 2$
池化层	4×2AvgPooling	4×2AvgPooling	4×2AvgPooling
批归一化层, 激活层卷积层	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@256\end{array} \right) \times 2$	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@256\end{array} \right) \times 2$	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@256\end{array} \right) \times 2$
池化层	—	2×1AvgPooling	2×1AvgPooling
批归一化层, 激活层卷积层	— —	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@512\end{array} \right) \times 2$	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@512\end{array} \right) \times 2$
池化层	—	—	2×1AvgPooling
批归一化层, 激活层卷积层	— —	— —	$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@1024\end{array} \right) \times 2$
批归一化层, 激活层, 卷积层	BN, ReLu, 1×1@1024
批归一化层, 卷积层, 全局池化层	BN, 1×1@15, Global AvgPooling
全连接层, 输出层	Dense(15), Softmax

SNMF	Fold1	Fold2	Fold3	Fold4	平均
K=64	0.781	0.795	0.771	0.824	0.793
K=128	0.805	0.837	0.793	0.854	0.822
K=256	0.827	0.839	0.814	0.863	0.836
K=512	0.818	0.831	0.807	0.855	0.828

模型	Fold1	Fold2	Fold3	Fold4	平均
CNN8	0.808	0.807	0.778	0.806	0.800
CNN10	0.827	0.839	0.814	0.863	0.836
CNN12	0.811	0.815	0.788	0.861	0.819

场景	基线系统	NMF	TNMF	SNMF	CQT	LM
沙滩	0.753	0.751	0.747	0.835	0.895	0.887
公交	0.718	0.893	0.813	0.928	0.930	0.922
饭馆	0.577	0.618	0.544	0.793	0.611	0.628
汽车	0.971	0.962	0.945	0.942	0.978	0.941
市中心	0.907	0.943	0.867	0.893	0.778	0.920
林荫道	0.795	0.769	0.892	0.925	0.881	0.855
杂货店	0.587	0.801	0.828	0.920	0.883	0.929
家	0.686	0.702	0.662	0.792	0.820	0.663
图书馆	0.571	0.725	0.691	0.658	0.783	0.685
地铁站	0.917	0.742	0.826	0.815	0.852	0.747
办公室	0.998	0.965	0.950	0.941	0.875	0.942
公园	0.702	0.695	0.712	0.705	0.545	0.723
居民区	0.641	0.874	0.774	0.738	0.691	0.764
火车	0.580	0.657	0.768	0.802	0.685	0.712
电车	0.817	0.852	0.847	0.851	0.864	0.876
总体	0.748	0.797	0.791	0.836	0.805	0.813
预测时间/s	-	2.6	1.1	2.7	3.1	3.3

[1]	Caiyun WANG, Yida WU, Jianing WANG, Lu MA, Huanyue ZHAO. SAR image target recognition based on combinatorial optimization convolutional neural network [J]. Systems Engineering and Electronics, 2022, 44(8): 2483-2487.
[2]	Dong CHEN, Yanwei JU. Ship object detection SAR images based on semantic segmentation [J]. Systems Engineering and Electronics, 2022, 44(4): 1195-1201.
[3]	Jingming SUN, Shengkang YU, Jun SUN. Pose sensitivity analysis of HRRP recognition based on deep learning [J]. Systems Engineering and Electronics, 2022, 44(3): 802-807.
[4]	Jingfeng LI, Yunxiang CHEN, Huachun XIANG, Jian WANG. Joint optimization of condition-based maintenance and spare part inventory for multi-component system considering random shock effect [J]. Systems Engineering and Electronics, 2022, 44(3): 875-883.
[5]	Hengyan LIU, Limin ZHANG, Wenjun YAN, Zhaogen ZHONG, Qing LING, Xiaojun LIANG. LDPC decoding based on WBP-CNN algorithm [J]. Systems Engineering and Electronics, 2022, 44(3): 1030-1035.
[6]	Kai SHAO, Miaomiao ZHU, Guangyu WANG. Modulation recognition method based on generative adversarial andconvolutional neural network [J]. Systems Engineering and Electronics, 2022, 44(3): 1036-1043.
[7]	Xi ZHANG, Zhengmeng JIN, Yaqin JIANG. Total variation algorithm with depth image priors for image colorization [J]. Systems Engineering and Electronics, 2022, 44(2): 385-393.
[8]	Qinzhe LYU, Yinghui QUAN, Minghui SHA, Shuxian DONG, Mengdao XING. Ensemble deep learning-based intelligent classification of active jamming [J]. Systems Engineering and Electronics, 2022, 44(12): 3595-3602.
[9]	Yiqiang TANG, Xiaopeng YANG, Shengming ZHU. Low-orbit satellite channel prediction algorithm based on the hybrid CNN-BiLSTM using attention mechanism [J]. Systems Engineering and Electronics, 2022, 44(12): 3863-3870.
[10]	Yali CAO, Meimei LI, Shihan QU, Xin SONG. Waveform design of cognitive radar based on joint criteria [J]. Systems Engineering and Electronics, 2022, 44(11): 3364-3370.
[11]	Yongxing GAO, Xudong WANG, Ling WANG, Daiyin ZHU, Jun GUO, Fanwang MENG. Weather signal detection for dual polarization weather radar based on RCNN [J]. Systems Engineering and Electronics, 2022, 44(11): 3380-3387.
[12]	Yonggang LI, Weigang ZHU, Qiongnan HUANG, Yuntao LI, Yonghua HE. Near-shore ship target detection with SAR images in complex background [J]. Systems Engineering and Electronics, 2022, 44(10): 3096-3103.
[13]	Bo DAN, Zhequan FU, Shan GAO, Tao JIAN. Full-polarization high resolution range profile recognition technology for sea surface target based on convolutional neural network [J]. Systems Engineering and Electronics, 2022, 44(1): 108-116.
[14]	Ziyan LIU, Shanshan MA, Jing LIANG, Mingcheng ZHU, Lei YUAN. Attention mechanism based CNN channel estimation algorithm in millimeter-wave massive MIMO system [J]. Systems Engineering and Electronics, 2022, 44(1): 307-312.
[15]	Caiyun WANG, Yangyu LI, Xiaofei LI, Jianing WANG, Wenyi WEI. Aerial image super-resolution restruction based on sparsity and deep learning [J]. Systems Engineering and Electronics, 2021, 43(8): 2045-2050.

Acoustic scene classification based on joint optimization of NMF and CNN

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 5

References 30

Related Articles 15

Recommended Articles

Metrics

Comments