Systems Engineering and Electronics ›› 2022, Vol. 44 ›› Issue (5): 1433-1438.doi: 10.12305/j.issn.1001-506X.2022.05.01
• Electronic Technology • Next Articles
Juan WEI1,*, Huangwei YANG1, Fangli NING2
Received:
2021-05-28
Online:
2022-05-01
Published:
2022-05-16
Contact:
Juan WEI
CLC Number:
Juan WEI, Huangwei YANG, Fangli NING. Acoustic scene classification based on joint optimization of NMF and CNN[J]. Systems Engineering and Electronics, 2022, 44(5): 1433-1438.
Table 1
CNN model structure"
名称 | CNN8 | CNN10 | CNN12 |
输入层 | 256×108×1 | 256×108×1 | 256×108×1 |
批归一化层, 卷积层 | BN, 3×3@64 | BN, 3×3@64 | BN, 3×3@64 |
批归一化层, 激活层, 卷积层 | BN, ReLu, 3×3@64 | BN, ReLu, 3×3@64 | BN, ReLu, 3×3@64 |
池化层 | 4×2AvgPooling | 4×2AvgPooling | 4×2AvgPooling |
批归一化层, 激活层 卷积层 | |||
池化层 | 4×2AvgPooling | 4×2AvgPooling | 4×2AvgPooling |
批归一化层, 激活层 卷积层 | |||
池化层 | — | 2×1AvgPooling | 2×1AvgPooling |
批归一化层, 激活层 卷积层 | — — | ||
池化层 | — | — | 2×1AvgPooling |
批归一化层, 激活层 卷积层 | — — | — — | |
批归一化层, 激活层, 卷积层 | BN, ReLu, 1×1@1024 | ||
批归一化层, 卷积层, 全局池化层 | BN, 1×1@15, Global AvgPooling | ||
全连接层, 输出层 | Dense(15), Softmax |
Table 4
Comparison of recognition accuracy of different features"
场景 | 基线系统 | NMF | TNMF | SNMF | CQT | LM |
沙滩 | 0.753 | 0.751 | 0.747 | 0.835 | 0.895 | 0.887 |
公交 | 0.718 | 0.893 | 0.813 | 0.928 | 0.930 | 0.922 |
饭馆 | 0.577 | 0.618 | 0.544 | 0.793 | 0.611 | 0.628 |
汽车 | 0.971 | 0.962 | 0.945 | 0.942 | 0.978 | 0.941 |
市中心 | 0.907 | 0.943 | 0.867 | 0.893 | 0.778 | 0.920 |
林荫道 | 0.795 | 0.769 | 0.892 | 0.925 | 0.881 | 0.855 |
杂货店 | 0.587 | 0.801 | 0.828 | 0.920 | 0.883 | 0.929 |
家 | 0.686 | 0.702 | 0.662 | 0.792 | 0.820 | 0.663 |
图书馆 | 0.571 | 0.725 | 0.691 | 0.658 | 0.783 | 0.685 |
地铁站 | 0.917 | 0.742 | 0.826 | 0.815 | 0.852 | 0.747 |
办公室 | 0.998 | 0.965 | 0.950 | 0.941 | 0.875 | 0.942 |
公园 | 0.702 | 0.695 | 0.712 | 0.705 | 0.545 | 0.723 |
居民区 | 0.641 | 0.874 | 0.774 | 0.738 | 0.691 | 0.764 |
火车 | 0.580 | 0.657 | 0.768 | 0.802 | 0.685 | 0.712 |
电车 | 0.817 | 0.852 | 0.847 | 0.851 | 0.864 | 0.876 |
总体 | 0.748 | 0.797 | 0.791 | 0.836 | 0.805 | 0.813 |
预测时间/s | - | 2.6 | 1.1 | 2.7 | 3.1 | 3.3 |
1 |
PASEDDULA C , GANGASHETTY S V . Late fusion framework for acoustic scene classification using LPCC, SCMC, and log-Mel band energies with deep neural networks[J]. Applied Acoustics, 2021, 172, 107568.
doi: 10.1016/j.apacoust.2020.107568 |
2 |
刘立芳, 杨海霞, 齐小刚. 基于线性判别分析的时频域特征提取算法[J]. 系统工程与电子技术, 2019, 41 (10): 2184- 2190.
doi: 10.3969/j.issn.1001-506X.2019.10.05 |
LIU L F , YANG H X , QI X G . Time-frequency domain feature extraction algorithm based on linear discriminant analysis[J]. Systems Engineering and Electronics, 2019, 41 (10): 2184- 2190.
doi: 10.3969/j.issn.1001-506X.2019.10.05 |
|
3 | MCDONNELL M D, GAO W. Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2020. |
4 | SONG H W, HAN J Q, DENG S W, et al. Acoustic scene classification by implicitly identifying distinct sound events[C]//Proc. of the Interspeech, 2019: 3860-3864. |
5 | WANG M, WANG R, ZHANG X L, et al. Hybrid constant-Q transform based CNN ensemble for acoustic scene classification[C]//Proc. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019: 1511-1516. |
6 | BISOT V , SERIZEL R , ESSID S , et al. Feature learning with matrix factorization applied to acoustic scene classification[J]. IEEE/ACM Trans.on Audio Speech & Language Processing, 2017, 25 (6): 1216- 1229. |
7 | SPRECHMANN P, BRONSTEIN A M, SAPIRO G. Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement[C]//Proc. of the Hands-free Speech Communication and Microphone Arrays, 2014: 11-15. |
8 | PODWINSKA Z, SOBIERAJ I, FAZENDA B M, et al. Acoustic event detection from weakly labeled data using auditory salience[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2019. |
9 | 姚琨, 杨吉斌, 张雄伟, 等. 基于多分辨率时频特征融合的声学场景分类[J]. 声学技术, 2020, 39 (4): 108- 114. |
YAO K , YANG J B , ZHANG X W , et al. Acoustic scene classification based on multi-resolution time-frequency feature fusion[J]. Acoustic Technology, 2020, 39 (4): 108- 114. | |
10 |
LEE S , PANG H S . Feature extraction based on the non-negative matrix factorization of convolutional neural networks for monitoring domestic activity with acoustic signals[J]. IEEE Access, 2020, 8, 122384- 122395.
doi: 10.1109/ACCESS.2020.3007199 |
11 | BISOT V, SERIZEL R, ESSID S, et al. Supervised non-negative matrix factorization for acoustic scene classification[C]//Proc. of the IEEE International Evaluation Campaign on Detection and Classification of Acousitc Scenes and Events, 2016. |
12 |
SALAMON J , BELLOJ P . Deep convolutional neural networks and data augmentation for environmental sound classification[J]. IEEE Signal Processing Letters, 2017, 24 (3): 279- 283.
doi: 10.1109/LSP.2017.2657381 |
13 | 杨浩聪, 史创, 李会勇. 保留立体声相位信息的声音场景分类系统[J]. 信号处理, 2020, 36 (6): 871- 878. |
YANG H C , SHI C , LI H Y . Sound scene classification system preserving stereo phase information[J]. Signal Processing, 2020, 36 (6): 871- 878. | |
14 |
BODDAPATI V , PETEF A , RASMUSSON J , et al. Classifying environmental sounds using image recognition networks[J]. Procedia Computer Science, 2017, 112, 2048- 2056.
doi: 10.1016/j.procs.2017.08.250 |
15 | DOAN T, NGUYEN H, NGO D T, et al. Acoustic scene classification using adeeper training method for convolution neural network[C]//Proc. of the International Symposium on Electrical and Electronics Engineering, 2019: 63-67. |
16 | 曹毅, 黄子龙, 张威, 等. N-DenseNet的城市声音事件分类模型[J]. 西安电子科技大学学报, 2019, 46 (6): 9- 16.9-16, 94 |
CAO Y , HUANG Z L , ZHANG W , et al. Urban sound event classification model based on N-DenseNet[J]. Journal of Xidian University, 2019, 46 (6): 9- 16.9-16, 94 | |
17 | 李伟, 李硕. 理解数字声音——基于一般音频/环境声的计算机听觉综述[J]. 复旦学报(自然科学版), 2019, 58 (3): 269- 313. |
LI W , LI S . Understanding digital sound: a review of computer hearing based on general audio/ambient sound[J]. Journal of Fudan University (Natural Science Edition), 2019, 58 (3): 269- 313. | |
18 | KOMATSU T, SENDA Y, KONDO R. Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation[C]//Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2016: 2259-2263. |
19 | GIANNOULIS P, POTAMIANOS G, MARAGOS P. Multi-channel non-negative matrix factorization for overlapped acoustic event detection[C]//Proc. of the 26th European Signal Processing Conference, 2018: 857-861. |
20 | MAIRAL J , BACH F , PONCE J . Task-driven dictionary learning[J]. IEEE Trans.on Pattern Analysis & Machine Intelligence, 2012, 34 (4): 791- 804. |
21 |
RAKOTOMAMONJY A . Supervised representation learning for audio scene classification[J]. IEEE/ACM Trans.on Audio, Speech, and Language Processing, 2017, 25 (6): 1253- 1265.
doi: 10.1109/TASLP.2017.2690561 |
22 | PHAM L, MCLOUGHLIN I, PHAN H, et al. A robust framework for acoustic scene classification[C]//Proc. of the Interspeech, 2019: 3634-3638. |
23 | LI X Y, CHEBIYYAM V, KIRCHHOFF K. Multi-stream network with temporal attention for environmental sound classification[C]//Proc. of the Interspeech, 2019: 3604-3608. |
24 | KONG Q, CAO Y, IQBAL T, et al. Cross-task learning for audio tagging, sound event detection and spatial localization: Dcase 2019 baseline systems[EB/OL]. [2021-05-28]. http://arxiv.org/abs/1904.03476v3. |
25 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale imagerecognition[EB/OL]. [2021-05-28]. http://arxiv.org/abs/1409.1556v6. |
26 | MCDONNELL M D. Training wide residual networks for deployment using a single bit for each weight[EB/OL]. [2021-05-28]. http://arxiv.org/abs/1802.08530. |
27 | MESAROS A, HEITTOLA T, DIMENT A, et al. DCASE 2017 Challenge setup: tasks, datasets and baseline system[C]//Proc. of the Detection and Classification of Acoustic Scenes and Events Workshop, 2017: 85-92. |
28 | WANG H L, ZOU Y X, CHONG D D. Acoustic scene classification with spectrogram processing strategies[C]//Pro. of the Detection and Classification of Acoustic Scenes and Events Workshop, 2020. |
29 | WANG C, SANTOSO A, WANG J. Acoustic scene classification using self-determination convolutional neural network[C]//Proc. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017: 19-22. |
30 | DANG A, VUT H, WANG J. Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction[C]//Proc. of the IEEE International Conference on Consumer Electronics, 2018. |
[1] | Caiyun WANG, Yida WU, Jianing WANG, Lu MA, Huanyue ZHAO. SAR image target recognition based on combinatorial optimization convolutional neural network [J]. Systems Engineering and Electronics, 2022, 44(8): 2483-2487. |
[2] | Dong CHEN, Yanwei JU. Ship object detection SAR images based on semantic segmentation [J]. Systems Engineering and Electronics, 2022, 44(4): 1195-1201. |
[3] | Jingming SUN, Shengkang YU, Jun SUN. Pose sensitivity analysis of HRRP recognition based on deep learning [J]. Systems Engineering and Electronics, 2022, 44(3): 802-807. |
[4] | Jingfeng LI, Yunxiang CHEN, Huachun XIANG, Jian WANG. Joint optimization of condition-based maintenance and spare part inventory for multi-component system considering random shock effect [J]. Systems Engineering and Electronics, 2022, 44(3): 875-883. |
[5] | Hengyan LIU, Limin ZHANG, Wenjun YAN, Zhaogen ZHONG, Qing LING, Xiaojun LIANG. LDPC decoding based on WBP-CNN algorithm [J]. Systems Engineering and Electronics, 2022, 44(3): 1030-1035. |
[6] | Kai SHAO, Miaomiao ZHU, Guangyu WANG. Modulation recognition method based on generative adversarial andconvolutional neural network [J]. Systems Engineering and Electronics, 2022, 44(3): 1036-1043. |
[7] | Xi ZHANG, Zhengmeng JIN, Yaqin JIANG. Total variation algorithm with depth image priors for image colorization [J]. Systems Engineering and Electronics, 2022, 44(2): 385-393. |
[8] | Qinzhe LYU, Yinghui QUAN, Minghui SHA, Shuxian DONG, Mengdao XING. Ensemble deep learning-based intelligent classification of active jamming [J]. Systems Engineering and Electronics, 2022, 44(12): 3595-3602. |
[9] | Yiqiang TANG, Xiaopeng YANG, Shengming ZHU. Low-orbit satellite channel prediction algorithm based on the hybrid CNN-BiLSTM using attention mechanism [J]. Systems Engineering and Electronics, 2022, 44(12): 3863-3870. |
[10] | Yali CAO, Meimei LI, Shihan QU, Xin SONG. Waveform design of cognitive radar based on joint criteria [J]. Systems Engineering and Electronics, 2022, 44(11): 3364-3370. |
[11] | Yongxing GAO, Xudong WANG, Ling WANG, Daiyin ZHU, Jun GUO, Fanwang MENG. Weather signal detection for dual polarization weather radar based on RCNN [J]. Systems Engineering and Electronics, 2022, 44(11): 3380-3387. |
[12] | Yonggang LI, Weigang ZHU, Qiongnan HUANG, Yuntao LI, Yonghua HE. Near-shore ship target detection with SAR images in complex background [J]. Systems Engineering and Electronics, 2022, 44(10): 3096-3103. |
[13] | Bo DAN, Zhequan FU, Shan GAO, Tao JIAN. Full-polarization high resolution range profile recognition technology for sea surface target based on convolutional neural network [J]. Systems Engineering and Electronics, 2022, 44(1): 108-116. |
[14] | Ziyan LIU, Shanshan MA, Jing LIANG, Mingcheng ZHU, Lei YUAN. Attention mechanism based CNN channel estimation algorithm in millimeter-wave massive MIMO system [J]. Systems Engineering and Electronics, 2022, 44(1): 307-312. |
[15] | Caiyun WANG, Yangyu LI, Xiaofei LI, Jianing WANG, Wenyi WEI. Aerial image super-resolution restruction based on sparsity and deep learning [J]. Systems Engineering and Electronics, 2021, 43(8): 2045-2050. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||