基于NMF与CNN联合优化的声学场景分类
韦娟, 杨皇卫, 宁方立

Acoustic scene classification based on joint optimization of NMF and CNN
Juan WEI, Huangwei YANG, Fangli NING
表1 CNN模型结构
Table 1 CNN model structure
名称 CNN8 CNN10 CNN12
输入层 256×108×1 256×108×1 256×108×1
批归一化层, 卷积层 BN, 3×3@64 BN, 3×3@64 BN, 3×3@64
批归一化层, 激活层, 卷积层 BN, ReLu, 3×3@64 BN, ReLu, 3×3@64 BN, ReLu, 3×3@64
池化层 4×2AvgPooling 4×2AvgPooling 4×2AvgPooling
批归一化层, 激活层
卷积层
$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@128\end{array} \right) \times 2$$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@128\end{array} \right) \times 2$$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@128\end{array} \right) \times 2$
池化层 4×2AvgPooling 4×2AvgPooling 4×2AvgPooling
批归一化层, 激活层
卷积层
$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@256\end{array} \right) \times 2$$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@256\end{array} \right) \times 2$$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@256\end{array} \right) \times 2$
池化层 2×1AvgPooling 2×1AvgPooling
批归一化层, 激活层
卷积层

$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@512\end{array} \right) \times 2$$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@512\end{array} \right) \times 2$
池化层 2×1AvgPooling
批归一化层, 激活层
卷积层


$\left( \begin{array}{l}{\rm{BN}}, {\rm{ReLu}}\\3 \times 3@1024\end{array} \right) \times 2$
批归一化层, 激活层, 卷积层 BN, ReLu, 1×1@1024
批归一化层, 卷积层, 全局池化层 BN, 1×1@15, Global AvgPooling
全连接层, 输出层 Dense(15), Softmax