系统工程与电子技术 ›› 2025, Vol. 47 ›› Issue (10): 3148-3154.doi: 10.12305/j.issn.1001-506X.2025.10.03

• 电子技术 • 上一篇    

基于自适应多分支卷积的声学场景分类

韦娟1,*(), 何德华1, 宁方立2   

  1. 1. 西安电子科技大学通信工程学院,陕西 西安 710071
    2. 西北工业大学机电学院,陕西 西安 710072
  • 收稿日期:2024-08-16 出版日期:2025-10-25 发布日期:2025-10-23
  • 通讯作者: 韦娟 E-mail:weijuan@xidian.edu.cn
  • 作者简介:何德华(2000—),女,硕士研究生,主要研究方向为声学场景分类
    宁方立(1974—),男,教授,博士,主要研究方向为声源定位
  • 基金资助:
    国家自然科学基金(52475132);陕西省重点研发计划(2023-YBGY-219);航空科学基金(20200015053001);西安市重点产业链技术攻关(23ZDCYJSGG0006-2023)资助课题

Acoustic scene classification based on adaptive multi-branch convolution

Juan WEI1,*(), Dehua HE1, Fangli NING2   

  1. 1. School of Communication Engineering,Xidian University,Xi’an 710071,China
    2. School of Mechanical Engineering,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2024-08-16 Online:2025-10-25 Published:2025-10-23
  • Contact: Juan WEI E-mail:weijuan@xidian.edu.cn

摘要:

针对声学场景分类任务中模型特征表达能力不充足的问题,提出一种基于自适应多分支卷积优化的网络架构。首先,使用多支路分别提取特征,再引入动态权重自适应改变权值平衡每个支路,提升特征感知能力。其次,考虑现有模型分类时忽略类与类之间的关系问题,引入粗粒度分类器辅助训练原分类模型,通过结果融合增强分类过程。在TUT2020移动开发数据集上进行训练与测试。实验结果表明,相较于优化前的算法,所提模型在准确率上提升了6.5%,证明所提方法可以有效提升整体分类效果。

关键词: 声学场景分类, 卷积神经网络, 自适应特征融合, 层次结构

Abstract:

Aiming to address the problem of the model’s insufficient feature representation ability in the acoustic scene classification task, a network architecture based on adaptive multi-branch convolutional optimization is proposed. Firstly, multiple branches are used to extract features independently, and dynamic weights are introduced to adaptively adjust the balance among the branches, enhancing feature perception capability. Secondly, to address the issue of ignoring the relationships among classes during classification in existing models, a coarse-grained classifier is introduced to assist in training the original classification model. The classification process is enhanced by fusing the results. The proposed method is trained and tested on the TUT2020 mobile development dataset. Experimental results show that the accuracy of the proposed method is improved by 6.5% compared with the algorithm before optimization, demonstrating that the proposed method effectively enhances the overall classification performance.

Key words: acoustic scene classification, convolutional neural networks, adaptive feature fusion, hierarchical proposed

中图分类号: