Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

dc.contributor.authorSpoorthy, V.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-04T12:25:42Z
dc.date.issued2024
dc.description.abstractIdentifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broad categories in the first level as indoor, outdoor, and transportation scenes. The three classes are further categorized into individual scenes in the second level. The proposed approach is implemented using three features: log Mel band energies, harmonic spectrograms and percussive spectrograms. To perform the classification, three CNN classifiers, namely, MobileNetV2, Squeeze-and-Excitation Net (SENet), and a combination of these two architectures, known as SE-MobileNet are used. The proposed combined model encashes the advantages of both MobileNetV2 and SENet architectures. Extensive experiments are conducted on DCASE 2020 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development and DCASE 2016 ASC datasets. The proposed SE-MobileNet model resulted in a classification accuracy of 96.9% and 86.6% for the first and second levels, respectively, on DCASE 2020 dataset, and 97.6% and 88.4%, respectively, on DCASE 2016 dataset. The proposed model is reported to be better in terms of both complexity and accuracy as compared to the state-of-the-art low-complexity ASC systems. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
dc.identifier.citationCircuits, Systems, and Signal Processing, 2024, 43, 1, pp. 388-407
dc.identifier.issn0278081X
dc.identifier.urihttps://doi.org/10.1007/s00034-023-02478-0
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/21497
dc.publisherBirkhauser
dc.subjectAudio acoustics
dc.subjectComplex networks
dc.subjectConvolution
dc.subjectConvolutional neural networks
dc.subjectDeep learning
dc.subjectNetwork architecture
dc.subjectSpectrographs
dc.subjectAcoustic scene classification
dc.subjectBi-level classification
dc.subjectConvolutional neural network
dc.subjectHarmonic–percussive decomposition
dc.subjectLearning models
dc.subjectLight weight
dc.subjectLight-weight convolutional neural network
dc.subjectScene classification
dc.subjectSecond level
dc.subjectSpectrograms
dc.subjectClassification (of information)
dc.titleBi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Files

Collections