Acoustic Scene Classification using Deep Fisher network

dc.contributor.authorVenkatesh, S.
dc.contributor.authorMulimani, M.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-04T12:26:25Z
dc.date.issued2023
dc.description.abstractAcoustic Scene Classification (ASC) is the task of assigning a semantic label to an audio recording, based on the surrounding environment. In this work, a Fisher network is introduced for ASC. The proposed method mimics the working mechanism of a feed-forward Convolutional Neural Network (CNN) where, output of a layer is fed as an input to the succeeding layer. The Fisher network consists of a feature extraction step followed by a Fisher layer. The Fisher layer has three sub-layers, namely, Fisher Vector (FV) encoder, temporal pyramid and normalization layers along with feature reduction layer. Gammatone Time Cepstral Coefficients (GTCCs) and Mel-spectrograms are the features encoded as Fisher vector representation in FV encoder sub-layer. Temporal information of the Fisher vectors is retained using temporal pyramid sub-layer. After temporal pyramids are extracted from Fisher vectors, they are available as a feature vector. Irrelevant dimensions of the temporal pyramids are reduced further using Principal Component Analysis (PCA) in normalization and PCA sub-layers. The proposed model is evaluated on five DCASE datasets, TUT Urban Acoustic Scenes 2018 and Mobile, DCASE 2019 Acoustic Scene Classification Task 1(a) and Task 1(b), TAU Urban Acoustic Scenes 2020 datasets. The overall classification accuracy is 93%, 91%, 92%, 91% and 89% for TUT 2018, TUT Mobile 2018, DCASE Task 1(a) 2019, DCASE Task 1(b) 2019, and TAU Urban Acoustic Scenes 2020 datasets, respectively. The proposed model performed much better than the state-of-the-art ASC systems. © 2023 Elsevier Inc.
dc.identifier.citationDigital Signal Processing: A Review Journal, 2023, 139, , pp. -
dc.identifier.issn10512004
dc.identifier.urihttps://doi.org/10.1016/j.dsp.2023.104062
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/21839
dc.publisherElsevier Inc.
dc.subjectClassification (of information)
dc.subjectMultilayer neural networks
dc.subjectNetwork coding
dc.subjectNetwork layers
dc.subjectPrincipal component analysis
dc.subjectVectors
dc.subjectAcoustic scene classification
dc.subjectFisher layer
dc.subjectFisher network
dc.subjectFisher vector encoding
dc.subjectFisher vectors
dc.subjectPrincipal component analyse
dc.subjectPrincipal-component analysis
dc.subjectScene classification
dc.subjectSub-layers
dc.subjectVectors encoding
dc.subjectSemantics
dc.titleAcoustic Scene Classification using Deep Fisher network

Files

Collections