Journal Articles

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/19884

Browse

Search Results

Now showing 1 - 6 of 6
  • Item
    Acoustic Event and Scene Classification: A Review
    (Springer, 2025) Mulimani, M.; Venkatesh, S.; Koolagudi, S.G.
    This paper gives deeper insight into the range of recent approaches developed and reported in the literature specifically for monophonic acoustic event classification (AEC), polyphonic acoustic event detection (AED) and acoustic scene classification (ASC) concerning datasets, features and classifiers. A list of datasets used for monophonic AEC, polyphonic AED and ASC is introduced. The features and classifiers used for monophonic AEC, polyphonic AED and ASC are reviewed with their success and failures. A list of the research issues is derived from the critical review of the available literature at the end of the paper. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
  • Item
    Extraction of MapReduce-based features from spectrograms for audio-based surveillance
    (Elsevier Inc. usjcs@elsevier.com, 2019) Mulimani, M.; Koolagudi, S.G.
    In this paper, we proposed a novel parallel method for extraction of significant information from spectrograms using MapReduce programming model for the audio-based surveillance system, which effectively recognizes critical acoustic events in the surrounding environment. Extraction of reliable information as features from spectrograms of big noisy audio event dataset demands high computational time. Parallelizing the feature extraction using MapReduce programming model on Hadoop improves the efficiency of the overall system. The acoustic events with real-time background noise from Mivia lab audio event data set are used for surveillance applications. The proposed approach is time efficient and achieves high performance of recognizing critical acoustic events with the average recognition rate of 96.5% in different noisy conditions. © 2019 Elsevier Inc.
  • Item
    Segmentation and characterization of acoustic event spectrograms using singular value decomposition
    (Elsevier Ltd, 2019) Mulimani, M.; Koolagudi, S.G.
    The traditional frame-based speech features such as Mel-frequency cepstral coefficients (MFCCs) are specifically developed for speech/speaker recognition tasks. Speech is different from acoustic events, when one considers its phonetic structure. Hence, frame-based speech features may not be suitable for Acoustic Event Classification (AEC). In this paper, a novel method is proposed for the extraction of robust acoustic event specific features from the spectrogram using a left singular vector for AEC. It consists of two main stages: segmentation and characterization of acoustic event spectrograms. In the first stage, symmetric Laplacian matrix of an acoustic event spectrogram is decomposed into singular values and vectors. Then, reliable region (spectral shape) of an acoustic from the spectrogram is segmented using a left singular vector. The selected prominent values of a left singular vector using the proposed threshold, automatically segment the reliable region of an acoustic event from the spectrogram. In the second stage, the segmented region of the spectrogram is used as a feature vector for AEC. Characteristics of values of singular vector belonging to reliable (event) and unreliable (non-event) regions of the spectrogram are determined. To evaluate the proposed approach, different categories of ‘home’ acoustic events are considered from the Freiburg-106 dataset. The results show that the significantly improved performance of acoustic event segmentation and classification. A singular vector effectively segments the reliable region of the acoustic event from spectrogram for Support Vector Machine (SVM) based AEC system. The proposed AEC system is robust to noise and achieves higher recognition rate in clean and noisy conditions compared to the traditional speech feature based systems. © 2018 Elsevier Ltd
  • Item
    Robust Acoustic Event Classification using Fusion Fisher Vector features
    (Elsevier Ltd, 2019) Mulimani, M.; Koolagudi, S.G.
    In this paper, a novel Fusion Fisher Vector (FFV) features are proposed for Acoustic Event Classification (AEC) in the meeting room environments. The monochrome images of a pseudo-color spectrogram of an acoustic event are represented as Fisher vectors. First, irrelevant feature dimensions of each Fisher vector are discarded using Principal Component Analysis (PCA) and then, resulting Fisher vectors are fused to get FFV features. Performance of the FFV features is evaluated on acoustic events of UPC-TALP dataset in clean and different noisy conditions. Results show that proposed FFV features are robust to noise and achieve overall 94.32% recognition accuracy in clean and different noisy conditions. © 2019 Elsevier Ltd
  • Item
    Acoustic scene classification using projection Kervolutional neural network
    (Springer, 2023) Mulimani, M.; Nandi, R.; Koolagudi, S.G.
    In this paper, a novel Projection Kervolutional Neural Network (ProKNN) is proposed for Acoustic Scene Classification (ASC). ProKNN is a combination of two special filters known as the left and right projection layers and Kervolutional Neural Network (KNN). KNN replaces the linearity of the Convolutional Neural Network (CNN) with a non-linear polynomial kernel. We extend the ProKNN to learn from the features of two channels of audio recordings in the initial stage. The performance of the ProKNN is evaluated on the two publicly available datasets: TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets. Results show that the proposed ProKNN outperforms the existing systems with an absolute improvement of accuracy of 8% and 14% on TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets respectively, as compared to the baseline model of Detection and Classification of Acoustic Scene and Events (DCASE) - 2018 challenge. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Acoustic Scene Classification using Deep Fisher network
    (Elsevier Inc., 2023) Venkatesh, S.; Mulimani, M.; Koolagudi, S.G.
    Acoustic Scene Classification (ASC) is the task of assigning a semantic label to an audio recording, based on the surrounding environment. In this work, a Fisher network is introduced for ASC. The proposed method mimics the working mechanism of a feed-forward Convolutional Neural Network (CNN) where, output of a layer is fed as an input to the succeeding layer. The Fisher network consists of a feature extraction step followed by a Fisher layer. The Fisher layer has three sub-layers, namely, Fisher Vector (FV) encoder, temporal pyramid and normalization layers along with feature reduction layer. Gammatone Time Cepstral Coefficients (GTCCs) and Mel-spectrograms are the features encoded as Fisher vector representation in FV encoder sub-layer. Temporal information of the Fisher vectors is retained using temporal pyramid sub-layer. After temporal pyramids are extracted from Fisher vectors, they are available as a feature vector. Irrelevant dimensions of the temporal pyramids are reduced further using Principal Component Analysis (PCA) in normalization and PCA sub-layers. The proposed model is evaluated on five DCASE datasets, TUT Urban Acoustic Scenes 2018 and Mobile, DCASE 2019 Acoustic Scene Classification Task 1(a) and Task 1(b), TAU Urban Acoustic Scenes 2020 datasets. The overall classification accuracy is 93%, 91%, 92%, 91% and 89% for TUT 2018, TUT Mobile 2018, DCASE Task 1(a) 2019, DCASE Task 1(b) 2019, and TAU Urban Acoustic Scenes 2020 datasets, respectively. The proposed model performed much better than the state-of-the-art ASC systems. © 2023 Elsevier Inc.