Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
6 results
Search Results
Item Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction and Convolutional Recurrent Attention Neural Network(Springer Science and Business Media Deutschland GmbH, 2022) Venkatesh, S.; Koolagudi, S.G.Acoustic Scene Classification (ASC) is the task of identifying a scene using sound cues and assigning a label to the identified scene. From the past two years, the datasets that are released for ASC consist of audio samples recorded with multiple devices bringing the problem closer to real-world scenarios. Therefore, we aim to develop a device robust ASC model consisting of audio samples recorded with three different devices. The dataset considered is DCASE 2019 ASC task 1a which consists of the primary recording device (Device A) and two mobile devices (Device B and C). This work introduces the Adaptive Noise Reduction (ANR) technique to reduce the device distortion present in devices B and C audio samples. Spectrograms are extracted from all audio samples and normalized to remove biased values in the input signal. The normalized features are fed to Light weight Convolutional Recurrent Attention Neural Network to perform ASC. The key contributions of this work are the reduction of device distortion in mismatched devices and the introduction of an attention layer in the Convolutional Recurrent Neural Network (CRANN). The results achieved from the proposed method have shown a considerable improvement in the accuracy related to mismatched device ASC. © 2022, Springer Nature Switzerland AG.Item Audio Fingerprinting System to Detect and Match Audio Recordings(Springer Science and Business Media Deutschland GmbH, 2023) Kishor, K.; Venkatesh, S.; Koolagudi, S.G.The emergence of a sizable volume of audio data has increased the requirement for audio retrieval, which can identify the required information rapidly and reliably. Audio fingerprint retrieval is a preferable substitute due to its improved performance. The task of song identification from an audio recording has been an ongoing research problem in the field of music information retrieval. This work presents a robust and efficient audio fingerprinting method for song detection. This approach for the proposed system utilizes a combination of spectral and temporal features extracted from the audio signal to generate a compact and unique fingerprint for each song. A matching algorithm is then used to compare the fingerprint of the query recording to those in a reference database and identify the closest match. The system is evaluated on a diverse dataset of commercial songs and a standardized dataset. The results demonstrate the superior identification accuracy of the proposed method compared to existing approaches on a standardized dataset. Additionally, the method shows comparable identification performance for recordings, particularly for shorter segments of 1 s, with an improvement in accuracy by 14%. Moreover, the proposed method achieves a reduction in storage space by 10% in terms of the number of fingerprints required. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.Item Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network(Springer Science and Business Media Deutschland GmbH, 2024) Venkatesh, S.; Koolagudi, S.G.In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named “Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)†is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.Item Acoustic Event and Scene Classification: A Review(Springer, 2025) Mulimani, M.; Venkatesh, S.; Koolagudi, S.G.This paper gives deeper insight into the range of recent approaches developed and reported in the literature specifically for monophonic acoustic event classification (AEC), polyphonic acoustic event detection (AED) and acoustic scene classification (ASC) concerning datasets, features and classifiers. A list of datasets used for monophonic AEC, polyphonic AED and ASC is introduced. The features and classifiers used for monophonic AEC, polyphonic AED and ASC are reviewed with their success and failures. A list of the research issues is derived from the critical review of the available literature at the end of the paper. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.Item Acoustic Scene Classification using Deep Fisher network(Elsevier Inc., 2023) Venkatesh, S.; Mulimani, M.; Koolagudi, S.G.Acoustic Scene Classification (ASC) is the task of assigning a semantic label to an audio recording, based on the surrounding environment. In this work, a Fisher network is introduced for ASC. The proposed method mimics the working mechanism of a feed-forward Convolutional Neural Network (CNN) where, output of a layer is fed as an input to the succeeding layer. The Fisher network consists of a feature extraction step followed by a Fisher layer. The Fisher layer has three sub-layers, namely, Fisher Vector (FV) encoder, temporal pyramid and normalization layers along with feature reduction layer. Gammatone Time Cepstral Coefficients (GTCCs) and Mel-spectrograms are the features encoded as Fisher vector representation in FV encoder sub-layer. Temporal information of the Fisher vectors is retained using temporal pyramid sub-layer. After temporal pyramids are extracted from Fisher vectors, they are available as a feature vector. Irrelevant dimensions of the temporal pyramids are reduced further using Principal Component Analysis (PCA) in normalization and PCA sub-layers. The proposed model is evaluated on five DCASE datasets, TUT Urban Acoustic Scenes 2018 and Mobile, DCASE 2019 Acoustic Scene Classification Task 1(a) and Task 1(b), TAU Urban Acoustic Scenes 2020 datasets. The overall classification accuracy is 93%, 91%, 92%, 91% and 89% for TUT 2018, TUT Mobile 2018, DCASE Task 1(a) 2019, DCASE Task 1(b) 2019, and TAU Urban Acoustic Scenes 2020 datasets, respectively. The proposed model performed much better than the state-of-the-art ASC systems. © 2023 Elsevier Inc.Item DBNLP: detecting bias in natural language processing system for India-centric languages(Springer Science and Business Media B.V., 2025) Keerthan Kumar, K.K.; Mendke, S.; Parihar, R.; Mayya, S.; Venkatesh, S.; Koolagudi, S.G.Natural language processing (NLP) is gaining widespread interest and seeing advancements rapidly due to its attractive and exhilarating applications. NLP models are being developed in search engines for real-world scenarios such as language translation, sentiment analysis, chat-bots such as ChatGPT, and auto-completion. These models are trained on a vast corpus of online data, exposing them to harmful biases and stereotypes towards various communities. The models learn these biases, making harmful and undesirable predictions about particular genders, religions, races, and professions. Biases in NLP systems can perpetuate societal biases and discrimination, leading to unfair and unequal treatment of individuals or groups. It is crucial to identify these biases, which will help mitigate them. Most of the literary works in this area have been primarily Western-centric, focusing on the English language, making it tough to use them for Indian models and languages. In this work, we propose a model called Detecting Bias in Natural Language Processing System for India-Centric Languages (DBNLP), which aims to identify the biases relevant to the Indian context present in the text-based language models, particularly for the English and Hindi languages. The DBNLP presents three techniques for bias identification based on (1) a Context Association Test (CAT), (2) a template-based perturbation technique for various co-domain associations, and (3) a co-occurrence count-based corpus analysis technique. Further, this work showcases how India-centric models such as IndicBERT, MuRIL, and datasets such as IndicCorp are biased toward various demographic categories. Detecting bias in natural language processing systems for India-centric languages is essential to creating fair, diverse, and inclusive models that benefit society. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2025.
