Browsing by Author "Koolagudi, S.G."

Now showing 1 - 20 of 237

A Deep Ensemble Learning-Based CNN Architecture for Multiclass Retinal Fluid Segmentation in OCT Images
(Institute of Electrical and Electronics Engineers Inc., 2023) Rahil, M.; Anoop, B.N.; Girish, G.N.; Kothari, A.R.; Koolagudi, S.G.; Rajan, J.
Retinal Fluids (fluid collections) develop because of the accumulation of fluid in the retina, which may be caused by several retinal disorders, and can lead to loss of vision. Optical coherence tomography (OCT) provides non-invasive cross-sectional images of the retina and enables the visualization of different retinal abnormalities. The identification and segmentation of retinal cysts from OCT scans is gaining immense attention since the manual analysis of OCT data is time consuming and requires an experienced ophthalmologist. Identification and categorization of the retinal cysts aids in establishing the pathophysiology of various retinal diseases, such as macular edema, diabetic macular edema, and age-related macular degeneration. Hence, an automatic algorithm for the segmentation and detection of retinal cysts would be of great value to the ophthalmologists. In this study, we have proposed a convolutional neural network-based deep ensemble architecture that can segment the three different types of retinal cysts from the retinal OCT images. The quantitative and qualitative performance of the model was evaluated using the publicly available RETOUCH challenge dataset. The proposed model outperformed the state-of-the-art methods, with an overall improvement of 1.8%. © 2013 IEEE.
A deep learning approach to detect drowsy drivers in real time
(Institute of Electrical and Electronics Engineers Inc., 2019) Pinto, A.; Bhasi, M.; Bhalekar, D.; Hegde, P.; Koolagudi, S.G.
Fatigue and microsleep are the reasons behind many severe road accidents. These can be avoided if the symptoms of fatigue are detected on time. This paper describes a real-time system for monitoring driver vigilance. Driver drowsiness detection algorithms in the past have proven to work in controlled environments but have not been implemented on a wide scale as of yet. Algorithms in the past suggest calculating a scalar value known as Eye Aspect Ratio (EAR) and detect drowsiness by comparing its instantaneous value with a previously configured value. We propose a generalised approach using Convolution Neural Networks (CNN) in this paper. Our algorithm tracks the driver's eyes and feeds it into a pre-trained that predicts the state of the eye. Once the prediction is obtained, we would be able to detect if the driver is drowsy or not. The main components of our system include a camera, for real time image acquisition, a processor for running algorithms to process the acquired image and an alarm system to warn the driver when the symptoms are detected in order to avoid potential accidents. Â© 2019 IEEE.
A Fully-Automated Framework for Mineral Identification on Martian Surface Using Supervised Learning Models
(Institute of Electrical and Electronics Engineers Inc., 2023) Kumari, P.; Soor, S.; Shetty, A.; Koolagudi, S.G.
The availability of various spectral libraries for CRISM (Compact Reconnaissance Imaging Spectrometer for Mars) data on NASA PDS (Planetary Data System) hugely facilitated the research on the surface mineralogy of Mars, however, building supervised learning models for mineral mapping appears to be challenging due to the lack of ground-truth/training data. In this paper, an automated framework is presented that classifies the spectra in a CRISM hyperspectral image using supervised learning models, where the required training data is produced by augmenting the mineral spectra available in the MICA (Minerals Identified in CRISM Analysis) spectral library, that keeps the key absorption signatures in the mineral spectra intact while providing adequate variability. The framework contains a pre-processing pipeline that in addition to some conventional pre-processing steps includes a new feature extraction method to capture the information of the most distinguishable absorption patterns in the spectra. The proposed framework is validated on a set of CRISM images captured from different locations on the Martian surface by using different types of supervised learning models, like random forests, support vector machines, and neural networks. An uncertainty analysis of the different steps involved in the pre-processing pipeline is provided, as well as a comparison of performances with some of the previously used methods for this purpose, which shows this framework works comparably well with a mean accuracy of around 0.8. Interactive mineral maps are also provided for the detected dominant minerals. © 2013 IEEE.
A Novel Approach to Video Steganography using a 3D Chaotic Map
(Institute of Electrical and Electronics Engineers Inc., 2019) Narayanan, G.; Narayanan, R.; Haneef, N.; Chittaragi, N.B.; Koolagudi, S.G.
In this paper, we introduce a novel approach for data-hiding in videos using 3-dimensional Chaotic Maps. A video is represented as a 3-dimensional image, with the third axis constituting the frames of the video. Existing chaotic map based data-hiding techniques on videos is confined to applications of 2-dimensional chaotic maps on a per-frame basis. In this paper, a 3-dimensional extension of the logistic chaos map is applied to identify pixels to encode information in the video's 3-dimensional space and 3-3-2 Least Significant Bit (LSB) substitution is used to encode 1 byte of information per pixel. We have implemented and presented a proof of concept that has been analyzed on a test video using various quality metrics. The chaotic map based data-hiding approach proposed in this paper is shown to be secured and the results observed are inline with the standard results for a video steganographic algorithm using LSB substitution. Â© 2019 IEEE.
A Transfer Learning Approach for Diabetic Retinopathy Classification Using Deep Convolutional Neural Networks
(Institute of Electrical and Electronics Engineers Inc., 2018) Krishnan, A.S.; Clive, D.R.; Bhat, V.; Ramteke, P.B.; Koolagudi, S.G.
Diabetic Retinopathy is a disease in which the retina is damaged due to diabetes mellitus. It is a leading cause for blindness today. Detection and quantification of such mellitus from retinal images is tedious and requires expertise. In this paper, an automatic identification of severity of Diabetic Retinopathy using Convolutional Neural Networks (CNNs) with a transfer learning approach has been proposed to aid the diagnostic process. A comparison of different CNN architectures such as ResNet, Inception-ResNet-v2 etc. is done using the quadratic weighted kappa metric. The qualitative and quantitative evaluation of the proposed approach is carried out on the Diabetic Retinopathy detection dataset from Kaggle. From the results, we observe that the proposed model achieves a kappa score of 0.76. Â© 2018 IEEE.
A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection
(Institute of Electrical and Electronics Engineers Inc., 2023) Spoorthy, V.; Koolagudi, S.G.
Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model's complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet. Â© 2023 IEEE.
Academic Curriculum Load Balancing using GA
(2019) Chakradhar, M.; Charan, M.S.; Sai, R.U.; Kunal, M.; Murthy, Y.V.S.; Koolagudi, S.G.
In the paper, we propose an algorithm using genetic alogithm to find out the optimal solution for the academic load balancing problem. The load balancing problem is to optimize the load of credits per semester in an academic curriculum. In the proposed method, we try to distribute the course load as evenly as possible so that the deviation from the mean credit load per each semester is as minimal as possible. The objective function is to distribute the credit load among all the semesters evenly such that the deviation from the mean credits per semester is minimal. The proposed approach explores the solution space using only mutation operators and does not operate using crossover as the solutions obtained using cross over does not create any newer and better solutions in the solution space.The algorithm is applied on three data sets and the results are compared with the solutions obtained using the existing approaches. The results obtained using the state of the art solution are either better than approaches or on par with the state of art optimal solutions. The solution set obtained using the proposed approach is well spread out through out all the periods and all the periods contain almost mean number of credits. � 2019 IEEE.
Acoustic Event and Scene Classification: A Review
(Springer, 2025) Mulimani, M.; Venkatesh, S.; Koolagudi, S.G.
This paper gives deeper insight into the range of recent approaches developed and reported in the literature specifically for monophonic acoustic event classification (AEC), polyphonic acoustic event detection (AED) and acoustic scene classification (ASC) concerning datasets, features and classifiers. A list of datasets used for monophonic AEC, polyphonic AED and ASC is introduced. The features and classifiers used for monophonic AEC, polyphonic AED and ASC are reviewed with their success and failures. A list of the research issues is derived from the critical review of the available literature at the end of the paper. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
Acoustic event classification using graph signals
(2017) Mulimani, M.; Jahnavi, U.P.; Koolagudi, S.G.
In this paper, a graph signal is generated from spectrogram and features are investigated from graph signal for Acoustic Event Classification (AEC). Different acoustic events are selected from Sound Scene Database of Real Word Computing Partnership (RWCP) group. Three different noises are selected from NOISEX'92 database and added to test samples at different noise conditions separately. The recognition performance of acoustic events using proposed features and Mel-frequency cepstral coefficients (MFCCs) with clean and noisy test samples are compared. The proposed features show significantly improved recognition accuracy over MFCCs in noisy conditions. � 2017 IEEE.
Acoustic event classification using graph signals
(Institute of Electrical and Electronics Engineers Inc., 2017) Mulimani, M.; Jahnavi, U.P.; Koolagudi, S.G.
In this paper, a graph signal is generated from spectrogram and features are investigated from graph signal for Acoustic Event Classification (AEC). Different acoustic events are selected from Sound Scene Database of Real Word Computing Partnership (RWCP) group. Three different noises are selected from NOISEX'92 database and added to test samples at different noise conditions separately. The recognition performance of acoustic events using proposed features and Mel-frequency cepstral coefficients (MFCCs) with clean and noisy test samples are compared. The proposed features show significantly improved recognition accuracy over MFCCs in noisy conditions. Â© 2017 IEEE.
Acoustic Event Classification Using Spectrogram Features
(2019) Mulimani, M.; Koolagudi, S.G.
This paper investigates a new feature extraction method to extract different features from the spectrogram of an audio signal for Acoustic Event Classification (AEC). A new set of features is formulated and extracted from local spectrogram regions named blocks. The average recognition performance of proposed spectrogram based features and Mel-frequency cepstral coefficients (MFCCs) with their deltas and accelerations on Support Vector Machines (SVM) is compared. In this work, different categories of acoustic events are considered from the Freiburg-106 dataset. Proposed features show significantly improved performance over conventional Mel-frequency cepstral coefficients (MFCCs) for Acoustic Event Classification. � 2018 IEEE.
Acoustic Event Classification Using Spectrogram Features
(Institute of Electrical and Electronics Engineers Inc., 2018) Mulimani, M.; Koolagudi, S.G.
This paper investigates a new feature extraction method to extract different features from the spectrogram of an audio signal for Acoustic Event Classification (AEC). A new set of features is formulated and extracted from local spectrogram regions named blocks. The average recognition performance of proposed spectrogram based features and Mel-frequency cepstral coefficients (MFCCs) with their deltas and accelerations on Support Vector Machines (SVM) is compared. In this work, different categories of acoustic events are considered from the Freiburg-106 dataset. Proposed features show significantly improved performance over conventional Mel-frequency cepstral coefficients (MFCCs) for Acoustic Event Classification. Â© 2018 IEEE.
Acoustic features based word level dialect classification using SVM and ensemble methods
(2018) Chittaragi, N.B.; Koolagudi, S.G.
In this paper, word based dialect classification system is proposed by using acoustic characteristics of the speech signal. Dialects mainly represent the different pronunciation patterns of any language. Dialectal cues can exist at various levels such as phoneme, syllable, word, sentence and phrase in an utterance. Word level dialectal traits are extracted to recognize dialects since every word exhibits significant dialect discriminating cues. Intonational Variations in English (IViE) speech corpus recorded in British English has been considered. The corpus includes nine dialects which cover nine distinct regions of British Isles. Acoustic properties such as spectral and prosodic features are derived from word level to construct the feature vector. Further, two different classification algorithms such as support vector machine (SVM) and tree-based extreme gradient boosting (XGB) ensemble algorithms are used to extract the prominent patterns that are used to discriminate the dialects. From the experiments, a better performance has been observed with word level traits using ensemble methods over the SVM classification method. � 2017 IEEE.
Acoustic features based word level dialect classification using SVM and ensemble methods
(Institute of Electrical and Electronics Engineers Inc., 2017) Chittaragi, N.B.; Koolagudi, S.G.
In this paper, word based dialect classification system is proposed by using acoustic characteristics of the speech signal. Dialects mainly represent the different pronunciation patterns of any language. Dialectal cues can exist at various levels such as phoneme, syllable, word, sentence and phrase in an utterance. Word level dialectal traits are extracted to recognize dialects since every word exhibits significant dialect discriminating cues. Intonational Variations in English (IViE) speech corpus recorded in British English has been considered. The corpus includes nine dialects which cover nine distinct regions of British Isles. Acoustic properties such as spectral and prosodic features are derived from word level to construct the feature vector. Further, two different classification algorithms such as support vector machine (SVM) and tree-based extreme gradient boosting (XGB) ensemble algorithms are used to extract the prominent patterns that are used to discriminate the dialects. From the experiments, a better performance has been observed with word level traits using ensemble methods over the SVM classification method. Â© 2017 IEEE.
Acoustic Scene Classification using Deep Fisher network
(Elsevier Inc., 2023) Venkatesh, S.; Mulimani, M.; Koolagudi, S.G.
Acoustic Scene Classification (ASC) is the task of assigning a semantic label to an audio recording, based on the surrounding environment. In this work, a Fisher network is introduced for ASC. The proposed method mimics the working mechanism of a feed-forward Convolutional Neural Network (CNN) where, output of a layer is fed as an input to the succeeding layer. The Fisher network consists of a feature extraction step followed by a Fisher layer. The Fisher layer has three sub-layers, namely, Fisher Vector (FV) encoder, temporal pyramid and normalization layers along with feature reduction layer. Gammatone Time Cepstral Coefficients (GTCCs) and Mel-spectrograms are the features encoded as Fisher vector representation in FV encoder sub-layer. Temporal information of the Fisher vectors is retained using temporal pyramid sub-layer. After temporal pyramids are extracted from Fisher vectors, they are available as a feature vector. Irrelevant dimensions of the temporal pyramids are reduced further using Principal Component Analysis (PCA) in normalization and PCA sub-layers. The proposed model is evaluated on five DCASE datasets, TUT Urban Acoustic Scenes 2018 and Mobile, DCASE 2019 Acoustic Scene Classification Task 1(a) and Task 1(b), TAU Urban Acoustic Scenes 2020 datasets. The overall classification accuracy is 93%, 91%, 92%, 91% and 89% for TUT 2018, TUT Mobile 2018, DCASE Task 1(a) 2019, DCASE Task 1(b) 2019, and TAU Urban Acoustic Scenes 2020 datasets, respectively. The proposed model performed much better than the state-of-the-art ASC systems. © 2023 Elsevier Inc.
Acoustic Scene Classification using Deep Learning Architectures
(Institute of Electrical and Electronics Engineers Inc., 2021) V Spoorthy; Mulimani, M.; Koolagudi, S.G.
Enabling devices to make sense of sound is known as Acoustic Scene Classification (ASC). The analysis of various scenes by applying computational algorithms is known as computational auditory scene analysis. The main aim of this paper is to classify audio recordings based on the scenes/environment in which they are recorded. Deep learning is amongst the recent trends in most of the applications. In this paper, two deep learning algorithms are used to perform the classification of acoustic scenes, namely Convolution Neural Network (CNN) and Convolution-Recurrent Neural Network (CRNN). The model is evaluated on three activation functions, namely, ReLU, LeakyReLU and ELU. The highest recognition accuracy achieved for ASC task is 90.96% from CRNN model. The model performed well on basic convolution architecture with 10.9% improvement from the baseline system of this task. Â© 2021 IEEE.
Acoustic scene classification using projection Kervolutional neural network
(Springer, 2023) Mulimani, M.; Nandi, R.; Koolagudi, S.G.
In this paper, a novel Projection Kervolutional Neural Network (ProKNN) is proposed for Acoustic Scene Classification (ASC). ProKNN is a combination of two special filters known as the left and right projection layers and Kervolutional Neural Network (KNN). KNN replaces the linearity of the Convolutional Neural Network (CNN) with a non-linear polynomial kernel. We extend the ProKNN to learn from the features of two channels of audio recordings in the initial stage. The performance of the ProKNN is evaluated on the two publicly available datasets: TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets. Results show that the proposed ProKNN outperforms the existing systems with an absolute improvement of accuracy of 8% and 14% on TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets respectively, as compared to the baseline model of Detection and Classification of Acoustic Scene and Events (DCASE) - 2018 challenge. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Acoustic-phonetic feature based Kannada dialect identification from vowel sounds
(Springer New York LLC barbara.b.bertram@gsk.com, 2019) Chittaragi, N.B.; Koolagudi, S.G.
In this paper, a dialect identification system is proposed for Kannada language using vowels sounds. Dialectal cues are characterized through acoustic parameters such as formant frequencies (F1–F3), and prosodic features [energy, pitch (F0), and duration]. For this purpose, a vowel dataset is collected from native speakers of Kannada belonging to different dialectal regions. Global features representing frame level global statistics such as mean, minimum, maximum, standard deviation and variance are extracted from vowel sounds. Local features representing temporal dynamic properties from the contour level are derived from the steady-state vowel region. Three decision tree-based ensemble algorithms, namely random forest, extreme random forest (ERF) and extreme gradient boosting algorithms are used for classification. Performance of both global and local features is evaluated individually. Further, the significance of every feature in dialect discrimination is analyzed using single factor-ANOVA (analysis of variances) tests. Global features with ERF ensemble model has shown a better average dialect identification performance of around 76%. Also, the contribution of every feature in dialect identification is verified. The role of duration, energy, pitch, and three formant features is found to be evidential in Kannada dialect classification. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
Advertisement detection in commercial radio channels
(2016) Koolagudi, S.G.; Sridhar, S.; Elango, N.; Kumar, K.; Afroz, F.
In this paper, real time identification of advertisement segments in a radio broadcast is performed. There are certain distinctive characteristics of advertisements that distinguish from the rest of the broadcasting information, Speech technology related to recognition of specific patterns in speech signal can characterize this distinction. Machine learning tools such as Hidden Markov Models, Artificial Neural Networks and Ensemble Method are used to classify advertisement and non-advertisement patterns. An ensemble classification technique gave a better classification performance. The system was created using blind audio segmentation for optimization of real time analysis. This work is done mainly using audio characteristics and can be extended to visual data. � 2015 IEEE.
Advertisement detection in commercial radio channels
(Institute of Electrical and Electronics Engineers Inc., 2016) Koolagudi, S.G.; Sridhar, S.; Elango, N.; Kumar, K.; Afroz, F.
In this paper, real time identification of advertisement segments in a radio broadcast is performed. There are certain distinctive characteristics of advertisements that distinguish from the rest of the broadcasting information, Speech technology related to recognition of specific patterns in speech signal can characterize this distinction. Machine learning tools such as Hidden Markov Models, Artificial Neural Networks and Ensemble Method are used to classify advertisement and non-advertisement patterns. An ensemble classification technique gave a better classification performance. The system was created using blind audio segmentation for optimization of real time analysis. This work is done mainly using audio characteristics and can be extended to visual data. Â© 2015 IEEE.