Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 17

Robust Dialect Identification System using Spectro-Temporal Gabor Features
(Institute of Electrical and Electronics Engineers Inc., 2018) Chittaragi, N.B.; Mothukuri, S.P.; Hegde, P.; Koolagudi, S.G.
Automatic identification of dialects of a language is gaining popularity in the field of automatic speech recognition (ASR) systems. The present work proposes an automatic dialect identification (ADI) system using 2D Gabor and spectral features. A comprehensive study of the five dialects of a Dravidian Kannada language has been taken up. Gabor filters representing spectro-temporal modulations attempt in emulation of the human auditory system concerning signal processing strategies. Hence, they are able to well perceive human voices in tern recognize dialectal variations effectively. Also, spectral features Mel frequency cepstral coefficients (MFCC) are derived. A single classifier based support vector machine (SVM) and ensemble based extreme random forest (ERF) classification methods are employed for recognition. The effectiveness of the Gabor features for ADI system is demonstrated with proposed Kannada dialect dataset along with a standard intonation variation in English (IViE) dataset for British English dialects. The Gabor features have shown better performance over MFCC features with both datasets. Better recognition performance of 88.75% and 99.16% is achieved with Kannada and IViE dialect datasets respectively. Proposed Gabor features have demonstrated better performances even under noisy conditions. Â© 2018 IEEE.
Nitk Kids' speech corpus
(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2019) Ramteke, P.B.; Supanekar, S.; Hegde, P.; Nelson, H.; Aithal, V.; Koolagudi, S.G.
This paper introduces speech database for analyzing children's speech. The proposed database of children is recorded in Kannada language (one of the South Indian languages) from children between age 2 12 to 6 12 years. The database is named as National Institute of Technology Karnataka Kids' Speech Corpus (NITK Kids' Speech Corpus). The relevant design considerations for the database collection are discussed in detail. It is divided into four age groups with an interval of 1 year between each age group. The speech corpus includes nearly 10 hours of speech recordings from 160 children. For each age range, the data is recorded from 40 children (20 male and 20 female). Further, the effect of developmental changes on the speech from 2 12 to 6 12 years are analyzed using pitch and formant analysis. Some of the potential applications, of the NITK Kids' Speech Corpus, such as, systematic study on the language learning ability of children, phonological process analysis and children speech recognition are discussed. Â© Â© 2019 ISCA
Spectral Feature Based Kannada Dialect Classification from Stop Consonants
(Springer, 2019) Chittaragi, N.B.; Hegde, P.; Mothukuri, S.K.P.; Koolagudi, G.K.
This study focuses on the investigation of the significance of stop consonants in view of the classification of Kannada dialects. Majority of the studies proposed have shown the existence of evidential differences in the pronunciation of vowels across dialects. However, consonant based studies on dialect processing are found to be comparatively lesser. In this work, eight stop consonants are used for characterization of five Kannada dialects. Acoustic characteristics such as cepstral coefficients, formant frequencies, spectral flux, and rolloff features are explored from spectral analysis of stops. The consonant dataset is derived from standard Kannada dialect dataset consisting of 2417 consonants obtained from 16 native speakers from each dialect. Support vector machine (SVM) and decision tree-based extreme gradient boosting (XGB) ensemble classification methods are employed for automatic recognition of Kannada dialects. The research findings show that the stops existing for shorter duration also convey dialectal linguistic cues. Combination of spectral properties has contributed to the identification of distinct dialect-specific information across Kannada dialects. Â© 2019, Springer Nature Switzerland AG.
A deep learning approach to detect drowsy drivers in real time
(Institute of Electrical and Electronics Engineers Inc., 2019) Pinto, A.; Bhasi, M.; Bhalekar, D.; Hegde, P.; Koolagudi, S.G.
Fatigue and microsleep are the reasons behind many severe road accidents. These can be avoided if the symptoms of fatigue are detected on time. This paper describes a real-time system for monitoring driver vigilance. Driver drowsiness detection algorithms in the past have proven to work in controlled environments but have not been implemented on a wide scale as of yet. Algorithms in the past suggest calculating a scalar value known as Eye Aspect Ratio (EAR) and detect drowsiness by comparing its instantaneous value with a previously configured value. We propose a generalised approach using Convolution Neural Networks (CNN) in this paper. Our algorithm tracks the driver's eyes and feeds it into a pre-trained that predicts the state of the eye. Once the prediction is obtained, we would be able to detect if the driver is drowsy or not. The main components of our system include a camera, for real time image acquisition, a processor for running algorithms to process the acquired image and an alarm system to warn the driver when the symptoms are detected in order to avoid potential accidents. Â© 2019 IEEE.
Kannada Dialect Classification using Artificial Neural Networks
(Institute of Electrical and Electronics Engineers Inc., 2020) Mothukuri, S.K.P.; Hegde, P.; Chittaragi, N.B.; Koolagudi, S.G.
In this paper, Automatic Dialect Classification (ADC) system is proposed for dialects of Kannada language (the Dravidian language spoken in Southern Karnataka). ADC system is proposed by extracting spectral Mel Frequency Cepstral Coefficients (MFCCs), and log filter bank features along with Linear predictive coefficients. In addition, prosodic pitch and energy features are extracted to capture dialect specific cues. A Kannada dialect speech corpus consisting of five prominent dialects of Kannada language is used for designing the ADC system. An attempt is made by using Artificial Neural Networks (ANNs) technique for classification of Kannada dialects. As, recently, ANNs and its variants are gaining more popularity in the area of speech processing application. Hyperparameter tuning of ANN has resulted with an increase in performance. Â© 2020 IEEE.
Kannada Dialect Classification UsingÂ CNN
(Springer Science and Business Media Deutschland GmbH, 2020) Hegde, P.; Chittaragi, N.B.; Mothukuri, S.K.P.; Koolagudi, S.G.
Kannada is one of the prominent languages spoken in southern India. Since the Kannada is a lingua franca and spoken by more than 70 million people, it is evident to have dialects. In this paper, we identified five major dialectal regions in Karnataka state. An attempt is made to classify these five dialects from sentence-level utterances. Sentences are segmented from continuous speech automatically by using spectral centroid and short term energy features. Mel frequency cepstral coefficient (MFCC) features are extracted from these sentence units. These features are used to train the convolutional neural networks (CNN). Along with MFCCs, shifted delta and double delta coefficients are also attempted to train the CNN model. The proposed CNN based dialect recognition system is also tested with internationally known standard Intonation Variation in English (IViE) dataset. The CNN model has resulted in better performance. It is observed that the use of one convolution layer and three fully connected layers balances computational complexity and results in better accuracy with both Kannada and English datasets. Â© 2020, Springer Nature Switzerland AG.
An Improved Method for Speech Enhancement Using Convolutional Neural Network Approach
(Institute of Electrical and Electronics Engineers Inc., 2022) Mahesh Kumar, T.N.; Hegde, P.; Deepak, K.T.; Narasimhadhan, A.V.
In the speech processing domain Speech enhancement is one of the most widely used techniques. With the development of deep neural networks and the availability of powerful hardware, multiple deep learning-based speech enhancement models have come up in recent years. In this work, the speech enhancement technique using a Convolutional Neural Network(CNN) as Denoising Autoencoders (DAEs) is investigated and compared with the conventional feed-forward topology. Further, The proposed model is analyzed at various SNR levels to process the corrupted english speech and also tested on unseen speech data which includes additional SNR levels. It is observed from simulation results that the proposed model outperforms the existing model in terms of Perceptual Evaluation of Speech Quality (PESQ) and Log Spectral Distance (LSD). The network achieved 3% higher scores than feed-forward neural networks, and it is found that the convolutional DAEs perform better than feed-forward counterparts. Â© 2022 IEEE.
Crack Density and Length Detection using Machine Learning
(Avestia Publishing, 2024) Koushik, M.; Hegde, P.; Rudra, B.
This study presents a comprehensive approach for detecting and analyzing microscopic cracks in rock samples using computer vision techniques and machine learning algorithms. The proposed methodology involves image segmentation, crack detection, length, and density prediction, utilizing a combination of image processing techniques and linear regression modeling. Microscopic rock images captured at various temperatures were analyzed to detect and measure cracks accurately. The developed system demonstrated effective crack detection and length measurement capabilities, aided by image segmentation, edge detection, and feature extraction methods. Moreover, the application of linear regression facilitated the prediction of crack parameters, exhibiting a clear relationship between crack characteristics and temperature variations. The findings contribute to a deeper understanding of crack formation mechanisms in rocks under different temperature conditions, offering valuable insights for geological studies and infrastructure integrity assessments. Â© 2024, Avestia Publishing. All rights reserved.
Utilizing Deep Learning Methods for Cancer Detection through Analysis of MicroRNA Expression Profiles
(Institute of Electrical and Electronics Engineers Inc., 2024) Kantamneni, S.; Hegde, P.; Patil, N.
Integration of cutting-edge computational methods and genomic data analysis has become crucial in the quest for early cancer diagnosis and enhanced diagnostic accuracy. The genomic sequences of microRNAs (miRNAs), which are important cancer biomarkers, provide important information for this. In this study, we propose a novel deep learning-based framework for cancer detection with a focus on FNNs and a hybrid DNN model with an accuracy of over 90.7%. Our method aims to identify detailed genomic patterns and features that improve the sensitivity and specificity of cancer detection by painstakingly curating and preprocessing large miRNA datasets gathered from various patient cohorts. This research sets the stage for further exploration of deep learning methodologies within the context of miRNA-based cancer detection, promising advancements in personalized diagnosis and prognosis. Our method aims to identify detailed genomic patterns and features that improve the sensitivity and specificity of cancer detection by painstakingly curating and preprocessing large miRNA datasets gathered from various patient cohorts. Our approach seeks to improve sensitivity and specificity by deciphering complex genetic patterns. By utilizing these datasets, we show off the effectiveness of our model and its clinical potential, giving an accuracy of 90.7% for our Hybrid Feedforward and Dense Neural Network model as compared to current state of the art machine learning models. This research promises revolutionary advances in customized oncology, providing a route towards improved diagnostic accuracy and early intervention. It also proves that miRNA expressions values are not sequential in nature. It also lays the groundwork for the development of deep learning in miRNA-based cancer detection. Â© 2024 IEEE.
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
(International Speech Communication Association, 2024) Kalluri, S.B.; Singh, P.; Roy Chowdhuri, P.; Kulkarni, A.; Baghel, S.; Hegde, P.; Sontakke, S.; Deepak, K.T.; Mahadeva Prasanna, S.R.; Vijayasenan, D.; Ganapathy, S.
The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this dataset. The dataset containing 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings, was released for LD and SD tracks. Further, 12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages. The details of the dataset, baseline systems and the leader board results are highlighted in this paper. We have also compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge. Â© 2024 International Speech Communication Association. All rights reserved.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results