Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
8 results
Search Results
Item Automatic text-independent Kannada dialect identification system(Springer Verlag service@springer.de, 2019) Chittaragi, N.B.; Limaye, A.; Chandana, N.T.; Annappa, B.; Koolagudi, S.G.This paper proposes a dialect identification system for the Kannada language. A system that can automatically identify the dialects of the language being spoken has a wide variety of applications. However, not many Automatic Speech Recognition (ASR) and dialect identification tasks are carried out in majority of the Indian languages. Further, there are only a few good quality annotated audio datasets available. In this paper, a new dataset for 5 spoken dialects of the Kannada language is introduced. Spectral and prosodic features have captured the most prominent features for recognition of Kannada dialects. Support Vector Machine (SVM) and neural networks algorithms are used for modeling text-independent recognition system. A neural network model that attempts for identification dialects based on sentence level cues has also been built. Hyper-parameters for SVM and neural network models are chosen using grid search. Neural network models have outperformed SVMs when complete utterances are considered. © Springer Nature Singapore Pte Ltd. 2019.Item Kannada Dialect Identification from Case-Based Word Utterances Using Gradient Boosting Algorithm(Springer Science and Business Media Deutschland GmbH, 2022) Chittaragi, N.B.; Koolagudi, S.G.Dialects or accents constitute the grammatical variations along with phonological and lexical changes those are commonly observed in the usage of a language with minor and subtle differences. Dialectal variations existing among dialects are mainly due to unique speaking patterns followed among the group of speakers. The dialect processing systems are essential in the development of automatic speech recognition systems (ASRs) for regional and resource-constrained languages in the country like India. Since India is with rich diversity in languages. In this paper, a language-dependent dialect identification system is proposed for Kannada language from words especially with the Kannada language-specific case (Vibhakthi Prathyayas) information. Special morphological operations that exist in the Kannada language in terms of various cases commonly called as a grammatical function of a noun or pronoun. These word utterances are used for the classification of five dialects of Kannada. This is a novel idea to use the smaller word utterances that consist of dialect-specific information representing the unique characteristics. In this paper, case-based word utterance dataset is prepared by considering five Kannada dialects from Kannada Dialect Speech Corpus (KDSC). Dynamic and static prosodic features are extracted to capture dialectal variations. Addition to these features, spectral MFCC features are also considered for evaluation of differences among dialects from these word-level units. Initially, multi-class Support vector machine (SVM) technique is used and later effective extreme gradient boosting (XGB) ensemble algorithms are used for the development of an automatic Kannada dialect recognition system. The research findings have demonstrated the words with case information convey dialect specific linguistic cues effectively. The combination of dynamic and static prosodic cues has a significant effect on the characterization of dialects along with spectral features. © 2022, Springer Nature Switzerland AG.Item Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS)(Elsevier Ltd, 2018) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.The technology of music information retrieval (MIR) is an emerging field that helps in tagging each portion of an audio clip. A majority of the subtasks of MIR need an application that segments vocal and non-vocal portions. In this paper, an effort has been made to segment the vocal and non-vocal regions using some novel features based on formant structure on top of standard features. The features such as Mel-frequency cepstral coefficients (MFCCs), linear prediction cepstral coefficients (LPCCs), frequency domain linear prediction (FDLP) values, statistical values of pitch, jitter, shimmer, formant attack slope (FAS), formant heights from base-to-peak (FH1), peak-to-base (FH2), formant angle values at peak (FA1), valley (FA2), and F5 have been considered. The classifiers such as artificial neural networks (ANN), support vector machines (SVM), and random forest (RF) have been considered for a comparative study as they are powerful enough to discover huge non-linear patterns. The concept of genetic algorithms with the support of neural networks has been used to select the relevant features rather considering all dimensions, named as a genetic algorithm based feature selection (GAFS). an accuracy of 89.23% before windowing and 95.16% after windowing is obtained with the optimal feature vector of length 32 using artificial neural networks. The system developed is capable of detecting singing voice segments with an accuracy of 98%. © 2018 Elsevier LtdItem Segmentation and characterization of acoustic event spectrograms using singular value decomposition(Elsevier Ltd, 2019) Mulimani, M.; Koolagudi, S.G.The traditional frame-based speech features such as Mel-frequency cepstral coefficients (MFCCs) are specifically developed for speech/speaker recognition tasks. Speech is different from acoustic events, when one considers its phonetic structure. Hence, frame-based speech features may not be suitable for Acoustic Event Classification (AEC). In this paper, a novel method is proposed for the extraction of robust acoustic event specific features from the spectrogram using a left singular vector for AEC. It consists of two main stages: segmentation and characterization of acoustic event spectrograms. In the first stage, symmetric Laplacian matrix of an acoustic event spectrogram is decomposed into singular values and vectors. Then, reliable region (spectral shape) of an acoustic from the spectrogram is segmented using a left singular vector. The selected prominent values of a left singular vector using the proposed threshold, automatically segment the reliable region of an acoustic event from the spectrogram. In the second stage, the segmented region of the spectrogram is used as a feature vector for AEC. Characteristics of values of singular vector belonging to reliable (event) and unreliable (non-event) regions of the spectrogram are determined. To evaluate the proposed approach, different categories of ‘home’ acoustic events are considered from the Freiburg-106 dataset. The results show that the significantly improved performance of acoustic event segmentation and classification. A singular vector effectively segments the reliable region of the acoustic event from spectrogram for Support Vector Machine (SVM) based AEC system. The proposed AEC system is robust to noise and achieves higher recognition rate in clean and noisy conditions compared to the traditional speech feature based systems. © 2018 Elsevier LtdItem Classification of aspirated and unaspirated sounds in speech using excitation and signal level information(Academic Press, 2020) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the ’puff of air’ released at the place of constriction in the vocal tract also known as burst. Here, properties of the vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from speech as low pass filtered linear prediction residual signal is used for the task. The signal characteristics of parameters such as glottal pulse, duration of open, closed & return phases; slope of open, & return phases; duration of burst; ratio of highest and lowest frame wise energies of signal and voice onset point are explored as features to characterize aspiration and unaspiration. Three datasets namely TIMIT, IIIT Hyderabad Marathi and IIIT Hyderabad Hindi (IIIT-H Indic Speech Databases) are used to verify the proposed approach. Random forest, support vector machine and deep feed forward neural networks (DFFNNs) are used as classifiers to test the effectiveness of the features used for the task. Optimal features are selected for the classification using correlation based feature selection (CFS). From the results, it is observed that the proposed features are efficient in classifying the aspirated and unaspirated consonants. Performance of the proposed features in recognition of aspirated and unaspirated phoneme is also evaluated. IIIT Hyderabad Marathi is considered for the analysis. It is observed that the performance of recognition of aspirated and unaspirated sounds using proposed features is improved in comparison with the MFCCs based phoneme recognition system. © 2020 Elsevier LtdItem Dialect Identification using Chroma-Spectral Shape Features with Ensemble Technique(Academic Press, 2021) Chittaragi, N.B.; Koolagudi, S.G.The present work proposes a text-independent dialect identification system. Generally, dialects of a language exhibit varying pronunciation styles followed in a specific geographical region. In this paper, chroma features familiar with music-related systems are employed for identification of dialects. In addition, eight significant spectral shape related features from short term spectra are computed and combined along with chroma features and named as chroma-spectral shape features. Chroma features try to aggregate spectral information and attempt to encapsulate the evidential variations, concerning timbre, correlated melody, rhythmic, and intonation patterns found prominently among dialects of few languages. The effectiveness of the proposed features and approach is evaluated on five prominent Kannada dialects spoken in Karnataka, India and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Discriminative models such as, single classifier based Support Vector Machine (SVM) and ensemble based support vector machines (ESVM) are employed for classification. The proposed features have shown better performance over state-of-the-art i-vector features on both datasets. The highest recognition performance of 95.6% and 97.52% are achieved in the cases of Kannada and IViE dialect datasets respectively using ESVM. Proposed features have also demonstrated robust performance with small sized (limited data) audio clips even in noisy conditions. © 2021 Elsevier LtdItem A Fully-Automated Framework for Mineral Identification on Martian Surface Using Supervised Learning Models(Institute of Electrical and Electronics Engineers Inc., 2023) Kumari, P.; Soor, S.; Shetty, A.; Koolagudi, S.G.The availability of various spectral libraries for CRISM (Compact Reconnaissance Imaging Spectrometer for Mars) data on NASA PDS (Planetary Data System) hugely facilitated the research on the surface mineralogy of Mars, however, building supervised learning models for mineral mapping appears to be challenging due to the lack of ground-truth/training data. In this paper, an automated framework is presented that classifies the spectra in a CRISM hyperspectral image using supervised learning models, where the required training data is produced by augmenting the mineral spectra available in the MICA (Minerals Identified in CRISM Analysis) spectral library, that keeps the key absorption signatures in the mineral spectra intact while providing adequate variability. The framework contains a pre-processing pipeline that in addition to some conventional pre-processing steps includes a new feature extraction method to capture the information of the most distinguishable absorption patterns in the spectra. The proposed framework is validated on a set of CRISM images captured from different locations on the Martian surface by using different types of supervised learning models, like random forests, support vector machines, and neural networks. An uncertainty analysis of the different steps involved in the pre-processing pipeline is provided, as well as a comparison of performances with some of the previously used methods for this purpose, which shows this framework works comparably well with a mean accuracy of around 0.8. Interactive mineral maps are also provided for the detected dominant minerals. © 2013 IEEE.Item Automatic diagnosis of COVID-19 related respiratory diseases from speech(Springer, 2023) Shekhar, K.; Chittaragi, N.B.; Koolagudi, S.G.In this work, an attempt is made to propose an intelligent and automatic system to recognize COVID-19 related illnesses from mere speech samples by using automatic speech processing techniques. We used a standard crowd-sourced dataset which was collected by the University of Cambridge through a web based application and an android/iPhone app. We worked on cough and breath datasets individually, and also with a combination of both the datasets. We trained the datasets on two sets of features, one consisting of only standard audio features such as spectral and prosodic features and one combining excitation source features with standard audio features extracted, and trained our model on shallow classifiers such as ensemble classifiers and SVM classification methods. Our model has shown better performance on both breath and cough datasets, but the best results in each of the cases was obtained through different combinations of features and classifiers. We got our best result when we used only standard audio features, and combined both cough and breath data. In this case, we achieved an accuracy of 84% and an Area Under Curve (AUC) score of 84%. Intelligent systems have already started to make a mark in medical diagnosis, and this type of study can help better the health system by providing much needed assistance to the health workers. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
