Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    Efficient audio segmentation in soccer videos
    (Institute of Electrical and Electronics Engineers Inc., 2016) Raghuram, M.A.; Chavan, N.R.; Koolagudi, S.G.; Ramteke, P.B.
    Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses on the audio segmentation in broadcast soccer videos into audio classes such as Silence, Speech Only, Speech Over Crowd, Crowd Only and Excited, with an alternative feature set which is simplistic as well as robust to changes in the environment conditions. Support Vector Machines (SVMs), Neural Networks and Random Forest are used for the classification. The accuracy achieved with SVMs, Neural Networks and Random Forest are 83.80%, 86.07%, and 88.35% respectively. The proposed features and Random Forest classifier are found to achieve better accuracy compared to the other classifiers. © 2016 IEEE.
  • Item
    Acoustic features based word level dialect classification using SVM and ensemble methods
    (Institute of Electrical and Electronics Engineers Inc., 2017) Chittaragi, N.B.; Koolagudi, S.G.
    In this paper, word based dialect classification system is proposed by using acoustic characteristics of the speech signal. Dialects mainly represent the different pronunciation patterns of any language. Dialectal cues can exist at various levels such as phoneme, syllable, word, sentence and phrase in an utterance. Word level dialectal traits are extracted to recognize dialects since every word exhibits significant dialect discriminating cues. Intonational Variations in English (IViE) speech corpus recorded in British English has been considered. The corpus includes nine dialects which cover nine distinct regions of British Isles. Acoustic properties such as spectral and prosodic features are derived from word level to construct the feature vector. Further, two different classification algorithms such as support vector machine (SVM) and tree-based extreme gradient boosting (XGB) ensemble algorithms are used to extract the prominent patterns that are used to discriminate the dialects. From the experiments, a better performance has been observed with word level traits using ensemble methods over the SVM classification method. © 2017 IEEE.
  • Item
    Identification of Phonological Process: Final Consonant Deletion from Childrens' Speech
    (Institute of Electrical and Electronics Engineers Inc., 2018) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
    Children within the age range of 2 1/2 to 6 1/2 years face difficulties in pronunciation due to underdeveloped vocal tract and neuromotor control. They try to substitute a simple class of sounds in place of sounds difficult for them to pronounce. These pronunciation error patterns are called phonological processes. Phonological processes disappear as the child advances in age, and its analysis gives the measure of language learning ability of children over the time. Appearance of these processes after the specified age (8 years) represents a phonological disorder. In this paper, final consonant deletion, one of the phonological processes in the Kannada language is considered for the analysis. In final consonant deletion consonant, part syllable, syllable or part word which appear at the end of the word is deleted. As the part of the word is deleted, features efficient in speech recognition namely MFCCs and LPCCs are explored for the analysis. Dynamic time warping (DTW) algorithm is considered to compare the correct and mispronounced word for identification of the region of final consonant deletion. DTW comparison path is observed to warp around the end of the mispronounced word where the part of the word is deleted. Combination of 13 MFCCs and 13 LPCCs is observed to achieve the highest accuracy of 72.68% within the tolerance range of ±50ms. Results show that the features efficient in speech recognition are efficient in the identification of final consonant deletion. © 2018 IEEE.