Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
3 results
Search Results
Item Nitk Kids' speech corpus(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2019) Ramteke, P.B.; Supanekar, S.; Hegde, P.; Nelson, H.; Aithal, V.; Koolagudi, S.G.This paper introduces speech database for analyzing children's speech. The proposed database of children is recorded in Kannada language (one of the South Indian languages) from children between age 2 12 to 6 12 years. The database is named as National Institute of Technology Karnataka Kids' Speech Corpus (NITK Kids' Speech Corpus). The relevant design considerations for the database collection are discussed in detail. It is divided into four age groups with an interval of 1 year between each age group. The speech corpus includes nearly 10 hours of speech recordings from 160 children. For each age range, the data is recorded from 40 children (20 male and 20 female). Further, the effect of developmental changes on the speech from 2 12 to 6 12 years are analyzed using pitch and formant analysis. Some of the potential applications, of the NITK Kids' Speech Corpus, such as, systematic study on the language learning ability of children, phonological process analysis and children speech recognition are discussed. © © 2019 ISCAItem Identification of Nasalization and Nasal Assimilation from Children’s Speech(Springer Science and Business Media Deutschland GmbH, 2020) Ramteke, P.B.; Supanekar, S.; Aithal, V.; Koolagudi, S.G.In children, nasalization is a commonly observed phonological process where the non-nasal sounds are substituted with nasal sounds. Here, an attempt has been made for the identification of nasalization and nasal assimilation. The properties of nasal sounds and nasalized voiced sounds are explored using MFCCs extracted from Hilbert envelope of the numerator of group delay (HNGD) Spectrum. HNGD Spectrum highlights the formants in the speech and extra nasal formant in the vicinity of first formant in nasalized voiced sounds. Features extracted from correctly pronounced and mispronounced words are compared using Dynamic Time Warping (DTW) algorithm. The nature of the deviation of DTW comparison path from its diagonal behavior is analyzed for the identification of mispronunciation. The combination of FFT based MFCCs and HNGD spectrum based MFCCs are observed to achieve highest accuracy of 82.22% within the tolerance range of ±50 ms. © 2020, Springer Nature Switzerland AG.Item Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram(Springer Science and Business Media Deutschland GmbH, 2020) Ramteke, P.B.; Supanekar, S.; Aithal, V.; Koolagudi, S.G.In this paper, an attempt has been made to identify palatal fricative fronting in children speech, where postalveolar /sh/ is mispronounced as dental /s/. In children’s speech, the concentration of energy (darkest part) of spectrogram for /s/ ranges 4000 Hz to 8000 Hz, whereas it ranges 3000 Hz 8000 Hz for /sh/. Gammatonegram follows the frequency subbands of the ear (wider for higher frequencies). Various spectral properties such as spectral centroid, spectral crest factor, spectral decrease, spectral flatness, spectral flux, spectral kurtosis, spectral spread, spectral skewness, spectral slope and Shannon entropy of the spectrogram (interval of 2000 Hz), extracted from the Gammatonegram are proposed for the characterization of /sh/ and /s/. The dataset recorded from 60 native Kannada speaking children of age between 3 1/2 to 6 1/2 years is considered for the analysis from NITK Kids’ Speech Corpus. Support vector machine (SVMs) is considered for the classification. Various combinations of the proposed features are considered for the evaluation, along with the MFCCs(39) and LPCCs(39). Combination of MFCCs(39), LPCCs(39) and Entropy(4) is observed to achieve highest mispronunciation identification performance of 83.2983%. © 2020, Springer Nature Switzerland AG.
