Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
2 results
Search Results
Item Characterization of aspirated and unaspirated sounds in speech(Institute of Electrical and Electronics Engineers Inc., 2017) Ramteke, P.B.; Sadanand, A.; Koolagudi, S.G.; Pai, V.In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the 'puff of air' released at the place of constriction in the vocal tract which is known as burst. Here, the properties of vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from the speech linear prediction residual is used for the task. The signal characteristics such as glottal pulse, duration of open, closed & return phases, slope of open & return phases, duration of burst, ratio of highest and lowest energies of signal and voice onset time (VOT) are explored to characterize aspiration and unaspiration. TIMIT English speech corpus is used to test the proposed approach. Random forest (RF) and support vector machine (SVMs) are used as classifiers to test the effectiveness of the features used for the task. An accuracy of 99.93% and 94.03% is achieved respectively. From the results, it is observed that the proposed features are robust in classifying the aspirated and unaspirated consonants. © 2017 IEEE.Item Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram(Springer Science and Business Media Deutschland GmbH, 2020) Ramteke, P.B.; Supanekar, S.; Aithal, V.; Koolagudi, S.G.In this paper, an attempt has been made to identify palatal fricative fronting in children speech, where postalveolar /sh/ is mispronounced as dental /s/. In children’s speech, the concentration of energy (darkest part) of spectrogram for /s/ ranges 4000 Hz to 8000 Hz, whereas it ranges 3000 Hz 8000 Hz for /sh/. Gammatonegram follows the frequency subbands of the ear (wider for higher frequencies). Various spectral properties such as spectral centroid, spectral crest factor, spectral decrease, spectral flatness, spectral flux, spectral kurtosis, spectral spread, spectral skewness, spectral slope and Shannon entropy of the spectrogram (interval of 2000 Hz), extracted from the Gammatonegram are proposed for the characterization of /sh/ and /s/. The dataset recorded from 60 native Kannada speaking children of age between 3 1/2 to 6 1/2 years is considered for the analysis from NITK Kids’ Speech Corpus. Support vector machine (SVMs) is considered for the classification. Various combinations of the proposed features are considered for the evaluation, along with the MFCCs(39) and LPCCs(39). Combination of MFCCs(39), LPCCs(39) and Entropy(4) is observed to achieve highest mispronunciation identification performance of 83.2983%. © 2020, Springer Nature Switzerland AG.
