Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
4 results
Search Results
Item Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/∂/)(Institute of Electrical and Electronics Engineers Inc., 2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/∂/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. © 2015 IEEE.Item Recognition of repetition and prolongation in stuttered speech using ANN(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Savin, P.S.; Ramteke, P.B.; Koolagudi, S.G.This paper mainly focuses on repetition and prolongation detection in stuttered speech signal. The acoustic and pitch related features like Mel-frequency cepstral coefficients (MFCCs), formants, pitch, zero crossing rate (ZCR) and Energy are used to test the effectiveness in recognizing repetitions and prolongations in stammered speech. Artificial Neural Networks (ANN) are used as classifier. The results are evaluated using combination of different features. The results show that the ANN classifier trained using MFCC features achieves an average accuracy of 87.39% for repetition and prolongation recognition. © Springer India 2016.Item Repetition detection in stuttered speech(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Ramteke, P.B.; Koolagudi, S.G.; Afroz, F.This paper mainly focuses on detection of repetitions in stuttered speech. The stuttered speech signal is divided into isolated units based on energy. Mel-frequency cepstrum coefficients (MFCCs), formants and shimmer are used as features for repetition recognition. These features are extracted from each isolated unit. Using Dynamic Time Warping (DTW) the features of each isolated unit are compared with those subsequent units within one second interval of speech. Based on the analysis of scores obtained from DTW a threshold is set, if the score is below the set threshold then the units are identified as repeated events. Twenty seven seconds of speech data used in this work, consists of 50 repetition events. The result shows that the combination of MFCCs, formants and shimmer can be used for the recognition of repetitions in stuttered speech. Out of 50 repetitions, 47 are correctly identified. © Springer India 2016.Item Characterization of Consonant Sounds Using Features Related to Place of Articulation(Springer, 2020) Ramteke, P.B.; Hegde, S.; Koolagudi, S.G.Speech sounds are classified into 5 classes, grouped based on place and manner of articulation: velar, palatal, retroflex, dental and labial. In this paper, an attempt has been made to explore the role of place of articulation and vocal tract length in characterizing the different class of speech sounds. Formants and vocal tract length available for the production of each class of sound are extracted from the region of transition from consonant burst to the rising profile of the immediate following vowel. These features along with their statistical variations are considered for the analysis. Based on the non-linear nature of the features Random Forest (RF) is used for the classification. From the results, it is observed that the proposed features are efficient in discriminating the class of consonants: velar and palatal, palatal and retroflex and palatal and labial sounds with an accuracy of 92.9%, 93.83 and 94.07 respectively. © 2020, Springer Nature Singapore Pte Ltd.
