Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 6 of 6

Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/âˆ‚/)
(Institute of Electrical and Electronics Engineers Inc., 2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.
Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/âˆ‚/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. Â© 2015 IEEE.
Recognition of repetition and prolongation in stuttered speech using ANN
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Savin, P.S.; Ramteke, P.B.; Koolagudi, S.G.
This paper mainly focuses on repetition and prolongation detection in stuttered speech signal. The acoustic and pitch related features like Mel-frequency cepstral coefficients (MFCCs), formants, pitch, zero crossing rate (ZCR) and Energy are used to test the effectiveness in recognizing repetitions and prolongations in stammered speech. Artificial Neural Networks (ANN) are used as classifier. The results are evaluated using combination of different features. The results show that the ANN classifier trained using MFCC features achieves an average accuracy of 87.39% for repetition and prolongation recognition. Â© Springer India 2016.
Gender Identification from Children's Speech
(Institute of Electrical and Electronics Engineers Inc., 2018) Ramteke, P.B.; Dixit, A.A.; Supanekar, S.; Dharwadkar, N.V.; Koolagudi, S.G.
Children's speech can be characterized by higher pitch and format frequencies compared to the adult speech. Gender identification task from children's speech is difficult as there is no significant difference in the acoustic properties of male and female child. Here, an attempt has been made to explore the features efficient in discriminating the gender from children's speech. Different combinations of spectral features such as Mel-frequency cepstral coefficients (MFCCs), Î”MFCCs and Î”Î”MFCCs, Formants, Linear predictive cepstral coefficients (LPCCs); Shimmer and Jitter; Prosodic features like pitch and its statistical variations along with Î”pitch related features are explored. Features are evaluated using non linear classifiers namely Artificial Neural Network (ANNs), Deep Neural Network (DNNs) and Random Forest (RF). From the results it is observed that the RF achieves an highest accuracy of 84.79% amongst the other classifiers. Â© 2018 IEEE.
Nitk Kids' speech corpus
(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2019) Ramteke, P.B.; Supanekar, S.; Hegde, P.; Nelson, H.; Aithal, V.; Koolagudi, S.G.
This paper introduces speech database for analyzing children's speech. The proposed database of children is recorded in Kannada language (one of the South Indian languages) from children between age 2 12 to 6 12 years. The database is named as National Institute of Technology Karnataka Kids' Speech Corpus (NITK Kids' Speech Corpus). The relevant design considerations for the database collection are discussed in detail. It is divided into four age groups with an interval of 1 year between each age group. The speech corpus includes nearly 10 hours of speech recordings from 160 children. For each age range, the data is recorded from 40 children (20 male and 20 female). Further, the effect of developmental changes on the speech from 2 12 to 6 12 years are analyzed using pitch and formant analysis. Some of the potential applications, of the NITK Kids' Speech Corpus, such as, systematic study on the language learning ability of children, phonological process analysis and children speech recognition are discussed. Â© Â© 2019 ISCA
Gender Identification using Spectral Features and Glottal Closure Instants (GCIs)
(Institute of Electrical and Electronics Engineers Inc., 2019) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
Automatic identification of gender from speech may help to improve the performance of the systems such as speaker speech recognition, forensic analysis, authentication processes. The difference in the physiological parameters of male and female vocal folds results in significant changes in their vocal fold vibration pattern. These changes can be characterized from the differences in the duration of their glottal closure. In this paper, an attempt has been made for gender recognition from speech using spectral features such as MFCCs, LPCCs, etc.; pitch (F0), excitation source features like glottal closure instants (GCIs) and its statistical variations. Western Michigan University's Gender dataset is used for experimentation. The dataset is collected from 93 speakers consisting of speech from 45 male and 48 female speakers respectively. Random forests (RFs) and Support vector machines (SVMs) are used to measure the performance of the proposed features. Random forest is observed to achieve average frame level accuracy of 96.908% using 13 MFCCs, 13 LPCCs, Pitch (F0) and GCI Stats (5). SVM is observed to achieve an average accuracy of 98.607% using 13 MFCCs, 13 LPCCs and GCI Stats (5). From the results, it is observed that the proposed features are efficient in discriminating the gender from speech. Â© 2019 IEEE.
Note Transcription from Carnatic Music
(Springer, 2020) Suma, S.M.; Koolagudi, S.G.; Ramteke, P.B.; Sreenivasa Rao, K.S.
In this work, anÂ effort has been made to identify note sequence ofÂ different ragas of Carnatic Music. The proposedÂ heuristic method makes use of standard just-intonationÂ frequency ratios between notes for basic transcription of music piece into written sequence of notes. The notes present in a given piece of music are obtained using pitch histograms. The normalized pitch contour of the music piece is segmented based on detection of the note boundaries. These segments are labeled using note information already available. Without prior knowledge of raga, 30 out of 64 sequences are identified accurately and additional 18 sequences are identified with one note error. With the prior raga knowledge 76.56% accuracy is observed in note sequence identification. Â© 2020, Springer Nature Singapore Pte Ltd.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results