Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 15

Raga classification for Carnatic music
(Springer Verlag service@springer.de, 2015) Suma, S.M.; Koolagudi, S.G.
In this work, an effort has been made to identify raga of given piece of Carnatic music. In the proposed method, direct raga classification without the use of note sequence has been performed using pitch as the primary feature. The primitive features that are extracted from the probability density function (pdf) of the pitch contour are used for classification. A feature vector of 36 dimension is obtained by extracting some parameters from the pdf. Since non-sequential features are extracted from the signal, artificial neural network (ANN) is used as a classifier. The database used for validating the system consists of 162 songs from 12 ragas. The average classification accuracy is found to be 89.5%. Â© Springer India 2015.
Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations
(Institute of Electrical and Electronics Engineers Inc., 2015) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural networks (ANNs) are used to capture the characteristics of vocal and non-vocal segments of the songs. The experiment is conducted on 60 vocal and 60 non-vocal clips extracted from Telugu albums. 11-point moving window is used to ensure the continuity of vocal and non-vocal segments, thus improving the accuracy of system. With this approach system achieves 85.59% accuracy for vocal and 88.52% for non-vocal segment classification. Â© 2015 IEEE.
Identification of allied raagas in Carnatic music
(Institute of Electrical and Electronics Engineers Inc., 2015) Upadhyaya, P.; Suma, S.M.; Koolagudi, S.G.
In this work, an effort has been made to differentiate the allied raagas in Carnatic music. Allied raagas are the raagas that are composed using same set of notes. The features derived from the pitch sequence are used for differentiating these raagas. The coefficients of legendre polynomials, used to fit the pitch contours of the song clips are used for identifying raagas. Obtained features are validated using different classifiers such as Neural networks, Naive Bayes, Multi class classifier, Bagging and Random forest. The proposed system is tested on 4 sets of allied raagas. Naive Bayes classifier gives an average accuracy of 86.67% for allied set of Todi-Dhanyasi and Multi class classifier gives an average accuracy of 86.67% for allied set of Kharaharapriya-Anandabhairavi-Reethigoula. In general, Neural network classifier performance is found to be better than other classifiers. Â© 2015 IEEE.
Identifying gamakas in Carnatic music
(Institute of Electrical and Electronics Engineers Inc., 2015) Vyas, H.M.; Suma, S.M.; Koolagudi, S.G.; Guruprasad, K.R.
In this work, an effort has been made to identify the gamakas present in a given piece of Carnatic music clip. Gamakas are the beautification elements used to improve the melody. The identification of gamaka is very important stage in note transcription. In the proposed method, features that correspond to melodic variations such as pitch and energy are used for characterizing the gamakas. The input pitch contour is modelled using Hidden Markov Model with 3 states, namely Attack, Sustain and Decay. These states correspond to ups and downs in the melody of the music. The system is validated using a comprehensive data set consisting 160 songs from 8 different ragas. The average accuracy of 75.86% is achieved using this method. Â© 2015 IEEE.
Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/âˆ‚/)
(Institute of Electrical and Electronics Engineers Inc., 2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.
Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/âˆ‚/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. Â© 2015 IEEE.
Recognition of repetition and prolongation in stuttered speech using ANN
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Savin, P.S.; Ramteke, P.B.; Koolagudi, S.G.
This paper mainly focuses on repetition and prolongation detection in stuttered speech signal. The acoustic and pitch related features like Mel-frequency cepstral coefficients (MFCCs), formants, pitch, zero crossing rate (ZCR) and Energy are used to test the effectiveness in recognizing repetitions and prolongations in stammered speech. Artificial Neural Networks (ANN) are used as classifier. The results are evaluated using combination of different features. The results show that the ANN classifier trained using MFCC features achieves an average accuracy of 87.39% for repetition and prolongation recognition. Â© Springer India 2016.
Classification of punjabi folk musical instruments based on acoustic features
(Springer Verlag service@springer.de, 2017) Singh, I.; Koolagudi, S.G.
Automatic musical instrument classification can be achieved using various features extracted such as pitch, skewness, energy, etc., from extensive number of musical database. Various feature extractionmethods have already been employed to represent data set. The crucial step in the feature extraction process is to find the best features that represent the appropriate characteristics of data set suitable for classification. This paper focuses on classification of Punjabi folk musical instruments from their audio segments. Five Punjabi folk musical instruments are considered for study. Twelve acoustic features such as entropy, kurtosis, brightness, event density, etc., including pitch are used to characterize eachmusical instrument from 150 songs. J48 classifier is used for the classification. Using the acoustic features, recognition accuracy of 91% is achieved. Â© Springer Science+Business Media Singapore 2017.
Gender Identification from Children's Speech
(Institute of Electrical and Electronics Engineers Inc., 2018) Ramteke, P.B.; Dixit, A.A.; Supanekar, S.; Dharwadkar, N.V.; Koolagudi, S.G.
Children's speech can be characterized by higher pitch and format frequencies compared to the adult speech. Gender identification task from children's speech is difficult as there is no significant difference in the acoustic properties of male and female child. Here, an attempt has been made to explore the features efficient in discriminating the gender from children's speech. Different combinations of spectral features such as Mel-frequency cepstral coefficients (MFCCs), Î”MFCCs and Î”Î”MFCCs, Formants, Linear predictive cepstral coefficients (LPCCs); Shimmer and Jitter; Prosodic features like pitch and its statistical variations along with Î”pitch related features are explored. Features are evaluated using non linear classifiers namely Artificial Neural Network (ANNs), Deep Neural Network (DNNs) and Random Forest (RF). From the results it is observed that the RF achieves an highest accuracy of 84.79% amongst the other classifiers. Â© 2018 IEEE.
Nitk Kids' speech corpus
(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2019) Ramteke, P.B.; Supanekar, S.; Hegde, P.; Nelson, H.; Aithal, V.; Koolagudi, S.G.
This paper introduces speech database for analyzing children's speech. The proposed database of children is recorded in Kannada language (one of the South Indian languages) from children between age 2 12 to 6 12 years. The database is named as National Institute of Technology Karnataka Kids' Speech Corpus (NITK Kids' Speech Corpus). The relevant design considerations for the database collection are discussed in detail. It is divided into four age groups with an interval of 1 year between each age group. The speech corpus includes nearly 10 hours of speech recordings from 160 children. For each age range, the data is recorded from 40 children (20 male and 20 female). Further, the effect of developmental changes on the speech from 2 12 to 6 12 years are analyzed using pitch and formant analysis. Some of the potential applications, of the NITK Kids' Speech Corpus, such as, systematic study on the language learning ability of children, phonological process analysis and children speech recognition are discussed. Â© Â© 2019 ISCA
Gender Identification using Spectral Features and Glottal Closure Instants (GCIs)
(Institute of Electrical and Electronics Engineers Inc., 2019) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
Automatic identification of gender from speech may help to improve the performance of the systems such as speaker speech recognition, forensic analysis, authentication processes. The difference in the physiological parameters of male and female vocal folds results in significant changes in their vocal fold vibration pattern. These changes can be characterized from the differences in the duration of their glottal closure. In this paper, an attempt has been made for gender recognition from speech using spectral features such as MFCCs, LPCCs, etc.; pitch (F0), excitation source features like glottal closure instants (GCIs) and its statistical variations. Western Michigan University's Gender dataset is used for experimentation. The dataset is collected from 93 speakers consisting of speech from 45 male and 48 female speakers respectively. Random forests (RFs) and Support vector machines (SVMs) are used to measure the performance of the proposed features. Random forest is observed to achieve average frame level accuracy of 96.908% using 13 MFCCs, 13 LPCCs, Pitch (F0) and GCI Stats (5). SVM is observed to achieve an average accuracy of 98.607% using 13 MFCCs, 13 LPCCs and GCI Stats (5). From the results, it is observed that the proposed features are efficient in discriminating the gender from speech. Â© 2019 IEEE.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results