Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
4 results
Search Results
Item Sentence-Based Dialect Identification System Using Extreme Gradient Boosting Algorithm(Springer, 2020) Chittaragi, N.B.; Koolagudi, S.G.In this paper, a dialect identification system (DIS) is proposed by exploring the dialect specific prosodic features and cepstral coefficients from sentence-level utterances. Commonly, people belonging to a specific region follow a unique speaking style among them known as dialects. Sentence speech units are chosen for dialect identification since it is observed that a unique intonation and energy patterns are followed in sentences. Sentences are derived from a standard Intonational Variations in English (IViE) speech dataset. In this paper, pitch and energy contour are used to derive intonation and energy features respectively by using Legendre polynomial fit function along with five statistical features. Further, Mel frequency cepstral coefficients (MFCCs) are added to capture dialect specific spectral information. Extreme Gradient Boosting (XGB) ensemble method is employed for evaluation of the system under individual and combinations of features. Obtained results have indicated the influences of both prosodic and spectral features in recognition of dialects, also combined feature vectors have shown a better DIS performance of about 89.6%. © 2020, Springer Nature Singapore Pte Ltd.Item Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition(Springer New York LLC barbara.b.bertram@gsk.com, 2018) Koolagudi, S.G.; Vishnu Srinivasa Murthy, Y.V.S.; Bhaskar, S.P.In this paper, the process of selecting a classifier based on the properties of dataset is designed since it is very difficult to experiment the data on n—number of classifiers. As a case study speech emotion recognition is considered. Different combinations of spectral and prosodic features relevant to emotions are explored. The best subset of the chosen set of features is recommended for each of the classifiers based on the properties of chosen dataset. Various statistical tests have been used to estimate the properties of dataset. The nature of dataset gives an idea to select the relevant classifier. To make it more precise, three other clustering and classification techniques such as K-means clustering, vector quantization and artificial neural networks are used for experimentation and results are compared with the selected classifier. Prosodic features like pitch, intensity, jitter, shimmer, spectral features such as mel frequency cepstral coefficients (MFCCs) and formants are considered in this work. Statistical parameters of prosody such as minimum, maximum, mean (?) and standard deviation (?) are extracted from speech and combined with basic spectral (MFCCs) features to get better performance. Five basic emotions namely anger, fear, happiness, neutral and sadness are considered. For analysing the performance of different datasets on different classifiers, content and speaker independent emotional data is used, collected from Telugu movies. Mean opinion score of fifty users is collected to label the emotional data. To make it more accurate, one of the benchmark IIT-Kharagpur emotional database is used to generalize the conclusions. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.Item Acoustic-phonetic feature based Kannada dialect identification from vowel sounds(Springer New York LLC barbara.b.bertram@gsk.com, 2019) Chittaragi, N.B.; Koolagudi, S.G.In this paper, a dialect identification system is proposed for Kannada language using vowels sounds. Dialectal cues are characterized through acoustic parameters such as formant frequencies (F1–F3), and prosodic features [energy, pitch (F0), and duration]. For this purpose, a vowel dataset is collected from native speakers of Kannada belonging to different dialectal regions. Global features representing frame level global statistics such as mean, minimum, maximum, standard deviation and variance are extracted from vowel sounds. Local features representing temporal dynamic properties from the contour level are derived from the steady-state vowel region. Three decision tree-based ensemble algorithms, namely random forest, extreme random forest (ERF) and extreme gradient boosting algorithms are used for classification. Performance of both global and local features is evaluated individually. Further, the significance of every feature in dialect discrimination is analyzed using single factor-ANOVA (analysis of variances) tests. Global features with ERF ensemble model has shown a better average dialect identification performance of around 76%. Also, the contribution of every feature in dialect identification is verified. The role of duration, energy, pitch, and three formant features is found to be evidential in Kannada dialect classification. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.Item Automatic hate speech detection in audio using machine learning algorithms(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.Even though every individual is entitled to freedom of speech, some limitations exist when this freedom is used to target and harm another individual or a group of people, as it translates to hate speech. In this study, the proposed research deals with detection of hate speech for English and Kiswahili languages from audio. The dataset used in this work was collected manually from YouTube videos and then converted to audio. Audio-based features namely spectral, temporal, prosodic and excitation source features were extracted and used to train various machine learning classifiers. Initial experiments were conducted for English language and later on for Kiswahili language. However, it is observed from literature that research activities on Kiswahili language is comparatively lesser. The scores calculated for accuracy, recall, precision, auc and f1 score in detecting hate speech, suggest that Random Forest classifier performed better for English language while the Extreme Gradient Boosting classifier performed better for Kiswahili language. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
