Journal Articles

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/19884

Browse

Search Results

Now showing 1 - 8 of 8

Dialect Identification Using Spectral and Prosodic Features on Single and Ensemble Classifiers
(Springer Verlag, 2018) Chittaragi, N.B.; Prakash, A.; Koolagudi, S.G.
In this paper, investigation of the significance of spectral and prosodic behaviors of speech signal has been carried out for dialect identification. Spectral features such as cepstral coefficients, spectral flux, and entropy are extracted from shorter frames. Prosodic attributes such as pitch, energy, and duration are derived from longer frames. IViE (Intonational Variations in English) speech corpus covering nine dialectal regions of British Isles has been considered, to evaluate the proposed approach. Since corpus is available in both read and semi-spontaneous modes, the influence of spectral and prosodic behavior over these datasets is distinguishably articulated. Further, two distinct classification algorithms, namely support vector machine (SVM) and an ensemble of decision trees along with the SVM are used for identification of nine dialects. Dialect discriminating information captured from both features are used for constructing feature vectors. Experiments have been conducted on individual and combinations of features. A better dialect recognition performance is observed with ensemble methods over a single independent SVM. © 2017, King Fahd University of Petroleum & Minerals.
Acoustic-phonetic feature based Kannada dialect identification from vowel sounds
(Springer New York LLC barbara.b.bertram@gsk.com, 2019) Chittaragi, N.B.; Koolagudi, S.G.
In this paper, a dialect identification system is proposed for Kannada language using vowels sounds. Dialectal cues are characterized through acoustic parameters such as formant frequencies (F1–F3), and prosodic features [energy, pitch (F0), and duration]. For this purpose, a vowel dataset is collected from native speakers of Kannada belonging to different dialectal regions. Global features representing frame level global statistics such as mean, minimum, maximum, standard deviation and variance are extracted from vowel sounds. Local features representing temporal dynamic properties from the contour level are derived from the steady-state vowel region. Three decision tree-based ensemble algorithms, namely random forest, extreme random forest (ERF) and extreme gradient boosting algorithms are used for classification. Performance of both global and local features is evaluated individually. Further, the significance of every feature in dialect discrimination is analyzed using single factor-ANOVA (analysis of variances) tests. Global features with ERF ensemble model has shown a better average dialect identification performance of around 76%. Also, the contribution of every feature in dialect identification is verified. The role of duration, energy, pitch, and three formant features is found to be evidential in Kannada dialect classification. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms
(Springer editorial@springerplus.com, 2020) Chittaragi, N.B.; Koolagudi, S.G.
In this paper, an automatic dialect identification (ADI) system is proposed by extracting spectral and prosodic features for Kannada language. A new dialect dataset is collected from native speakers of Kannada language (A Dravidian language). This dataset includes five distinct dialects of Kannada language representing five geographical regions of Karnataka state. Investigation of the significance of spectral and prosodic variations on five Kannada dialects is carried out. Mel-frequency cepstral coefficients (MFCCs), spectral flux, and entropy are used as representatives of spectral features. Besides, pitch and energy features are extracted as representatives of prosodic parameters for identification of dialects. These raw feature vectors are further processed to get a new derived feature vectors by using statistical processing. In this paper, a single classifier based multi-class support vector machine (SVM) and multiple classifier based ensemble SVM (ESVM) techniques are employed for classification of dialects. The effectiveness and performance evaluation of the explored features are carried out on newly collected Kannada speech corpus, with five Kannada dialects and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Experimental results have demonstrated that the derived feature vectors performs better when compared to raw feature vectors. However, ESVM technique has demonstrated better performance over a single SVM. Spectral and prosodic features have resulted individually with the dialect recognition performance of 83.12% and 44.52% respectively. Further, the complementary nature of both spectral and prosodic features is evaluated by combining both feature vectors for dialect recognition. However, an increase in dialect recognition performance of about 86.25% is observed. This indicates the existence of complementary dialect specific evidence with spectral and prosodic features. The experiments conducted on standard IViE corpus have shown a higher recognition rate of 91.38% using ESVM. Proposed ADI systems with derived features have shown better performance over the state-of-the-art i-vector feature based systems on both datasets. © 2019, Springer Nature B.V.
Dialect Identification using Chroma-Spectral Shape Features with Ensemble Technique
(Academic Press, 2021) Chittaragi, N.B.; Koolagudi, S.G.
The present work proposes a text-independent dialect identification system. Generally, dialects of a language exhibit varying pronunciation styles followed in a specific geographical region. In this paper, chroma features familiar with music-related systems are employed for identification of dialects. In addition, eight significant spectral shape related features from short term spectra are computed and combined along with chroma features and named as chroma-spectral shape features. Chroma features try to aggregate spectral information and attempt to encapsulate the evidential variations, concerning timbre, correlated melody, rhythmic, and intonation patterns found prominently among dialects of few languages. The effectiveness of the proposed features and approach is evaluated on five prominent Kannada dialects spoken in Karnataka, India and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Discriminative models such as, single classifier based Support Vector Machine (SVM) and ensemble based support vector machines (ESVM) are employed for classification. The proposed features have shown better performance over state-of-the-art i-vector features on both datasets. The highest recognition performance of 95.6% and 97.52% are achieved in the cases of Kannada and IViE dialect datasets respectively using ESVM. Proposed features have also demonstrated robust performance with small sized (limited data) audio clips even in noisy conditions. © 2021 Elsevier Ltd
Automatic diagnosis of COVID-19 related respiratory diseases from speech
(Springer, 2023) Shekhar, K.; Chittaragi, N.B.; Koolagudi, S.G.
In this work, an attempt is made to propose an intelligent and automatic system to recognize COVID-19 related illnesses from mere speech samples by using automatic speech processing techniques. We used a standard crowd-sourced dataset which was collected by the University of Cambridge through a web based application and an android/iPhone app. We worked on cough and breath datasets individually, and also with a combination of both the datasets. We trained the datasets on two sets of features, one consisting of only standard audio features such as spectral and prosodic features and one combining excitation source features with standard audio features extracted, and trained our model on shallow classifiers such as ensemble classifiers and SVM classification methods. Our model has shown better performance on both breath and cough datasets, but the best results in each of the cases was obtained through different combinations of features and classifiers. We got our best result when we used only standard audio features, and combined both cough and breath data. In this case, we achieved an accuracy of 84% and an Area Under Curve (AUC) score of 84%. Intelligent systems have already started to make a mark in medical diagnosis, and this type of study can help better the health system by providing much needed assistance to the health workers. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Automatic hate speech detection in audio using machine learning algorithms
(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.
Even though every individual is entitled to freedom of speech, some limitations exist when this freedom is used to target and harm another individual or a group of people, as it translates to hate speech. In this study, the proposed research deals with detection of hate speech for English and Kiswahili languages from audio. The dataset used in this work was collected manually from YouTube videos and then converted to audio. Audio-based features namely spectral, temporal, prosodic and excitation source features were extracted and used to train various machine learning classifiers. Initial experiments were conducted for English language and later on for Kiswahili language. However, it is observed from literature that research activities on Kiswahili language is comparatively lesser. The scores calculated for accuracy, recall, precision, auc and f1 score in detecting hate speech, suggest that Random Forest classifier performed better for English language while the Extreme Gradient Boosting classifier performed better for Kiswahili language. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
Explainable hate speech detection using LIME
(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to detect hate speech, there are significant research gaps. First, most studies used text data instead of other modalities such as videos or audio. Second, most studies explored traditional machine learning algorithms. However, due to the increase in complexities of computational tasks, there is need to employ complex techniques and methodologies. Third, majority of the research studies have either been evaluated using very few evaluation metrics or not statistically evaluated at all. Lastly, due to the opaque, black-box nature of the complex classifiers, there is need to use explainability techniques. This research aims to address these gaps by detecting hate speech in English and Kiswahili languages using videos manually collected from YouTube. The videos were converted to text and used to train various classifiers. The performance of these classifiers was evaluated using various evaluation and statistical measurements. The experimental results suggest that the random forest classifier achieved the highest results for both languages across all evaluation measurements compared to all classifiers used. The results for English language were: accuracy 98%, AUC 96%, precision 99%, recall 97%, F1 98%, specificity 98% and MCC 96% while the results for Kiswahili language were: accuracy 90%, AUC 94%, precision 93%, recall 92%, F1 94%, specificity 87% and MCC 75%. These results suggest that the random forest classifier is robust, effective and efficient in detecting hate speech in any language. This also implies that the classifier is reliable in detecting hate speech and other related problems in social media. However, to understand the classifiers’ decision-making process, we used the Local Interpretable Model-agnostic Explanations (LIME) technique to explain the predictions achieved by the random forest classifier. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
Video forgery localization using inter-frame denoising and intra-frame segmentation
(Springer, 2025) Banerjee, D.; Chittaragi, N.B.; Koolagudi, S.G.
Video forgery detection has been necessary with recent spurt in fake videos like Deepfakes and doctored videos from multiple video capturing devices. In this paper, we provide a novel technique of detecting fake videos by creating an ensemble network, based on statistical and deep learning methods to detect interframe forgery and intraframe forgery in forged videos separately. In this paper, Noise signature extraction of a particular image capturing sensor and an Autoencoder-based Convolutional Neural Network model (CNN) are used to localize the forged regions. We have trained the model to localize Deepfake video forgeries as well as copy-paste forgeries with effective results in the test data. The proposed fake video detector can be applied at the back-end of on-line video aggregating services and check their authenticity to verify the genuineness of videos. The results achieved have shown better performances in detecting fake videos compared to existing methods. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

Journal Articles

Browse

Filters

Settings

Sort By

Results per page

Search Results