Speech Processing Approaches towards Characterization and Identification of Dialects

Chittaragi, Nagaratna B.

Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/16839

Title:	Speech Processing Approaches towards Characterization and Identification of Dialects
Authors:	Chittaragi, Nagaratna B.
Supervisors:	Koolagudi, Shashidhar G.
Keywords:	Department of Computer Science & Engineering;Kannada dialect identification;Spectral features;Prosodic features;Excitation source features;Spectro-temporal features;Chroma features;Spectral-shaped features;Dynamic and static features;Cases;Support vector machine;Random forest;Extreme random forest;Extreme gradient boosting
Issue Date:	2020
Publisher:	National Institute of Technology Karnataka, Surathkal
Abstract:	Dialects constitute the phonological, lexical, and grammatical variations in the usage of a language with very minor and subtle differences. These variations are mainly due to specific speaking patterns followed among the group of speakers. In the recent past, dialect identification from the speech is emerging as one of the prominent speech research areas. This is mainly due to the extensive increase in the use of interactive voice-based systems. Therefore, it is essential to address speech variabilities caused due to dialectal differences in order to achieve effective, realistic man-machine interaction. The existing research on characterization and identification of dialects has mainly focused on acoustic, phonetic and phonotactic approaches on several languages such as English, Chinese, Arabic, Hindi, Spanish, etc. However, these models are not proved to be language independent. Applying these models to other languages may not perform equally well as there are many fundamental differences between dialects of different languages. However, in the literature dialect processing models reported with respect to Indian regional languages are considerably less. In this thesis, an attempt is made to develop few useful language independent and dependent Automatic Dialect Identification (ADI) systems for the Kannada language. In the beginning, a new text-independent Kannada Dialect Speech Corpus (KDSC) is collected from native speakers belonging to five prominent dialectal regions of Karnataka. This thesis investigates the significances of the excitation source, spectral, and prosodic features of speech for dialect identification. Additionally, spectrotemporal variations across dialects are captured through 2D Gabor features which are known to be biologically inspired ones. Further, the existence of non- conventional dialect-specific rhythmic and melodic correlations among dialects are explored using chroma features. These are well-established features in music-related applications. Robustness of these proposed features has been investigated under noisy background conditions and with small sized (limited data) audio clips. Inaddition, word and sentence based ADI systems are proposed using intonation and intensity variations representing the dynamic and static prosodic behaviors. Further, language dependent dialect identification systems are proposed for Kannada language using basic phonetic unit level dialect information. Additionally, Kannada language specific ’case’ (Vibhakthi Prathyayas) based dialect identification approaches are proposed. A single classifier based Support Vector Machines (SVM) and multiple classifiers based ensemble algorithms are used for classification of dialects. Experiments are carried out using individual and combinations of features. Use of different features has illustrated their complementary nature towards dialect processing. Performance comparison of both categories of classification algorithms has shown that ensemble algorithms perform better over single classifier based algorithms. Further, the intuition to use rhythm based aspects of dialects through chroma and spectral-shape features has shown better performance over state-of-the-art i-vector features. Moreover, this feature set has shown the noise robustness over the conventional MFCCs. In this work, we also have proposed intonation and intensity features to capture dialectal information from words and sentences for effective classification of dialects. In continuation, the role of duration, energy, pitch, three formants, and spectral features is also found to be evidential in Kannada dialect classification.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/16839
Appears in Collections:	1. Ph.D Theses

Files in This Item:

File	Description	Size	Format
155112CS15F09.pdf		1.9 MB	Adobe PDF	View/Open

Show full item record