Speech Processing Approaches towards Characterization and Identification of Dialects
Date
2020
Authors
Chittaragi, Nagaratna B.
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
Dialects constitute the phonological, lexical, and grammatical variations in the
usage of a language with very minor and subtle differences. These variations are
mainly due to specific speaking patterns followed among the group of speakers.
In the recent past, dialect identification from the speech is emerging as one of
the prominent speech research areas. This is mainly due to the extensive increase
in the use of interactive voice-based systems. Therefore, it is essential to address speech variabilities caused due to dialectal differences in order to achieve
effective, realistic man-machine interaction. The existing research on characterization and identification of dialects has mainly focused on acoustic, phonetic and
phonotactic approaches on several languages such as English, Chinese, Arabic,
Hindi, Spanish, etc. However, these models are not proved to be language independent. Applying these models to other languages may not perform equally
well as there are many fundamental differences between dialects of different languages. However, in the literature dialect processing models reported with respect
to Indian regional languages are considerably less. In this thesis, an attempt is
made to develop few useful language independent and dependent Automatic Dialect Identification (ADI) systems for the Kannada language. In the beginning, a
new text-independent Kannada Dialect Speech Corpus (KDSC) is collected from
native speakers belonging to five prominent dialectal regions of Karnataka.
This thesis investigates the significances of the excitation source, spectral,
and prosodic features of speech for dialect identification. Additionally, spectrotemporal variations across dialects are captured through 2D Gabor features which
are known to be biologically inspired ones. Further, the existence of non- conventional dialect-specific rhythmic and melodic correlations among dialects are explored using chroma features. These are well-established features in music-related
applications. Robustness of these proposed features has been investigated under
noisy background conditions and with small sized (limited data) audio clips. Inaddition, word and sentence based ADI systems are proposed using intonation and
intensity variations representing the dynamic and static prosodic behaviors.
Further, language dependent dialect identification systems are proposed for
Kannada language using basic phonetic unit level dialect information. Additionally, Kannada language specific ’case’ (Vibhakthi Prathyayas) based dialect
identification approaches are proposed. A single classifier based Support Vector
Machines (SVM) and multiple classifiers based ensemble algorithms are used for
classification of dialects. Experiments are carried out using individual and combinations of features. Use of different features has illustrated their complementary
nature towards dialect processing. Performance comparison of both categories of
classification algorithms has shown that ensemble algorithms perform better over
single classifier based algorithms. Further, the intuition to use rhythm based aspects of dialects through chroma and spectral-shape features has shown better
performance over state-of-the-art i-vector features. Moreover, this feature set has
shown the noise robustness over the conventional MFCCs. In this work, we also
have proposed intonation and intensity features to capture dialectal information
from words and sentences for effective classification of dialects. In continuation,
the role of duration, energy, pitch, three formants, and spectral features is also
found to be evidential in Kannada dialect classification.
Description
Keywords
Department of Computer Science & Engineering, Kannada dialect identification, Spectral features, Prosodic features, Excitation source features, Spectro-temporal features, Chroma features, Spectral-shaped features, Dynamic and static features, Cases, Support vector machine, Random forest, Extreme random forest, Extreme gradient boosting