Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

dc.contributor.authorChittaragi, N.B.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-05T09:28:36Z
dc.date.issued2020
dc.description.abstractIn this paper, an automatic dialect identification (ADI) system is proposed by extracting spectral and prosodic features for Kannada language. A new dialect dataset is collected from native speakers of Kannada language (A Dravidian language). This dataset includes five distinct dialects of Kannada language representing five geographical regions of Karnataka state. Investigation of the significance of spectral and prosodic variations on five Kannada dialects is carried out. Mel-frequency cepstral coefficients (MFCCs), spectral flux, and entropy are used as representatives of spectral features. Besides, pitch and energy features are extracted as representatives of prosodic parameters for identification of dialects. These raw feature vectors are further processed to get a new derived feature vectors by using statistical processing. In this paper, a single classifier based multi-class support vector machine (SVM) and multiple classifier based ensemble SVM (ESVM) techniques are employed for classification of dialects. The effectiveness and performance evaluation of the explored features are carried out on newly collected Kannada speech corpus, with five Kannada dialects and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Experimental results have demonstrated that the derived feature vectors performs better when compared to raw feature vectors. However, ESVM technique has demonstrated better performance over a single SVM. Spectral and prosodic features have resulted individually with the dialect recognition performance of 83.12% and 44.52% respectively. Further, the complementary nature of both spectral and prosodic features is evaluated by combining both feature vectors for dialect recognition. However, an increase in dialect recognition performance of about 86.25% is observed. This indicates the existence of complementary dialect specific evidence with spectral and prosodic features. The experiments conducted on standard IViE corpus have shown a higher recognition rate of 91.38% using ESVM. Proposed ADI systems with derived features have shown better performance over the state-of-the-art i-vector feature based systems on both datasets. © 2019, Springer Nature B.V.
dc.identifier.citationLanguage Resources and Evaluation, 2020, 54, 2, pp. 553-585
dc.identifier.issn1574020X
dc.identifier.urihttps://doi.org/10.1007/s10579-019-09481-5
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/23893
dc.publisherSpringer editorial@springerplus.com
dc.subjectDerived features
dc.subjectDialect identification
dc.subjectEnsemble SVM
dc.subjectIViE dialect dataset
dc.subjectKannada dialect dataset
dc.subjectSingle SVM
dc.subjectSpectral and prosodic features
dc.titleAutomatic dialect identification system for Kannada language using single and ensemble SVM algorithms

Files

Collections