Chittaragi, N.B.Koolagudi, S.G.2026-02-052021Computer Speech and Language, 2021, 70, , pp. -8852308https://doi.org/10.1016/j.csl.2021.101230https://idr.nitk.ac.in/handle/123456789/23019The present work proposes a text-independent dialect identification system. Generally, dialects of a language exhibit varying pronunciation styles followed in a specific geographical region. In this paper, chroma features familiar with music-related systems are employed for identification of dialects. In addition, eight significant spectral shape related features from short term spectra are computed and combined along with chroma features and named as chroma-spectral shape features. Chroma features try to aggregate spectral information and attempt to encapsulate the evidential variations, concerning timbre, correlated melody, rhythmic, and intonation patterns found prominently among dialects of few languages. The effectiveness of the proposed features and approach is evaluated on five prominent Kannada dialects spoken in Karnataka, India and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Discriminative models such as, single classifier based Support Vector Machine (SVM) and ensemble based support vector machines (ESVM) are employed for classification. The proposed features have shown better performance over state-of-the-art i-vector features on both datasets. The highest recognition performance of 95.6% and 97.52% are achieved in the cases of Kannada and IViE dialect datasets respectively using ESVM. Proposed features have also demonstrated robust performance with small sized (limited data) audio clips even in noisy conditions. © 2021 Elsevier LtdHuman computer interactionSoftware engineeringDialect identificationDiscriminative modelsEnsemble techniquesRobust performanceShort-term spectrumSpectral informationState of the artText independentsSupport vector machinesDialect Identification using Chroma-Spectral Shape Features with Ensemble Technique