An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition

dc.contributor.authorAntony, A.
dc.contributor.authorKota, S.R.
dc.contributor.authorLade, A.
dc.contributor.authorSpoorthy, V.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-06T06:35:28Z
dc.date.issued2022
dc.description.abstractDue to the extensive usage of technology in many languages throughout the world, interest in Automatic Speech Recognition (ASR) systems for Code-Switching (CS) in speech has grown in recent years. Several studies have shown that End-to-End (E2E) ASR is easier to adopt and works much better in monolingual settings. E2E systems are likewise widely recognised for requiring massive quantities of labelled speech data. Since there is a scarcity in the availability of large amount of CS speech, E2E ASR takes longer computation time and does not offer promising results. In this work, an E2E ASR model system using a transformer-transducer architecture is introduced for code-switched Hindi-English speech, and also addressed training data scarcity by leveraging the vastly available monolingual data. Specifically, the language-specific modules in the Transformer are pre-trained by leveraging the vastly available single language speech datasets. The proposed method also provides a Word Error Rate (WER) of 29.63% and Transliterated Word Error Rate (T-WER) of 27.42% which is better than the state-of-the-art by 2.19%. © © 2022 ISCA.
dc.identifier.citationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, Vol.2022-September, , p. 3123-3127
dc.identifier.issn2308457X
dc.identifier.urihttps://doi.org/10.21437/Interspeech.2022-10763
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/29873
dc.publisherInternational Speech Communication Association
dc.subjectAutomatic Speech Recognition
dc.subjectCode-Switching
dc.subjectTransducer
dc.subjectTransformer
dc.titleAn Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition

Files