A Study of Machine Translation Models for Kannada-Tulu

dc.contributor.authorHegde, A.
dc.contributor.authorShashirekha, H.L.
dc.contributor.authorAnand Kumar, M.
dc.contributor.authorChakravarthi, B.R.
dc.date.accessioned2026-02-06T06:35:06Z
dc.date.issued2023
dc.description.abstractOver the past ten years, neural machine translation (NMT) has seen tremendous growth and is now entering a phase of maturity. Despite being the most popular solution for machine translation (MT), it performs sub-optimally on under-resourced language pairs due to lack of parallel corpora as compared to high-resourced language pairs. The implementation of NMT techniques for under-resourced language pairs is receiving the attention of researchers and has resulted in a significant amount of research for many under-resourced language pairs. In view of the growth of MT, this paper describes a set of practical approaches for investigating MT between Kannada and Tulu. These two languages belong to the family of Dravidian languages and are under-resourced due to lack of tools and resources particularly the parallel corpus for MT. Since there are no parallel corpora for the Kannada-Tulu language pair for MT, this work aims to construct a parallel corpus for this language pair. As manual construction of parallel corpus is laborious, data augmentation is introduced to enhance the size of the parallel corpus along with suitable preprocessing techniques. Different NMT schemes such as recurrent neural network (RNN) baseline, bidirectional recurrent neural network (BiRNN), transformer-based NMT with and without subword tokenization, and statistical machine translation (SMT) models are implemented for MT of Kannada-Tulu and Tulu-Kannada language pairs. Empirical results reveal that the impact of data augmentation increases the bilingual evaluation understudy (BLEU) score of the proposed models. Transformer-based models with subword tokenization outperformed the other models with BLEU scores 41.82 and 40.91 for Kannada-Tulu and Tulu-Kannada MT, respectively. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
dc.identifier.citationLecture Notes in Networks and Systems, 2023, Vol.608, , p. 145-161
dc.identifier.issn23673370
dc.identifier.urihttps://doi.org/10.1007/978-981-19-9225-4_12
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/29632
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.subjectKannada
dc.subjectMachine translation
dc.subjectNeural machine translation
dc.subjectTulu
dc.subjectUnder-resourced languages
dc.titleA Study of Machine Translation Models for Kannada-Tulu

Files