A Study of Machine Translation Models for Kannada-Tulu

Hegde, A.; Shashirekha, H.L.; Anand Kumar, M.; Chakravarthi, B.R.

A Study of Machine Translation Models for Kannada-Tulu

dc.contributor.author	Hegde, A.
dc.contributor.author	Shashirekha, H.L.
dc.contributor.author	Anand Kumar, M.
dc.contributor.author	Chakravarthi, B.R.
dc.date.accessioned	2026-02-06T06:35:06Z
dc.date.issued	2023
dc.description.abstract	Over the past ten years, neural machine translation (NMT) has seen tremendous growth and is now entering a phase of maturity. Despite being the most popular solution for machine translation (MT), it performs sub-optimally on under-resourced language pairs due to lack of parallel corpora as compared to high-resourced language pairs. The implementation of NMT techniques for under-resourced language pairs is receiving the attention of researchers and has resulted in a significant amount of research for many under-resourced language pairs. In view of the growth of MT, this paper describes a set of practical approaches for investigating MT between Kannada and Tulu. These two languages belong to the family of Dravidian languages and are under-resourced due to lack of tools and resources particularly the parallel corpus for MT. Since there are no parallel corpora for the Kannada-Tulu language pair for MT, this work aims to construct a parallel corpus for this language pair. As manual construction of parallel corpus is laborious, data augmentation is introduced to enhance the size of the parallel corpus along with suitable preprocessing techniques. Different NMT schemes such as recurrent neural network (RNN) baseline, bidirectional recurrent neural network (BiRNN), transformer-based NMT with and without subword tokenization, and statistical machine translation (SMT) models are implemented for MT of Kannada-Tulu and Tulu-Kannada language pairs. Empirical results reveal that the impact of data augmentation increases the bilingual evaluation understudy (BLEU) score of the proposed models. Transformer-based models with subword tokenization outperformed the other models with BLEU scores 41.82 and 40.91 for Kannada-Tulu and Tulu-Kannada MT, respectively. Â© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
dc.identifier.citation	Lecture Notes in Networks and Systems, 2023, Vol.608, , p. 145-161
dc.identifier.issn	23673370
dc.identifier.uri	https://doi.org/10.1007/978-981-19-9225-4_12
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/29632
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Kannada
dc.subject	Machine translation
dc.subject	Neural machine translation
dc.subject	Tulu
dc.subject	Under-resourced languages
dc.title	A Study of Machine Translation Models for Kannada-Tulu

Collections

Conference Papers

A Study of Machine Translation Models for Kannada-Tulu

Files

Collections