Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Overview of the Shared Task on Machine Translation in Dravidian Languages
    (Association for Computational Linguistics (ACL), 2022) Anand Kumar, A.M.; Hegde, A.; Banerjee, S.; Chakravarthi, B.R.; Priyadarshini, R.; Shashirekha, H.L.; Mccrae, J.P.
    This paper presents an outline of the shared task on translation of under-resourced Dravidian languages at DravidianLangTech-2022 workshop to be held jointly with ACL 2022. A description of the datasets used, approach taken for analysis of submissions and the results have been illustrated in this paper. Five sub-tasks organized as a part of the shared task include the following translation pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Sanskrit, Kannada to Malayalam and Kannada to Tulu. Training, development and test datasets were provided to all participants and results were evaluated on the gold standard datasets. A total of 16 research groups participated in the shared task and a total of 12 submission runs were made for evaluation. Bilingual Evaluation Understudy (BLEU) score was used for evaluation of the translations. © 2022 Association for Computational Linguistics.
  • Item
    A Study of Machine Translation Models for Kannada-Tulu
    (Springer Science and Business Media Deutschland GmbH, 2023) Hegde, A.; Shashirekha, H.L.; Anand Kumar, M.; Chakravarthi, B.R.
    Over the past ten years, neural machine translation (NMT) has seen tremendous growth and is now entering a phase of maturity. Despite being the most popular solution for machine translation (MT), it performs sub-optimally on under-resourced language pairs due to lack of parallel corpora as compared to high-resourced language pairs. The implementation of NMT techniques for under-resourced language pairs is receiving the attention of researchers and has resulted in a significant amount of research for many under-resourced language pairs. In view of the growth of MT, this paper describes a set of practical approaches for investigating MT between Kannada and Tulu. These two languages belong to the family of Dravidian languages and are under-resourced due to lack of tools and resources particularly the parallel corpus for MT. Since there are no parallel corpora for the Kannada-Tulu language pair for MT, this work aims to construct a parallel corpus for this language pair. As manual construction of parallel corpus is laborious, data augmentation is introduced to enhance the size of the parallel corpus along with suitable preprocessing techniques. Different NMT schemes such as recurrent neural network (RNN) baseline, bidirectional recurrent neural network (BiRNN), transformer-based NMT with and without subword tokenization, and statistical machine translation (SMT) models are implemented for MT of Kannada-Tulu and Tulu-Kannada language pairs. Empirical results reveal that the impact of data augmentation increases the bilingual evaluation understudy (BLEU) score of the proposed models. Transformer-based models with subword tokenization outperformed the other models with BLEU scores 41.82 and 40.91 for Kannada-Tulu and Tulu-Kannada MT, respectively. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.