Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
3 results
Search Results
Item Embedding linguistic features in word embedding for preposition sense disambiguation in english—Malayalam machine translation context(Springer Verlag service@springer.de, 2019) Premjith, B.; Padannayil, K.P.; Anand Kumar, M.; Jyothi Ratnam, D.Preposition sense disambiguation has huge significance in Natural language processing tasks such as Machine Translation. Transferring the various senses of a simple preposition in source language to a set of senses in target language has high complexity due to these many-to-many relationships, particularly in English-Malayalam machine translation. In order to reduce this complexity in the transfer of senses, in this paper, we used linguistic information such as noun class features and verb class features of the respective noun and verb correlated to the target simple preposition. The effect of these linguistic features for the proper classification of the senses (postposition in Malayalam) is studied with the help of several machine learning algorithms. The study showed that, the classification accuracy is higher when both verb and noun class features are taken into consideration. In linguistics, the major factor that decides the sense of the preposition is the noun in the prepositional phrase. The same trend was observed in the study when the training data contained only noun class features. i.e., noun class features dominates the verb class features. © Springer Nature Switzerland AG 2019.Item CNN-GRU: Transforming image into sentence using GRU and attention mechanism(Grenze Scientific Society, 2021) Saini, G.; Patil, N.Recent advancement of the deep neural network has triggered great attention in both Natural Language Processing (NLP) and Computer Vision (CV). It provides an efficient way of understanding semantic and syntactic structure which can deal with complex task such as automatic image captioning. Image captioning methodology mainly based on the encoder-decoder approach. In the present work, we developed a CNN-GRU model using Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and attention mechanism. Here VGG16 is used as an encoder, GRU and attention mechanism are used as a decoder. Our model has shown significant improvement compared to other state-of-art encoder-decoder models on the famous MSCOCO data set. Further, the time taken to train and test our model is two-third as compared to other similar models such as CNN-CNN and CNN-RNN. © Grenze Scientific Society, 2021.Item A Study of Machine Translation Models for Kannada-Tulu(Springer Science and Business Media Deutschland GmbH, 2023) Hegde, A.; Shashirekha, H.L.; Anand Kumar, M.; Chakravarthi, B.R.Over the past ten years, neural machine translation (NMT) has seen tremendous growth and is now entering a phase of maturity. Despite being the most popular solution for machine translation (MT), it performs sub-optimally on under-resourced language pairs due to lack of parallel corpora as compared to high-resourced language pairs. The implementation of NMT techniques for under-resourced language pairs is receiving the attention of researchers and has resulted in a significant amount of research for many under-resourced language pairs. In view of the growth of MT, this paper describes a set of practical approaches for investigating MT between Kannada and Tulu. These two languages belong to the family of Dravidian languages and are under-resourced due to lack of tools and resources particularly the parallel corpus for MT. Since there are no parallel corpora for the Kannada-Tulu language pair for MT, this work aims to construct a parallel corpus for this language pair. As manual construction of parallel corpus is laborious, data augmentation is introduced to enhance the size of the parallel corpus along with suitable preprocessing techniques. Different NMT schemes such as recurrent neural network (RNN) baseline, bidirectional recurrent neural network (BiRNN), transformer-based NMT with and without subword tokenization, and statistical machine translation (SMT) models are implemented for MT of Kannada-Tulu and Tulu-Kannada language pairs. Empirical results reveal that the impact of data augmentation increases the bilingual evaluation understudy (BLEU) score of the proposed models. Transformer-based models with subword tokenization outperformed the other models with BLEU scores 41.82 and 40.91 for Kannada-Tulu and Tulu-Kannada MT, respectively. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
