Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 99

A comparative analysis of machine comprehension using deep learning models in code-mixed hindi language
(Springer Verlag service@springer.de, 2019) Viswanathan, S.; Anand Kumar, M.; Padannayil, K.P.
The domain of artificial intelligence revolutionizes the way in which humans interact with machines. Machine comprehension is one of the latest fields under natural language processing that holds the capability for huge improvement in artificial intelligence. Machine comprehension technique gives systems the ability to understand a passage given by user and answer questions asked from it, which is an evolved version of traditional question answering technique. Machine comprehension is a main technique that falls under the category of natural language understanding, which exposes the amount of understanding required for a model to find the area of interest from a passage. The scope for the implementation of this technique is very high in India due to the availability of different regional languages. This work focused on the incorporation of machine comprehension technique in code-mixed Hindi language. A detailed comparison study on the performance of dataset in several deep learning approaches including End to End Memory Network, Dynamic Memory Network, Recurrent Neural Network, Long Short-Term Memory Network and Gated Recurrent Unit are evaluated. The best suited model for the dataset used is identified from the comparison study. A new architecture is proposed in this work by combining two of the best performing networks. To improve the model with respect to various ways of answering questions from a passage the natural language processing technique of distributed word representation was performed on the best model identified. The model was improved by applying pre-trained fastText embeddings for word representations. This is the first implementation of machine comprehension models in code-mixed Hindi language using deep neural networks. The work analyses the performance of all five models implemented, which will be helpful for future researches on Machine Comprehension technique in code-mixed Indian languages. © Springer Nature Switzerland AG 2019.
Embedding linguistic features in word embedding for preposition sense disambiguation in english—Malayalam machine translation context
(Springer Verlag service@springer.de, 2019) Premjith, B.; Padannayil, K.P.; Anand Kumar, M.; Jyothi Ratnam, D.
Preposition sense disambiguation has huge significance in Natural language processing tasks such as Machine Translation. Transferring the various senses of a simple preposition in source language to a set of senses in target language has high complexity due to these many-to-many relationships, particularly in English-Malayalam machine translation. In order to reduce this complexity in the transfer of senses, in this paper, we used linguistic information such as noun class features and verb class features of the respective noun and verb correlated to the target simple preposition. The effect of these linguistic features for the proper classification of the senses (postposition in Malayalam) is studied with the help of several machine learning algorithms. The study showed that, the classification accuracy is higher when both verb and noun class features are taken into consideration. In linguistics, the major factor that decides the sense of the preposition is the noun in the prepositional phrase. The same trend was observed in the study when the training data contained only noun class features. i.e., noun class features dominates the verb class features. © Springer Nature Switzerland AG 2019.
MedNLU: Natural Language Understander for Medical Texts
(Springer Science and Business Media Deutschland GmbH, 2020) Barathi Ganesh, H.B.; Reshma, U.; Padannayil, K.P.; Anand Kumar, M.
Natural Language Understanding is one of the essential tasks for building clinical text-based applications. Understanding of these clinical texts can be achieved through Vector Space Models and Sequential Modelling tasks. This paper is focused on sequential modelling i.e. Named Entity Recognition and Part of Speech Tagging by attaining a state of the art performance of 93.8% as F1 score for i2b2 clinical corpus and achieves 97.29% as F1 score for GENIA corpus. This paper also states the performance of feature fusion by integrating word embedding, feature embedding and character embedding for sequential modelling tasks. We also propose a framework based on a sequential modelling architecture, named MedNLU, which has the capability of performing Part of Speech Tagging, Chunking, and Entity Recognition on clinical texts. The sequence modeler in MedNLU is an integrated framework of Convolutional Neural Network, Conditional Random Fields and Bi-directional Long-Short Term Memory network. © 2020, Springer Nature Switzerland AG.
Ontological Structure-Based Retrieval System for Tamil
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2021) Rajendran, S.; Padannayil, K.P.; Anand Kumar, M.; Sankaralingam, C.
Ontological structure of Tamil (OST) is an outcome of an extensive research activity that went on in the field of lexical semantics of Tamil for the last three decades. Rajendran’s (Semantic structure of Tamil vocabulary. Report of the UGC sponsored postdoctoral work (in manuscript). Deccan College Post-Doctoral Research Institute, Pune, 1983) post-doctoral research work went through several stages before culminating into OST. It depicts the travel from Tamil thesaurus to Tamil WordNet and into OST. OST is a lexical resource which amalgamates all sorts of information available in a dictionary, thesaurus and WordNet. The Dravidian WordNets (in which Tamil WordNet is one of the four components) built under the Indo-WordNet project depended on an ontology developed by Western conceptualization of the world found in English. This has not taken into consideration the Indian conceptualization of the world depicted in the nikhandu tradition. There are many lexical gaps between English WordNet and Tamil WordNet. Moreover, building a WordNet based on Hindi WordNet which in turn is built on English WordNet will take many years to complete and it would miss the conceptualization depicted in Indian tradition. Apart from this, the extension approach of building Tamil WordNet using Hindi WordNet cannot fulfil Dravidian conceptualization. A merger approach of building separate WordNets and collapsing them into one would have been a preferable approach. The present OST tried to overcome the lacunae found in Tamil WordNet. OST is based on the Indian and Dravidian conceptualization and the process of building one is comparatively very simple. We have the plan to mend it into a generic one so that all the Dravidian languages can be easily accommodated into it. © 2021, Springer Nature Switzerland AG.
Semantic Similarity and Paraphrase Identification for Malayalam Using Deep Autoencoders
(Springer Science and Business Media Deutschland GmbH, 2021) Praveena, R.; Anand Kumar, M.; Padannayil, K.P.
In this chapter, we deal with the sentence-level paraphrase identification for the Malayalam language. We use recursive autoencoder architecture for the unsupervised learning of phrase representations to extract features for paraphrase identification. Sentence’s features of varying lengths are converted to fixed-size representation using the convolution method of dynamic pooling. Initially, the Malayalam paraphrase identification system was designed to identify paraphrases and non-paraphrases alone and later extended to identify semi-equivalent paraphrases. Along with semantic features, conventional statistical features are further taken into account, resulting in improved system performance. The proposed system was implemented using word2vec embedding and obtained 77.67% accuracy for the two-class system and 66.07% for the three-class system. This chapter also discusses different experiments done for choosing the best parameters and embedding models. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Categorizing Relations via Semi-supervised Learning Using a Hybrid Tolerance Rough Sets and Genetic Algorithm Approach
(Springer Science and Business Media Deutschland GmbH, 2022) Agrawal, S.; Ahmed, R.; Anand Kumar, M.; Ramanna, S.
In the last few decades, we have seen a tremendous increase in the amount of data available on the web. There have been significant advances in constructing knowledge bases consisting of relations from the text data. These relations are words in the text often represented as pairs (Noun, Context), for example (Disease, Symptom), which can be classified into some predefined category to give us some useful information. Categorization of relations using tolerance-rough set based semi-supervised learning algorithm (TPL) have been successfully demonstrated in several works. However, an unexplored problem is the automatic selection of hyper parameters of the TPL algorithm. This paper proposes a genetic algorithm-based approach (TPL-GA) for optimizing the hyper-parameters that are fundamental to the TPL algorithm. The proposed approach was tested on two standard datasets drawn from different domains representing two different languages: English and Hindi text. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
An Effective Diabetic Retinopathy Detection Using Hybrid Convolutional Neural Network Models
(Springer Science and Business Media Deutschland GmbH, 2023) Kumar, N.; Ahmed, R.; Venkatesh, B.H.; Anand Kumar, M.
Loss of vision in the present era of the developing world is mainly caused by diabetic retinopathy. More than 103 million people are believed to be affected. It is estimated that around 40 million beings have diabetes in the United States, and according to the World Health Organization (WHO), 347 million people are living with the disease globally. Diabetic retinopathy (DR) is a long-term diabetes-related eye condition. Roughly, 45–50% of the American citizens suffering from diabetes undergo some unique stages that can be categorized. When DR is diagnosed on a timely basis, the possibility of it extending to the course of vision impairment can be delayed and stopped, though this is not entirely true and a very daunting task because it seldom reveals any symptom before it escalates to a stage of no return to effectively treat it. The paper uses convolutional neural network models to achieve an effective classification for diabetic detection of retinal fundus images. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Human Emotion Recognition in Smart City Using Transfer Learning-Based Convolutional Neural Network (CNN) Model
(CRC Press, 2023) Sohanraj, R.; Jason Krithik Kumar, S.; Mahesh, R.; Anand Kumar, M.
[No abstract available]
Overview of Arnekt IECSIL at Fire-2018 track on information extraction for conversational systems in Indian languages
(CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2018) Barathi Ganesh, H.; Padannayil, K.P.; Reshma, U.; Kale, M.; Mankame, P.; Kulkarni, G.; Kale, A.; Anand Kumar, M.
This overview paper describes the first shared task on Information Extractor for Conversational Systems in Indian Languages (IECSIL) which has been organized by FIRE 2018. Motivated by the need of Information Extractor, corpora has been developed to perform the Named Entity Recognition (Task A) and Relation Extraction (Task B) for five Indian languages (Hindi, Tamil, Malayalam, Telugu and Kannada). Task A is to identify and classify the named entities to one of the many classes and Task B is to extract the relation among the entities present in the sentences. Altogether, nearly 100 submission of 10 different teams were evaluated. In this paper, we have given an overview of the approaches and also discussed the results that the participated teams have attained. Â© 2018 CEUR-WS. All Rights Reserved.
Overview of the second shared task on Indian native language identification (INLI)
(CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2018) Anand Kumar, M.; Barathi Ganesh, H.; Ajay, S.G.; Padannayil, K.P.
This overview paper describes the second shared task on Indian Native Language Identification (INLI) that was organized by FIRE 2018. Given a corpus with comments in English from various Facebook newspapers pages, the objective of the task is to identify the native language among the following six Indian languages: Bengali, Hindi, Kannada, Malayalam, Tamil, and Telugu. Altogether, 31 approaches of 14 different teams are evaluated. In this paper, we report the overview of the participantâ€™s systems and the results of second INLI shared task. We have also compared the results of the first INLI shared task conducted with FIRE-2017. Â© 2018 CEUR-WS. All Rights Reserved.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results