Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
17 results
Search Results
Item Extraction of named entities from social media text in tamil language using N-gram embedding for disaster management(Springer Verlag service@springer.de, 2020) Remmiya Devi, G.R.; Anand Kumar, M.A.; Padannayil, K.P.In the present era, data in any form is considered with greater importance. More specifically, text data has rich and brief information than any other form of data. Extraction and analysis of these data can result in various new findings through text analytics. This has led to applications such as search engines, extraction of product names, sentiment analysis, document classification and few more. Companies are much focused on sentimental analysis to review the positive, negative and neutral comments for their products. Summarization of text is a notable application of Natural Language Processing that reveals the gist of brief documents. Apart from these, on concerning welfare of the society, application based on information extraction can be developed. Handling an emergency situation requires collection of vast information. Extraction of such data can be supportive during disaster management. In order to perceive such task, system must learn the meaning of human languages. To ease the accessibility of text data across language barriers is the primary motive of Natural Language Processing (NLP) systems. The proposed systems has utilized word embedding model, specifically skip gram model to implement the most fundamental task of NLP—entity extraction in social media text. Implementation of N-gram embedding methods paved way for creation of rich context knowledge for the system to handle social media text. Classification of named entities using the proposed system has been carried out using machine learning classifier Support Vector Machine (SVM). © Springer Nature Switzerland AG 2020.Item Image Manipulation Detection Using Augmentation and Convolutional Neural Networks(Springer Science and Business Media Deutschland GmbH, 2024) Maheshwari, A.; Jain, R.; Mahapatra, R.; Palakuru, S.; Anand Kumar, M.A.Image tampering is now simpler than ever, thanks to the explosion of digital photos and the creation of easy image modification tools. As a result, if the situation is not handled properly, the major problems may arise. Many computer vision and deep learning strategies have been put out over the years to address the problem. Having said that, people can easily recognize the photographs that were used in that research. This begs the key question of how CNNs might do on more difficult samples. In this chapter, we build a complex CNN network and use various machine learning algorithms to classify the images and compare the accuracies obtained by them. Its performance is also compared on two different datasets. Additionally, we assess the impact of various hyperparameters and a data augmentation strategy on classification performance. This leads to a conclusion that performance can be considerably impacted by dataset difficulty. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.Item Generating Synthetic Text Data for Improving Class Balance in Personality Prediction(Springer Science and Business Media Deutschland GmbH, 2024) Lakhtaria, D.; L, D.H.; Chhabra, R.; Taparia, R.; Anand Kumar, M.A.The growing popularity of social media as a means of self-expression and self-discovery has sparked a heightened curiosity in utilizing the Myers–Briggs Type Indicator (MBTI) to investigate human personalities. Despite the increasing use of word-embedding techniques, machine learning algorithms, and imbalanced data-handling techniques to predict MBTI personality types, further research is needed to explore how these approaches can enhance the accuracy of the results. Our research aimed to use the GPT model to address the problem of class imbalance. We have implemented several machine learning models such as RCNN, LSTM, XGBoost, and Random Forest. We have also tried using two-word embedding including Word2Vec and GloVe Embedding. According to our findings, the approach we used can attain a considerably high F1-score, which is dependent on the selected model for the prediction and classification of MBTI personality. The ability to accurately predict and classify MBTI personality through our approach has the potential to improve our comprehension of MBTI. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.Item KCE DALab-APDA@FIRE2019: Author profiling and deception detection in Arabic using weighted embedding(CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2019) Sharmila Devi, V.; Subramanian, S.; Ravikumar, G.; Anand Kumar, M.A.This paper explaining the work submitted on Author Pro- filing and Deception Detection in Arabic Tweets shared task organized at the Forum for Information Retrieval Evaluation (FIRE) 2019. The first task Author profiling illustrates identifying the categories of au- thors based on the Arabic tweets. In the second task, the aim is to Detect deception in Arabic for two genres such as Twitter and News. Deception detection means that the automatic way of identifying false messages in the text content on social network or news. For each task, we have submitted three different systems. For submission 1, we have used the Term Frequency and Inverse Document Frequency (TFIDF) based Support Vector Machine classification and in submission 2, we have used fastText classifier. For submission 3, we have proposed a low dimensional weighted document embedding (TFIDF + Word embedding) with SVM classification. We have attained second place in the Deception detection and third in Author profiling. The performance difference between the top team results and the submitted runs are only 3.34% for Author pro- filing and 1.16% for Deception detection. © Copyright 2019 for this paper by its authors.Item ARS NITK at MEDIQA 2019: Analysing various methods for natural language inference, recognising question entailment and medical question answering system(Association for Computational Linguistics (ACL), 2019) Agrawal, A.; George, R.A.; Ravi, S.S.; Kamath S․, S.S.; Anand Kumar, M.A.This paper includes approaches we have taken for Natural Language Inference, Question Entailment Recognition and Question-Answering tasks to improve domain-specific Information Retrieval. Natural Language Inference (NLI) is a task that aims to determine if a given hypothesis is an entailment, contradiction or is neutral to the given premise. Recognizing Question Entailment (RQE) focuses on identifying entailment between two questions while the objective of Question-Answering (QA) is to filter and improve the ranking of automatically retrieved answers. For addressing the NLI task, the UMLS Metathesaurus was used to find the synonyms of medical terms in given sentences, on which the InferSent model was trained to predict if the given sentence is an entailment, contradictory or neutral. We also introduce a new Extreme gradient boosting model built on PubMed embeddings to perform RQE. Further, a closed-domain Question Answering technique that uses Bi-directional LSTMs trained on the SquAD dataset to determine relevant ranks of answers for a given question is also discussed. Experimental validation showed that the proposed models achieved promising results. © 2019 Association for Computational LinguisticsItem Intrinsic evaluation for english–tamil bilingual word embeddings(Springer Verlag service@springer.de, 2020) Jp, J.P.; Krishna Menon, V.K.; Rajendran, S.; Padannayil, K.P.; Anand Kumar, M.A.Despite the growth of bilingual word embeddings, there is no work done so far, for directly evaluating them for English–Tamil language pair. In this paper, we present a data resource and evaluation for the English–Tamil bilingual word vector model. In this paper, we present dataset and the evaluation paradigm for English–Tamil bilingual language pair. This dataset contains words that covers a range of concepts that occur in natural language. The dataset is scored based on the similarity rather than association or relatedness. Hence, the word pairs that are associated but not literally similar have a low rating. The measures are quantified further to ensure consistency in the dataset, mimicking the cognitive phenomena. Henceforth, the dataset can be used by non-native speakers, with minimal effort. We also present some inferences and insights into the semantics captured by word vectors and human cognition. © Springer Nature Singapore Pte Ltd. 2020.Item Dynamic mode-based feature with random mapping for sentiment analysis(Springer Verlag service@springer.de, 2020) Sachin Kumar, S.; Anand Kumar, M.A.; Padannayil, K.P.; Poornachandran, P.Sentiment analysis (SA) or polarity identification is a research topic which receives considerable number of attention. The work in this research attempts to explore the sentiments or opinions in text data related to any event, politics, movies, product reviews, sports, etc. The present article discusses the use of dynamic modes from dynamic mode decomposition (DMD) method with random mapping for sentiment classification. Random mapping is performed using random kitchen sink (RKS) method. The present work aims to explore the use of dynamic modes as the feature for sentiment classification task. In order to conduct the experiment and analysis, the dataset used consists of tweets from SAIL 2015 shared task (tweets in Tamil, Bengali, Hindi) and Malayalam languages. The dataset for Malayalam is prepared by us for the work. The evaluations are performed using accuracy, F1-score, recall, and precision. It is observed from the evaluations that the proposed approach provides competing result. © Springer Nature Singapore Pte Ltd. 2020.Item Analyzing Banking Services Applicability Using Explainable Artificial Intelligence(Association for Computing Machinery, 2022) Sriram, A.; Gorti, S.S.; Amin, E.G.; Anand Kumar, M.A.Over the last few years, the banking sector has had a pivotal role to play in the global economy, comprising of about 24% of the global GDP and employing millions of people worldwide. Banks have a wide array of products and services to offer, ranging from ATMs, Tele-Banking, Credit Cards, Debit cards, Electronic Fund Transfers (EFT), Internet Banking, Mobile Banking, etc. Machine learning is a method of data analysis that automates analytical model building and can be an essential decision support tool for banks in providing services to certain customers and to help in improving customer satisfaction and experience based on collected data. In this study, we made use of several machine learning models and Artificial Neural Networks (ANN) to help banks make predictions about timely customer loan repayment and customer satisfaction. We explored different machine learning algorithms and have performed SHAP analysis, which has helped make conclusions about the significant features driving these decisions. © 2022 ACM.Item Multi-Vehicle Tracking and Speed Estimation Model using Deep Learning(Association for Computing Machinery, 2022) Prajwal, K.; Navaneeth, P.; Tharun, K.; Anand Kumar, M.A.Speed estimation of vehicles is one of the prime application of speed estimation of moving objects. The YOLOv5 model has proven to have a very good accuracy in detecting moving objects in real-time. The vehicles on the road are extracted from each frame of the video by running it through a custom YOLOv5 object detector. The YOLO model splits the frame into a grid and each grid detects a vehicle within itself. An instance identifier tracks the vehicle across the frames. The tracking algorithm computes deep features for every bounding box and utilizes the similarities within the deep features to identify and track the object. The pixel per meter metric has to adjusted based on perspective after which the speed of the vehicle can be estimated. Finally a comparison of our model metrics with the existing state of the art models is provided. © 2022 ACM.Item Representation Learning in Continuous-Time Dynamic Signed Networks(Association for Computing Machinery, 2023) Sharma, K.; Raghavendra, M.; Lee, Y.-C.; Anand Kumar, M.A.; Kumar, S.Signed networks allow us to model conflicting relationships and interactions, such as friend/enemy and support/oppose. These signed interactions happen in real-time. Modeling such dynamics of signed networks is crucial to understanding the evolution of polarization in the network and enabling effective prediction of the signed structure (i.e., link signs) in the future. However, existing works have modeled either (static) signed networks or dynamic (unsigned) networks but not dynamic signed networks. Since both sign and dynamics inform the graph structure in different ways, it is non-trivial to model how to combine the two features. In this work, we propose a new Graph Neural Network (GNN)-based approach to model dynamic signed networks, named SEMBA: Signed link's Evolution using Memory modules and Balanced Aggregation. Here, the idea is to incorporate the signs of temporal interactions using separate modules guided by balance theory and to evolve the embeddings from a higher-order neighborhood. Experiments on 4 real-world datasets and 3 different tasks demonstrate that SEMBA consistently and significantly outperforms the baselines by up to 80% on the tasks of predicting signs of future links while matching the state-of-the-art performance on predicting existence of these links in the future. We find that this improvement is due specifically to superior performance of SEMBA on the minority negative class. Code is made available at https://github.com/claws-lab/semba. © 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0124-5/23/10.
