Browsing by Author "Anand Kumar, M.A."

Now showing 1 - 17 of 17

An overview of the shared task on machine translation in Indian languages (MTIL)-2017
(De Gruyter peter.golla@degruyter.com, 2019) Anand Kumar, M.A.; Premjith, B.; Singh, S.; Rajendran, S.; Padannayil, K.P.
In recent years, the multilingual content over the internet has grown exponentially together with the evolution of the internet. The usage of multilingual content is excluded from the regional language users because of the language barrier. So, machine translation between languages is the only possible solution to make these contents available for regional language users. Machine translation is the process of translating a text from one language to another. The machine translation system has been investigated well already in English and other European languages. However, it is still a nascent stage for Indian languages. This paper presents an overview of the Machine Translation in Indian Languages shared task conducted on September 7-8, 2017, at Amrita Vishwa Vidyapeetham, Coimbatore, India. This machine translation shared task in Indian languages is mainly focused on the development of English-Tamil, English-Hindi, English-Malayalam and English-Punjabi language pairs. This shared task aims at the following objectives: (a) to examine the state-of-the-art machine translation systems when translating from English to Indian languages; (b) to investigate the challenges faced in translating between English to Indian languages; (c) to create an open-source parallel corpus for Indian languages, which is lacking. Evaluating machine translation output is another challenging task especially for Indian languages. In this shared task, we have evaluated the participant's outputs with the help of human annotators. As far as we know, this is the first shared task which depends completely on the human evaluation. © 2019 Walter de Gruyter GmbH, Berlin/Boston.
Analyzing Banking Services Applicability Using Explainable Artificial Intelligence
(Association for Computing Machinery, 2022) Sriram, A.; Gorti, S.S.; Amin, E.G.; Anand Kumar, M.A.
Over the last few years, the banking sector has had a pivotal role to play in the global economy, comprising of about 24% of the global GDP and employing millions of people worldwide. Banks have a wide array of products and services to offer, ranging from ATMs, Tele-Banking, Credit Cards, Debit cards, Electronic Fund Transfers (EFT), Internet Banking, Mobile Banking, etc. Machine learning is a method of data analysis that automates analytical model building and can be an essential decision support tool for banks in providing services to certain customers and to help in improving customer satisfaction and experience based on collected data. In this study, we made use of several machine learning models and Artificial Neural Networks (ANN) to help banks make predictions about timely customer loan repayment and customer satisfaction. We explored different machine learning algorithms and have performed SHAP analysis, which has helped make conclusions about the significant features driving these decisions. Â© 2022 ACM.
ARS NITK at MEDIQA 2019: Analysing various methods for natural language inference, recognising question entailment and medical question answering system
(Association for Computational Linguistics (ACL), 2019) Agrawal, A.; George, R.A.; Ravi, S.S.; Kamath Sâ€¤, S.S.; Anand Kumar, M.A.
This paper includes approaches we have taken for Natural Language Inference, Question Entailment Recognition and Question-Answering tasks to improve domain-specific Information Retrieval. Natural Language Inference (NLI) is a task that aims to determine if a given hypothesis is an entailment, contradiction or is neutral to the given premise. Recognizing Question Entailment (RQE) focuses on identifying entailment between two questions while the objective of Question-Answering (QA) is to filter and improve the ranking of automatically retrieved answers. For addressing the NLI task, the UMLS Metathesaurus was used to find the synonyms of medical terms in given sentences, on which the InferSent model was trained to predict if the given sentence is an entailment, contradictory or neutral. We also introduce a new Extreme gradient boosting model built on PubMed embeddings to perform RQE. Further, a closed-domain Question Answering technique that uses Bi-directional LSTMs trained on the SquAD dataset to determine relevant ranks of answers for a given question is also discussed. Experimental validation showed that the proposed models achieved promising results. Â© 2019 Association for Computational Linguistics
Depression Severity Detection from Social Media Posts
(Springer Science and Business Media Deutschland GmbH, 2024) Recharla, N.; Bolimera, P.; Gupta, Y.; Anand Kumar, M.A.
Regardless of age, gender, or color, mental health problems affect people all over the world. People feel increasingly at ease sharing their opinions on social networking sites (SNS) practically every day in the present era of communication and technology. Reddit is a social networking site that consists of subreddits, or single-topic communities, that are created, maintained, and frequented by anonymous users. The dataset used in the paper is, eRisk2021 dataset provided for task 3, which is used for depression severity measurement. It consists posts of Reddit users. In this paper, the approach involves finding user depression severity based on their Reddit history with the help of the BDI-II questionnaire, which is discussed. The paper provides three different approaches in finding the users depression severity from their social media data. Â© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
Detecting Suicide Risk Patterns using Hierarchical Attention Networks with Large Language Models
(Association for Computational Linguistics (ACL), 2024) Koushik, L.; Vishruth, M.; Anand Kumar, M.A.
Suicide has become a major public health and social concern in the world . This Paper looks into a method through use of LLMs (Large Language Model) to extract the likely reason for a person to attempt suicide, through analysis of their social media text posts detailing about the event, using this data we can extract the reason for the cause such mental state which can provide support for suicide prevention. This submission presents our approach for CLPsych Shared Task 2024. Our model uses Hierarchical Attention Networks (HAN) and Llama2 for finding supporting evidence about an individualâ€™s suicide risk level. Â©2024 Association for Computational Linguistics.
Dynamic mode-based feature with random mapping for sentiment analysis
(Springer Verlag service@springer.de, 2020) Sachin Kumar, S.; Anand Kumar, M.A.; Padannayil, K.P.; Poornachandran, P.
Sentiment analysis (SA) or polarity identification is a research topic which receives considerable number of attention. The work in this research attempts to explore the sentiments or opinions in text data related to any event, politics, movies, product reviews, sports, etc. The present article discusses the use of dynamic modes from dynamic mode decomposition (DMD) method with random mapping for sentiment classification. Random mapping is performed using random kitchen sink (RKS) method. The present work aims to explore the use of dynamic modes as the feature for sentiment classification task. In order to conduct the experiment and analysis, the dataset used consists of tweets from SAIL 2015 shared task (tweets in Tamil, Bengali, Hindi) and Malayalam languages. The dataset for Malayalam is prepared by us for the work. The evaluations are performed using accuracy, F1-score, recall, and precision. It is observed from the evaluations that the proposed approach provides competing result. Â© Springer Nature Singapore Pte Ltd. 2020.
Extraction of named entities from social media text in tamil language using N-gram embedding for disaster management
(Springer Verlag service@springer.de, 2020) Remmiya Devi, G.R.; Anand Kumar, M.A.; Padannayil, K.P.
In the present era, data in any form is considered with greater importance. More specifically, text data has rich and brief information than any other form of data. Extraction and analysis of these data can result in various new findings through text analytics. This has led to applications such as search engines, extraction of product names, sentiment analysis, document classification and few more. Companies are much focused on sentimental analysis to review the positive, negative and neutral comments for their products. Summarization of text is a notable application of Natural Language Processing that reveals the gist of brief documents. Apart from these, on concerning welfare of the society, application based on information extraction can be developed. Handling an emergency situation requires collection of vast information. Extraction of such data can be supportive during disaster management. In order to perceive such task, system must learn the meaning of human languages. To ease the accessibility of text data across language barriers is the primary motive of Natural Language Processing (NLP) systems. The proposed systems has utilized word embedding model, specifically skip gram model to implement the most fundamental task of NLP—entity extraction in social media text. Implementation of N-gram embedding methods paved way for creation of rich context knowledge for the system to handle social media text. Classification of named entities using the proposed system has been carried out using machine learning classifier Support Vector Machine (SVM). © Springer Nature Switzerland AG 2020.
Findings of the First Shared Task on Offensive Span Identification from Code-Mixed Kannada-English Comments
(Association for Computational Linguistics (ACL), 2024) Ravikiran, M.; Rajalakshmi, R.; Chakravarthi, B.; Anand Kumar, M.A.; Thavareesan, S.
Effectively managing offensive content is crucial on social media platforms to encourage positive online interactions. However, addressing offensive contents in code-mixed Dravidian languages faces challenges, as current moderation methods focus on flagging entire comments rather than pinpointing specific offensive segments. This limitation stems from a lack of annotated data and accessible systems designed to identify offensive language sections. To address this, our shared task presents a dataset comprising Kannada-English code-mixed social comments, encompassing offensive comments. This paper outlines the dataset, the utilized algorithms, and the results obtained by systems participating in this shared task. Â© 2024 Association for Computational Linguistics.
Generating Synthetic Text Data for Improving Class Balance in Personality Prediction
(Springer Science and Business Media Deutschland GmbH, 2024) Lakhtaria, D.; L, D.H.; Chhabra, R.; Taparia, R.; Anand Kumar, M.A.
The growing popularity of social media as a means of self-expression and self-discovery has sparked a heightened curiosity in utilizing the Myers–Briggs Type Indicator (MBTI) to investigate human personalities. Despite the increasing use of word-embedding techniques, machine learning algorithms, and imbalanced data-handling techniques to predict MBTI personality types, further research is needed to explore how these approaches can enhance the accuracy of the results. Our research aimed to use the GPT model to address the problem of class imbalance. We have implemented several machine learning models such as RCNN, LSTM, XGBoost, and Random Forest. We have also tried using two-word embedding including Word2Vec and GloVe Embedding. According to our findings, the approach we used can attain a considerably high F1-score, which is dependent on the selected model for the prediction and classification of MBTI personality. The ability to accurately predict and classify MBTI personality through our approach has the potential to improve our comprehension of MBTI. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Image Manipulation Detection Using Augmentation and Convolutional Neural Networks
(Springer Science and Business Media Deutschland GmbH, 2024) Maheshwari, A.; Jain, R.; Mahapatra, R.; Palakuru, S.; Anand Kumar, M.A.
Image tampering is now simpler than ever, thanks to the explosion of digital photos and the creation of easy image modification tools. As a result, if the situation is not handled properly, the major problems may arise. Many computer vision and deep learning strategies have been put out over the years to address the problem. Having said that, people can easily recognize the photographs that were used in that research. This begs the key question of how CNNs might do on more difficult samples. In this chapter, we build a complex CNN network and use various machine learning algorithms to classify the images and compare the accuracies obtained by them. Its performance is also compared on two different datasets. Additionally, we assess the impact of various hyperparameters and a data augmentation strategy on classification performance. This leads to a conclusion that performance can be considerably impacted by dataset difficulty. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Intrinsic evaluation for englishâ€“tamil bilingual word embeddings
(Springer Verlag service@springer.de, 2020) Jp, J.P.; Krishna Menon, V.K.; Rajendran, S.; Padannayil, K.P.; Anand Kumar, M.A.
Despite the growth of bilingual word embeddings, there is no work done so far, for directly evaluating them for Englishâ€“Tamil language pair. In this paper, we present a data resource and evaluation for the Englishâ€“Tamil bilingual word vector model. In this paper, we present dataset and the evaluation paradigm for Englishâ€“Tamil bilingual language pair. This dataset contains words that covers a range of concepts that occur in natural language. The dataset is scored based on the similarity rather than association or relatedness. Hence, the word pairs that are associated but not literally similar have a low rating. The measures are quantified further to ensure consistency in the dataset, mimicking the cognitive phenomena. Henceforth, the dataset can be used by non-native speakers, with minimal effort. We also present some inferences and insights into the semantics captured by word vectors and human cognition. Â© Springer Nature Singapore Pte Ltd. 2020.
KCE DALab-APDA@FIRE2019: Author profiling and deception detection in Arabic using weighted embedding
(CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2019) Sharmila Devi, V.; Subramanian, S.; Ravikumar, G.; Anand Kumar, M.A.
This paper explaining the work submitted on Author Pro- filing and Deception Detection in Arabic Tweets shared task organized at the Forum for Information Retrieval Evaluation (FIRE) 2019. The first task Author profiling illustrates identifying the categories of au- thors based on the Arabic tweets. In the second task, the aim is to Detect deception in Arabic for two genres such as Twitter and News. Deception detection means that the automatic way of identifying false messages in the text content on social network or news. For each task, we have submitted three different systems. For submission 1, we have used the Term Frequency and Inverse Document Frequency (TFIDF) based Support Vector Machine classification and in submission 2, we have used fastText classifier. For submission 3, we have proposed a low dimensional weighted document embedding (TFIDF + Word embedding) with SVM classification. We have attained second place in the Deception detection and third in Author profiling. The performance difference between the top team results and the submitted runs are only 3.34% for Author pro- filing and 1.16% for Deception detection. Â© Copyright 2019 for this paper by its authors.
Multi-Vehicle Tracking and Speed Estimation Model using Deep Learning
(Association for Computing Machinery, 2022) Prajwal, K.; Navaneeth, P.; Tharun, K.; Anand Kumar, M.A.
Speed estimation of vehicles is one of the prime application of speed estimation of moving objects. The YOLOv5 model has proven to have a very good accuracy in detecting moving objects in real-time. The vehicles on the road are extracted from each frame of the video by running it through a custom YOLOv5 object detector. The YOLO model splits the frame into a grid and each grid detects a vehicle within itself. An instance identifier tracks the vehicle across the frames. The tracking algorithm computes deep features for every bounding box and utilizes the similarities within the deep features to identify and track the object. The pixel per meter metric has to adjusted based on perspective after which the speed of the vehicle can be estimated. Finally a comparison of our model metrics with the existing state of the art models is provided. Â© 2022 ACM.
Multimodal Meme Troll and Domain Classification Using Contrastive Learning
(Institute of Electrical and Electronics Engineers Inc., 2024) Phadatare, A.; Jayanth, P.; Anand Kumar, M.A.
This paper presents a holistic approach to meme trolling detection and domain classification, focusing on Telugu and Kannada languages. Leveraging a spectrum of methodologies ranging from basic machine learning models such as Support Vector Machines (SVM), Random Forest, Naive Bayes, to image-based models like Convolutional Neural Networks (CNN), ResNet-50, and state-of-the-art models such as CLIP, multilingual BERT, XLM-BERT, and Vision Transformers, we explore diverse modalities including image classification, extracted text classification, and combined text-caption classification. Our system integrates multiple models to achieve two primary goals: accurately detecting trolling behavior and classifying memes into thematic domains like politics, movies, sports.. By training on multilingual data and considering linguistic diversity, our approach ensures robust performance across different linguistic contexts, providing valuable insights into meme culture and trolling behavior in Telugu and Kannada-speaking communities. Â© 2024 IEEE.
On developing handwritten character image database for Malayalam language script
(Elsevier B.V., 2019) Manjusha, K.; Anand Kumar, M.A.; Padannayil, K.P.
The objective of this paper is to build a handwritten character image database for Malayalam language script. Standard handwritten document image databases are an essential requirement for the development and objective evaluation of different handwritten text recognition systems for any language script. Considerable research efforts for handwritten Malayalam character recognition are present in literature. Still, no public domain handwritten image database is available for the Malayalam language. The present work focuses on building an open source handwritten character image database for Malayalam language script. The unique orthographic representation of the Malayalam characters forms the different character classes, and the current version of the database contains 85 character classes frequently used in writing Malayalam text. Handwritten data samples collected from 77 native Malayalam writers. For extracting the character images from the handwritten data sheets, active contour model-based image segmentation algorithm utilized. Recognition experiments conducted on the created character image database by employing different feature extraction techniques. Among the considered feature descriptors, scattering convolutional network-based feature descriptors attain the highest recognition accuracy of 91.05%. © 2018 Karabuk University
Overview of Shared Task on Multitask Meme Classification - Unraveling Misogynistic and Trolls in Online Memes
(Association for Computational Linguistics (ACL), 2024) Chakravarthi, B.; Rajiakodi, S.; Ponnusamy, R.; Pannerselvam, K.; Anand Kumar, M.A.; Rajalakshmi, R.; LekshmiAmmal, H.R.; Kizhakkeparambil, A.; Kumar, S.S.; Sivagnanam, B.; Rajkumar, C.
This paper offers a detailed overview of the first shared task on "Multitask Meme Classification - Unraveling Misogynistic and Trolls in Online Memes," organized as part of the LT-EDI@EACL 2024 conference. The task was set to classify misogynistic content and troll memes within online platforms, focusing specifically on memes in Tamil and Malayalam languages. A total of 52 teams registered for the competition, with four submitting systems for the Tamil meme classification task and three for the Malayalam task. The outcomes of this shared task are significant, providing insights into the current state of misogynistic content in digital memes and highlighting the effectiveness of various computational approaches in identifying such detrimental content. The top-performing model got a macro F1 score of 0.73 in Tamil and 0.87 in Malayalam. Â© 2024 Association for Computational Linguistics.
Representation Learning in Continuous-Time Dynamic Signed Networks
(Association for Computing Machinery, 2023) Sharma, K.; Raghavendra, M.; Lee, Y.-C.; Anand Kumar, M.A.; Kumar, S.
Signed networks allow us to model conflicting relationships and interactions, such as friend/enemy and support/oppose. These signed interactions happen in real-time. Modeling such dynamics of signed networks is crucial to understanding the evolution of polarization in the network and enabling effective prediction of the signed structure (i.e., link signs) in the future. However, existing works have modeled either (static) signed networks or dynamic (unsigned) networks but not dynamic signed networks. Since both sign and dynamics inform the graph structure in different ways, it is non-trivial to model how to combine the two features. In this work, we propose a new Graph Neural Network (GNN)-based approach to model dynamic signed networks, named SEMBA: Signed link's Evolution using Memory modules and Balanced Aggregation. Here, the idea is to incorporate the signs of temporal interactions using separate modules guided by balance theory and to evolve the embeddings from a higher-order neighborhood. Experiments on 4 real-world datasets and 3 different tasks demonstrate that SEMBA consistently and significantly outperforms the baselines by up to 80% on the tasks of predicting signs of future links while matching the state-of-the-art performance on predicting existence of these links in the future. We find that this improvement is due specifically to superior performance of SEMBA on the minority negative class. Code is made available at https://github.com/claws-lab/semba. Â© 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0124-5/23/10.