Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 15
  • Item
    Natural Language Inference: Detecting Contradiction and Entailment in Multilingual Text
    (Springer Science and Business Media Deutschland GmbH, 2021) Sree Harsha, S.; Krishna Swaroop, K.; Chandavarkar, B.R.
    Natural Language Inference (NLI) is the task of characterising the inferential relationship between a natural language premise and a natural language hypothesis. The premise and the hypothesis could be related in three distinct ways. The hypothesis could be a logical conclusion that follows from the given premise (entailment), the hypothesis could be false (contradiction), or the hypothesis and the premise could be unrelated (neutral). A robust and reliable system for NLI serves as a suitable evaluation measure for true natural language understanding and enables the use of such systems in several modern day application scenarios. We propose a novel technique for the NLI task by leveraging the recently proposed Bidirectional Encoder Representations from Transformers (BERT). We utilize a robustly optimized variant of BERT, integrate a contextualized definition embedding mechanism, and incorporate the use of global average pooling into our proposed NLI system. We use several different benchmark datasets, including a dataset containing premise-hypothesis pairs from 15 different languages to systematically evaluate the performance of our model and show that it yields superior results. © 2021, Springer Nature Switzerland AG.
  • Item
    NeuralDoc-Automating Code Translation Using Machine Learning
    (Springer Science and Business Media Deutschland GmbH, 2022) Sree Harsha, S.; Sohoni, A.C.; Chandrasekaran, K.
    Source code documentation is the process of writing concise, natural language descriptions of how the source code behaves during run time. In this work, we propose a novel approach called NeuralDoc, for automating source code documentation using machine learning techniques. We model automatic code documentation as a language translation task, where the source code serves as the input sequence, which is translated by the machine learning model to natural language sentences depicting the functionality of the program. The machine learning model that we use is the Transformer, which leverages the self-attention and multi-headed attention features to effectively capture long-range dependencies and has been shown to perform well on a range of natural language processing tasks. We integrate the copy attention mechanism and incorporate the use of BERT, which is a pre-training technique into the basic Transformer architecture to create a novel approach for automating code documentation. We build an intuitive interface for users to interact with our models and deploy our system as a web application. We carry out experiments on two datasets consisting of Java and Python source programs and their documentation, to demonstrate the effectiveness of our proposed method. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Legal Text Analysis Using Pre-trained Transformers
    (Springer Science and Business Media Deutschland GmbH, 2022) Prajwal, M.P.; Anand Kumar, A.M.
    In this paper, we investigate the application of pre-trained transformers for text classification and similarity identification in the legal domain. We do several experiments applying various pre-trained transformer models to predict the descriptor of law or case based on text and identify similar cases. We consider an Indian Supreme Court judicial cases dataset containing cases and statutes and the EURLEX dataset containing approximately 57,000 documents and 4000 labels. EURLEX is a collection of treaties and laws related to the European Union. We preprocess the texts in the dataset and obtain embeddings from pre-trained transformers. Then, we use these embeddings as input to LSTM/BiLSTM layer to classify or predict similarity. Our results show that pre-trained transformers are sufficiently good when the length of the text to be classified or similarity predicted is small rather than large texts. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Early detection of depression using BERT and DeBERTa
    (CEUR-WS, 2022) Devaguptam, S.; Kogatam, T.; Kotian, N.; Anand Kumar, A.M.
    In today’s world, social media usage has become one of the most fundamental human activities. On the report of Oberlo, at present, 3.2 billion people are on social media, which comprises 42% of the World’s population. People usually post about their daily life style, special occasions, views about on-going issues and their networks on the social media platforms. People also share things on social media which otherwise would not have shared with other people. Social media helps us to stay connected, keep informed, mobilise on social issues. Due to the surge of suicide attempts, social media can act as a life saver in detecting and tracing users who are on the verge of depression and self-harm. Natural language processing methods with the help of deep learning are aiding in solving language/text related real world problems like sentiment analysis, translation of text into different languages, depression detection. Many transformer based models like BERT (Bidirectional Encoders Representations from Transformers) are put to use to solve NLP problems, which voluntarily learns to attend to different features differently (Weighing). In this paper, a supervised machine learning algorithm with transfer learning approach is used to detect self-harm tendency in the social media users at the earliest. © 2022 Copyright for this paper by its authors.
  • Item
    Machine Learning-based Automated System for Subjective Answer Evaluation
    (Institute of Electrical and Electronics Engineers Inc., 2023) Dodia, S.; Spoorthy, V.; Chandak, T.
    An examination is a useful tool for assessing students' knowledge. Evaluation of exams is a difficult and time-consuming process. The automatic examination of answer scripts makes this task easier for teachers, reducing the amount of effort and time required. The existing literature has a number of methods that have been proposed for evaluating responses to objective questions using machine learning. However, more work needs to be done on evaluating answers to descriptive questions. This study suggests a way to evaluate students' answers to questions of a descriptive kind without using traditional paper or pencil by teachers. Instead, a computer acts as a teacher and grades the students' submissions. The primary objective is to communicate the outcomes of subjective responses using Bidirectional Encoder Representations from Transformers (BERT), cosine, and Jaccard distance. The proposed model achieved an accuracy of 91%, an error of 9.01, a precision of 83%, and a recall of 79%, respectively. The suggested model has provided the best results in comparison with state-of-the-art systems. © 2023 IEEE.
  • Item
    Unsupervised KeyPhrase Extraction using Graph and Embedding Techniques
    (Institute of Electrical and Electronics Engineers Inc., 2023) Kumar S, J.K.; Anand Kumar, M.
    The process of extracting keyphrases from a document automatically, without any supervision, is referred to as Unsupervised Keyphrase Extraction. This method aims to produce a brief summary of the main content of the document. Embedding-based methods comprise computing similarity between candidate keyphrases and documents embeddings. In this paper, we find that filtering candidate keyphrases using graph-based techniques enriches frequent candidates which are reranked using embeddings. On comparing the proposed model to the current state-of-the-art unsupervised Keyphrase Extraction approaches across three KPE benchmarks, it was found that the proposed model outperformed them. © 2023 IEEE.
  • Item
    IIMH: Intention Identification in Multimodal Human Utterances
    (Association for Computing Machinery, 2023) Keerthan Kumar, T.G.; Dhakate, H.; Koolagudi, S.G.
    Intention identification is a challenging problem in the field of natural language processing, speech processing, and computer vision. People often use contradictory or ambiguous words in different contexts, which can sometimes be very confusing to identify the intention behind an utterance. Intention identification has many practical applications in the fields of natural language processing, sentiment analysis, social media analysis, robotics, and human-computer interaction, where valuable insights into user behavior can be achieved by identifying intention. In this work, we propose a model to determine whether an utterance made by a person is intentional or not intentional. To achieve this, we collected a multimodal dataset containing text, video, and speech from various TV shows, movies, and YouTube videos and labeled them with their corresponding intention. Feature extraction is done at both utterance and word levels to get useful information from all three modalities. We trained the baseline model using SVM to set a benchmark performance. We designed an architecture to detect the contradiction between positive spoken words with negative facial expressions or speech to identify an utterance as non-intentional. Along with the architecture, we used different approaches for classification and got the best results with the Support vector machine (SVM) classifier using RBF kernel, with an accuracy of 78.83% and proven to be better compared to the baseline approach. © 2023 ACM.
  • Item
    Subjective Answer Evaluation Using Keyword Similarity and Regression Techniques
    (Institute of Electrical and Electronics Engineers Inc., 2024) Kapparad, P.
    This paper introduces a novel approach of automated grading of subjective answers using Natural Language Processing (NLP) techniques. The motivation for the project arises from the need to simplify the process of subjective answer evaluation, which is a repetitive and time-consuming task when done manually. Since no dataset is available for topic presented, we created our own dataset consisting of evaluated student answers for 1 and 3 mark questions on the topics of Social Science. For 1 mark questions, we employed a keyword similarity based grading system. On the other hand, for the 3 mark questions many techniques were explored, including using BERT, DistilBERT, and RoBERTa, which achieved no noteworthy results. Another alternative approach involving both keyword similarity and sentence-sentence similarity was created for the 3 mark questions, which slightly outperformed the previously mentioned techniques. The results for evaluation of 1 mark questions was promising, achieving 90% accuracy. However, there remains significant room for improvement for evaluation of longer answer questions. A key insight from our study is that the scope of improvement is directly related to increasing the quantity and quality of the dataset. This research adds to the ongoing conversation about automation of subjective answer evaluation, aiming to make grading methods more efficient and hassle free in the future. © 2024 IEEE.
  • Item
    Phishing Classification Based on Text Content of an Email Body Using Transformers
    (Springer Science and Business Media Deutschland GmbH, 2024) Somesha, M.; Pais, A.R.
    Phishing attacks steal sensitive credentials using different techniques, tools, and some sophisticated methods. The techniques include content injection, information re-routing, social engineering, server hacking, social networking, SMS and WhatsApp mobile applications. To overcome such attacks and minimize risks of such attacks, many phishing detection and avoidance techniques were introduced. Among various techniques, deep learning algorithms achieved the efficient results. In the proposed work, a transformers-based technique is used to classify phishing emails. The proposed method outperformed the other similar mechanisms for the classification of phishing emails. The phishing classification accuracy achieved by the proposed work is 99.51% using open-source datasets. The proposed model is also used to learn and validate the correctness of the in-house created datasets. The obtained results with in-house datasets are equally competitive. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Sentiment Analysis on Worldwide COVID-19 Outbreak
    (Springer Science and Business Media Deutschland GmbH, 2024) Vasudev, R.; Dahikar, P.; Jain, A.; Patil, N.
    Sentiment analysis has proved to be an effective way to easily mine public opinions on issues, products, policies, etc. One of the ways this is achieved is by extracting social media content. Data extracted from the social media has proven time and again to be the most powerful source material for sentiment analysis tasks. Twitter, which is widely used by the general public to express their concerns over daily affairs, can be the strongest tool to provide data for such analysis. In this paper, we intend to use the tweets posted regarding the COVID-19 pandemic for a sentiment analysis study and sentiment classification using BERT model. Due to its transformer architecture and bidirectional approach, this deep learning model can be easily preferred as the best choice for our study. As expected, the model performed very well in all the considered classification metrics and achieved an overall accuracy of 92%. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.