Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
5 results
Search Results
Item Predicting Vaccine Hesitancy and Vaccine Sentiment Using Topic Modeling and Evolutionary Optimization(Springer Science and Business Media Deutschland GmbH, 2021) S. Krishnan, G.S.; Kamath S․, S.; Sugumaran, V.The ongoing COVID-19 pandemic has posed serious threats to the world population, affecting over 219 countries with a staggering impact of over 162 million cases and 3.36 million casualties. With the availability of multiple vaccines across the globe, framing vaccination policies for effectively inoculating a country’s population against such diseases is currently a crucial task for public health agencies. Social network users post their views and opinions on vaccines publicly and these posts can be put to good use in identifying vaccine hesitancy. In this paper, a vaccine hesitancy identification approach is proposed, built on novel text feature modeling based on evolutionary computation and topic modeling. The proposed approach was experimentally validated on two standard tweet datasets – the flu vaccine dataset and UK COVID-19 vaccine tweets. On the first dataset, the proposed approach outperformed the state-of-the-art in terms of standard metrics. The proposed model was also evaluated on the UKCOVID dataset and the results are presented in this paper, as our work is the first to benchmark a vaccine hesitancy model on this dataset. © 2021, Springer Nature Switzerland AG.Item LATA – Label attention transformer architectures for ICD-10 coding of unstructured clinical notes(Institute of Electrical and Electronics Engineers Inc., 2021) Mayya, V.; Kamath S․, S.S.; Sugumaran, V.Effective code assignment for patient clinical records in a hospital plays a significant role in the process of standardizing medical records, mainly for streamlining clinical care delivery, billing, and managing insurance claims. The current practice employed is manual coding, usually carried out by trained medical coders, making the process subjective, error-prone, inexact, and time-consuming. To alleviate this cost-intensive process, intelligent coding systems built on patients’ structured electronic medical records are critical. Classification of medical diagnostic codes, like ICD-10, is widely employed to categorize patients’ clinical conditions and associated diagnoses. In this work, we present a neural model LATA, built on Label Attention Transformer Architectures for automatic assignment of ICD-10 codes. Our work is benchmarked on the CodiEsp dataset, a dataset for automatic clinical coding systems for multilingual medical documents, used in the eHealth CLEF 2020-Multilingual Information Extraction Shared Task. The experimental results reveal that the proposed LATA variants outperform their basic BERT counterparts by 33-49% in terms of standard metrics like precision, recall, F1-score and mean average precision. The label attention mechanism also enables direct extraction of textual evidence in medical documents that map to the clinical ICD-10 diagnostic codes. © 2021 IEEE.Item Effective Information Retrieval, Question Answering and Abstractive Summarization on Large-Scale Biomedical Document Corpora(Springer Science and Business Media Deutschland GmbH, 2023) Shenoy, N.; Nayak, P.; Jain, S.; Kamath S․, S.; Sugumaran, V.During the COVID-19 pandemic, a concentrated effort was made to collate published literature on SARS-Cov-2 and other coronaviruses for the benefit of the medical community. One such initiative is the COVID-19 Open Research Dataset which contains over 400,000 published research articles. To expedite access to relevant information sources for health workers and researchers, it is vital to design effective information retrieval and information extraction systems. In this article, an IR approach leveraging transformer-based models to enable question-answering and abstractive summarization is presented. Various keyword-based and neural-network-based models are experimented with and incorporated to reduce the search space and determine relevant sentences from the vast corpus for ranked retrieval. For abstractive summarization, candidate sentences are determined using a combination of various standard scoring metrics. Finally, the summary and the user query are utilized for supporting question answering. The proposed model is evaluated based on standard metrics on the standard CovidQA dataset for both natural language and keyword queries. The proposed approach achieved promising performance for both query classes, while outperforming various unsupervised baselines. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.Item A Comprehensive Analysis of Classification Techniques for Effective Multi-class Research Article Categorization on an Imbalanced Dataset(Springer Science and Business Media Deutschland GmbH, 2025) Gowhar, S.; Kempaiah, P.; Sowmya Kamath, S.; Sugumaran, V.Categorizing scientific articles into specific research fields is a challenging problem, affected by the volume and variety of literature published. However, existing classification systems often suffer from limitations regarding taxonomy or the models used for classification. This article explores a comprehensive analysis of approaches built on Sentence Transformer embeddings combined with Machine Learning algorithms, Neural Networks, and Transformers to classify articles into 123 predefined classes, with the dataset being heavily imbalanced. The effectiveness of Large Language Models (LLMs) for generating synthetic data is also experimented with, along with synonym augmentation SMOTE and employing 1D CNNs for text classification. The best-performing model is a hierarchical classification model trained on MP-Net sentence embeddings that achieved an accuracy of 78%, outperforming all other models. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.Item Imbalanced Multi-Class Research Article Classification using Sentence Transformers and Machine Learning Algorithms(Association for Computing Machinery, Inc, 2025) Gowhar, S.; Kempaiah, P.; Kamath, S.S.; Sugumaran, V.Categorizing scientific articles into specific research fields is a challenging problem, considering the volume and variety of published literature. However, existing classification systems often suffer from limitations regarding taxonomy or the models used for classification. This article explores approaches built on Sentence Transformer embeddings combined with Machine Learning algorithms to classify articles into 123 predefined classes, with the dataset being heavily imbalanced in nature. The effectiveness of Large Language Models (LLMs) for generating synthetic data is also experimented with, along with synonym augmentation and SMOTE. The best-performing model, the One vs Rest classifier trained on MP-Net sentence embeddings with SMOTE, achieved an accuracy of 77%, and outperformed all the other models. © 2024 Copyright held by the owner/author(s).
