Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
19 results
Search Results
Item Sentiment analysis based approaches for understanding user context in Web content(2013) Kamath S․, S.S.; Bagalkotkar, A.; Khandelwal, A.; Pandey, S.; Poornima, K.In our day to day lives, we highly value the opinions of friends in making decisions about issues like which brand to buy or which movie to watch. With the increasing popularity of blogs, online reviews and social networking sites, the current trend is to look up reviews, expert opinions and discussions on the Web, so that one can make an informed decision. Sentiment analysis, also known as opinion mining is the computational study of opinions, sentiments and emotions expressed in natural language for the purpose of decision making. Sentiment analysis applies natural language processing techniques and computational linguistics to extract information about sentiments expressed by authors and readers about a particular subject, thus helping users in making sense of huge volume of unstructured Web data. Applications like review classification, product review mining and trend prediction benefit from sentiment analysis based techniques. This paper presents a study of different approaches in this field, the state of the art techniques and current research in Sentiment Analysis based approaches for understanding user's context. © 2013 IEEE.Item A novel technique for efficient text document summarization as a service(2013) Bagalkotkar, A.; Khandelwal, A.; Pandey, S.; Kamath S․, S.S.Due to an exponential growth in the generation of web data, the need for tools and mechanisms for automatic summarization of Web documents has become very critical. Web data can be accessed from multiple sources, for e.g. on different Web pages, which makes searching for relevant pieces of information a difficult task. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of an input document by extracting the representative sentences from it. In this paper, we present a novel technique for generating the summarization of domain-specific text from a single Web document by using statistical NLP techniques on the text in a reference corpus and on the web document. The summarizer proposed generates a summary based on the calculated Sentence Weight (SW), the rank of a sentence in the document's content, the number of terms and the number of words in a sentence, and using term frequency in the input corpus. © 2013 IEEE.Item Query-oriented unsupervised multi-document summarization on big data(Association for Computing Machinery acmhelp@acm.org, 2016) Sunaina; Kamath S․, S.S.Real time document summarization is a critical need nowadays, owing to the large volume of information available for our reading, and our inability to deal with this entirely due to limitations of time and resources. Oftentimes, information is available in multiple sources, offering multiple contexts and viewpoints on a single topic of interest. Automated multi-document summarization (MDS) techniques aim to address this problem. However, current techniques for automated MDS suffer from low precision and accuracy with reference to a given subject matter, when compared to those summaries prepared by humans and takes large time to create the summary when the input given is too huge. In this paper, we propose a hybrid MDS technique combining feature based algorithms and dynamic programming for generating a summary from multiple documents based on user provided query. Further, in real-world scenarios, Web search serves up a large number of URLs to users, and the work of making sense of these with reference to a particular query is left to the user. In this context, an efficient parallelized MDS technique based on Hadoop is also presented, for serving a concise summary of multiple Webpage contents for a given user query in reduced time duration. © 2016 ACM.Item Hybrid text feature modeling for disease group prediction using unstructured physician notes(Springer Science and Business Media Deutschland GmbH, 2020) S. Krishnan, G.S.; Kamath S․, S.Existing Clinical Decision Support Systems (CDSSs) largely depend on the availability of structured patient data and Electronic Health Records (EHRs) to aid caregivers. However, in case of hospitals in developing countries, structured patient data formats are not widely adopted, where medical professionals still rely on clinical notes in the form of unstructured text. Such unstructured clinical notes recorded by medical personnel can also be a potential source of rich patient-specific information which can be leveraged to build CDSSs, even for hospitals in developing countries. If such unstructured clinical text can be used, the manual and time-consuming process of EHR generation will no longer be required, with huge person-hours and cost savings. In this article, we propose a generic ICD9 disease group prediction CDSS built on unstructured physician notes modeled using hybrid word embeddings. These word embeddings are used to train a deep neural network for effectively predicting ICD9 disease groups. Experimental evaluation showed that the proposed approach outperformed the state-of-the-art disease group prediction model built on structured EHRs by 15% in terms of AUROC and 40% in terms of AUPRC, thus proving our hypothesis and eliminating dependency on availability of structured patient data. © Springer Nature Switzerland AG 2020.Item A novel bio-inspired hybrid metaheuristic for unsolicited bulk email detection(Springer Science and Business Media Deutschland GmbH, 2020) Gangavarapu, T.; Jaidhar, C.D.With the recent influx of technology, Unsolicited Bulk Emails (UBEs) have become a potential problem, leaving computer users and organizations at the risk of brand, data, and financial loss. In this paper, we present a novel bio-inspired hybrid parallel optimization algorithm (Cuckoo-Firefly-GR), which combines Genetic Replacement (GR) of low fitness individuals with a hybrid of Cuckoo Search (CS) and Firefly (FA) optimizations. Cuckoo-Firefly-GR not only employs the random walk in CS, but also uses mechanisms in FA to generate and select fitter individuals. The content- and behavior-based features of emails used in the existing works, along with Doc2Vec features of the email body are employed to extract the syntactic and semantic information in the emails. By establishing an optimal balance between intensification and diversification, and reaching global optimization using two metaheuristics, we argue that the proposed algorithm significantly improves the performance of UBE detection, by selecting the most discriminative feature subspace. This study presents significant observations from the extensive evaluations on UBE corpora of 3, 844 emails, that underline the efficiency and superiority of our proposed Cuckoo-Firefly-GR over the base optimizations (Cuckoo-GR and Firefly-GR), dense autoencoders, recurrent neural autoencoders, and several state-of-the-art methods. Furthermore, the instructive feature subset obtained using the proposed Cuckoo-Firefly-GR, when classified using a dense neural model, achieved an accuracy of $$99\%$$. © Springer Nature Switzerland AG 2020.Item Predicting Vaccine Hesitancy and Vaccine Sentiment Using Topic Modeling and Evolutionary Optimization(Springer Science and Business Media Deutschland GmbH, 2021) S. Krishnan, G.S.; Kamath S․, S.; Sugumaran, V.The ongoing COVID-19 pandemic has posed serious threats to the world population, affecting over 219 countries with a staggering impact of over 162 million cases and 3.36 million casualties. With the availability of multiple vaccines across the globe, framing vaccination policies for effectively inoculating a country’s population against such diseases is currently a crucial task for public health agencies. Social network users post their views and opinions on vaccines publicly and these posts can be put to good use in identifying vaccine hesitancy. In this paper, a vaccine hesitancy identification approach is proposed, built on novel text feature modeling based on evolutionary computation and topic modeling. The proposed approach was experimentally validated on two standard tweet datasets – the flu vaccine dataset and UK COVID-19 vaccine tweets. On the first dataset, the proposed approach outperformed the state-of-the-art in terms of standard metrics. The proposed model was also evaluated on the UKCOVID dataset and the results are presented in this paper, as our work is the first to benchmark a vaccine hesitancy model on this dataset. © 2021, Springer Nature Switzerland AG.Item Towards a Federated Learning Approach for NLP Applications(Springer Science and Business Media Deutschland GmbH, 2021) Prabhu, O.S.; Gupta, P.K.; Shashank, P.; Chandrasekaran, K.; Divakarla, D.Traditional machine learning involves the collection of training data to a centralized location. This collected data is prone to misuse and data breach. Federated learning is a promising solution for reducing the possibility of misusing sensitive user data in machine learning systems. In recent years, there has been an increase in the adoption of federated learning in healthcare applications. On the other hand, personal data such as text messages and emails also contain highly sensitive data, typically used in natural language processing (NLP) applications. In this paper, we investigate the adoption of federated learning approach in the domain of NLP requiring sensitive data. For this purpose, we have developed a federated learning infrastructure that performs training on remote devices without the need to share data. We demonstrate the usability of this infrastructure for NLP by focusing on sentiment analysis. The results show that the federated learning approach trained a model with comparable test accuracy to the centralized approach. Therefore, federated learning is a viable alternative for developing NLP models to preserve the privacy of data. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item CNN-GRU: Transforming image into sentence using GRU and attention mechanism(Grenze Scientific Society, 2021) Saini, G.; Patil, N.Recent advancement of the deep neural network has triggered great attention in both Natural Language Processing (NLP) and Computer Vision (CV). It provides an efficient way of understanding semantic and syntactic structure which can deal with complex task such as automatic image captioning. Image captioning methodology mainly based on the encoder-decoder approach. In the present work, we developed a CNN-GRU model using Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and attention mechanism. Here VGG16 is used as an encoder, GRU and attention mechanism are used as a decoder. Our model has shown significant improvement compared to other state-of-art encoder-decoder models on the famous MSCOCO data set. Further, the time taken to train and test our model is two-third as compared to other similar models such as CNN-CNN and CNN-RNN. © Grenze Scientific Society, 2021.Item Ensemble Neural Models for Depressive Tendency Prediction Based on Social Media Activity of Twitter Users(Springer Science and Business Media Deutschland GmbH, 2022) Saini, G.; Yadav, N.; Kamath S․, S.In view of the ongoing pandemic, Clinical Depression (CD) is a serious health challenge for a large segment of the population. According to recent public surveys, more than 30 million American citizens are the victim of depression each year and depression also causes 30 thousand suicides each year. Early detection of depression can help provide much needed medical intervention and treatment for better mental health. Toward this, the social media posts of users can be a significant source for analyzing their mental health signals, and can also serve as a measure for assessing the prevalence of clinical depression tendencies in the population. In this paper, an approach that leverages the predictive power of supervised and semi-supervised learning algorithms for detecting depressive tendencies in the population using social media activity is presented. Learning models were trained on preprocessed tweet data from the Sentiment140 dataset containing 1.6 million labeled tweets. We also designed a convolution neural network model for the prediction task that outperformed machine learning models by a significant margin with an accuracy of 97.1%. The performance of the proposed models is benchmarked using standard metrics like SMDI (Social Media Depression Index). Crowd-sourcing approaches were adopted for collecting real-time social behavior of users to train the proposed model and demonstrate its potential for real-world applications. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Multi-branch Deep Neural Model for Natural Language-Based Vehicle Retrieval(Springer Science and Business Media Deutschland GmbH, 2023) Shankaranarayan, N.; Kamath S․, S.Natural language interfaces (NLIs) have seen tremendous popularity in recent times. The utility of natural language descriptions for identifying vehicles in city-scale smart traffic systems is an emerging problem that has received significant research interest. NL-based vehicle identification/retrieval can significantly improve existing systems’ usability and user-friendliness. In this paper, the problem of NL-based vehicle retrieval is explored, which focuses on the retrieval/identification of a unique vehicle from a single-view video given the vehicle’s natural language description. Natural language descriptions are leveraged to identify a specific target vehicle based on its visual features and environmental features such as trajectory and neighbours. We propose a multi-branch model that learns the target vehicle’s visual features, environmental features, and direction and uses the concatenated feature vector to calculate a similarity score by comparing it with the feature vector of the given natural language description, thus identifying the vehicle of interest. The Cityflow-NL dataset was used for the purpose of training/validation, and the performance was measured using MRR (Mean Reciprocal Rank). The proposed model achieved a standardised MRR score of 0.15, which is on par with state-of-the-art models. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
