Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 14

Gaining Actionable Insights in COVID-19 Dataset Using Word Embeddings
(Springer Science and Business Media Deutschland GmbH, 2022) Jha, R.A.; Ananthanarayana, V.S.
The field of unsupervised natural language processing (NLP) is gradually growing in prominence and popularity due to the overwhelming amount of scientific and medical data available as text, such as published journals and papers. To make use of this data, several techniques are used to extract information from these texts. Here, in this paper, we have made use of COVID-19 corpus (https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge ) related to the deadly corona virus, SARS-CoV-2, to extract useful information which can be invaluable in finding the cure of the disease. We make use of two word-embeddings model, Word2Vec and global vector for word representation (GloVe), to efficiently encode all the information available in the corpus. We then follow some simple steps to find the possible cures of the disease. We got useful results using these word-embeddings models, and also, we observed that Word2Vec model performed better than GloVe model on the used dataset. Another point highlighted by this work is that latent information about potential future discoveries are significantly contained in past papers and publications. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
NLP based intelligent news search engine using information extraction from e-newspapers
(Institute of Electrical and Electronics Engineers Inc., 2014) Kanakaraj, M.; Kamath Sâ€¤, S.
Extracting text information from a web news page is a challenging task as most of the E-News content is provided with support from backend Content Management Systems (CMSs). In this paper, we present a personalized news search engine that focuses on building a repository of news articles by applying efficient extraction of text information from a web news page from varied e-news portals. The system is based on the concept of Document Object Model(DOM) tree manipulation for extracting text and modifying the web page structure to exclude irrelevant content like ads and user comments. We also use WordNet, a thesaurus of English language based on psycholinguist studies for matching the extracted content semantically to the title of the web page. TF-IDF (Term Frequency Inverse Document Frequency) is used for identifying the web page blocks carrying information relevant to the pages title. In addition to the extraction of information, functionalities to gather related information from different web news papers and to summarize the gathered information based on user preferences have also been included. We observed that the system was able to achieve good recall and high precision for both generalized and specific queries. Â© 2014 IEEE.
Sociopedia: An interactive system for event detection and trend analysis for twitter data
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Kaushik, R.; Apoorva Chandra, S.; Mallya, D.; Chaitanya, J.N.V.K.; Kamath Sâ€¤, S.
The emergence of social media has resulted in the generation of highly versatile and high volume data. Most web search engines return a set of links or web documents as a result of a query, without any interpretation of the results to identify relations in a social sense. In the work presented in this paper, we attempt to create a search engine for social media datastreams, that can interpret inherent relations within tweets, using an ontology built from the tweet dataset itself. The main aim is to analyze evolving social media trends and providing analytics regarding certain real world events, that being new product launches, in our case. Once the tweet dataset is pre-processed to extract relevant entities, Wiki data about these entities is also extracted. It is semantically parsed to retrieve relations between the entities and their properties. Further, we perform various experiments for event detection and trend analysis in terms of representative tweets, key entities and tweet volume, that also provide additional insight into the domain. Â© Springer India 2016.
A morphological approach for measuring pair-wise semantic similarity of sanskrit sentences
(Springer Verlag service@springer.de, 2017) Keshava, V.; Sanapala, M.; Dinesh, A.C.; Kamath Sâ€¤, S.S.
Capturing explicit and implicit similarity between texts in natural language is a critical task in Computational Linguistics applications. Similarity can be multi-level (word, sentence, paragraph or document level), each of which can affect the similarity computation differently. Most existing techniques are ill-suited for classical languages like Sanskrit as it is significantly richer in morphology than English. In this paper, we present a morphological analysis based approach for computing semantic similarity between short Sanskrit texts. Our technique considers the constituent wordsâ€™ semantic properties and their role in individual sentences within the text, to compute similarity. As all words do not contribute equally to the semantics of a sentence, an adaptive scoring algorithm is used for ranking, which performed very well for Sanskrit sentence pairs of varied complexities. Â© Springer International Publishing AG 2017.
An intelligent algorithm for automatic candidate selection for web service composition
(Springer Verlag service@springer.de, 2018) Kedia, A.; Pandel, A.; Mohata, A.; Kamath Sâ€¤, S.
Web services have become an important enabling paradigm for distributed computing. Some deterrents to the continued popularity of the web service technology currently are the nonavailability of large-scale, semantically enhanced service descriptions and limited use of semantics in service life cycle tasks like discovery, selection, and composition. In this paper, we outline an intelligent semantics-based web service discovery and selection technique that uses interfaces and text description of services to capture their functional semantics. We also propose a service composition mechanism that automatically performs candidate selection using the service functional semantics, when one web service does not suffice. These techniques can aid application designers in the process of service-based application development that uses multiple web services for its intended functionality. We present experimental and theoretical evaluation of the proposed method. Â© Springer Nature Singapore Pte Ltd. 2018.
A supervised learning approach for ICU mortality prediction based on unstructured electrocardiogram text reports
(Springer Verlag service@springer.de, 2018) S. Krishnan, G.S.; Kamath Sâ€¤, S.
Extracting patient data documented in text-based clinical records into a structured form is a predominantly manual process, both time and cost-intensive. Moreover, structured patient records often fail to effectively capture the nuances of patient-specific observations noted in doctorsâ€™ unstructured clinical notes and diagnostic reports. Automated techniques that utilize such unstructured text reports for modeling useful clinical information for supporting predictive analytics applications can thus be highly beneficial. In this paper, we propose a neural network based method for predicting mortality risk of ICU patients using unstructured Electrocardiogram (ECG) text reports. Word2Vec word embedding models were adopted for vectorizing and modeling textual features extracted from the patientsâ€™ reports. An unsupervised data cleansing technique for identification and removal of anomalous data/special cases was designed for optimizing the patient data representation. Further, a neural network model based on Extreme Learning Machine architecture was proposed for mortality prediction. ECG text reports available in the MIMIC-III dataset were used for experimental validation. The proposed model when benchmarked against four standard ICU severity scoring methods, outperformed all by 10â€“13%, in terms of prediction accuracy. Â© 2018, Springer International Publishing AG, part of Springer Nature.
Loss Optimised Video Captioning using Deep-LSTM, Attention Mechanism and Weighted Loss Metrices
(Institute of Electrical and Electronics Engineers Inc., 2021) Yadav, N.; Naik, D.
The aim of the video captioning task is to use multiple natural-language sentences to define video content. Photographic, graphical, and auditory data are all used in the videos. Our goal is to investigate and recognize the video's visual features, as well as to create a caption so that anyone can get the video's information within a second. Despite the fact, that phase encoder-decoder models have made significant progress, but it still needs many improvements. In the present work, we enhanced the top-down architecture using Bahdanau Attention, Deep-Long Short-Term Memory (Deep-LSTM) and weighted loss function. VGG16 is used to extract the features from the frames. To understand the actions in the video, Deep-LSTM is paired with an attention system. On the MSVD dataset, we analysed the efficiency of our model, which indicates a major improvement over the other state-of-art model. Â© 2021 IEEE.
Analysis of written interactions in open-source communities using RCNN
(Institute of Electrical and Electronics Engineers Inc., 2021) Maheshwarkar, A.; Kumar, A.; Gupta, M.
Open-source software has proved to be a key pillar in modern-day software development. The growing size of the open-source communities has significantly increased the throughput of these projects. However, larger communities tend to lead to difficulties in communication and openness for newer members. In this paper, we try to analyze the interactions on Github for some of the popular open-source projects. We have created a database of 2500 filtered comments classified into five classes of emotion. We have also proposed a novel RCNN based architecture to detect the sentiment of the comments and perform multiclass text classification. Furthermore, we have discussed possible model integrations with existing open-source platforms and the challenges associated with the implementation. Â© 2021 IEEE.
Transparency in Content and Source Moderation
(Springer, 2023) C, A.R.; D, C.S.; D V, P.; Chandavarkar, B.R.
Content moderation is defined as the process of screening and monitoring user-generated content online. To provide a safe environment for both users and brands, platforms must moderate content to ensure that it falls within pre-established guidelines of acceptable behavior specific to the platform and its audience. Many social media companies employ thousands of employees or volunteers to moderate content manually. These moderators discuss the nature of any questionable posts off-site and remove them if they are deemed inappropriate. Certain platforms also employ automated moderation of content through machine learning models. However, many of them often do not give users any or accurate reasons when their posts are taken down. This lack of transparency in moderation can cause users to believe that their posts were evaluated in a biased manner. To increase usersâ€™ trust in the unbiased nature of a platform and still allow for extensive and robust content moderation, we propose a novel algorithm in this chapter. An adaptive machine learning model is used as the initial moderation layer, and then users are allowed to moderate posts through a trust-based social network algorithm. Since machine learning models can gradually improve their performance through feedback and feedback is given in a self-policing fashion, the system enforces both accuracy and transparency for content moderation. Â© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Network Science based Predictive Analysis on Social Media Data
(Institute of Electrical and Electronics Engineers Inc., 2023) Joshi, S.; Kamath Sâ€¤, S.
Traditional approaches utilizing machine learning algorithms have limitations in capturing the full depth and semantic nuances of text, hindering comprehensive analysis for tasks like opinion mining, sentiment analysis, population health analytics etc. To overcome these limitations, we propose the integration of graph analysis and Social Network Analysis (SNA) techniques to enhance the informative value of tweet analysis and facilitate the extraction of structured knowledge from textual and visual content. This work focuses on modeling user-generated content on Twitter to enable intelligent population analytics applications in the healthcare domain. Standard datasets comprising user details and their tweets are considered for the experiments, which are transformed into graph representations suitable for both structural and behavioral analytics. Additionally, a comparative study to assess the impact of varying network sizes by manipulating the number of nodes within the network is conducted. To evaluate the network properties, different centrality measures were employed and compared. Â© 2023 IEEE.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results