Browsing by Author "Kamath Sâ€¤, S.S."

Now showing 1 - 20 of 26

A bottom-up approach towards achieving semantic web services
(IEEE Computer Society, 2013) Kamath Sâ€¤, S.S.; Ananthanarayana, V.S.
With the advent of Service-oriented Architecture, Web Services have already become the preferred way of integrating multi-platform, cross-vendor business applications. However, due to the steadily increasing number of published services on the Web, discovery of the most relevant Web services as per the stated requirements is very challenging. Due to this explosive growth of Web data and services, there is a significant need for streamlining the process of service discovery to provide better matching, composition and integration capabilities without human intervention. The proposed Semantic Web approach aims to address these issues by the inclusion of explicit semantics in service descriptions. However, it is still far from realization and the fact remains that it is complex to develop a semantic Web service from the ground-up. In this paper, we propose a framework that uses a bottom-up approach for building semantic Web services from existing service descriptions on the Web, thus reducing the time and cost involved in large scale manual annotation. Â© 2013 IEEE.
A modified Ant Colony optimization algorithm with load balancing for job shop scheduling
(IEEE Computer Society help@computer.org, 2013) Chaukwale, R.; Kamath Sâ€¤, S.S.
The problem of efficiently scheduling jobs on several machines is an important consideration when using Job Shop scheduling production system (JSP). JSP is known to be a NP-hard problem and hence methods that focus on producing an exact solution can prove insufficient in finding an optimal resolution to JSP. Hence, in such cases, heuristic methods can be employed to find a good solution within reasonable time. In this paper, we study the conventional ACO algorithm and propose a Load Balancing ACO algorithm for JSP. We also present the observed results, and discuss them with reference to the conventional ACO. It is observed that the proposed algorithm gives better results when compared to conventional ACO. Â© 2013 IEEE.
A morphological approach for measuring pair-wise semantic similarity of sanskrit sentences
(Springer Verlag service@springer.de, 2017) Keshava, V.; Sanapala, M.; Dinesh, A.C.; Kamath Sâ€¤, S.S.
Capturing explicit and implicit similarity between texts in natural language is a critical task in Computational Linguistics applications. Similarity can be multi-level (word, sentence, paragraph or document level), each of which can affect the similarity computation differently. Most existing techniques are ill-suited for classical languages like Sanskrit as it is significantly richer in morphology than English. In this paper, we present a morphological analysis based approach for computing semantic similarity between short Sanskrit texts. Our technique considers the constituent wordsâ€™ semantic properties and their role in individual sentences within the text, to compute similarity. As all words do not contribute equally to the semantics of a sentence, an adaptive scoring algorithm is used for ranking, which performed very well for Sanskrit sentence pairs of varied complexities. Â© Springer International Publishing AG 2017.
A novel technique for efficient text document summarization as a service
(2013) Bagalkotkar, A.; Khandelwal, A.; Pandey, S.; Kamath Sâ€¤, S.S.
Due to an exponential growth in the generation of web data, the need for tools and mechanisms for automatic summarization of Web documents has become very critical. Web data can be accessed from multiple sources, for e.g. on different Web pages, which makes searching for relevant pieces of information a difficult task. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of an input document by extracting the representative sentences from it. In this paper, we present a novel technique for generating the summarization of domain-specific text from a single Web document by using statistical NLP techniques on the text in a reference corpus and on the web document. The summarizer proposed generates a summary based on the calculated Sentence Weight (SW), the rank of a sentence in the document's content, the number of terms and the number of words in a sentence, and using term frequency in the input corpus. Â© 2013 IEEE.
A personalized recommender system using Machine Learning based Sentiment Analysis over social data
(Institute of Electrical and Electronics Engineers Inc., 2016) Ashok, M.; Rajanna, S.; Joshi, P.V.; Kamath Sâ€¤, S.S.
Social Media platforms are already an indispensable part of our daily lives. With its constant growth, it has contributed to superfluous, heterogeneous data which can be overwhelming due to its volume and velocity, thus limiting the availability of relevant and required information when a particular query is to be served. Hence, a need for personalized, fine-grained user preference-oriented framework for resolving this problem and also, to enhance user experience is increasingly felt. In this paper, we propose a such a social framework, which extracts user's reviews, comments of restaurants and points of interest such as events and locations, to personalize and rank suggestions based on user preferences. Machine Learning and Sentiment Analysis based techniques are used for further optimizing search query results. This provides the user with quicker and more relevant data, thus avoiding irrelevant data and providing much needed personalization. Â© 2016 IEEE.
A semantic search engine for answering domain specific user queries
(2013) Kamath Sâ€¤, S.S.; Piraviperumal, D.; Meena, G.; Karkidholi, S.; Kumar, K.
With the exponential growth in web content and due to its sheer volume, the answers provided by traditional search engines by query specific keywords to content has resulted in markedly high recall and low precision. In order to alleviate this problem, the notion of incorporating semantics in content and in Search Engines, i.e., a Semantic Search Engine is increasingly crucial. Several Semantic Search Engines (SSEs) have been proposed and deployed till date and each is inherently different from the other. As such, the objective of this paper is to present a discussion on semantically enhanced search engines for intelligent web content discovery. We also present the architecture of a new SSE based on a bottom up approach that focuses on building a semantic base for Web content first and then carry out the process of querying it for attaining high precision and lower recall. Â© 2013 IEEE.
A spatial clustering approach for efficient landmark discovery using geo-tagged photos
(Institute of Electrical and Electronics Engineers Inc., 2016) Deeksha, S.D.; Ashrith, H.C.; Bansode, R.; Kamath Sâ€¤, S.S.
Geo-tagged photos enable people to share their personal experiences while visiting various vacation spots through image sharing social networks like Flickr. The geo-tag information offers a wealth of information for capturing additional information on traveler behavior, trends, opinions and interests. In this paper, we propose a landmark discovery system that aims to discover popular tourist attractions in a city by assuming that the popularity of a tourist attraction is positively dependent on the visitor statistics and also the amount of tourist uploaded photos clicked on site. It is a known fact that places with lots of geo-tagged photos uploaded to Flickr are visited more often by social-media savvy tourists, who plan their trip based on others' experiences. We propose to build a system that identifies the most popular tourist places in a particular city by using geo-tagged photos collected from Flickr and recommend the same to the user. In this paper, we present the methodology of spatially clustering the geo-tagged images and present an analysis of algorithm performance in identifying landmarks and their popularity. Â© 2015 IEEE.
An approach for multimodal medical image retrieval using latent dirichlet allocation
(Association for Computing Machinery, 2019) Vikram, M.; Suhas, B.S.; Anantharaman, A.; Kamath Sâ€¤, S.S.
Modern medical practices are increasingly dependent on Medical Imaging for clinical analysis and diagnoses of patient illnesses. A significant challenge when dealing with the extensively available medical data is that it often consists of heterogeneous modalities. Existing works in the field of Content based medical image retrieval (CBMIR) have several limitations as they focus mainly on visual or textual features for retrieval. Given the unique manifold of medical data, we seek to leverage both the visual and textual modalities to improve the image retrieval. We propose a Latent Dirichlet Allocation (LDA) based technique for encoding the visual features and show that these features effectively model the medical images. We explore early fusion and late fusion techniques to combine these visual features with the textual features. The proposed late fusion technique achieved a higher mAP than the state-of-the-art on the ImageCLEF 2009 dataset, underscoring its suitability for effective multimodal medical image retrieval. Â© 2019 Association for Computing Machinery.
ARS NITK at MEDIQA 2019: Analysing various methods for natural language inference, recognising question entailment and medical question answering system
(Association for Computational Linguistics (ACL), 2019) Agrawal, A.; George, R.A.; Ravi, S.S.; Kamath Sâ€¤, S.S.; Anand Kumar, M.A.
This paper includes approaches we have taken for Natural Language Inference, Question Entailment Recognition and Question-Answering tasks to improve domain-specific Information Retrieval. Natural Language Inference (NLI) is a task that aims to determine if a given hypothesis is an entailment, contradiction or is neutral to the given premise. Recognizing Question Entailment (RQE) focuses on identifying entailment between two questions while the objective of Question-Answering (QA) is to filter and improve the ranking of automatically retrieved answers. For addressing the NLI task, the UMLS Metathesaurus was used to find the synonyms of medical terms in given sentences, on which the InferSent model was trained to predict if the given sentence is an entailment, contradictory or neutral. We also introduce a new Extreme gradient boosting model built on PubMed embeddings to perform RQE. Further, a closed-domain Question Answering technique that uses Bi-directional LSTMs trained on the SquAD dataset to determine relevant ranks of answers for a given question is also discussed. Experimental validation showed that the proposed models achieved promising results. Â© 2019 Association for Computational Linguistics
Automated stock price prediction and trading framework for Nifty intraday trading
(2013) Bhat, A.A.; Kamath Sâ€¤, S.S.
Research on automated systems for Stock price prediction has gained much momentum in recent years owing to its potential to yield profits. In this paper, we present an automatic trading system for Nifty for deciding the buying and selling calls for intra-day trading that combines various methods to improve the quality and precision of the prediction. Historical data has been used to implement the various technical indicators and also to train the Neural Network that predicts movement for intra-day Nifty. Further, Sentiment Analysis techniques are applied to popular blog articles written by domain experts and to user comments to find sentiment orientation, so that analysis can be further improved and better prediction accuracy can be achieved. The system makes a prediction for every trading day with these methods to forecast if next day will be a positive day or negative. Further, buy and sell calls for intra-day trading are also decided by the system thus achieving full automation in stock trading. Â© 2013 IEEE.
Coherence-based modeling of clinical concepts inferred from heterogeneous clinical notes for ICU patient risk stratification
(Association for Computational Linguistics, 2019) Gangavarapu, T.; S. Krishnan, G.; Kamath Sâ€¤, S.S.
In hospitals, critical care patients are often susceptible to various complications that adversely affect their morbidity and mortality. Digitized patient data from Electronic Health Records (EHRs) can be utilized to facilitate risk stratification accurately and provide prioritized care. Existing clinical decision support systems are heavily reliant on the structured nature of the EHRs. However, the valuable patient-specific data contained in unstructured clinical notes are often manually transcribed into EHRs. The prolific use of extensive medical jargon, heterogeneity, sparsity, rawness, inconsistent abbreviations, and complex structure of the clinical notes poses significant challenges, and also results in a loss of information during the manual conversion process. In this work, we employ two coherence-based topic modeling approaches to model the free-text in the unstructured clinical nursing notes and capture its semantic textual features with the emphasis on human interpretability. Furthermore, we present FarSight, a long-term aggregation mechanism intended to detect the onset of disease with the earliest recorded symptoms and infections. We utilize the predictive capabilities of deep neural models for the clinical task of risk stratification through ICD-9 code group prediction. Our experimental validation on MIMIC-III (v1.4) database underlined the efficacy of FarSight with coherence-based topic modeling, in extracting discriminative clinical features from the unstructured nursing notes. The proposed approach achieved a superior predictive performance when bench-marked against the structured EHR data based state-of-the-art model, with an improvement of 11.50% in AUPRC and 1.16% in AUROC. Â© 2019 Association for Computational Linguistics.
Comparative evaluation of algorithms for effective data leakage detection
(2013) Kumar, A.; Goyal, A.; Kumar, A.; Chaudhary, N.K.; Kamath Sâ€¤, S.S.
Researchers have proposed several mechanisms to secure data from unauthorized use but there is very less work in the field of detecting and managing an authorized or trustworthy agent that has caused a data leak to some third party advertently or unknowingly. In this paper, we implement methods aimed at improving the odds of detecting such leakages when a distributer's sensitive data has been leaked by trustworthy agents and also to possibly identify the agent(s) that leaked the data. We also implement some data allocation strategies that can improve the probability of identifying leakages and can also be used to assess the likelihood of a leak at a particular agent assuming the fact that the data was not simply guessed by the third party where the leaked data set has been found. We also propose new allocation strategies that work on the basis of No-Wait model, i.e. agent does not need to wait for other agents' allocation and it is different from already proposed model that makes an agent wait for others. These methods do not rely on the alterations of the distributed data, but rather focus on minimizing the overlapping of the allocated data items to various agents, thus facilitating an exact determination of the guilty agent in a particular data leakage scenario. Â© 2013 IEEE.
Constructing an enriched domain taxonomy for Hindi using word embeddings
(Institute of Electrical and Electronics Engineers Inc., 2017) Keshava, V.; Pravalika, P.; Kamath Sâ€¤, S.S.; Geetha, V.
Domain-specific taxonomies constitute a valuable resource as they offer extensive support in information retrieval related activities like browsing, searching, recommendations and personalization. Such taxonomies can bridge the gap between the lack of domain-specific querying knowledge in potential users and the actual content. In case of multilingual content, taxonomies can play a pivotal role in boosting search performance for content across language barriers. In this paper, a domain-agnostic framework for building an evolving, domain-specific taxonomy for the Hindi, given a set of well-organized data points is proposed. The approach is intended for designing a hierarchical taxonomy enriched with synonyms and other morphological variants using WordNet and Word2vec models respectively. The hierarchical structure acts as a base which binds the taxonomy to a given domain. Such enrichment can improve taxonomy coverage within the given domain. The focus is also on building a taxonomy that can self-evolve over time, with high precision and recall, with minimal manual effort. Â© 2017 IEEE.
Deep Neural Network Models for Question Classification in Community Question-Answering Forums
(Institute of Electrical and Electronics Engineers Inc., 2019) Upadhya, B.A.; Udupa, S.; Kamath Sâ€¤, S.S.
Automatic generation of responses to questions is a challenging problem that has applications in fields like customer support, question-answering forums etc. Prerequisite to developing such systems is a requirement for a methodology that classifies questions as yes/no or opinion-based questions, so that quick and accurate responses can be provided. Performing this classification is advantageous, as yes/no questions can generally be answered using the data that is already available. In the case of an opinion-based or a yes/no question that wasn't previously answered, an external knowledge source is needed to generate the answer. We propose a LSTM based model that performs question classification into the two aforementioned categories. Given a question as an input, the objective is to classify it into opinion-based or yes/no question. The proposed model was tested on the Amazon community question-answer dataset as it is reflective of the problem statement we are trying to solve. The proposed methodology achieved promising results, with a high accuracy rate of 91% in question classification. Â© 2019 IEEE.
DeepOA: Clinical Decision Support System for Early Detection and Severity Grading of Knee Osteoarthritis
(Institute of Electrical and Electronics Engineers Inc., 2021) Dalia, Y.; Bharath, A.; Mayya, V.; Kamath Sâ€¤, S.S.
Knee Osteoarthritis (OA) is a medical condition affecting the knee joint that causes pain due to the cartilage wear-And-Tear. The severity of the impairment is graded by experienced radiologists as per standardized grading systems like the Kellgren-Lawrence(KL) grading scheme. Early detection and classification of knee OA in a patient before it increases in severity can significantly aid in corrective measures and benefit humankind. In this work, we propose a DL model to automatically segment the knee region and predict onset of Knee OA with X-ray scans. A comparative study using an ensemble model consisting of a YOLOv5 object detection algorithm for knee joint segmentation is also proposed. Various classification models such as VGG16, Resnet etc., are experimented with for the KL grade classification. The detailed experiments are conducted to understand the need for the region of interest segmentation step in KL grade classification. The proposed Clinical Decision Support System (CDSS) can help the medical practitioners perform preemptive screening based on X-ray scans for detecting onset earlier and for enabling required treatment. Â© 2021 IEEE.
Explainable Deep Neural Models for COVID-19 Prediction from Chest X-Rays with Region of Interest Visualization
(Institute of Electrical and Electronics Engineers Inc., 2021) Nedumkunnel, I.M.; Elizabeth George, L.; Kamath Sâ€¤, S.S.; Rosh, N.A.; Mayya, V.
COVID-19 has been designated as a once-in-a-century pandemic, and its impact is still being felt severely in many countries, due to the extensive human and green casualties. While several vaccines are under various stage of development, effective screening procedures that help detect the disease at early stages in a non-invasive and resource-optimized manner are the need of the hour. X-ray imaging is fairly accessible in most healthcare institutions and can prove useful in diagnosing this respiratory disease. Although a chest X-ray scan is a viable method to detect the presence of this disease, the scans must be analyzed by trained experts accurately and quickly if large numbers of tests are to be processed. In this paper, a benchmarking study of different preprocessing techniques and state-of-the-art deep learning models is presented to provide comprehensive insights into both the objective and subjective evaluation of their performance. To analyze and prevent possible sources of bias, we preprocessed the dataset in two ways-first, we segmented the lungs alone, and secondly, we formed a bounding box around the lung and used only this area to train. Among the models chosen to benchmark, which were DenseNet201, EfficientNetB7, and VGG-16, DenseNet201 performed better for all three datasets. Â© 2021 IEEE.
Frame instance extraction and clustering for default knowledge building
(CEUR-WS, 2017) Shah, A.; Basile, V.; Cabrio, E.; Kamath Sâ€¤, S.S.
Obtaining and representing common-sense knowledge, useful in a robotics scenario for planning and making inference about the robots' surroundings, is a challenging problem, because such knowledge is typically found in unstructured repositories such as text corpora or small handmade resources. The work described in this paper presents a methodology for automatically creating a default knowledge base about real-world objects for the robotics domain. The proposed method relies on clustering frame instances extracted from natural language text as a way of distilling default knowledge. We collect and parse a natural language corpus using the Web as a source, then perform an agglomerative clustering of frame instances according to an appropriately defined similarity measure, and finally extract prototypical frame instances from each cluster and publish them in LOD-complaint format to promote reuse and interoperability.
Improved speculative Apriori with percentiles algorithm for website restructuring based on usage patterns
(Institute of Electrical and Electronics Engineers Inc., 2016) Gahlot, G.; Kamath Sâ€¤, S.S.
Web structure mining techniques are popularly used in the process of improved website design/replanning based on user browsing actions. In this paper, an algorithm for improving the design map (site map of a Website) using the pertinent information available in the website's server logs is proposed, that incorporates probability for extending the well-known Apriori Algorithm. The proposed methodology harnesses the normal distribution curve used in statistical measurements to improve recommendation accuracy after parsing the server log file. This allows the discovery of more association rules as the idea is to use percentile calculations instead of the percentages and having a relative quest within the item sets to determine their existence in the domain. By enforcing the percentile calculations on the distribution curve of the collection, selective items from the small groups within can be obtained. Experimental results for the proposed Speculative Apriori with Percentiles Algorithm (SAwP) indicate that it was effective in discovering relevant itemsets and more association rules, when compared to classical Apriori algorithm. Â© 2016 IEEE.
LATA â€“ Label attention transformer architectures for ICD-10 coding of unstructured clinical notes
(Institute of Electrical and Electronics Engineers Inc., 2021) Mayya, V.; Kamath Sâ€¤, S.S.; Sugumaran, V.
Effective code assignment for patient clinical records in a hospital plays a significant role in the process of standardizing medical records, mainly for streamlining clinical care delivery, billing, and managing insurance claims. The current practice employed is manual coding, usually carried out by trained medical coders, making the process subjective, error-prone, inexact, and time-consuming. To alleviate this cost-intensive process, intelligent coding systems built on patientsâ€™ structured electronic medical records are critical. Classification of medical diagnostic codes, like ICD-10, is widely employed to categorize patientsâ€™ clinical conditions and associated diagnoses. In this work, we present a neural model LATA, built on Label Attention Transformer Architectures for automatic assignment of ICD-10 codes. Our work is benchmarked on the CodiEsp dataset, a dataset for automatic clinical coding systems for multilingual medical documents, used in the eHealth CLEF 2020-Multilingual Information Extraction Shared Task. The experimental results reveal that the proposed LATA variants outperform their basic BERT counterparts by 33-49% in terms of standard metrics like precision, recall, F1-score and mean average precision. The label attention mechanism also enables direct extraction of textual evidence in medical documents that map to the clinical ICD-10 diagnostic codes. Â© 2021 IEEE.
Query-oriented unsupervised multi-document summarization on big data
(Association for Computing Machinery acmhelp@acm.org, 2016) Sunaina; Kamath Sâ€¤, S.S.
Real time document summarization is a critical need nowadays, owing to the large volume of information available for our reading, and our inability to deal with this entirely due to limitations of time and resources. Oftentimes, information is available in multiple sources, offering multiple contexts and viewpoints on a single topic of interest. Automated multi-document summarization (MDS) techniques aim to address this problem. However, current techniques for automated MDS suffer from low precision and accuracy with reference to a given subject matter, when compared to those summaries prepared by humans and takes large time to create the summary when the input given is too huge. In this paper, we propose a hybrid MDS technique combining feature based algorithms and dynamic programming for generating a summary from multiple documents based on user provided query. Further, in real-world scenarios, Web search serves up a large number of URLs to users, and the work of making sense of these with reference to a particular query is left to the user. In this context, an efficient parallelized MDS technique based on Hadoop is also presented, for serving a concise summary of multiple Webpage contents for a given user query in reduced time duration. Â© 2016 ACM.