Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
16 results
Search Results
Item A bio-inspired, incremental clustering algorithm for semantics-based web service discovery(Inderscience Enterprises Ltd., 2015) Kamath S?, S.; Ananthanarayana, V.S.Web service discovery is a challenging task due to the widespread availability of published services on the web. In this paper, a service crawler-based web service discovery framework is proposed, that employs information retrieval techniques to effectively retrieve available, published service descriptions. Their functional semantics is extracted for similarity computation and tag generation using natural language processing techniques. The framework is inherently dynamic in nature as new service descriptions may be continually added during periodic crawler runs or existing ones may be removed if service is unavailable. To deal with these issues, a dynamic, incremental clustering approach based on bird flocking behaviour is proposed. Experimental results show that semantic analysis and automatic tagging captured the services' functional semantics in a meaningful way. The algorithm effectively handled the dynamic requirements of the proposed framework by eliminating cluster recomputation overhead and achieved a speed-up factor of 61.8% when compared to hierarchical clustering. © 2015 Inderscience Enterprises Ltd.Item Semantic similarity based context-aware web service discovery using NLP techniques(Rinton Press Inc. sales@rintonpress.com, 2016) Kamath S?, S.S.; Ananthanarayana, V.S.Due to the high availability and also the distributed nature of published web services on the Web, efficient discovery and retrieval of relevant services that meet user requirements can be a challenging task. In this paper, we present a semantics based web service retrieval framework that uses natural language processing techniques to extract a service’s functional information. The extracted information is used to compute the similarity between any given service pair, for generating additional metadata for each service and for classifying the services based on their functional similarity. The framework also adds natural language querying capabilities for supporting exact and approximate matching of relevant services to a given user query. We present experimental results that show that the semantic analysis & automatic tagging effectively captured the inherent functional details of a service and also the similarity between different services. Also, a significant improvement in precision and recall was observed during Web service retrieval when compared to simple keyword matching search, using the natural language querying interface provided by the proposed framework. © Rinton Press.Item Dravidian language classification from speech signal using spectral and prosodic features(Springer New York LLC barbara.b.bertram@gsk.com, 2017) Koolagudi, S.G.; Bharadwaj, A.; Vishnu Srinivasa Murthy, Y.V.; Reddy, N.; Rao, P.The interesting aspect of the Dravidian languages is a commonality through a shared script, similar vocabulary, and their common root language. In this work, an attempt has been made to classify the four complex Dravidian languages using cepstral coefficients and prosodic features. The speech of Dravidian languages has been recorded in various environments and considered as a database. It is demonstrated that while cepstral coefficients can indeed identify the language correctly with a fair degree of accuracy, prosodic features are added to the cepstral coefficients to improve language identification performance. Legendre polynomial fitting and the principle component analysis (PCA) are applied on feature vectors to reduce dimensionality which further resolves the issue of time complexity. In the experiments conducted, it is found that using both cepstral coefficients and prosodic features, a language identification rate of around 87% is obtained, which is about 18% above the baseline system using Mel-frequency cepstral coefficients (MFCCs). It is observed from the results that the temporal variations and prosody are the important factors needed to be considered for the tasks of language identification. © 2017, Springer Science+Business Media, LLC.Item Discovering composable web services using functional semantics and service dependencies based on natural language requests(Springer New York LLC barbara.b.bertram@gsk.com, 2019) Kamath S?, S.; Ananthanarayana, V.S.The processes of service discovery, selection and composition are crucial tasks in web service based application development. Most web service-driven applications are complex and are composed of more than one service, so, it becomes important for application designers to identify the best service to perform the next task in the intended application’s workflow. In this paper, a framework for discovering composable service sets as per user’s complex requirements is proposed. The proposed approach uses natural language processing and semantics based techniques to extract the functional semantics of the service dataset and also to understand user context. In case of simple queries, basic services may be enough to satisfy the user request, however, in case of complex queries, several basic services may have to be identified to serve all the requirements, in the correct sequence. For this, the service dependencies of all the services are used for constructing a service interface graph for finding suitable composable services. Experiments showed that the proposed approach was effective towards finding relevant services for simple & complex queries and achieved an average accuracy rate of 75.09 % in finding correct composable service templates. © 2017, Springer Science+Business Media New York.Item Crime base: Towards building a knowledge base for crime entities and their relationships from online news papers(Elsevier Ltd, 2019) Srinivasa, S.; Santhi Thilagam, P.In the current era of internet, information related to crime is scattered across many sources namely news media, social networks, blogs, and video repositories, etc. Crime reports published in online newspapers are often considered as reliable compared to crowdsourced data like social media and contain crime information not only in the form of unstructured text but also in the form of images. Given the volume and availability of crime-related information present in online newspapers, gathering and integrating crime entities from multiple modalities and representing them as a knowledge base in machine-readable form will be useful for any law enforcement agencies to analyze and prevent criminal activities. Extant research works to generate the crime knowledge base, does not address extraction of all non-redundant entities from text and image data present in multiple newspapers. Hence, this work proposes Crime Base, an entity relationship based system to extract and integrate crime related text and image data from online newspapers with a focus towards reducing duplicity and loss of information in the knowledge base. The proposed system uses a rule-based approach to extract the entities from text and image captions. The entities extracted from text data are correlated using contextual as-well-as semantic similarity measures and image entities are correlated using low-level and high-level image features. The proposed system also presents an integrated view of these entities and their relations in the form of a knowledge base using OWL. The system is tested for a collection of crime related articles from popular Indian online newspapers. © 2019 Elsevier LtdItem Predicting ICD-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes(Elsevier B.V., 2020) Gangavarapu, T.; Jayasimha, A.; S. Krishnan, G.S.; Kamath S?, S.In hospitals, caregivers are trained to chronicle the subtle changes in the clinical conditions of a patient at regular intervals, for enabling decision-making. Caregivers’ text-based clinical notes are a significant source of rich patient-specific data, that can facilitate effective clinical decision support, despite which, this treasure-trove of data remains largely unexplored for supporting the prediction of clinical outcomes. The application of sophisticated data modeling and prediction algorithms with greater computational capacity have made disease prediction from raw clinical notes a relevant problem. In this paper, we propose an approach based on vector space and topic modeling, to structure the raw clinical data by capturing the semantic information in the nursing notes. Fuzzy similarity based data cleansing approach was used to merge anomalous and redundant patient data. Furthermore, we utilize eight supervised multi-label classification models to facilitate disease (ICD-9 code group) prediction. We present an exhaustive comparative study to evaluate the performance of the proposed approaches using standard evaluation metrics. Experimental validation on MIMIC-III, an open database, underscored the superior performance of the proposed Term weighting of unstructured notes AGgregated using fuzzy Similarity (TAGS) model, which consistently outperformed the state-of-the-art structured data based approach by 7.79% in AUPRC and 1.24% in AUROC. © 2019 Elsevier B.V.Item Summarization of Wireless Capsule Endoscopy Video Using Deep Feature Matching and Motion Analysis(Institute of Electrical and Electronics Engineers Inc., 2021) Sushma, B.; Aparna., P.Conventional Wireless capsule endoscopy (WCE) video summary generation techniques apprehend an image by extracting hand crafted features, which are not essentially sufficient to encapsulate the semantic similarity of endoscopic images. Use of supervised methods for extraction of deep features from an image need an enormous amount of accurate labelled data for training process. To solve this, we use an unsupervised learning method to extract features using convolutional auto encoder. Furthermore, WCE images are classified into similar and dissimilar pairs using fixed threshold derived through large number of experiments. Finally, keyframe extraction method based on motion analysis is used to derive a structured summary of WCE video. Proposed method achieves an average F-measure of 91.1% with compression ratio of 83.12%. The results indicate that the proposed method is more efficient compared to existing WCE video summarization techniques. © 2013 IEEE.Item Clustering and bootstrapping based framework for news knowledge base completion(Slovak Academy of Sciences, 2021) Srinivasa, K.; Santhi Thilagam, P.S.Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incrementally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for knowledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, information from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redundancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of bootstrapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles. © 2021 Slovak Academy of Sciences. All rights reserved.Item Ensemble deep neural network based quality of service prediction for cloud service recommendation(Elsevier B.V., 2021) Sahu, P.; Raghavan, S.; Chandrasekaran, K.Applications of Cloud Services are increasing day by day, and so is the difficulty of choosing the best-suited service for a customer. Quality of Service (QoS) parameters can be used for quality assurance and evaluation; further, a service can be recommended based on these QoS parameters’ values. Recommendation systems are getting much attention lately. It has a crucial role in almost all the major commercial platforms and many improvements are being made to make the recommendations more precise and closer to the user's requirements. Conventional Machine Learning algorithms and statistical analysis methods, presently are not that efficient in learning the complex correlation between data elements. Lately, Deep Learning models have proven to be practical and precise in areas like natural language processing, image processing, data mining, & data interpretation. However, there are not many examples of complete Deep Learning applications for cloud service recommendation systems, though some works partially use Deep Learning. We propose the Ensemble of Deep Neural Networks (EDNN) method, which is of the hybrid type, i.e., the fusion of neighborhood-based and neural network model-based methods. The output obtained from both the models are combined using another different neural network model. Our approach for predicting QoS values is simple and different from previous works, and the results show that it outperforms other classical methods marginally. © 2021 Elsevier B.V.Item A transformer-based architecture for fake news classification(Springer, 2021) Mehta, D.; Dwivedi, A.; Patra, A.; Anand Kumar, M.In today’s post-truth world, the proliferation of propaganda and falsified news poses a deadly risk of misinforming the public on a variety of issues, either through traditional media or on social media. Information people acquire through these articles and posts tends to shape their world view and provides reasoning for choices they take in their day to day lives. Thus, fake news can definitely be a malicious force, having massive real-world consequences. In this paper, we focus on classifying fake news using models based on a natural language processing framework, Bidirectional Encoder Representations from Transformers, also known as BERT. We fine-tune BERT for specific domain datasets and also make use of human justification and metadata for added performance in our models. We determine that the deep-contextualizing nature of BERT is effective for this task and obtain significant improvement over binary classification, and minimal yet important improvement in six-label classification in comparison with previously explored models. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.
