2. Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/1/7

Browse

Search Results

Now showing 1 - 10 of 26

NPRank: Nexus based Predicate Ranking of Linked Data
(2019) Sakthi, Murugan, R.; Ananthanarayana, V.S.
In the typical use case of browsing Linked Data in DBpedia, the user would find an average of 180 facts attached to each entity. These facts are ordered alphabetically based on predicates, but a logical ordering of these facts is a better option. In this article, we present a Nexus based predicate ranking of Linked Data facts named NPRank. The key idea of NPRank is, the importance of a predicate is directly proportional to its familiarity among its group called Nexus. NPRank is a language and endpoint independent model allowing seamless integration and querying of data from multiple endpoints. Nexus score generated to rank predicates also assists in fragmentation of large data and bring in more hidden data from the SPARQL endpoints. Our experiments, conducted with the ranking of the Linked Data facts, corresponding to most visited pages of Wikipedia; from 275 active SPARQL endpoints, achieves better performance than the state-of-the-art methods. � 2019 IEEE.
Multi-level per node combiner (MLPNC) to minimize mapreduce job latency on virtualized environment
(2018) Jeyaraj, R.; Ananthanarayana, V.S.
Big data drove businesses and researches more data driven. Hadoop MapReduce is one of the cost-effective ways for processing huge amount of data and also offered as a service from cloud on cluster of Virtual Machines (VM). In Cloud Data Center (CDC), Hadoop VMs are co-located with other general purpose VMs across racks. Such a multi-tenancy leads to varying local network bandwidth availability for Hadoop VMs, which directly impacts MapReduce job latency. Because, shuffle phase in MapReduce execution sequence itself contributes 26%-70% of overall job latency due to large number of intermediate records. Therefore, Hadoop virtual cluster requires to ensure a maximum bandwidth to minimize job latency, but, it also increases the bandwidth usage cost. In this paper, we propose "Multi-Level Per Node Combiner" (MLPNC) that curtails the number of intermediate records in shuffle phase resulting to reduction in overall job latency. It also minimizes bandwidth usage cost as well. We evaluate MLPNC results on wordcount job against default combiner, and Per Node Combiner (PNC). We also discuss the results based on number of shuffled records, shuffle latency, average merge latency, average reduce latency, average reduce task start time, and overall job latency. Finally, we argue in favor of MLPNC as it achieves up to 33% reduction in number of intermediate records and up to 32% reduction in average job latency than PNC. � 2018 ACM.
MapReduce scheduler to minimize the size of intermediate data in shuffle phase
(2019) Jeyaraj, R.; Ananthanarayana, V.S.; Paul, A.
Hadoop MapReduce is one of the cost-effective ways for processing huge data in this decade. Despite it is opensource, setting up Hadoop on-premise is not affordable for small-scale businesses and research entities. Therefore, consuming Hadoop MapReduce as a service from cloud is on increasing pace as it is scalable on-demand and based on pay-per-use model. In such multi-tenant environment, virtual bandwidth is an expensive commodity and co-located virtual machines race each other to make use of the bandwidth. A study shows that 26%-70% of MapReduce job latency is due to shuffle phase in MapReduce execution sequence. Primary expectation of a typical cloud user is to minimize the service usage cost. Allocating less bandwidth to the service costs less but increases job latency, consequently increases makespan. This trade-off is compromised by minimizing the amount of intermediate data generated in shuffle phase at application level. To achieve this, we proposed Time Sharing MapReduce Job Scheduler to minimize the amount of intermediate data; thus, service cost is cut down. As a by-product, MapReduce job latency and makespan also are improved. Result shows that our proposed model minimized the size of intermediate data upto 62.1%, when compared to the classical schedulers with combiners. � 2019 IEEE.
Cloud based service registry for location based mobile web services system
(2013) D'Souza, M.; Ananthanarayana, V.S.
Location based services (LBS) are growing in popularity due to the growing number of smart-phone users. The architectural design of LBS systems plays a major role in delivering location based services in ubiquitous environments. Service oriented architecture (SOA) which uses services as its basic constructs is the latest trend in designing and developing loosely coupled distributed applications even in heterogeneous environments. Cloud computing is another latest area which provides highly reliable and scalable infrastructure environment for resource intensive applications. This paper gives an overview of SOA based LBS system and explains how to move service registry to the cloud to utilize the best of both SOA and Cloud infrastructure. � 2013 IEEE.
Change propagation based incremental data handling in a Web service discovery framework
(2015) Sowmya, Kamath S.; Ananthanarayana, V.S.
Due to the explosive growth in availability of Web services over the open Web and the heterogeneous sources in which they are available, discovering relevant web services for a given task continues to be challenging. In order to deal with these problems, a bottom-up approach based on finding published service descriptions to automatically build a service repository for developing a web service discovery framework was proposed. This framework employs a Service Crawler to find published service descriptions on the Web. Since the service crawler will periodically make repeated runs to find new service descriptions or to check the continued availability of already crawled services, the framework is of an inherently dynamic nature. Hence, it is critical to keep track of various entities like visited URLs, already added & successfully processed service descriptions to avoid rework when the service data set changes. In order to cope with this problem, we developed a change propagation technique based on an event based state machine to incorporate an incremental processing strategy in this Web scale framework. � 2014 IEEE.
Dynamic Performance Aware Reduce Task Scheduling in MapReduce on Virtualized Environment
(2018) Jeyaraj, R.; Ananthanarayana, V.S.
Hadoop MapReduce as a service from cloud is widely used by various research, and commercial communities. Hadoop MapReduce is typically offered as a service hosted on virtualized environment in Cloud Data-Center. Cluster of virtual machines for MapReduce is placed across racks in Cloud Data-Center to achieve fault tolerance. But, it negatively introduces dynamic/heterogeneous performance for virtual machines due to hardware heterogeneity and co-located virtual machine's interference, which cause varying latency for same task. Alongside, curbing number of intermediate records and placing reduce tasks on right virtual node are also important to minimize MapReduce job latency further. In this paper, we introduce Multi-Level Per Node Combiner to minimize the number of intermediate records and Dynamic Ranking based MapReduce Job Scheduler to place reduce tasks on right virtual machine to minimize MapReduce job latency by exploiting dynamic performance of virtual machines. To experiment and evaluate, we launched 29 virtual machines hosted in eight different physical machines to run wordcount job on PUMA dataset. Our proposed methodology improves overall job latency up to 33% for wordcount job. � 2018 IEEE.
Document classification with a weighted frequency pattern tree algorithm
(2016) Dsouza, F.H.; Ananthanarayana, V.S.
Document classification can be defined as the task of automatically categorizing collections of electronic documents into their annotated classes, based on their contents. It is an important problem in Data mining. Due to the exponential growth of documents in the Internet and the emergent need to organize them, developing an efficient document classification method to automatically manipulate web documents is of great importance and has received an ever-increased attention in the recent years. However, the existing approaches to text classification treat documents primarily as a bag of words, where all the information about the document is gathered based on the presence of individual words in the document, and not in what order or context those words appear in a sentence. In this paper we investigate the possibility of adopting the FP-tree, a data structure used in itemset mining, for the representation of training documents in text classification while preserving sentence information. Comparison between our method and other conventional document classification algorithms is conducted on several corpora. The experimental results indicate that our proposed algorithm yields much better performance than other conventional algorithms, especially the ones with primarily disjoint classification categories. � 2016 IEEE.
Distributed Public Computing and Storage using Mobile Devices
(2019) Prem, Kumar, M.; Bhat, R.R.; Alavandar, S.R.; Ananthanarayana, V.S.
We see an increasing trend in the processing power and storage capacities of mobile phones. Combined with their large numbers and ubiquitous nature, they present new possibilities in the field of public resource computing, also called volunteer computing. An effective volunteer computing solution can be achieved by utilizing the idle CPU cycles and free storage space of these mobile phones. Existing solutions like BOINC cater mainly to large organizations and have complex procedures for submitting datasets and code for computation. Here we propose a novel distributed computing platform which enables the user to harness the public computing power with ease. The user needs to upload a dataset, the Java code that needs to be run on it, and the merge code that combines the results. We have come up with a distribution and scheduling algorithm which leverages the computational heterogeneity of the devices, the complexity of the task involved and the size of the dataset uploaded. The platform also provides a decentralized public storage, using which users can upload any file securely. It uses threshold cryptography on the uploaded files to create encrypted shares. This approach reduces the redundancy required to maintain availability. We have run a DNA sequence similarity algorithm on our system, utilizing a number of Android phones of different makes. Our results show that this approach is a viable, cost-efficient alternative to traditional distributed computing resources for performing non-time bound computations on large datasets. � 2018 IEEE.
Data trustworthiness in wireless sensor networks
(2016) Karthik, N.; Ananthanarayana, V.S.
Wireless Sensor Networks (WSN) comprises of tiny wireless sensor nodes installed in the terrain for continuous observation of physical or environmental conditions. Finding data trustworthiness is a prime pre-processing action in WSN because of harsh environment producing faulty data and insecure data transfer over WSN. The trustworthy of the data generated from sensor nodes play an important role to make critical decision. In this work, we propose a Data Trust Management Scheme (DTMS) to address the issue by assigning the trust score to data items. The proposed DTMS detects the data fault with the help of temporal and spatial correlations. The provenance data is used to evaluate the trust score of data item by similarity of value and provenance. The data trust score is utilized for making decision. Implementation of the proposed DTMS is done by simulations. Results show that the proposed DTMS detects untrustworthy data and score the data items which are useful for taking critical decisions. � 2016 IEEE.
Data trust model for event detection in wireless sensor networks using data correlation techniques
(2017) Karthik, N.; Ananthanarayana, V.S.
A wireless sensor network (WSN) is a conglomeration of scattered self organized sensor nodes to agreeably monitor the physical and surrounding conditions. These sensor nodes are equipped with limited resources such as memory, processing capability, battery power and transceiver for monitoring, processing and communicating the observed phenomena to make critical decisions with respect to collected data. Evaluating the trustworthiness of data is a primary preprocessing process of event detection in WSN. The trustworthy data which is free from data fault, inaccuracy and inconsistency is used to identify the interesting events and critical decision making in WSN. In this paper, we present our current work on data trust model that focuses on data fault detection, data reconstruction, data quality estimation for reliable event detection in WSN. The aim of this paper is to propose a novel data trust model for harsh environment of WSN to identify the events and strange environmental data behavior. This proposed framework combines different data processing methods through data correlation techniques to mitigate the data security risks of pervasive environments. � 2017 IEEE.

2. Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results