Conference Papers

Search Results

Now showing 1 - 2 of 2

Performance analysis of graph based iterative algorithms on MapReduce framework
(Institute of Electrical and Electronics Engineers Inc., 2014) Debbarma, A.; Annappa, B.; Mude, R.G.
In the recent few years, there has been an enormous growth in the amount of digital data that is being produced. Numerous attempts are being made to process this large amount of data in a fast and effective manner. Hadoop MapReduce is one such software framework that has gained popularity in the last few years for distributed computation of Big Data. It provides a scalable, economical and easier way to process massive amounts of data in-parallel on large computing cluster preserving the properties of fault tolerance in a transparent manner. However, Hadoop always stores intermediate results to the local disk for running iterative jobs. As a result, Hadoop usually suffers from long execution runtimes for iterative jobs as it typically pays a high I/O cost, wasting CPU cycles and network bandwidth. This paper analyses the problems of existing Hadoop and compare its performance against iMapReduce and HaLoop for graph based iterative algorithms. HaLoop offers better performance as it stores intermediate results in cache and reuses those data on the next successive iteration. For using cache invariant data (inter-iteration locality) it schedules the tasks onto the same node that might occur in different iterations. Â© 2014 IEEE.
Capturing Node Resource Status and Classifying Workload for Map Reduce Resource Aware Scheduler
(Springer Verlag service@springer.de, 2015) Mude, R.G.; Betta, A.; Debbarma, A.
There has been an enormous growth in the amount of digital data, and numerous software frameworks have been made to process the same. Hadoop MapReduce is one such popular software framework which processes large data on commodity hardware. Job scheduler is a key component of Hadoop for assigning tasks to node. Existing MapReduce scheduler assigns tasks to node without considering node heterogeneity, workload type, and the amount of available resources. This leads to overburdening of node by one type of job and reduces the overall throughput. In this paper, we propose a new scheduler which capture the node resource status after every heartbeat, classifies jobs into two types, CPU bound and IO bound, and assigns task to the node which is having less CPU/IO utilization. The experimental result shows an improvement of 15-20 % on heterogeneous and around 10 % of homogeneous cluster with respect to Hadoop native scheduler. Â© Springer India 2015.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results