MapReduce scheduler to minimize the size of intermediate data in shuffle phase

Jeyaraj, R.; Ananthanarayana, V.S.; Paul, A.

Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/8512

Title:	MapReduce scheduler to minimize the size of intermediate data in shuffle phase
Authors:	Jeyaraj, R. Ananthanarayana, V.S. Paul, A.
Issue Date:	2019
Citation:	Proceedings - 18th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2019, 2019, Vol., , pp.30-34
Abstract:	Hadoop MapReduce is one of the cost-effective ways for processing huge data in this decade. Despite it is opensource, setting up Hadoop on-premise is not affordable for small-scale businesses and research entities. Therefore, consuming Hadoop MapReduce as a service from cloud is on increasing pace as it is scalable on-demand and based on pay-per-use model. In such multi-tenant environment, virtual bandwidth is an expensive commodity and co-located virtual machines race each other to make use of the bandwidth. A study shows that 26%-70% of MapReduce job latency is due to shuffle phase in MapReduce execution sequence. Primary expectation of a typical cloud user is to minimize the service usage cost. Allocating less bandwidth to the service costs less but increases job latency, consequently increases makespan. This trade-off is compromised by minimizing the amount of intermediate data generated in shuffle phase at application level. To achieve this, we proposed Time Sharing MapReduce Job Scheduler to minimize the amount of intermediate data; thus, service cost is cut down. As a by-product, MapReduce job latency and makespan also are improved. Result shows that our proposed model minimized the size of intermediate data upto 62.1%, when compared to the classical schedulers with combiners. � 2019 IEEE.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/8512
Appears in Collections:	2. Conference Papers

Files in This Item:

There are no files associated with this item.

Show full item record