Please use this identifier to cite or link to this item:
Title: Multi-level per node combiner (MLPNC) to minimize mapreduce job latency on virtualized environment
Authors: Jeyaraj, R.
Ananthanarayana, V.S.
Issue Date: 2018
Citation: Proceedings of the ACM Symposium on Applied Computing, 2018, Vol., , pp.167-174
Abstract: Big data drove businesses and researches more data driven. Hadoop MapReduce is one of the cost-effective ways for processing huge amount of data and also offered as a service from cloud on cluster of Virtual Machines (VM). In Cloud Data Center (CDC), Hadoop VMs are co-located with other general purpose VMs across racks. Such a multi-tenancy leads to varying local network bandwidth availability for Hadoop VMs, which directly impacts MapReduce job latency. Because, shuffle phase in MapReduce execution sequence itself contributes 26%-70% of overall job latency due to large number of intermediate records. Therefore, Hadoop virtual cluster requires to ensure a maximum bandwidth to minimize job latency, but, it also increases the bandwidth usage cost. In this paper, we propose "Multi-Level Per Node Combiner" (MLPNC) that curtails the number of intermediate records in shuffle phase resulting to reduction in overall job latency. It also minimizes bandwidth usage cost as well. We evaluate MLPNC results on wordcount job against default combiner, and Per Node Combiner (PNC). We also discuss the results based on number of shuffled records, shuffle latency, average merge latency, average reduce latency, average reduce task start time, and overall job latency. Finally, we argue in favor of MLPNC as it achieves up to 33% reduction in number of intermediate records and up to 32% reduction in average job latency than PNC. � 2018 ACM.
Appears in Collections:2. Conference Papers

Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.