Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Jeyaraj, R.; Ananthanarayana, V.S.; Paul, A.

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

dc.contributor.author	Jeyaraj, R.
dc.contributor.author	Ananthanarayana, V.S.
dc.contributor.author	Paul, A.
dc.date.accessioned	2026-02-05T09:28:12Z
dc.date.issued	2020
dc.description.abstract	Big data overwhelmed industries and research sectors. Reliable decision making is always a challenging task, which requires cost-effective big data processing tools. Hadoop MapReduce is being used to store and process huge volume of data in a distributed environment. However, due to huge capital investment and lack of expertise to set up an on-premise Hadoop cluster, big data users seek cloud-based MapReduce service over the Internet. Mostly, MapReduce on a cluster of virtual machines is offered as a service for a pay-per-use basis. Virtual machines in MapReduce virtual cluster reside in different physical machines and co-locate with other non-MapReduce VMs. This causes to share IO resources such as disk and network bandwidth, leading to congestion as most of the MapReduce jobs are disk and network intensive. Especially, the shuffle phase in MapReduce execution sequence consumes huge network bandwidth in a multi-tenant environment. This results in increased job latency and bandwidth consumption cost. Therefore, it is essential to minimize the amount of intermediate data in the shuffle phase rather than supplying more network bandwidth that results in increased service cost. Considering this objective, we extended multi-level per node combiner for a batch of MapReduce jobs to improve makespan. We observed that makespan is improved up to 32.4% by minimizing the number of intermediate data in shuffle phase when compared to classical schedulers with default combiners. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
dc.identifier.citation	Journal of Ambient Intelligence and Humanized Computing, 2020, 11, 10, pp. 4261-4272
dc.identifier.issn	18685137
dc.identifier.uri	https://doi.org/10.1007/s12652-020-01707-7
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/23713
dc.publisher	Springer Science and Business Media Deutschland GmbH info@springer-sbm.com
dc.subject	Big data
dc.subject	Cost effectiveness
dc.subject	Decision making
dc.subject	Investments
dc.subject	Network security
dc.subject	Scheduling
dc.subject	Virtual machine
dc.subject	Virtual reality
dc.subject	Bandwidth consumption
dc.subject	Bandwidth minimization
dc.subject	Combiner
dc.subject	Distributed environments
dc.subject	Execution sequences
dc.subject	Job scheduling
dc.subject	Minimizing the number of
dc.subject	Virtualized environment
dc.subject	Bandwidth
dc.title	Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Collections

Journal Articles

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Files

Collections