Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment

Jeyaraj, R.; Ananthanarayana, V.S.; Paul, A.

Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment

dc.contributor.author	Jeyaraj, R.
dc.contributor.author	Ananthanarayana, V.S.
dc.contributor.author	Paul, A.
dc.date.accessioned	2026-02-05T09:28:40Z
dc.date.issued	2020
dc.description.abstract	Big data is largely influencing business entities and research sectors to be more data-driven. Hadoop MapReduce is one of the cost-effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on-demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic-based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2-dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic-based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler. © 2019 John Wiley & Sons, Ltd.
dc.identifier.citation	Concurrency and Computation: Practice and Experience, 2020, 32, 7, pp. -
dc.identifier.issn	15320626
dc.identifier.uri	https://doi.org/10.1002/cpe.5558
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/23945
dc.publisher	John Wiley and Sons Ltd
dc.subject	Cloud computing
dc.subject	Cost effectiveness
dc.subject	Large dataset
dc.subject	Network security
dc.subject	Scheduling
dc.subject	Virtual machine
dc.subject	Bin packing
dc.subject	Cloud environments
dc.subject	Cloud service providers
dc.subject	Heterogeneous environments
dc.subject	Heterogeneous workloads
dc.subject	Large-scale datasets
dc.subject	Map/reduce
dc.subject	Resource utilizations
dc.subject	Job shop scheduling
dc.title	Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment

Collections

Journal Articles

Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment

Files

Collections