Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
2 results
Search Results
Item Resource aware scheduling in Hadoop for heterogeneous workloads based on load estimation(2013) Kapil, B.S.; Kamath S․, S.S.Currently, most cloud based applications require large scale data processing capability. Data to be processed is growing at a rate much faster than available computing power. Hadoop is used to enable distributed processing on large clusters of commodity hardware. In large clusters, the workloads may be heterogeneous in nature, that is, I/O bound, CPU bound or network intensive jobs that demand different types of resources requirement so as to run simultaneously on large cluster. Hadoops job scheduling is based on FIFO where, parallelization based on types of job has not been taken into account for scheduling. In this paper, we propose a new scheduling algorithm for Hadoop based distributed system, based on the classification of workloads to assign a specific category to a particular cluster according to current load of the cluster. The proposed scheduler increases the performance of both CPU and I/O resources in a cluster under heterogeneous workloads, by approximately 12% when compared to Hadoops FIFO scheduler. © 2013 IEEE.Item Workload characteristics and resource aware Hadoop scheduler(Institute of Electrical and Electronics Engineers Inc., 2015) Divya, M.; Annappa, B.Hadoop MapReduce is one of the largely used platforms for large scale data processing. Hadoop cluster has machines with different resources, including memory size, CPU capability and disk space. This introduces challenging research issue of improving Hadoop's performance through proper resource provisioning. The work presented in this paper focuses on optimizing job scheduling in Hadoop. Workload Characteristic and Resource Aware (WCRA) Hadoop scheduler is proposed, that classifies the jobs into CPU bound and Disk I/O bound. Based on the performance, nodes in the cluster are classified as CPU busy and Disk I/O busy. The amount of primary memory available in the node is ensured to be more than 25% before scheduling the job. Performance parameters of Map tasks such as the time required for parsing the data, map, sort and merge the result, and of Reduce task, such as the time to merge, parse and reduce is considered to categorize the job as CPU bound or Disk I/O bound. Tasks are assigned the priority based on their minimum Estimated Completion Time. The jobs are scheduled on a compute node in such a way that jobs already running on it will not be affected. Experimental results has given 30 % improvement in performance compared to Hadoop's FIFO, Fair and Capacity scheduler. © 2015 IEEE.
