Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
6 results
Search Results
Item Modified MapReduce framework for enhancing performance of graph based algorithms by fast convergence in distributed environment(Institute of Electrical and Electronics Engineers Inc., 2014) Singhal, H.; Guddeti, G.R.M.The amount of data which is produced is huge in current world and more importantly it is increasing exponentially. Traditional data storage and processing techniques are ineffective in handling such huge data [10]. Many real life applications require iterative computations in general and in particular used in most of machine learning and data mining algorithms over large datasets, such as web link structures and social network graphs. MapReduce is a software framework for easily writing applications which process large amount of data (multi-terabyte) in parallel on large clusters (thousands of nodes) of commodity hardware. However, because of batch oriented processing of MapReduce we are unable to utilize the benefits of MapReduce in iterative computations. Our proposed work is mainly focused on optimizing three factors resulting in performance improvement of iterative algorithms in MapReduce environment. In this paper, we address the key issues based on execution of tasks, the unnecessary creation of new task in each iteration and excessive shuffling of data in each iteration. Our preliminary experiments have shown promising results over the basic MapReduce framework. The comparative study with existing solutions based on MapReduce framework like HaLoop, has also shown better performance w.r.t algorithm run time and amount of data traffic over Hadoop Cluster. © 2014 IEEE.Item Capturing Node Resource Status and Classifying Workload for Map Reduce Resource Aware Scheduler(Springer Verlag service@springer.de, 2015) Mude, R.G.; Betta, A.; Debbarma, A.There has been an enormous growth in the amount of digital data, and numerous software frameworks have been made to process the same. Hadoop MapReduce is one such popular software framework which processes large data on commodity hardware. Job scheduler is a key component of Hadoop for assigning tasks to node. Existing MapReduce scheduler assigns tasks to node without considering node heterogeneity, workload type, and the amount of available resources. This leads to overburdening of node by one type of job and reduces the overall throughput. In this paper, we propose a new scheduler which capture the node resource status after every heartbeat, classifies jobs into two types, CPU bound and IO bound, and assigns task to the node which is having less CPU/IO utilization. The experimental result shows an improvement of 15-20 % on heterogeneous and around 10 % of homogeneous cluster with respect to Hadoop native scheduler. © Springer India 2015.Item Analysis of MapReduce scheduling and its improvements in cloud environment(Institute of Electrical and Electronics Engineers Inc., 2015) D'Souza, S.; Chandrasekaran, K.MapReduce has become a prominent Parallel processing model used for analysing large scale data. MapReduce applications are increasingly being deployed in the cloud along with other applications sharing the same physical resources. In this scenario, efficient scheduling of MapReduce applications is of utmost importance. Also, MapReduce has to consider various other parameters like energy efficiency and meeting SLA goals besides achieving performance when executing jobs in cloud environments. In this work, we have classified MapReduce Scheduling as Cluster based Scheduling and Objective based Scheduling. We then summarize and analyse the different class of schedulers highlighting the strong points and limitations of each of the scheduling approaches. The Adaptive scheduling techniques provide dynamic resource management and meet performance goals. The Energy efficient scheduling techniques aim to cut data centre costs by using different approaches. Finally, we discuss the current challenges and future work. © 2015 IEEE.Item Genome Data Analysis Using MapReduce Paradigm(Institute of Electrical and Electronics Engineers Inc., 2015) Pahadia, M.; Srivastava, A.; Srivastava, D.; Patil, N.Counting the number of occurences of a substringin a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. Ak-mer is a k-length sub string of a biological sequence. K-mercounting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. K-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. The current k-mer counting tools are both time and space costly. We provide a solution which uses MapReduce and Hadoop to reduce the time complexity. After applying the algorithms on real genome datasets, we concluded that the algorithm using Hadoopand MapReduce Paradigm runs more efficiently and reduces the time complexity significantly. © 2015 IEEE.Item Mining closed colossal frequent patterns from high-dimensional dataset: Serial versus parallel framework(Springer Verlag service@springer.de, 2018) Sureshan, S.; Penumacha, A.; Jain, S.; Vanahalli, M.; Patil, N.Mining colossal patterns is one of the budding fields with a lot of applications, especially in the field of bioinformatics and genetics. Gene sequences contain inherent information. Mining colossal patterns in such sequences can further help in their study and improve prediction accuracy. The increase in average transaction length reduces the efficiency and effectiveness of existing closed frequent pattern mining algorithm. The traditional algorithms expend most of the running time in mining huge amount of minute and midsize patterns which do not enclose valuable information. The recent research focused on mining large cardinality patterns called as colossal patterns which possess valuable information. A novel parallel algorithm has been proposed to extract the closed colossal frequent patterns from high-dimensional datasets. The algorithm has been implemented on Hadoop framework to exploit its inherent distributed parallelism using MapReduce programming model. The experiment results highlight that the proposed parallel algorithm on Hadoop framework gives an efficient performance in terms of execution time compared to the existing algorithms. © Springer Nature Singapore Pte Ltd. 2018.Item Dynamic Performance Aware Reduce Task Scheduling in MapReduce on Virtualized Environment(Institute of Electrical and Electronics Engineers Inc., 2018) Jeyaraj, R.; Ananthanarayana, V.S.Hadoop MapReduce as a service from cloud is widely used by various research, and commercial communities. Hadoop MapReduce is typically offered as a service hosted on virtualized environment in Cloud Data-Center. Cluster of virtual machines for MapReduce is placed across racks in Cloud Data-Center to achieve fault tolerance. But, it negatively introduces dynamic/heterogeneous performance for virtual machines due to hardware heterogeneity and co-located virtual machine's interference, which cause varying latency for same task. Alongside, curbing number of intermediate records and placing reduce tasks on right virtual node are also important to minimize MapReduce job latency further. In this paper, we introduce Multi-Level Per Node Combiner to minimize the number of intermediate records and Dynamic Ranking based MapReduce Job Scheduler to place reduce tasks on right virtual machine to minimize MapReduce job latency by exploiting dynamic performance of virtual machines. To experiment and evaluate, we launched 29 virtual machines hosted in eight different physical machines to run wordcount job on PUMA dataset. Our proposed methodology improves overall job latency up to 33% for wordcount job. © 2018 IEEE.
