ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach

Kumar, Sandeep and Padakandla, Sindhu and Chandrashekar, L and Parihar, Priyank and Gopinath, K and Bhatnagar, Shalabh (2017) Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach. In: 10th IEEE International Conference on Cloud Computing (CLOUD), JUN 25-30, 2017, Honolulu, HI, pp. 375-382.

[img] PDF
Ieee_InT_Con_Clo_Com_375_2017.pdf - Published Version
Restricted to Registered users only

Download (245kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/CLOUD.2017.55

Abstract

Hadoop MapReduce is a popular framework for distributed storage and processing of large datasets and is used for big data analytics. It has various configuration parameters which play an important role in deciding the performance i.e., the execution time of a given big data processing job. Default values of these parameters do not result in good performance and therefore it is important to tune them. However, there is inherent difficulty in tuning the parameters due to two important reasons - first, the parameter search space is large and second, there are cross-parameter interactions. Hence, there is a need for a dimensionality-free method which can automatically tune the configuration parameters by taking into account the cross-parameter dependencies. In this paper, we propose a novel Hadoop parameter tuning methodology, based on a noisy gradient algorithm known as the simultaneous perturbation stochastic approximation (SPSA). The SPSA algorithm tunes the selected parameters by directly observing the performance of the Hadoop MapReduce system. The approach followed is independent of parameter dimensions and requires only 2 observations per iteration while tuning. We demonstrate the effectiveness of our methodology in achieving good performance on popular Hadoop benchmarks namely Grep, Bigram, Inverted Index, Word Co-occurrence and Terasort. Our method, when tested on a 25 node Hadoop cluster shows 45-66% decrease in execution time of Hadoop jobs on an average, when compared to prior methods. Further, our experiments also indicate that the parameters tuned by our method are resilient to changes in number of cluster nodes, which makes our method suitable to optimize Hadoop when it is provided as a service on the cloud.

Item Type: Conference Proceedings
Series.: IEEE International Conference on Cloud Computing
Publisher: IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Additional Information: Copy right for this article belong to IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 18 Apr 2018 18:22
Last Modified: 18 Apr 2018 18:22
URI: http://eprints.iisc.ac.in/id/eprint/59635

Actions (login required)

View Item View Item