Krishnan, MN and Prakash, N and Lalitha, V and Sasidharan, B and Kumar, PV and Narayanamurthy, S and Kumar, R and Nandi, S (2014) Evaluation of codes with inherent double replication for hadoop. In: 6th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2014, 17-18, June, 2014, Philadelphia; United States.
PDF
EVA_CODES_INH.pdf - Published Version Restricted to Registered users only Download (579kB) | Request a copy |
Abstract
In this paper, we evaluate the efficacy, in a Hadoop setting, of two coding schemes, both possessing an inherent double replication of data. The two coding schemes belong to the class of regenerating and locally regenerating codes respectively, and these two classes are representative of recent advances made in designing codes for the efficient storage of data in a distributed setting. In comparison with triple replication, double replication permits a significant reduction in storage overhead, while delivering good MapReduce performance under moderate work loads. The two coding solutions under evaluation here, add only moderately to the storage overhead of double replication, while simultaneously offering reliability levels similar to that of triple replication. One might expect from the property of inherent data duplication that the performance of these codes in executing a MapReduce job would be comparable to that of double replication. However, a second feature of this class of code comes into play here, namely that under both coding schemes analyzed here, multiple blocks from the same coded stripe are required to be stored on the same node. This concentration of data belonging to a single stripe negatively impacts MapReduce execution times. However, much of this effect can be undone by simply adding a larger number of processors per node. Further improvements are possible if one tailors the Map task scheduler to the codes under consideration. We present both experimental and simulation results that validate these observations.
Item Type: | Conference Paper |
---|---|
Publication: | 6th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2014 |
Additional Information: | cited By 9; Conference of 6th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2014 ; Conference Date: 17 June 2014 Through 18 June 2014; Conference Code:156233 |
Keywords: | Digital storage; Facsimile; File organization, Coding scheme; Data duplication; Map task; Map-reduce; Regenerating codes; Reliability level; Storage overhead; Work loads, Codes (symbols) |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 26 Aug 2020 04:49 |
Last Modified: | 26 Aug 2020 04:49 |
URI: | http://eprints.iisc.ac.in/id/eprint/64840 |
Actions (login required)
View Item |