ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Clay codes: Moulding MDS codes to yield an MSR code

Vajha, M and Ramkumar, V and Puranik, B and Kini, G and Lobo, E and Sasidharan, B and Kumar, PV and Barg, A and Ye, M and Narayanamurthy, S and Hussain, S and Nandi, S (2018) Clay codes: Moulding MDS codes to yield an MSR code. In: 16th USENIX Conference on File and Storage Technologies, 12 - 15 February 2018, Oakland, pp. 139-153.

Full text not available from this repository. (Request a copy)
Official URL: https://www.usenix.org/system/files/conference/fas...

Abstract

With increase in scale, the number of node failures in a data center increases sharply. To ensure availability of data, failure-tolerance schemes such as Reed-Solomon (RS) or more generally, Maximum Distance Separable (MDS) erasure codes are used. However, while MDS codes offer minimum storage overhead for a given amount of failure tolerance, they do not meet other practical needs of today’s data centers. Although modern codes such as Minimum Storage Regenerating (MSR) codes are designed to meet these practical needs, they are available only in highly-constrained theoretical constructions, that are not sufficiently mature enough for practical implementation. We present Clay codes that extract the best from both worlds. Clay (short for Coupled-Layer) codes are MSR codes that offer a simplified construction for decoding/repair by using pairwise coupling across multiple stacked layers of any single MDS code. In addition, Clay codes provide the first practical implementation of an MSR code that offers (a) low storage overhead, (b) simultaneous optimality in terms of three key parameters: repair bandwidth, sub-packetization level and disk I/O, (c) uniform repair performance of data and parity nodes and (d) support for both single and multiple-node repairs, while permitting faster and more efficient repair. While all MSR codes are vector codes, none of the distributed storage systems support vector codes. We have modified Ceph to support any vector code, and our contribution is now a part of Ceph’s master codebase. We have implemented Clay codes, and integrated it as a plugin to Ceph. Six example Clay codes were evaluated on a cluster of Amazon EC2 instances and code parameters were carefully chosen to match known erasure-code deployments in practice. A particular example code, with storage overhead 1.25x, is shown to reduce repair network traffic by a factor of 2.9 in comparison with RS codes and similar reductions are obtained for both repair time and disk read.

Item Type: Conference Paper
Publication: Proceedings of the 16th USENIX Conference on File and Storage Technologies, FAST 2018
Publisher: USENIX Association
Additional Information: The copyright for this article belongs to USENIX Association
Keywords: C (programming language); Digital storage; Fault tolerant computer systems; Forward error correction; Multiprocessing systems; Optimal systems; Repair, Code parameters; Distributed storage system; Failure tolerance; Maximum distance separable erasure codes; Multiple nodes; Network traffic; Pairwise couplings; Storage overhead, Codes (symbols)
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 13 Sep 2022 06:39
Last Modified: 13 Sep 2022 06:39
URI: https://eprints.iisc.ac.in/id/eprint/76037

Actions (login required)

View Item View Item