ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Single-Forking of Coded Subtasks for Straggler Mitigation

Badita, A and Parag, P and Aggarwal, V (2021) Single-Forking of Coded Subtasks for Straggler Mitigation. In: IEEE/ACM Transactions on Networking, 29 (6). pp. 2413-2424.

[img] PDF
IEEE_acm_tra_29-6_2413-2424_2021.pdf - Submitted Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.1109/TNET.2021.3075377

Abstract

Given the unpredictable nature of the nodes in distributed computing systems, some of the tasks can be significantly delayed. Such delayed tasks are called stragglers. Straggler mitigation can be achieved by redundant computation. In maximum distance separable (MDS) redundancy method, a task is divided into k subtasks which are encoded to n coded subtasks, such that a task is completed if any k out of n coded subtasks are completed. Two important metrics of interest are task completion time, and server utilization which is the aggregate completed work by all servers in this duration. We consider a proactive straggler mitigation strategy where n0 out of n coded subtasks are started at time 0 while the remaining n-n0 coded subtasks are launched when ℓ0≤ min of the initial ones finish. The coded subtasks are halted when k of them finish. For this flexible forking strategy with multiple parameters, we analyze the mean of two performance metrics when the random service completion time at each server is independent and distributed identically (i.i.d.) to a shifted exponential. From this study, we find a tradeoff between the metrics which provides insights into the parameter choices. Experiments on Intel DevCloud illustrate that the shifted exponential distribution adequately captures the random coded subtask completion times, and our derived insights continue to hold. © 2021 IEEE.

Item Type: Journal Article
Publication: IEEE/ACM Transactions on Networking
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Completion time; Distributed computing systems; Forking point; K-out-of-n; Maximum distance; Mitigation strategy; Redundant computation; Straggler mitigation; Subtask; Task completion time
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 21 Feb 2023 04:42
Last Modified: 21 Feb 2023 04:42
URI: https://eprints.iisc.ac.in/id/eprint/80498

Actions (login required)

View Item View Item