ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Prabhakar, Raghu and Govindarajan , R and Thazhuthaveetil, Matthew J (2012) CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters. In: Proceedings of the 18th International Conference Euro-Par 2012, August 27-31, 2012, Rhodes Island, Greece.

europar2012.pdf - Accepted Version

Download (396kB) | Preview
Official URL: http://dx.doi.org/10.1007/978-3-642-32820-6_42


Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Item Type: Conference Paper
Additional Information: Copyright of this article belongs to Springer.
Keywords: CUDA; Multi-Cores; Distributed Programming; Distributed Systems; Clusters; Software Distributed Shared Memory
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Depositing User: Francis Jayakanth
Date Deposited: 27 Nov 2013 07:54
Last Modified: 27 Nov 2013 07:54
URI: http://eprints.iisc.ac.in/id/eprint/47827

Actions (login required)

View Item View Item