Prabhakar, Raghu and Govindarajan , R and Thazhuthaveetil, Matthew J (2012) CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters. In: Proceedings of the 18th International Conference Euro-Par 2012, August 27-31, 2012, Rhodes Island, Greece.
|
PDF
europar2012.pdf - Accepted Version Download (396kB) | Preview |
Abstract
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Item Type: | Conference Paper |
---|---|
Publisher: | Springer |
Additional Information: | Copyright of this article belongs to Springer. |
Keywords: | CUDA; Multi-Cores; Distributed Programming; Distributed Systems; Clusters; Software Distributed Shared Memory |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 27 Nov 2013 07:54 |
Last Modified: | 27 Nov 2013 07:54 |
URI: | http://eprints.iisc.ac.in/id/eprint/47827 |
Actions (login required)
View Item |