ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Improving GPGPU Concurrency with Elastic Kernels

Pai, Sreepathi and Thazhuthaveetil, Matthew J and Govindarajan, R (2013) Improving GPGPU Concurrency with Elastic Kernels. In: Eighteenth international conference on Architectural support for programming languages and operating systems , 16-20 March 2013, Houston, Texas, USA, pp. 407-418.

[img] PDF
acm_sig_not_48-4_407_2013.pdf - Published Version
Restricted to Registered users only

Download (896kB) | Request a copy
Official URL: http://dx.doi.org/10.1145/2451116.2451160

Abstract

Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs. On two-program workloads from the Parboil2 benchmark suite we find concurrent execution is often no better than serialized execution. We identify that the lack of control over resource allocation to kernels is a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware concurrency policies that offer significantly better performance and concurrency compared to the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil 2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads when compared to the current CUDA concurrency implementation.

Item Type: Conference Proceedings
Publication: ACM SIGPLAN NOTICES
Publisher: ASSOC COMPUTING MACHINERY
Additional Information: Copyright of this article is belongs ASSOC COMPUTING MACHINERY
Keywords: GPGPU; CUDA; Concurrent Kernels
Department/Centre: Division of Interdisciplinary Sciences > Supercomputer Education & Research Centre
Date Deposited: 09 Sep 2013 10:31
Last Modified: 09 Sep 2013 10:31
URI: http://eprints.iisc.ac.in/id/eprint/47078

Actions (login required)

View Item View Item