ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Bondhugula, Uday and Acharya, Aravind and Cohen, Albert (2016) The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests. In: ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 38 (3).

[img] PDF
Pro_Nat_Aca_Sci_Uni_Sta_Ame_38-3_12_2016.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: http://dx.doi.org/10.1145/2896389

Abstract

Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks such as the Pluto algorithm, which include a cost function for modern multicore architectures for which coarse-grained parallelism and locality are crucial, consider only a subspace of transformations to avoid a combinatorial explosion in finding transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations: in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In addition, there is currently no proof that the algorithm successfully finds a tree of permutable loop bands for all affine loop nests. In this article, we propose an approach to address these two issues (1) by modeling a much larger space of practically useful affine transformations in conjunction with the existing cost function of Pluto, and (2) by extending the Pluto algorithm in a way that allows a proof for its soundness and completeness for all affine loop nests. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance for any benchmark from Polybench. For the Lattice Boltzmann Method (LBM) simulations with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compilation time significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time by only 15%. In cases in which it improves execution time significantly, it increased polyhedral optimization time by only 2.04x.

Item Type: Journal Article
Additional Information: Copy right for this article belongs to the ASSOC COMPUTING MACHINERY, 2 PENN PLAZA, STE 701, NEW YORK, NY 10121-0701 USA
Keywords: Algorithms; Design; Experimentation; Performance; Automatic parallelization; locality optimization; polyhedral model; loop transformations; affine transformations; tiling
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Depositing User: Id for Latest eprints
Date Deposited: 15 Jun 2016 07:35
Last Modified: 15 Jun 2016 07:35
URI: http://eprints.iisc.ac.in/id/eprint/53961

Actions (login required)

View Item View Item