ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A practical tile size selection model for affine loop nests

Narasimhan, K and Acharya, A and Baid, A and Bondhugula, U (2021) A practical tile size selection model for affine loop nests. In: 35th ACM International Conference on Supercomputing, ICS 2021, 14-17 Jun 2021, pp. 27-39.

Full text not available from this repository.
Official URL: https://doi.org/10.1145/3447818.3462213

Abstract

Loop tiling for locality is an important transformation for general-purpose and domain-specific compilation as it allows programs to exploit the benefits of deep memory hierarchies. Most code generation tools with the infrastructure to perform automatic tiling of loop nests rely on auto-tuning to find good tile sizes. Tile size selection models proposed in the literature either fall back to modeling complex non-linear optimization problems or tackle a narrow class of inputs. Hence, a fast and generic tile size selection model is desirable for it to be adopted into compiler infrastructures like those of GCC, LLVM, or MLIR. In this paper, we propose a new, fast and lightweight tile size selection model that considers temporal and spatial reuse along dimensions of a loop nest. For an n-dimensional loop nest, we determine the tile sizes by calculating the zeros of a polynomial in a single variable of degree at most n. Our tile size calculation model also accounts for vectorizability of the innermost dimension. We demonstrate the generality of our approach by selecting benchmarks from various domains: linear algebra kernels, digital signal processing (DSP) and image processing. We implement our tile size selection model in PolyMage (a domain-specific language and compiler for image processing pipelines) and Pluto (state-of-the-art polyhedral auto-parallelizer). Implementing the model in PolyMage allows us to extend it to DSP and linear algebra domains and also incorporate idiom recognition phases so that optimized vendor-specific library implementations could be utilized whenever profitable. Our experiments demonstrate a significant geomean performance gain of 2.2� over Matlab on benchmarks from the DSP domain. For PolyBench, we obtain a geomean speedup of 1.04� (maximum speedup of 1.3�) over Pluto. © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Item Type: Conference Paper
Publication: Proceedings of the International Conference on Supercomputing
Publisher: Association for Computing Machinery
Additional Information: The copyright for this article belongs to Association for Computing Machinery
Keywords: Automatic programming; Benchmarking; Cache memory; Digital signal processing; Image processing; Linear algebra; MATLAB; Nonlinear programming; Pipeline processing systems; Problem oriented languages; Program compilers, Affine loop nests; Code generation tools; Digital signal processing (DSP); Domain specific languages; Image processing pipeline; Non-linear optimization problems; State of the art; Temporal and spatial, Ion beams
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 26 Aug 2021 10:24
Last Modified: 26 Aug 2021 10:24
URI: http://eprints.iisc.ac.in/id/eprint/69321

Actions (login required)

View Item View Item