Tiwari, M and Vadhiyar, S (2022) Communication Overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures. In: 27th International Conference on Parallel and Distributed Computing, Euro-Par 2021, 30 - 31 August 2021, Virtual, Online, pp. 535-539.
PDF
LNCS_2021.pdf - Published Version Restricted to Registered users only Download (179kB) | Request a copy |
Abstract
Preconditioned Conjugate Gradient (PCG) method has been one of the widely used methods for solving linear systems of equations for sparse problems. Pipelined PCG (PIPECG) attempts to eliminate the dependencies in the computations in the PCG algorithm and overlap non-dependent computations by reorganizing the traditional PCG code and using non-blocking allreduces. We have developed a novel pipelined PCG algorithm called PIPECG-OATI (One Allreduce per Two Iterations) which reduces the number of non-blocking allreduces to one per two iterations and provides large overlap of global communication and computations at higher number of cores in distributed memory CPU systems. PIPECG-OATI gives up to 3 × speedup over PCG and 1.73 × speedup over PIPECG at large number of cores. For GPU accelerated heterogeneous architectures, we have developed three methods for efficient execution of the PIPECG algorithm. These methods achieve task and data parallelism. Our methods give considerable performance improvements over PCG CPU and GPU implementations of Paralution and PETSc libraries.
Item Type: | Conference Paper |
---|---|
Publication: | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Publisher: | Springer Science and Business Media Deutschland GmbH |
Additional Information: | The copyright of this article belongs to the Springer Science and Business Media Deutschland GmbH. |
Keywords: | Conjugate gradient method; Memory architecture; Pipelines, All-reduce; Distributed memory systems; Heterogeneous architectures; Linear systems of equations; Memory system architectures; Non-blocking; Overlapping communication and computations; Preconditioned conjugate gradient; Preconditioned conjugate gradient algorithms; Preconditioned conjugate gradient method, Linear systems |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 13 Jul 2022 06:45 |
Last Modified: | 19 May 2023 10:06 |
URI: | https://eprints.iisc.ac.in/id/eprint/74758 |
Actions (login required)
View Item |