ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Pipelined Preconditioned Conjugate Gradient Methods for real and complex linear systems for distributed memory architectures

Tiwari, M and Vadhiyar, S (2022) Pipelined Preconditioned Conjugate Gradient Methods for real and complex linear systems for distributed memory architectures. In: Journal of Parallel and Distributed Computing, 163 . pp. 147-155.

[img] PDF
jou_par_dis_2022.pdf - Published Version
Restricted to Registered users only

Download (832kB) | Request a copy
Official URL: https://doi.org/10.1016/j.jpdc.2022.01.008


Preconditioned Conjugate Gradient (PCG) is a popular method for solving large and sparse linear systems of equations. The performance of PCG at scale is affected due to the costly global synchronization steps that arise in dot-products on distributed memory systems. Pipelined PCG (PIPECG) removes the costly global synchronization steps from PCG by only executing a single non-blocking allreduce per iteration and overlapping it with independent computations. In our previous work, we have developed a novel pipelined PCG algorithm called PIPECG-OATI (One Allreduce per Two Iterations) for real linear systems which executes a single non-blocking allreduce per two iterations and provides a large overlap of global communication with independent computations at higher number of cores. Our method achieves this overlap by using iteration combination and by introducing new recurrence and non-recurrence computations. We implement optimizations in the PIPECG-OATI method to use cache memory efficiently. In this work, we present PIPECG-OATI-c method for linear systems with complex Hermitian positive definite and complex symmetric matrices. We compare our method with various pipelined CG methods on a variety of problems and demonstrate that our method always gives the least run times. We performed experiments with our method using 20M and 30M unknowns on up to 16K cores and obtained up to 2.48X performance improvement over PCG and 2.14X performance improvement over PIPECG methods. We also experimented with up to 1-billion unknowns on 16K cores, the largest problem size explored for the CG problem, to our knowledge, and obtained about 25 improvement over PCG. © 2022 Elsevier Inc.

Item Type: Journal Article
Publication: Journal of Parallel and Distributed Computing
Publisher: Academic Press Inc.
Additional Information: The copyright for this article belongs to Academic Press Inc.
Keywords: Cache memory; Linear systems; Matrix algebra; Memory architecture; Pipelines, All-reduce; Complex hermitian positive definite system; Complex-symmetric systems; Global synchronization; Hermitians; Overlapping communication and computations; Performance; Pipelining; Positive definite; Preconditioned conjugate gradient, Conjugate gradient method
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 16 Mar 2022 06:05
Last Modified: 16 Mar 2022 06:05
URI: http://eprints.iisc.ac.in/id/eprint/71424

Actions (login required)

View Item View Item