ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

Natarajan, S and Krishna Kumar, N and Pal, D and Nandy, SK (2020) Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective. In: Journal of Signal Processing Systems, 92 (10). pp. 1197-1213.

[img] PDF
Jou_Sig_Pro_92_10_1197-1213_2020.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: https://doi.org/10.1007/s11265-019-01452-x

Abstract

Genome Informatics (GI) involves accurate computational investigations of strongly correlated subsystems that demands inter-disciplinary approaches for problem solving. With the growing volume of genomic sequencing data at an alarming rate, High Performance Computing (HPC) solutions offer the right platform to address the computational needs. GI requires algorithm-architecture co-design of parallel and accelerated biocomputing involving reconfigurable hardware like FPGAs and graphics accelerators or GPUs, to bridge the gap between growing data volumes and compute capabilities. Such platforms offer high degrees of parallelism and scalability, while accelerating the multi-stage GI computational pipeline. Amidst such high computing power, it is the choice of algorithms and implementations in the entirety of the GI pipeline that decides the precision of bio-computing in revealing biologically relevant information. Through this paper, we present ReneGENE-GI, an innovatively engineered GI pipeline. This paper details the performance analysis of ReneGENE-GI’s Comparative Genomics Module (CGM), the compute intensive stage of the pipeline. This module comes in two flavours, designed to run on GPUs and FPGAs respectively, hosted on HPC platforms. The pipeline uses a very efficient reference indexing algorithm based on the dynamic Monotonic Minimal Perfect Hashing Function (MMPH), allowing an absolute indexing for the reference genome, thus avoiding heuristics. Alignment time for our FPGA version is about one-tenth the time taken by our single GPU implementation, which itself is 2.62x faster than CUSHAW2-GPU (the GPU CUDA implementation of CUSHAW). With the single-GPU implementation demonstrating a speed up of 150+ x over standard heuristic aligners in the market like BFAST, the FPGA version of our CGM is several orders faster than the competitors, offering precision over heuristics.

Item Type: Journal Article
Publication: Journal of Signal Processing Systems
Publisher: Springer
Additional Information: The copyright for this article belongs to Springer.
Keywords: Field programmable gate arrays (FPGA); Genes; Graphics processing unit; Indexing (of information); Informatics; Pipelines; Program processors; Reconfigurable architectures; Reconfigurable hardware, Algorithm architectures; Computational investigation; Genome informatics; High performance computing; High performance computing (HPC); Minimal perfect hashing; Sequencing; Short-read mappings, Computer hardware
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 13 Feb 2023 10:30
Last Modified: 13 Feb 2023 10:30
URI: https://eprints.iisc.ac.in/id/eprint/80216

Actions (login required)

View Item View Item