ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

ReneGENE-Novo: Co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads

Natarajan, S and KrishnaKumar, N and Anuchan, HV and Pal, D and Nandy, SK (2018) ReneGENE-Novo: Co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads. In: 14th International Symposium on Applied Reconfigurable Computing, ARC 2018, 2 - 4 May 2018, Santorini, pp. 564-577.

[img] PDF
app_rec_com_arc_too_app_564-577_2018.pdf - Published Version
Restricted to Registered users only

Download (886kB) | Request a copy
Official URL: https://doi.org/10.1007/978-3-319-78890-6_45

Abstract

Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150–600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors.

Item Type: Conference Paper
Publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher: Springer Verlag
Additional Information: The copyright for this article belongs to the Springer International Publishing AG, part of Springer Nature.
Keywords: Big data; Errors; Iterative methods; Reconfigurable architectures, Algorithm architectures; De novo assemblies; Genome assembly; Genome informatics; Multidimensional data; Multiple process; Next-generation sequencing; Performance scalability, Genes
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 27 Aug 2022 09:01
Last Modified: 27 Aug 2022 09:01
URI: https://eprints.iisc.ac.in/id/eprint/76078

Actions (login required)

View Item View Item