ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs

Chandra, G and Jain, C (2023) Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs. In: Journal of Computational Biology .

[img] PDF
Jou_com_bio_30_11_2023.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.1089/cmb.2023.0186

Abstract

A pangenome graph can serve as a better reference for genomic studies because it allows a compact representation of multiple genomes within a species. Aligning sequences to a graph is critical for pangenome-based resequencing. The seed-chain-extend heuristic works by finding short exact matches between a sequence and a graph. In this heuristic, colinear chaining helps identify a good cluster of exact matches that can be combined to form an alignment. Colinear chaining algorithms have been extensively studied for aligning two sequences with various gap costs, including linear, concave, and convex cost functions. However, extending these algorithms for sequence-to-graph alignment presents significant challenges. Recently, Makinen et al. introduced a sparse dynamic programming framework that exploits the small path cover property of acyclic pangenome graphs, enabling efficient chaining. However, this framework does not consider gap costs, limiting its practical effectiveness. We address this limitation by developing novel problem formulations and provably good chaining algorithms that support a variety of gap cost functions. These functions are carefully designed to enable fast chaining algorithms whose time requirements are parameterized in terms of the size of the minimum path cover. Through an empirical evaluation, we demonstrate the superior performance of our algorithm compared with existing aligners. When mapping simulated long reads to a pangenome graph comprising 95 human haplotypes, we achieved 98.7 precision while leaving <2 of reads unmapped. © Mary Ann Liebert, Inc.

Item Type: Journal Article
Publication: Journal of Computational Biology
Publisher: Mary Ann Liebert Inc.
Additional Information: The copyright for this article belongs to Mary Ann Liebert Inc.
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 24 Apr 2024 07:34
Last Modified: 24 Apr 2024 07:34
URI: https://eprints.iisc.ac.in/id/eprint/84345

Actions (login required)

View Item View Item