ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Sequence to Graph Alignment Using Gap-Sensitive Co-linear Chaining

Chandra, G and Jain, C (2023) Sequence to Graph Alignment Using Gap-Sensitive Co-linear Chaining. In: 27th International Conference on Research in Computational Molecular Biology, RECOMB 2023, 16 - 19 April 2023, Istanbul, pp. 58-73.

RECOMB_2023.pdf - Published Version

Download (1MB) | Preview
Official URL: https://doi.org/10.1007/978-3-031-29119-7_4


Co-linear chaining is a widely used technique in sequence alignment tools that follow seed-filter-extend methodology. It is a mathematically rigorous approach to combine short exact matches. For co-linear chaining between two sequences, efficient subquadratic-time chaining algorithms are well-known for linear, concave and convex gap cost functions [Eppstein et al. JACM’92]. However, developing extensions of chaining algorithms for directed acyclic graphs (DAGs) has been challenging. Recently, a new sparse dynamic programming framework was introduced that exploits small path cover of pangenome reference DAGs, and enables efficient chaining [Makinen et al. TALG’19, RECOMB’18]. However, the underlying problem formulation did not consider gap cost which makes chaining less effective in practice. To address this, we develop novel problem formulations and optimal chaining algorithms that support a variety of gap cost functions. We demonstrate empirically the ability of our provably-good chaining implementation to align long reads more precisely in comparison to existing aligners. For mapping simulated long reads from human genome to a pangenome DAG of 95 human haplotypes, we achieve precision while leaving reads unmapped.

Item Type: Conference Paper
Publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher: Springer Science and Business Media Deutschland GmbH
Additional Information: The copyright for this article belongs to the Authors.
Keywords: Directed graphs; Dynamic programming, Acyclic graphs; Cost-function; Minimum path cover; Pangenome; Path cover; Problem formulation; Rigorous approach; Sequence alignments; Sparse dynamic programming; Variation graph, Cost functions
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 19 May 2023 07:24
Last Modified: 19 May 2023 07:24
URI: https://eprints.iisc.ac.in/id/eprint/81575

Actions (login required)

View Item View Item