Chandra, G and Jain, C (2023) Sequence to Graph Alignment Using Gap-Sensitive Co-linear Chaining. In: 27th International Conference on Research in Computational Molecular Biology, RECOMB 2023, 16 - 19 April 2023, Istanbul, pp. 58-73.
|
PDF
RECOMB_2023.pdf - Published Version Download (1MB) | Preview |
Abstract
Co-linear chaining is a widely used technique in sequence alignment tools that follow seed-filter-extend methodology. It is a mathematically rigorous approach to combine short exact matches. For co-linear chaining between two sequences, efficient subquadratic-time chaining algorithms are well-known for linear, concave and convex gap cost functions [Eppstein et al. JACM’92]. However, developing extensions of chaining algorithms for directed acyclic graphs (DAGs) has been challenging. Recently, a new sparse dynamic programming framework was introduced that exploits small path cover of pangenome reference DAGs, and enables efficient chaining [Makinen et al. TALG’19, RECOMB’18]. However, the underlying problem formulation did not consider gap cost which makes chaining less effective in practice. To address this, we develop novel problem formulations and optimal chaining algorithms that support a variety of gap cost functions. We demonstrate empirically the ability of our provably-good chaining implementation to align long reads more precisely in comparison to existing aligners. For mapping simulated long reads from human genome to a pangenome DAG of 95 human haplotypes, we achieve precision while leaving reads unmapped.
Item Type: | Conference Paper |
---|---|
Publication: | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Publisher: | Springer Science and Business Media Deutschland GmbH |
Additional Information: | The copyright for this article belongs to the Authors. |
Keywords: | Directed graphs; Dynamic programming, Acyclic graphs; Cost-function; Minimum path cover; Pangenome; Path cover; Problem formulation; Rigorous approach; Sequence alignments; Sparse dynamic programming; Variation graph, Cost functions |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 19 May 2023 07:24 |
Last Modified: | 19 May 2023 07:24 |
URI: | https://eprints.iisc.ac.in/id/eprint/81575 |
Actions (login required)
View Item |