Rajput, J and Chandra, G and Jain, C (2023) Co-Linear Chaining on Pangenome Graphs. In: 23rd International Workshop on Algorithms in Bioinformatics, WABI 2023, 4-6 September 2023, Bangalore, India.
PDF
LIPIcs_WABI_2023_273_2023 - Published Version Restricted to Registered users only Download (1MB) | Request a copy |
Abstract
Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact seed matches that can be combined to form an alignment. Recent works show how the co-linear chaining problem can be efficiently solved for acyclic pangenome graphs by exploiting their small width Makinen et al., TALG'19 and how incorporating gap cost in the scoring function improves alignment accuracy Chandra and Jain, RECOMB'23. However, it remains open on how to effectively generalize these techniques for general pangenome graphs which contain cycles. Here we present the first practical formulation and an exact algorithm for co-linear chaining on cyclic pangenome graphs. We rigorously prove the correctness and computational complexity of the proposed algorithm. We evaluate the empirical performance of our algorithm by aligning simulated long reads from the human genome to a cyclic pangenome graph constructed from 95 publicly available haplotype-resolved human genome assemblies. While the existing heuristic-based algorithms are faster, the proposed algorithm provides a significant advantage in terms of accuracy. © Jyotshna Rajput, Ghanshyam Chandra, and Chirag Jain; licensed under Creative Commons License CC-BY 4.0.
Item Type: | Conference Paper |
---|---|
Publication: | Leibniz International Proceedings in Informatics, LIPIcs |
Publisher: | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
Additional Information: | The copyright for this article belongs to the Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. |
Keywords: | Bioinformatics; Clustering algorithms; Genes; Genome; Graph theory; Heuristic algorithms, Alignment algorithms; Complex topology; Genetics diversities; Genome sequencing; Genomics; Human genomes; Path cover; Reference graphs; Sequence alignments; Variation graph, Graphic methods |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 17 Dec 2023 09:22 |
Last Modified: | 17 Dec 2023 09:22 |
URI: | https://eprints.iisc.ac.in/id/eprint/83454 |
Actions (login required)
View Item |