ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

SPINE: Putting Backbone into String Indexing

Neelapala, Naresh and Mittal, Romil and Haritsa, Jayant R (2004) SPINE: Putting Backbone into String Indexing. In: 20th International Conference on Data Engineering, 2004, 30 March-2 April, Massachusetts,USA, 325 -336.

[img]
Preview
PDF
SPINE.pdf

Download (510kB)

Abstract

The indexing technique commonly used for long strings, such as genomes, is the suffix tree, which is based on a vertical (intra-path) compaction of the underlying trie structure. We investigate an alternative approach to index building, based on horizontal (inter-path) compaction of the trie. In particular, we present SPINE, a carefully engineered horizontally-compacted trie index. SPINE consists of a backbone formed by a linear chain of nodes representing the underlying string, with the nodes connected by a rich set of edges for facilitating fast forward and backward traversals over the backbone during index construction and query search. A special feature of SPINE is that it collapses the trie into a linear structure, representing the logical extreme of horizontal compaction. We describe algorithms for SPINE construction and for searching this index to find the occurrences of query patterns. Our experimental results on a variety of real genomic and proteomic strings show that SPINE requires significantly less space than standard implementations of suffix trees. Further, SPINE takes lesser time for both construction and search as compared to suffix trees, especially when the index is disk-resident. Finally, the linearity of its structure makes it more amenable for integration with database engines.

Item Type: Conference Paper
Publisher: IEEE
Additional Information: ©1990 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 14 Dec 2005
Last Modified: 19 Sep 2010 04:22
URI: http://eprints.iisc.ac.in/id/eprint/4493

Actions (login required)

View Item View Item