Bedathur, Srikanta J and Haritsa, Jayant R (2004) Engineering a Fast Online Persistent Suffix Tree Construction. In: 20th International Conference on Data Engineering, 2004, 30 March-2 April, Massachusetts,USA, pp. 720-731.
|
PDF
Engineering.pdf Download (613kB) |
Abstract
Online persistent suffix tree construction has been considered impractical due to its excessive I/O costs. However, these prior studies have not taken into account the effects of the buffer management policy and the internal node structure of the suffix tree on I/O behavior of construction and subsequent retrievals over the tree. We study these two issues in detail in the context of large genomic DNA and protein sequences. In particular, we make the following contributions: (i) a novel, low-overhead buffering policy called TOP-Q which improves the on-disk behavior of suffix tree construction and subsequent retrievals, and (ii) empirical evidence that the space efficient linked-list representation of suffix tree nodes provides significantly inferior performance when compared to the array representation. These results demonstrate that a careful choice of implementation strategies can make online persistent suffix tree construction considerably more scalable - in terms of length of sequences indexed with a fixed memory budget, than currently perceived.
Item Type: | Conference Paper |
---|---|
Publisher: | IEEE |
Additional Information: | �©1990 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. |
Department/Centre: | Division of Interdisciplinary Sciences > Supercomputer Education & Research Centre |
Date Deposited: | 17 Feb 2008 |
Last Modified: | 19 Sep 2010 04:22 |
URI: | http://eprints.iisc.ac.in/id/eprint/4495 |
Actions (login required)
View Item |