Kumaran, A and Haritsa, Jayant R (2004) LexEQUAL: Supporting Multiscript Matching in Database Systems. [Book Chapter]
PDF
LexEQUAL_BC_Feb23rd.pdf Restricted to Registered users only Download (613kB) | Request a copy |
Abstract
To effectively support today’s global economy, database systems need to store and manipulate text data in multiple languages simultaneously. Current database systems do support the storage and management of multilingual data, but are not capable of querying or matching text data across different scripts. As a first step towards addressing this lacuna, we propose here a new query operator called LexEQUAL, which supports multiscript matching of proper names. The operator is implemented by first transforming matches in multiscript text space into matches in the equivalent phoneme space, and then using standard approximate matching techniques to compare these phoneme strings. The algorithm incorporates tunable parameters that impact the phonetic match quality and thereby determine the match performance in the multiscript space.We evaluate the performance of the LexEQUAL operator on a real multiscript names dataset and demonstrate that it is possible to simultaneously achieve good recall and precision by appropriate parameter settings.We also show that the operator run-time can be made extremely efficient by utilizing a combination of q-gram and database indexing techniques. Thus, we show that the LexEQUAL operator can complement the standard lexicographic operators, representing a first step towards achieving complete multilingual functionality in database systems.
Item Type: | Book Chapter |
---|---|
Publication: | 9th International Conference on Extending Database Technology,EDBT 2004, Heraklion, Crete, Greece (Lecture Notes in Computer Science) |
Publisher: | Springer Verlag |
Additional Information: | Copyright of this article belongs to Springer Verlag |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 23 Jun 2007 |
Last Modified: | 19 Sep 2010 04:35 |
URI: | http://eprints.iisc.ac.in/id/eprint/10068 |
Actions (login required)
View Item |