ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

LexEQUAL: Supporting Multiscript Matching in Database Systems

Kumaran, A and Haritsa, Jayant R (2004) LexEQUAL: Supporting Multiscript Matching in Database Systems. [Book Chapter]

[img] PDF
LexEQUAL_BC_Feb23rd.pdf
Restricted to Registered users only

Download (613kB) | Request a copy

Abstract

To effectively support today’s global economy, database systems need to store and manipulate text data in multiple languages simultaneously. Current database systems do support the storage and management of multilingual data, but are not capable of querying or matching text data across different scripts. As a first step towards addressing this lacuna, we propose here a new query operator called LexEQUAL, which supports multiscript matching of proper names. The operator is implemented by first transforming matches in multiscript text space into matches in the equivalent phoneme space, and then using standard approximate matching techniques to compare these phoneme strings. The algorithm incorporates tunable parameters that impact the phonetic match quality and thereby determine the match performance in the multiscript space.We evaluate the performance of the LexEQUAL operator on a real multiscript names dataset and demonstrate that it is possible to simultaneously achieve good recall and precision by appropriate parameter settings.We also show that the operator run-time can be made extremely efficient by utilizing a combination of q-gram and database indexing techniques. Thus, we show that the LexEQUAL operator can complement the standard lexicographic operators, representing a first step towards achieving complete multilingual functionality in database systems.

Item Type: Book Chapter
Publication: 9th International Conference on Extending Database Technology,EDBT 2004, Heraklion, Crete, Greece (Lecture Notes in Computer Science)
Publisher: Springer Verlag
Additional Information: Copyright of this article belongs to Springer Verlag
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 23 Jun 2007
Last Modified: 19 Sep 2010 04:35
URI: http://eprints.iisc.ac.in/id/eprint/10068

Actions (login required)

View Item View Item