ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Pruned Universal Symbol Sequences for LZW based Language identification

Basavaraja, SV and Sreenivas, TV (2008) Pruned Universal Symbol Sequences for LZW based Language identification. In: 6th Odyssy IEEE Work. Speech and Language Recognition, Stellenbosch, South Africa, January 21--24, 2008, South Africa.

[img] PDF
PRUNED_UNIVERSAL.pdf - Published Version
Restricted to Registered users only

Download (192kB) | Request a copy

Abstract

We present a improved language modeling technique for Lempel-Ziv-Welch (LZW) based LID scheme. The previous approach to LID using LZW algorithm prepares the language pattern table using LZW algorithm. Because of the sequential nature of the LZW algorithm, several language specific patterns of the language were missing in the pattern table. To overcome this, we build a universal pattern table, which contains all patterns of different length. For each language it's corresponding language specific pattern table is constructed by retaining the patterns of the universal table whose frequency of appearance in the training data is above the threshold.This approach reduces the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score[LZW-WDS]) for non native languages and increases the LID performance considerably.

Item Type: Conference Paper
Keywords: Language modeling;PRLM;Pattern table;LZW-CR;LZW-WDS.
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 17 Oct 2011 06:57
Last Modified: 17 Oct 2011 06:57
URI: http://eprints.iisc.ac.in/id/eprint/40641

Actions (login required)

View Item View Item