ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Pruned universal symbol sequences for LZW based language identification

Basavaraja, SV and Sreenivas, TV (2008) Pruned universal symbol sequences for LZW based language identification. In: Speaker and Language Recognition Workshop, Odyssey 2008, 21-24 January 2008, Stellenbosch; South Africa.

[img] PDF
ODY_SPE_LAN_REC_WOR_2008.pdf - Published Version
Restricted to Registered users only

Download (192kB) | Request a copy
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....


We present a improved language modeling technique for Lempel-Ziv-Welch (LZW) based LID scheme. The previous approach to LID using LZW algorithm prepares the language pattern table using LZW algorithm. Because of the sequential nature of the LZW algorithm, several language specific patterns of the language were missing in the pattern table. To overcome this, we build a universal pattern table, which contains all patterns of different length. For each language it�s corresponding language specific pattern table is constructed by retaining the patterns of the universal table whose frequency of appearance in the training data is above the threshold. This approach reduces the classification score (Compression Ratio LZW-CR or the weighted discriminant score LZW-WDS) for non native languages and increases the LID performance considerably. © Odyssey 2008: Speaker and Language Recognition Workshop. All rights reserved.

Item Type: Conference Paper
Publication: Odyssey 2008: Speaker and Language Recognition Workshop
Publisher: International Speech Communication Association
Additional Information: cited By 0; Conference of Speaker and Language Recognition Workshop, Odyssey 2008 ; Conference Date: 21 January 2008 Through 24 January 2008; Conference Code:151517
Keywords: Computational linguistics; Natural language processing systems; Speech recognition, Frequency of appearance; Language identification; Language model; Non-native language; Pattern table; PRLM; Universal patterns; Weighted discriminants, Modeling languages
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 14 Oct 2020 11:16
Last Modified: 14 Oct 2020 11:16
URI: http://eprints.iisc.ac.in/id/eprint/65409

Actions (login required)

View Item View Item