Anoop, CS and Ramakrishnan, AG (2023) Suitability of syllable-based modeling units for end-to-end speech recognition in Sanskrit and other Indian languages. In: Expert Systems with Applications, 220 .
PDF
exp_sys_app_220_2023.pdf - Published Version Restricted to Registered users only Download (870kB) | Request a copy |
Abstract
Most Indian languages are spoken in units of syllables. However, speech recognition systems developed so far for Indian languages have generally used characters or phonemes as modeling units. This work evaluates the performance of syllable-based modeling units in end-to-end speech recognition for several Indian languages. The text is represented in 3 different forms: native script, Sanskrit library phonetics (SLP1) encoding, and syllables, and tokenized with sub-word units like character, byte-pair encoding (BPE), and unigram language modeling (ULM). The performances of these tokens in monolingual training and cross-lingual transfer learning are compared. Syllable-based BPE/ULM subword units give promising results in the monolingual setup if the dataset is sufficiently diverse to represent the syllable distribution in the language. For the Vāksañcayaḥ dataset in Sanskrit, syllable-BPE tokens achieve state-of-the-art results. The capability of syllable-BPE units to complement SLP1-character models through a pretraining–finetuning setup is also evaluated. For Sanskrit, syllable-BPE achieves better word error rates (WER) than the pretraining–finetuning approaches. For Tamil and Telugu, both result in comparable WERs. SLP1-character units are largely better than syllable-based units for cross-lingual transfer learning. © 2023 Elsevier Ltd
Item Type: | Journal Article |
---|---|
Publication: | Expert Systems with Applications |
Publisher: | Elsevier Ltd |
Additional Information: | The copyright for this article belongs to Elsevier Ltd. |
Keywords: | Character recognition; Computational linguistics; Encoding (symbols); Learning systems; Modeling languages; Signal encoding, Automatic speech recognition; Based modelling; Byte-pair encoding; End to end; Indian languages; Low-resource; Performance; Sanskrit; Subword units; Syllable-based ASR, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 16 Mar 2023 06:11 |
Last Modified: | 16 Mar 2023 06:11 |
URI: | https://eprints.iisc.ac.in/id/eprint/80964 |
Actions (login required)
View Item |