Suitability of syllable-based modeling units for end-to-end speech recognition in Sanskrit and other Indian languages

Anoop, CS and Ramakrishnan, AG (2023) Suitability of syllable-based modeling units for end-to-end speech recognition in Sanskrit and other Indian languages. In: Expert Systems with Applications, 220 .

PDF
exp_sys_app_220_2023.pdf - Published Version
Restricted to Registered users only
Download (870kB) | Request a copy

Official URL: https://doi.org/10.1016/j.eswa.2023.119722

Abstract

Most Indian languages are spoken in units of syllables. However, speech recognition systems developed so far for Indian languages have generally used characters or phonemes as modeling units. This work evaluates the performance of syllable-based modeling units in end-to-end speech recognition for several Indian languages. The text is represented in 3 different forms: native script, Sanskrit library phonetics (SLP1) encoding, and syllables, and tokenized with sub-word units like character, byte-pair encoding (BPE), and unigram language modeling (ULM). The performances of these tokens in monolingual training and cross-lingual transfer learning are compared. Syllable-based BPE/ULM subword units give promising results in the monolingual setup if the dataset is sufficiently diverse to represent the syllable distribution in the language. For the Vāksañcayaḥ dataset in Sanskrit, syllable-BPE tokens achieve state-of-the-art results. The capability of syllable-BPE units to complement SLP1-character models through a pretraining–finetuning setup is also evaluated. For Sanskrit, syllable-BPE achieves better word error rates (WER) than the pretraining–finetuning approaches. For Tamil and Telugu, both result in comparable WERs. SLP1-character units are largely better than syllable-based units for cross-lingual transfer learning. © 2023 Elsevier Ltd

Item Type:	Journal Article
Publication:	Expert Systems with Applications
Publisher:	Elsevier Ltd
Additional Information:	The copyright for this article belongs to Elsevier Ltd.
Keywords:	Character recognition; Computational linguistics; Encoding (symbols); Learning systems; Modeling languages; Signal encoding, Automatic speech recognition; Based modelling; Byte-pair encoding; End to end; Indian languages; Low-resource; Performance; Sanskrit; Subword units; Syllable-based ASR, Speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	16 Mar 2023 06:11
Last Modified:	16 Mar 2023 06:11
URI:	https://eprints.iisc.ac.in/id/eprint/80964

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India