ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Investigation of Different G2P Schemes for Speech Recognition in Sanskrit

Anoop, CS and Ramakrishnan, AG (2021) Investigation of Different G2P Schemes for Speech Recognition in Sanskrit. In: 28th International Conference on Neural Information Processing, ICONIP 2021, 8 - 12 Dec 2021, Virtual, Online, pp. 536-547.

Full text not available from this repository.
Official URL: https://doi.org/10.1007/978-3-030-92270-2_46


In this work, we explore the impact of different grapheme to phoneme (G2P) conversion schemes for the task of automatic speech recognition (ASR) in Sanskrit. The performance of four different G2P conversion schemes is evaluated on the ASR task in Sanskrit using a speech corpus of around 15.5 h duration. We also benchmark the traditional and neural network based Kaldi ASR systems on our corpus using these G2P schemes. Modified Sanskrit library phonetic (SLP1-M) encoding scheme performs the best in all Kaldi models except for the recent end-to-end (E2E) models trained with flat-start LF-MMI objective. We achieve the best results with factorized time-delay neural networks (TDNN-F) trained on lattice-free maximum mutual information (LF-MMI) objective when SLP1-M is employed. In this case, SLP1 achieves a word error rate (WER) of 8.4 on the test set with a relative improvement of 7.7 over SLP1. The best E2E models have a WER of 13.3 with the basic SLP1 scheme. The use of G2P schemes employing schwa deletion (as in Hindi, which uses the same Devanagari script as Sanskrit) degrades the performance of GMM-HMM models considerably. © 2021, Springer Nature Switzerland AG.

Item Type: Conference Paper
Publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher: Springer Science and Business Media Deutschland GmbH
Additional Information: The copyright for this article belongs to Springer Science and Business Media Deutschland GmbH.
Keywords: Neural networks, Automatic speech recognition; E2E; G2P; Kaldi; Lattice-free; Lattice-free maximum mutual information; Maximum mutual information; Performance; Sanskrit; Word error rate, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 12 Jan 2022 05:50
Last Modified: 12 Jan 2022 05:50
URI: http://eprints.iisc.ac.in/id/eprint/70943

Actions (login required)

View Item View Item