Semi-supervised Acoustic and Language Modeling for Hindi ASR

Rath, SP and Bandarupalli, TS and Shah, N and Onoe, N and Ganapathy, S (2022) Semi-supervised Acoustic and Language Modeling for Hindi ASR. In: 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, 18 - 22 September 2022, Incheon, pp. 3528-3532.

PDF
INTERSPEECH_2022.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy

Official URL: https://doi.org/10.21437/Interspeech.2022-10336

Abstract

This paper describes the submission made by our team to the Hindi Gram Vaani ASR challenge. This challenge involves building an ASR system for spontaneous telephonic recordings. The challenge is unique because of the small amount of labelled data available for model development. On top of that, the acoustic variabilities such as spontaneity of natural conversations, rich diversity of Hindi across India and varied backgrounds present in the corpus make it much more challenging. We participated in two of the three tracks where the first track involves 100 hours of labelled speech only and the second track involves 1000 hours of additional unlabelled corpus along with 100 hours of labelled speech. A Kaldi based hybrid model has been developed for the first and second track involving TDNN-F character based acoustic model, N-gram first pass decoding, RNN-LM re-scoring and system combinations. On the other hand, for the second track, an E2E conformer based system has been trained on representations obtained from a contrastive predictive coding (CPC) model. The results obtained for both the tracks are significantly better than the baseline results published by the challenge organizers on the development set consisting of 5 hours of audio.

Item Type:	Conference Paper
Publication:	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher:	International Speech Communication Association
Additional Information:	The copyright for this article belongs to International Speech Communication Association.
Keywords:	Modeling languages; Speech communication, Acoustic and language models; Acoustic variability; Conformer; Contrastive predictive coding; E2E ASR; Hybrid ASR; Labeled data; Model development; Predictive coding; Semi-supervised, Speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	10 Nov 2022 06:13
Last Modified:	10 Nov 2022 06:13
URI:	https://eprints.iisc.ac.in/id/eprint/77852

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India