Rath, SP and Bandarupalli, TS and Shah, N and Onoe, N and Ganapathy, S (2022) Semi-supervised Acoustic and Language Modeling for Hindi ASR. In: 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, 18 - 22 September 2022, Incheon, pp. 3528-3532.
PDF
INTERSPEECH_2022.pdf - Published Version Restricted to Registered users only Download (1MB) | Request a copy |
Abstract
This paper describes the submission made by our team to the Hindi Gram Vaani ASR challenge. This challenge involves building an ASR system for spontaneous telephonic recordings. The challenge is unique because of the small amount of labelled data available for model development. On top of that, the acoustic variabilities such as spontaneity of natural conversations, rich diversity of Hindi across India and varied backgrounds present in the corpus make it much more challenging. We participated in two of the three tracks where the first track involves 100 hours of labelled speech only and the second track involves 1000 hours of additional unlabelled corpus along with 100 hours of labelled speech. A Kaldi based hybrid model has been developed for the first and second track involving TDNN-F character based acoustic model, N-gram first pass decoding, RNN-LM re-scoring and system combinations. On the other hand, for the second track, an E2E conformer based system has been trained on representations obtained from a contrastive predictive coding (CPC) model. The results obtained for both the tracks are significantly better than the baseline results published by the challenge organizers on the development set consisting of 5 hours of audio.
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher: | International Speech Communication Association |
Additional Information: | The copyright for this article belongs to International Speech Communication Association. |
Keywords: | Modeling languages; Speech communication, Acoustic and language models; Acoustic variability; Conformer; Contrastive predictive coding; E2E ASR; Hybrid ASR; Labeled data; Model development; Predictive coding; Semi-supervised, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 10 Nov 2022 06:13 |
Last Modified: | 10 Nov 2022 06:13 |
URI: | https://eprints.iisc.ac.in/id/eprint/77852 |
Actions (login required)
View Item |