Active learning methods for low resource end-to-end speech recognition

Malhotra, K and Bansal, S and Ganapathy, S (2019) Active learning methods for low resource end-to-end speech recognition. In: 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, 15 - 19 September 2019, Graz, pp. 2215-2219.

PDF
INTERSPEECH_2019.pdf - Published Version
Restricted to Registered users only
Download (289kB) | Request a copy

Official URL: https://doi.org/10.21437/Interspeech.2019-2316

Abstract

Recently developed end-to-end (E2E) automatic speech recognition (ASR) systems demand abundance of transcribed speech data, there are several scenarios where the labeling of speech data is cumbersome and expensive. For a fixed annotation cost, active learning for speech recognition allows to efficiently train the ASR model. In this work, we advance the most common approach for active learning methods which relies on uncertainty sampling technique. In particular, we explore the use of path probability of the decoded sequence as a confidence measure and select the samples with the least confidence for active learning. In order to reduce the sampling bias in active learning, we propose a regularized uncertainty sampling approach that incorporates an i-vector diversity measure. Thus, the active learning in the proposed framework uses a joint score of uncertainty and i-vector diversity. The benefits of the proposed approach are illustrated for an E2E ASR task performed on CSJ and Librispeech datasets. In these experiments, we show that the proposed approach yields considerable improvements over the baseline model using random sampling.

Item Type:	Conference Paper
Publication:	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher:	International Speech Communication Association
Additional Information:	The copyright for this article belongs to International Speech Communication Association.
Keywords:	Active learning; Diversity measures; End-to-end models; Speech recognition; Uncertainty sampling
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation Division of Interdisciplinary Sciences > Computational and Data Sciences Others
Date Deposited:	05 Dec 2022 09:54
Last Modified:	05 Dec 2022 09:54
URI:	https://eprints.iisc.ac.in/id/eprint/78253

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India