ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

Madhavaraj, A and Ramakrishnan, AG (2019) Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages. In: 25th National Conference on Communications, NCC 2019, 20 - 23 February 2019, Bangalore.

[img] PDF
NCC_2019.pdf - Published Version
Restricted to Registered users only

Download (3MB) | Request a copy
Official URL: https://doi.org/10.1109/NCC.2019.8732237

Abstract

We present two approaches to improve the performance of automatic speech recognition (ASR) systems for Gujarati, Tamil and Telugu. In the first approach using data-pooling with phone mapping (DP-PM), a deep neural network (DNN) is trained to predict the senones for the target language; then we use the feature vectors and their alignments from other source languages to map the phones from the source to the target language. The lexicons of the source languages are then modified using this phone mapping and an ASR system for the target language is trained using both the target and the modified source data. This DP-PM approach gives relative improvements in word error rates (WER) of 5.1 for Gujarati, 3.1 for Tamil and 3.4 for Telugu, over the corresponding baseline figures. In the second approach using multi-task DNN (MT-DNN) modeling, we use feature vectors from all the languages and train a DNN with three output layers, each predicting the senones of one of the languages. Objective functions of the output layers are modified such that during training, only those DNN layers responsible for predicting the senones of a language are updated, if the feature vector belongs to that language. This MT-DNN approach achieves relative improvements in WER of 5.7, 3.3 and 5.2 for Gujarati, Tamil and Telugu, respectively.

Item Type: Conference Paper
Publication: 25th National Conference on Communications, NCC 2019
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Alignment; Deep neural networks; Forecasting; Mapping; Modeling languages; Telephone sets, Cross-lingual; Data pooling; Gujarati; Multilingual trainings; Multitask learning; Parameter sharing; Senone posteriors; Tamil; Telugu, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 29 Nov 2022 05:43
Last Modified: 29 Nov 2022 05:43
URI: https://eprints.iisc.ac.in/id/eprint/78063

Actions (login required)

View Item View Item