ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages

Udupa, S and Bandekar, J and Deekshitha, G and Kumar, S and Ghosh, PK and Badiger, S and Singh, A and Murthy, S and Pai, P and Raghavan, S and Nanavati, R (2023) Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages. In: UNSPECIFIED.

[img] PDF
Aut_spe_rec_und_wor_asr_2023 - Published Version
Restricted to Registered users only

Download (314kB) | Request a copy
Official URL: https://doi.org/10.1109/ASRU57964.2023.10389624


In this work, several methods have been proposed towards improving the performance of dialectal automatic speech recognition (ASR). A novel encoder architecture has been introduced that is suited for multi-dialect ASR training. Further, we propose Multi-Task Self-Supervised learning (SSL) fine-tuning using CTC and dialect identification. Additionally, the use of different language models (LM) to improve the performance of dialectal ASR has been investigated. Around 800 hours of Bengali and Bhojpuri data, released as a part of the MADASR ASRU challenge have been used to train these models. The work shows that the proposed multi-encoder ASR observes a relative reduction of 7.5 and 9 in WER in Bhojpuri and Bengali, respectively. Additionally, we also observe a 1-2 WER reduction in fine-tuning SSL, further improving performance in these languages. Moreover, we observe advantages in using dialect-specific LM decoding based on predicted dialect. © 2023 IEEE.

Item Type: Conference Paper
Publication: 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Signal encoding; Supervised learning, Automatic speech recognition; Bengali and bhojpuri automatic speech recognition; Bengalis; Dialectal automatic speech recognition; Encoder architecture; Multi-dialect dataset; Multi-encoder architecture; Performance; Self-supervised learning, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 16 May 2024 10:15
Last Modified: 16 May 2024 10:15
URI: https://eprints.iisc.ac.in/id/eprint/84555

Actions (login required)

View Item View Item