ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Accented Speech Recognition With Accent-specific Codebooks

Prabhu, D and Jyothi, P and Ganapathy, S and Unni, V (2023) Accented Speech Recognition With Accent-specific Codebooks. In: UNSPECIFIED, pp. 7175-7188.

[img] PDF
Con_emp_met_nat_lan_pro_pro_2023 - Published Version
Restricted to Registered users only

Download (988kB) | Request a copy

Abstract

Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to 37 relative improvement in word error rate) but also on the unseen accents (up to 5 relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training. © 2023 Association for Computational Linguistics.

Item Type: Conference Paper
Publication: EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
Publisher: Association for Computational Linguistics (ACL)
Additional Information: The copyright for this article belongs to Association for Computational Linguistics (ACL).
Keywords: Computational linguistics; Zero-shot learning, Accented speech; Automatic speech recognition; Automatic speech recognition system; Codebooks; End to end; Mozilla; Performance; Specific information; State of the art; Test data, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 17 May 2024 04:05
Last Modified: 17 May 2024 04:05
URI: https://eprints.iisc.ac.in/id/eprint/84543

Actions (login required)

View Item View Item