Purushothaman, A and Sreeram, A and Kumar, R and Ganapathy, S (2020) Deep learning based dereverberation of temporal envelopes for robust speech recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 25-29 October 2020, Shanghai; China, pp. 1688-1692.
PDF
Pro-Con-Int-Spe-Com-Ass-2020-1688-1692.pdf - Published Version Restricted to Registered users only Download (787kB) | Request a copy |
Abstract
Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dereverberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers. The enhanced sub-band envelopes are used to generate features for automatic speech recognition (ASR). The ASR experiments are performed on the REVERB challenge dataset as well as the CHiME-3 dataset. In these experiments, the proposed neural enhancement approach provides significant improvements over a baseline ASR system with beamformed audio (average relative improvements of 21 on the development set and about 11 on the evaluation set in word error rates for REVERB challenge dataset). © 2020 ISCA
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher: | International Speech Communication Association |
Additional Information: | cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507 |
Keywords: | Frequency domain analysis; Multilayer neural networks; Network layers; Recurrent neural networks; Reverberation; Speech; Speech communication, Auto regressive models; Automatic speech recognition; Frequency domains; Linear prediction; Neural modeling; Reverberant condition; Robust speech recognition; Temporal envelopes, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 11 Jan 2021 11:06 |
Last Modified: | 11 Jan 2021 11:06 |
URI: | http://eprints.iisc.ac.in/id/eprint/67643 |
Actions (login required)
View Item |