Deep learning based dereverberation of temporal envelopes for robust speech recognition

Purushothaman, A and Sreeram, A and Kumar, R and Ganapathy, S (2020) Deep learning based dereverberation of temporal envelopes for robust speech recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 25-29 October 2020, Shanghai; China, pp. 1688-1692.

PDF
Pro-Con-Int-Spe-Com-Ass-2020-1688-1692.pdf - Published Version
Restricted to Registered users only
Download (787kB) | Request a copy

Official URL: https://dx.doi.org/10.21437/Interspeech.2020-2283

Abstract

Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dereverberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers. The enhanced sub-band envelopes are used to generate features for automatic speech recognition (ASR). The ASR experiments are performed on the REVERB challenge dataset as well as the CHiME-3 dataset. In these experiments, the proposed neural enhancement approach provides significant improvements over a baseline ASR system with beamformed audio (average relative improvements of 21 on the development set and about 11 on the evaluation set in word error rates for REVERB challenge dataset). Â© 2020 ISCA

Item Type:	Conference Paper
Publication:	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher:	International Speech Communication Association
Additional Information:	cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507
Keywords:	Frequency domain analysis; Multilayer neural networks; Network layers; Recurrent neural networks; Reverberation; Speech; Speech communication, Auto regressive models; Automatic speech recognition; Frequency domains; Linear prediction; Neural modeling; Reverberant condition; Robust speech recognition; Temporal envelopes, Speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	11 Jan 2021 11:06
Last Modified:	11 Jan 2021 11:06
URI:	http://eprints.iisc.ac.in/id/eprint/67643

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India