ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Modulation Filter Learning Using Deep Variational Networks for Robust Speech Recognition

Agrawal, Purvi and Ganapathy, Sriram (2019) Modulation Filter Learning Using Deep Variational Networks for Robust Speech Recognition. In: IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 13 (2, SI). pp. 244-253.

[img] PDF
IEEE_Jou_Sel_Top_Sig_pro_13_2_244-253_2019.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.1109/JSTSP.2019.2913965


The performance of a typical speech recognition system is degraded in the presence of extrinsic sources like noise and due to the recording artifacts like reverberation. The principle of modulation filtering attempts to remove the spectro-temporal modulations of the speech signal that are more susceptible to noise while preserving the key modulations for speech recognition. While traditional approaches use modulation filters that are hand-crafted, we propose a novel method for modulation filter learning using deep variational models in this paper. Specifically, we pose the filter learning problem in a deep unsupervised generative modeling framework where the convolutional filters in the variational autoencoder capture the important speech modulations. The two-dimensional modulation filters, learned using the deep variational networks in the joint spectro-temporal domain, are used to process the spectrogram features for speech recognition task. Several speech recognition experiments are performed on a set of tasks consisting of additive noise with channel artifacts (Aurora-4), reverberation (REVERB Challenge), and additive noise with reverberation (CHiME-3). In these experiments, the proposed modulation filter learning framework shows significant improvements over the baseline features as well as various other noise robust front-ends (average relative improvements of 7.5% and 20% over the baseline features on the Aurora-4 and CHiME-3 databases respectively). Furthermore, the proposed method is also shown to be of considerable benefit for semi-supervised automatic speech recognition applications. For example, on Aurora-4 database we observe an average relative improvement of 25% over the baseline system using 30% labeled training data.

Item Type: Journal Article
Additional Information: copyright for this article belongs to IEEE
Keywords: Unsupervised filter learning; deep variational autoencoder; modulation filtering; noise robust speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 23 Jul 2019 06:37
Last Modified: 23 Jul 2019 06:37
URI: http://eprints.iisc.ac.in/id/eprint/62880

Actions (login required)

View Item View Item