Modulation Filter Learning Using Deep Variational Networks for Robust Speech Recognition

Agrawal, Purvi and Ganapathy, Sriram (2019) Modulation Filter Learning Using Deep Variational Networks for Robust Speech Recognition. In: IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 13 (2, SI). pp. 244-253.

PDF
IEEE_Jou_Sel_Top_Sig_pro_13_2_244-253_2019.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy

Official URL: https://doi.org/10.1109/JSTSP.2019.2913965

Abstract

The performance of a typical speech recognition system is degraded in the presence of extrinsic sources like noise and due to the recording artifacts like reverberation. The principle of modulation filtering attempts to remove the spectro-temporal modulations of the speech signal that are more susceptible to noise while preserving the key modulations for speech recognition. While traditional approaches use modulation filters that are hand-crafted, we propose a novel method for modulation filter learning using deep variational models in this paper. Specifically, we pose the filter learning problem in a deep unsupervised generative modeling framework where the convolutional filters in the variational autoencoder capture the important speech modulations. The two-dimensional modulation filters, learned using the deep variational networks in the joint spectro-temporal domain, are used to process the spectrogram features for speech recognition task. Several speech recognition experiments are performed on a set of tasks consisting of additive noise with channel artifacts (Aurora-4), reverberation (REVERB Challenge), and additive noise with reverberation (CHiME-3). In these experiments, the proposed modulation filter learning framework shows significant improvements over the baseline features as well as various other noise robust front-ends (average relative improvements of 7.5% and 20% over the baseline features on the Aurora-4 and CHiME-3 databases respectively). Furthermore, the proposed method is also shown to be of considerable benefit for semi-supervised automatic speech recognition applications. For example, on Aurora-4 database we observe an average relative improvement of 25% over the baseline system using 30% labeled training data.

Item Type:	Journal Article
Publication:	IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING
Publisher:	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Additional Information:	copyright for this article belongs to IEEE
Keywords:	Unsupervised filter learning; deep variational autoencoder; modulation filtering; noise robust speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	23 Jul 2019 06:37
Last Modified:	23 Jul 2019 06:37
URI:	http://eprints.iisc.ac.in/id/eprint/62880

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India