ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Deep Variational Filter Learning Models for Speech Recognition

Agrawal, P and Ganapathy, S (2019) Deep Variational Filter Learning Models for Speech Recognition. In: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 May 2019-17 May 2019, Brighton, pp. 5731-5735.

[img] PDF
ICASSP_2019.pdf - Published Version
Restricted to Registered users only

Download (13MB) | Request a copy
Official URL: https://doi.org/10.1109/ICASSP.2019.8682520


We present a novel approach to derive robust speech representations for automatic speech recognition (ASR) systems. The proposed method uses an unsupervised data-driven modulation filter learning approach that preserves the key modulations of speech signal in spectro-temporal domain. This is achieved by a deep generative modeling framework to learn modulation filters using convolutional variational autoencoder (CVAE). A skip connection based CVAE enables the learning of multiple irredundant modulation filters in the time and frequency modulation domain using temporal and spectral trajectories of input spectrograms. The learnt filters are used to process the spectrogram features for ASR training. The ASR experiments are performed on Aurora-4 (additive noise with channel artifact) and CHiME-3 (additive noise with reverberation) databases. The results show significant improvements for the proposed CVAE model over the baseline features as well as other robust front-ends (average relative improvements of 9 in word error rate over baseline features on Aurora-4 database and 23 on CHiME-3 database). In addition, the performance of the proposed features is highly beneficial for semi-supervised training of ASR when reduced amounts of labeled training data are available (average relative improvements of 29 over baseline features on Aurora-4 database with 30 of the labeled training data).

Item Type: Conference Paper
Publication: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Acoustic noise; Additive noise; Additives; Audio signal processing; Convolution; Database systems; Deep learning; Reverberation; Spectrographs; Speech; Speech communication; Speech recognition, Auto encoders; Automatic speech recognition system; Labeled training data; Modulation filtering; Modulation filters; Robust speech recognition; Semi-supervised trainings; Unsupervised filter learning, Modulation
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 30 Nov 2022 09:49
Last Modified: 30 Nov 2022 09:49
URI: https://eprints.iisc.ac.in/id/eprint/78377

Actions (login required)

View Item View Item