Comparison of Unsupervised Modulation Filter Learning Methods for ASR

Agrawal, Purvi and Ganapathy, Sriram (2018) Comparison of Unsupervised Modulation Filter Learning Methods for ASR. In: 19th Annual Conference of the International Speech Communication, 2-6 September, 2018, Hyderabad International Convention Centre (HICC)Hyderabad, pp. 2908-2912.

PDF
int_sep_2908-2912_2018 - Published Version
Restricted to Registered users only
Download (610kB) | Request a copy

Official URL: https://dx.doi.org/10.21437/Interspeech.2018-1972

Abstract

The widespread deployment of automatic speech recognition (ASR) system in consumer centric applications such as voice interaction and voice search demands the need for noise robustness in such systems. One approach to this problem is to achieve the desired robustness in speech representations used in the ASR. Motivated from studies on robust human speech recognition, we analyse the unsupervised data-driven temporal modulation filter learning for robust feature extraction. In this paper, we compare various unsupervised models for data driven filter learning like convolutional autoencoder (CAE), generative adversarial network (GAN) and convolutional restricted Boltzmann machine (CRBM). The unsupervised models are designed to learn a set of filters from long temporal trajectories of speech sub-band energy. The filters learnt from these models are used for modulation filtering of the input spectrogram before the ASR training. The ASR experiments are performed on Wall Street Journal (WSJ) Aurora-4 database with clean and multi condition training setup. The experimental results obtained from the modulation filtered representations shows considerable robustness to noise, channel distortions and reverberant conditions compared to other feature extraction methods. Among the three approaches compared in this paper, the GAN approach provides the most consistent improvements in ASR accuracy in different training scenarios.

Item Type:	Conference Paper
Series.:	Interspeech
Publisher:	ISCA-INT SPEECH COMMUNICATION ASSOC
Additional Information:	19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, AUG 02-SEP 06, 2018
Keywords:	Unsupervised learning; data-driven modulation filtering; generative adversarial network; convolutional autoencoder; robust speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	13 Aug 2020 07:58
Last Modified:	13 Aug 2020 07:58
URI:	http://eprints.iisc.ac.in/id/eprint/62927

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India