Ansari, TK and Kumar, R and Singh, S and Ganapathy, S (2018) Deep learning methods for unsupervised acoustic modeling-Leap submission to ZeroSpeech challenge 2017. In: IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, 16 - 20 December 2017, Okinawa, pp. 754-761.
PDF
IEEE_ASRU_754-761_2018.pdf - Published Version Restricted to Registered users only Download (247kB) | Request a copy |
Abstract
In this paper, we present our system submission to the ZeroSpeech 2017 Challenge. The track1 of this challenge is intended to develop language independent speech representations that provide the least pairwise ABX distance computed for within speaker and across speaker pairs of spoken words. We investigate two approaches based on deep learning methods for unsupervised modeling. In the first approach, a deep neural network (DNN) is trained on the posteriors of mixture component indices obtained from training a Gaussian mixture model (GMM)-UBM. In the second approach, we develop a similar hidden Markov model (HMM) based DNN model to learn the unsupervised acoustic units provided by HMM state alignments. In addition, we also develop a deep autoencoder which learns language independent embeddings of speech to train the HMM-DNN model. Both the approaches do not use any labeled training data or require any supervision. We perform several experiments using the ZeroSpeech 2017 corpus with the minimal pair ABX error measure. In these experiments, we find that the two proposed approaches significantly improve over the baseline system using MFCC features (average relative improvements of 30-40). Furthermore, the system combination of the two proposed approaches improves the performance over the best individual system.
Item Type: | Conference Paper |
---|---|
Publication: | 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to the IEEE. |
Keywords: | Communication channels (information theory); Deep neural networks; Gaussian distribution; Hidden Markov models; Object recognition; Trellis codes; Unsupervised learning, Autoencoders; Gaussian Mixture Model; Gaussian mixture model (GMMs); Hidden markov models (HMMs); Individual systems; Labeled training data; Language independents; System combination, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 14 Aug 2022 05:11 |
Last Modified: | 14 Aug 2022 05:11 |
URI: | https://eprints.iisc.ac.in/id/eprint/75684 |
Actions (login required)
View Item |