ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions

Ansari, TK and Kumar, R and Singh, S and Ganapathy, S and Devi, S (2017) Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, 16 - 20 December 2017, Okinawa, pp. 762-768.

[img] PDF
IEEE_ASRU 2017_2018_762-768_2017.pdf - Published Version
Restricted to Registered users only

Download (220kB) | Request a copy
Official URL: https://doi.org/10.1109/ASRU.2017.8269014

Abstract

The task of language independent acoustic unit modeling in unlabeled raw speech (zero-resource setting) has gained significant interest over the recent years. The main challenge here is the extraction of acoustic representations that elicit good similarity between the same words or linguistic tokens spoken by different speakers and to derive these representations in a language independent manner. In this paper, we explore the use of Hidden Markov Model (HMM) based posteriograms for unsupervised acoustic unit modeling. The states of the HMM (which represent the language independent acoustic units) are initialized using a Gaussian mixture model (GMM) - Universal Background Model (UBM). The trained HMM is subsequently used to generate a temporally contiguous state alignment which are then modeled in a hybrid deep neural network (DNN) model. For the purpose of testing, we use the frame level HMM state posteriors obtained from the DNN as features for the ZeroSpeech challenge task. The minimal pair ABX error rate is measured for both the within and across speaker pairs. With several experiments on multiple languages in the ZeroSpeech corpus, we show that the proposed HMM based posterior features provides significant improvements over the baseline system using MFCC features (average relative improvements of 25 for within speaker pairs and 40 for across speaker pairs). Furthermore, the experiments where the target language is not seen training illustrate the proposed modeling approach is capable of learning global language independent representations. © 2017 IEEE

Item Type: Conference Paper
Publication: 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to the Institute of Electrical and Electronics Engineers Inc.
Keywords: Deep neural networks; Gaussian distribution; Hidden Markov models; Modeling languages; Trellis codes, Baseline systems; Gaussian Mixture Model; Language independents; Multiple languages; Posterior features; Resource conditions; Target language; Universal background model, Speech recognition
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 17 Jul 2022 06:29
Last Modified: 17 Jul 2022 06:29
URI: https://eprints.iisc.ac.in/id/eprint/74513

Actions (login required)

View Item View Item