ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Supervised I-vector modeling for language and accent recognition

Ramoji, Shreyas and Ganapathy, Sriram (2020) Supervised I-vector modeling for language and accent recognition. In: COMPUTER SPEECH AND LANGUAGE, 60 .

[img] PDF
Com_Spe_Lan_60_2020.pdf - Published Version
Restricted to Registered users only

Download (1MB)
Official URL: https://dx.doi.org/10.1016/j.csl.2019.101030

Abstract

The conventional i-vector approach to speaker and language recognition constitutes an unsupervised learning paradigm where a variable length speech utterance is converted into a fixed dimensional feature vector (termed as i-vector). The i-vector approach belongs to the broader family of factor analysis models where the utterance level adapted means of a Gaussian Mixture Model - Universal Background Model (GMM-UBM) are assumed to lie in a low rank subspace. The latent variables in the low rank model are assumed to have a standard Gaussian prior distribution. In this paper, we rework the theory of i-vector modeling in a supervised framework where the class labels (like language or accent) of the speech recordings are introduced directly into the i-vector model using a mixture Gaussian prior where each mixture component is associated with a class label. We provide the mathematical formulation for minimum mean squared error estimate (MMSE) of the supervised i-vector (s-vector) model. A detailed analysis of the s-vector model is given and this is contrasted with the traditional i-vector framework. The proposed model is used for language recognition tasks using the NIST Language Recognition Evaluation (LRE) 2017 dataset as well as an accent recognition task using the Mozilla common voices dataset. In these experiments, the s-vector model provides significant improvements over the conventional i-vector model (relative improvements of up to 24% for LRE task in terms of primary detection cost metric).

Item Type: Journal Article
Publication: COMPUTER SPEECH AND LANGUAGE
Publisher: ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
Additional Information: Copyright of this article belongs to ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
Keywords: Unsupervised i-vector; S-vector; Minimum-mean square error (MMSE) estimate; Language recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 07 Jan 2020 12:00
Last Modified: 07 Jan 2020 12:00
URI: http://eprints.iisc.ac.in/id/eprint/64077

Actions (login required)

View Item View Item