ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH

Krishna, V and Ganapathy, S (2022) SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH. In: 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 - 27 May 2022, Virtual, Online at Singapore, pp. 3268-3272.

[img] PDF
IEEE_ICASSP-2022_2022_3268-3272_2022 .pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: https://doi.org/10.1109/ICASSP43922.2022.9747259

Abstract

The automatic discovery of acoustic sub-word units from raw speech, without any text or labels, is a growing field of research. The key challenge is to derive representations of speech that can be categorized into a small number of phoneme-like units which are speaker invariant and can broadly capture the content variability of speech. In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. Both the loss functions, the CPC and the clustering loss, are self-supervised. The clustering cost involves the loss function using the phoneme-like labels generated with an iterative k-means algorithm. The inclusion of this loss ensures that the model representations can be categorized into a small number of automatic speech units. We experiment with several sub-tasks described as part of the Zerospeech 2021 challenge to illustrate the effectiveness of the framework. In these experiments, we show that proposed representation learning approach improves significantly over the previous self-supervision based models as well as the wav2vec family of models on a range of word-level similarity tasks and language modeling tasks. © 2022 IEEE

Item Type: Conference Paper
Publication: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to the Institute of Electrical and Electronics Engineers Inc.
Keywords: Contrastive Predictive Coding; Deep clustering; Representation learning; Self-supervised learning; ZeroSpeech challenge
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 21 Jun 2022 10:33
Last Modified: 21 Jun 2022 10:33
URI: https://eprints.iisc.ac.in/id/eprint/73938

Actions (login required)

View Item View Item