ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification

Naini, AR and Achuth Rao, MV and Ghosh, PK (2019) Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification. In: 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, 15 - 19 September 2019, Graz, pp. 4340-4344.

[img] PDF
INTERSPEECH_2019.pdf - Published Version
Restricted to Registered users only

Download (291kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2019-2280

Abstract

In this work, we propose a novel feature mapping (FM) from whispered to neutral speech features using a cosine similarity based objective function for speaker verification (SV) using whispered speech. Typically the performance of an SV system enrolled with neutral speech degrades significantly when tested using whispered speech, due to the differences between spectral characteristics of neutral and whispered speech. We hypothesize that FM from whispered Mel frequency cepstral coefficients (MFCC) to neutral MFCC by maximizing cosine similarity between neutral and whisper i-vectors yields better performance than the baseline method, which typically performs a direct FM between MFCC features by minimizing mean squared error (MSE). We also explored an affine transform between MFCC features using the proposed objective function. Whisper SV experiments with 1882 speakers reveal that the equal error rate (EER) using the proposed method is lower than that using the best baseline by ∼24% (relative). We show that the proposed FM system maintains the neutral SV performance, while improving the EER of whispered SV unlike baseline methods. We also show that the bias in the learned affine transform is corresponds to the glottal flow information, which is absent in the whispered speech.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: Cosine similarity; Feature mapping; Speaker verification; Whispered speech
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 05 Dec 2022 09:46
Last Modified: 05 Dec 2022 09:46
URI: https://eprints.iisc.ac.in/id/eprint/78251

Actions (login required)

View Item View Item