Naini, AR and Achuth Rao, MV and Ghosh, PK (2019) Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification. In: 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, 15 - 19 September 2019, Graz, pp. 4340-4344.
PDF
INTERSPEECH_2019.pdf - Published Version Restricted to Registered users only Download (291kB) | Request a copy |
Abstract
In this work, we propose a novel feature mapping (FM) from whispered to neutral speech features using a cosine similarity based objective function for speaker verification (SV) using whispered speech. Typically the performance of an SV system enrolled with neutral speech degrades significantly when tested using whispered speech, due to the differences between spectral characteristics of neutral and whispered speech. We hypothesize that FM from whispered Mel frequency cepstral coefficients (MFCC) to neutral MFCC by maximizing cosine similarity between neutral and whisper i-vectors yields better performance than the baseline method, which typically performs a direct FM between MFCC features by minimizing mean squared error (MSE). We also explored an affine transform between MFCC features using the proposed objective function. Whisper SV experiments with 1882 speakers reveal that the equal error rate (EER) using the proposed method is lower than that using the best baseline by ∼24% (relative). We show that the proposed FM system maintains the neutral SV performance, while improving the EER of whispered SV unlike baseline methods. We also show that the bias in the learned affine transform is corresponds to the glottal flow information, which is absent in the whispered speech.
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher: | International Speech Communication Association |
Additional Information: | The copyright for this article belongs to International Speech Communication Association. |
Keywords: | Cosine similarity; Feature mapping; Speaker verification; Whispered speech |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 05 Dec 2022 09:46 |
Last Modified: | 05 Dec 2022 09:46 |
URI: | https://eprints.iisc.ac.in/id/eprint/78251 |
Actions (login required)
View Item |