Naini, AR and Rao Mv, A and Ghosh, PK (2019) Formant-gaps Features for Speaker Verification Using Whispered Speech. In: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 May 2019-17 May 2019, Brighton, pp. 6231-6235.
PDF
ICASSP_2019.pdf - Published Version Restricted to Registered users only Download (14MB) | Request a copy |
Abstract
In this work, we propose a new feature based on formants for whispered speaker verification (SV) task, where neutral data is used for enrollment and whispered recordings are used for test. Such a mismatch between enrollment and test often degrades the performance of whispered SV systems due to the difference in acoustic characteristics of whispered and neutral speech. We hypothesize that the proposed formant and formant gap (F oG) features are more invariant to the modes of speech in capturing speaker specific information compared to traditional baseline features for SV including mel frequency cepstral coefficients (MFCC) and auditory-inspired amplitude modulation features (AAMF). Whispered SV experiments with 714 speakers comprising 29232 neutral and 22932 whispered recordings reveal that the equal error rate (EER) using the proposed features is lower than that using the best baseline features by ~3.79 (absolute). It was also observed that at least four whispered recordings during enrollment are required for the baseline features to perform at par with the proposed features. However, it was found that the best performing baseline features yield an EER for neutral SV task which is ~1.88 higher than that using the proposed features.
Item Type: | Conference Paper |
---|---|
Publication: | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc. |
Keywords: | Audio recordings; Speech; Speech communication; Speech recognition, Acoustic characteristic; Equal error rate; Feature-based; formants; Mel-frequency cepstral coefficients; Speaker specific informations; Speaker verification; Whispered speech, Audio signal processing |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 30 Nov 2022 08:49 |
Last Modified: | 30 Nov 2022 08:49 |
URI: | https://eprints.iisc.ac.in/id/eprint/78384 |
Actions (login required)
View Item |