ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Neural PLDA modeling for end-to-end speaker verification

Ramoji, S and Krishnan, P and Ganapathy, S (2020) Neural PLDA modeling for end-to-end speaker verification. In: 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, 25 October 2020, Shanghai; China, pp. 4333-4337.

INT-Neu-Vol-2020.pdf - Published Version

Download (451kB) | Preview
Official URL: https://dx.doi.org/10.21437/Interspeech.2020-2699


While deep learning models have made significant advances in supervised classification problems, the application of these models for out-of-set verification tasks like speaker recognition has been limited to deriving feature embeddings. The state-of-the-art x-vector PLDA based speaker verification systems use a generative model based on probabilistic linear discriminant analysis (PLDA) for computing the verification score. Recently, we had proposed a neural network approach for backend modeling in speaker verification called the neural PLDA (NPLDA) where the likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. In this paper, we extend this work to achieve joint optimization of the embedding neural network (x-vector network) with the NPLDA network in an end-to-end (E2E) fashion. This proposed end-to-end model is optimized directly from the acoustic features with a verification cost function and during testing, the model directly outputs the likelihood ratio score. With various experiments using the NIST speaker recognition evaluation (SRE) 2018 and 2019 datasets, we show that the proposed E2E model improves significantly over the x-vector PLDA baseline speaker verification system. © 2020 ISCA

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507
Keywords: Cost functions; Deep learning; Discriminant analysis; Embeddings; Neural networks; Speech communication, Joint optimization; Probabilistic linear discriminant analysis; Similarity functions; Speaker recognition; Speaker recognition evaluations; Speaker verification; Speaker verification system; Supervised classification, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 12 Jan 2021 05:45
Last Modified: 12 Jan 2021 05:45
URI: http://eprints.iisc.ac.in/id/eprint/67634

Actions (login required)

View Item View Item