ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

PLDA inspired Siamese networks for speaker verification

Ramoji, Shreyas and Krishnan, Prashant and Ganapathy, Sriram (2022) PLDA inspired Siamese networks for speaker verification. In: Computer Speech & Language, 76 . ISSN 08852308

[img] PDF
com_spe_lan_76_2022.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: https://doi.org/10.1016/j.csl.2022.101383


The deep learning methodologies in state-of-the-art speaker recognition systems are predominantly limited to the extraction of recording level embeddings. This is usually followed by generative modeling of the embeddings to output the verification score. In this paper, we explore a fully neural approach where the neural model outputs the verification score directly, given the acoustic feature inputs. This model, termed as Siamese neural network (SiamNN), combines the embedding extraction and back-end modeling into a single processing pipeline. The back-end modeling is achieved using a neural approach to PLDA modeling, called neural probabilistic linear discriminant analysis (NPLDA). In the NPLDA model, the verification score is computed as a discriminative similarity function. The development of the single neural SiamNN model allows the joint optimization of all the modules using a verification cost. Several speaker recognition experiments are performed using SITW, VOiCES, and NIST SRE datasets where the proposed SiamNN model is shown to significantly improve over the state-of-art x-vector PLDA baseline system (relative improvements of up to 35% in the primary cost metric). We also provide a detailed analysis of the influence of hyper-parameters, choice of loss functions, and data sampling strategies for training the model. In particular, we highlight that the proposed soft detection cost function based optimization improves over other loss functions considered.

Item Type: Journal Article
Publication: Computer Speech & Language
Publisher: Academic Press
Additional Information: The copyright of this article belongs to the Academic Press
Keywords: Neural PLDA; Siamese networks; Speaker verification
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 20 May 2022 04:42
Last Modified: 20 May 2022 04:42
URI: https://eprints.iisc.ac.in/id/eprint/72305

Actions (login required)

View Item View Item