Ramoji, S and Mohan, A and Mysore, B and Bhatia, A and Singh, P and Vardhan, H and Ganapathy, S (2019) The Leap Speaker Recognition System for NIST SRE 2018 Challenge. In: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 May 2019-17 May 2019, Brighton, pp. 5771-5775.
PDF
ICASSP_2019.pdf - Published Version Restricted to Registered users only Download (12MB) | Request a copy |
Abstract
The NIST Speaker Recognition Evaluation (SRE) 2018 challenge comprises an open evaluation of the text independent speaker verification task. This paper summarizes the LEAP speaker verification systems submitted to the NIST SRE 2018. For all the speaker verification approaches, the front-end feature extraction involved the use of neural embeddings from a time delay neural network (TDNN) trained on a speaker discrimination task. These features, called x-vectors, are used in multiple ways for speaker verification task. In the first approach, the x-vectors with pre-processing and dimensionality reduction, are used with probabilistic linear discriminant analysis (PLDA) scoring. The second approach applies a speaker diarization scheme on the test segments containing multiple talkers before speaker verification scoring based on PLDA. The third system uses a local pairwise LDA model for pre-processing the x-vectors which are then scored using a Gaussian back-end. With experiments on the SRE 2018 database, we show that most of the systems achieved noticeable improvements over the NIST baseline in terms of the primary cost metric. Using a system fusion of the various approaches, we obtain significant improvements over the NIST official baseline (average relative improvements of 19.7 and 20.1 for the development and evaluation set respectively).
Item Type: | Conference Paper |
---|---|
Publication: | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc. |
Keywords: | Audio signal processing; Character recognition; Discriminant analysis; Neural networks; Speech communication; Vectors, Dimensionality reduction; Gaussians; PLDA scoring; Speaker diarization; Speaker verification, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation Division of Interdisciplinary Sciences > Computational and Data Sciences Others |
Date Deposited: | 30 Nov 2022 09:08 |
Last Modified: | 30 Nov 2022 09:08 |
URI: | https://eprints.iisc.ac.in/id/eprint/78385 |
Actions (login required)
View Item |