ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Pitch-synchronous discrete cosine transform features for speaker identification and verification

Meghanani, A and Ramakrishnan, AG (2020) Pitch-synchronous discrete cosine transform features for speaker identification and verification. In: ICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods 2020, 22 February 2020 through 24 February 2020, Valletta, Malta, pp. 395-401.

[img] PDF
ICPRAM_395-401_2020.pdf - Published Version
Restricted to Registered users only

Download (375kB) | Request a copy
Official URL: https://www.scopus.com/record/display.uri?eid=2-s2...

Abstract

We propose a feature called pitch-synchronous discrete cosine transform (PS-DCT), derived from the voiced part of the speech for speaker identification (SID) and verification (SV) tasks. PS-DCT features are derived from the �time-domain, quasi-stationary waveform shape� of the voiced sounds. We test our PS-DCT feature on TIMIT, Mandarin and YOHO datasets. On TIMIT with 168 and Mandarin with 855 speakers, we obtain the SID accuracies of 99.4 and 96.1, respectively, using a Gaussian mixture model-based classifier. In the i-vector-based SV framework, fusing the �PS-DCT based system� with the �MFCC-based system� at the score level reduces the equal error rate (EER) for both YOHO and Mandarin datasets. In the case of limited test data and session variabilities, we obtain a significant reduction in EER, up to 5.8 (for test data of duration < 3 sec). Copyright © 2020 by SCITEPRESS � Science and Technology Publications, Lda. All rights reserved.

Item Type: Conference Paper
Publication: ICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods
Publisher: SciTePress
Additional Information: cited By 0; Conference of 9th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2020 ; Conference Date: 22 February 2020 Through 24 February 2020; Conference Code:158680
Keywords: Continuous speech recognition; Gaussian distribution; Loudspeakers; Time domain analysis, Equal error rate; Gaussian Mixture Model; MFCC; Pitch synchronous; Quasi-stationary; Speaker identification; Speaker verification; Waveform shape, Discrete cosine transforms
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 29 Sep 2020 11:11
Last Modified: 29 Sep 2020 11:11
URI: http://eprints.iisc.ac.in/id/eprint/65176

Actions (login required)

View Item View Item