Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Li, Ming and Kim, Jangwon and Lamrnert, Adam and Ghosh, Prasanta Kumar and Ramanarayanan, Vikram and Narayanan, Shrikanth (2015) Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. In: COMPUTER SPEECH AND LANGUAGE, 36 . pp. 196-211.

PDF
Com_Spe_Lan_36-196_2016.pdf - Published Version
Restricted to Registered users only
Download (2MB) | Request a copy

Official URL: http://dx.doi.org/10.1016/j.csl.2015.05.003

Abstract

We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.

Item Type:	Journal Article
Publication:	COMPUTER SPEECH AND LANGUAGE
Publisher:	ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
Additional Information:	Copy right for this article belongs to the ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD, 24-28 OVAL RD, LONDON NW1 7DX, ENGLAND
Keywords:	Text independent speaker verification; Text dependent speaker verification; Speech production; Articulatory features; Acoustic-to-articulatory inversion
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	22 Jan 2016 05:18
Last Modified:	22 Jan 2016 05:18
URI:	http://eprints.iisc.ac.in/id/eprint/53129

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India