A Comparative Study of Articulatory Features From Facial Video and Acoustic-To-Articulatory Inversion for Phonetic Discrimination

Narwekar, Abhishek and Ghosh, Prasanta Kumar (2016) A Comparative Study of Articulatory Features From Facial Video and Acoustic-To-Articulatory Inversion for Phonetic Discrimination. In: 11th International Conference on Signal Processing and Communications (SPCOM), JUN 12-15, 2016, Indian Inst Sci, Banglore, INDIA.

PDF
2016_Int_Con_SPCOM.pdf - Published Version
Restricted to Registered users only
Download (387kB) | Request a copy

Official URL: http://dx.doi.org/ 10.1109/SPCOM.2016.7746670

Abstract

Several studies in the past have shown that the features based on the kinematics of speech articulators improve the phonetic recognition accuracy when combined with the acoustic features. It is also known that the audio-visual speech recognition performance is better than that of the audio-only recognition, which, in turn, indicates that the information from the visible articulators is complementary to that provided by the acoustic features. Typically, visible articulators can be extracted directly from a facial video. On the other hand, the speech articulators are recorded using electromagnetic articulography (EMA), which requires highly specialized equipment. Thus, the latter is not directly available in practice and hence usually estimated from speech using acoustic-to-articulatory inversion. In this work, we compare the information provided by the visible and the estimated articulators about different phonetic classes when used with and without acoustic features. The information provided by different visible, articulatory, acoustic and combined features is quantified by the mutual information (MI). For this study, we have created a large phonetically rich audio-visual (PRAV) dataset comprising of 9000 TIMIT sentences spoken by four subjects. Experiments using PRAV corpus reveal that the articulatory features estimated by inversion are more informative than the visible features but less informative than the acoustic features. This suggests that the advantage of visible articulatory features in recognition could be achieved by recovering them from the acoustic signal itself.

Item Type:	Conference Proceedings
Additional Information:	Copy right for this article belongs to the IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	31 Jan 2017 05:32
Last Modified:	31 Jan 2017 05:32
URI:	http://eprints.iisc.ac.in/id/eprint/56153

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India