ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition

Prasad, Abhay and Ghosh, Prasanta Kumar (2016) Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition. In: COMPUTER SPEECH AND LANGUAGE, 39 . pp. 108-128.

[img] PDF
Com_Spe_Lan_39_108_2016.pdf - Published Version
Restricted to Registered users only

Download (3MB) | Request a copy
Official URL: http://dx.doi.org/10.1016/j.csl.2016.03.003

Abstract

We propose an information theoretic region selection algorithm from the real time magnetic resonance imaging (rtMRI) video frames for a broad phonetic class recognition task. Representations derived from these optimal regions are used as the articulatory features for recognition. A set of connected and arbitrary shaped regions are selected such that the articulatory features computed from such regions provide maximal information about the broad phonetic classes. We also propose a tree-structured greedy region splitting algorithm to further segment these regions so that articulatory features from these split regions enhance the information about the phonetic classes. We find that some of the proposed articulatory features correlate well with the articulatory gestures from the Articulatory Phonology theory of speech production. Broad phonetic class recognition experiment using four rtMRI subjects reveals that the recognition accuracy with optimal split regions is, on average, higher than that using only acoustic features. Combining acoustic and articulatory features further reduces the error-rate by 8.25% (relative). (c) 2016 Elsevier Ltd. All rights reserved.

Item Type: Journal Article
Publication: COMPUTER SPEECH AND LANGUAGE
Publisher: ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
Additional Information: Copy right for this article belongs to the ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD, 24-28 OVAL RD, LONDON NW1 7DX, ENGLAND
Keywords: Mutual information; Phonetic recognition; Speech production; Region splitting
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 20 Jul 2016 08:49
Last Modified: 20 Jul 2016 08:49
URI: http://eprints.iisc.ac.in/id/eprint/54251

Actions (login required)

View Item View Item