ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training

Yarra, Chandana SChiranjeevi and Aggarwal, Ritu and Mittal, Sanjeev Kumar and Kausthubha, NK and Raseena, KT and Singh, Astha and Ghosh, Prasanta Kumar (2018) Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training. In: 19th Annual Conference of the International Speech Communication, 2-6, September 2018, Hyderabad International Convention Centre (HICC)Hyderabad, pp. 3127-3131.

[img] PDF
int_sep_3127-3131_2018.pdf - Published Version
Restricted to Registered users only

Download (875kB) | Request a copy
Official URL: https://dx.doi.org/10.21437/Interspeech.2018-1570

Abstract

For the benefit of spoken language training, concatenation based articulatory video synthesis has been proposed in the past to overcome the limitation in the articulatory data recording. For this, real time magnetic resonance imaging (rt-MRI) video image-frames (IFs) containing articulatory movements have been used. These IFs require a visual augmentation for better understanding. We, in this work, propose an augmentation method using pixel intensities in the regions enclosed by the articulatory boundaries obtained from air-tissue boundaries (ATBs). Since, the pixel intensities reflect the muscle movements in the articulators, the augmented IFs could provide realistic articulatory movements, when we color them accordingly. However, the ATB manual annotation is time consuming; hence, we propose to synthesize ATBs using the ATBs from a few selected frames that have been used in synthesizing the articulatory videos. We augment a set of synthesized articulatory videos for 50 words obtained from the MRI-TIMIT database. Subjective evaluation on the quality of the augmented videos using twenty-one subjects suggests that the videos are visually more appealing than the respective synthesized rt-MRI videos with a rating of 3.75 out of 5, where a score of 5 (1) indicates that the augmented video quality is excellent (poor).

Item Type: Conference Proceedings
Series.: Interspeech
Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC
Additional Information: 19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, AUG 02-SEP 06, 2018
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 13 Aug 2020 06:17
Last Modified: 13 Aug 2020 06:17
URI: http://eprints.iisc.ac.in/id/eprint/62932

Actions (login required)

View Item View Item