ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help


Desai, Urvish and Yarra, Chiranjeevi and Ghosh, Prasanta Kumar (2018) CONCATENATIVE ARTICULATORY VIDEO SYNTHESIS USING REAL-TIME MRI DATA FOR SPOKEN LANGUAGE TRAINING. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), APR 15-20, 2018, Calgary, CANADA, pp. 4999-5003.

[img] PDF
Ieee_Int_Con_Aco_Spe_Sig_Pro_4999_2018.pdf - Published Version
Restricted to Registered users only

Download (497kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/ICASSP.2018.8462401


Spoken language training benefits from showing a video of native speakers' articulatory movements to train the second language learners. Typically, the articulatory video is prepared in conjunction with the audio which is collected simultaneously with the articulatory recording. Articulatory video recording requires specialized equipment and, hence, is expensive and time consuming. In this work, we propose a concatenative synthesis approach to obtain articulatory videos for an audio, which may not have a simultaneous articulatory recording. In the training stage of the proposed approach, we make a repository for phoneme specific articulatory image sequence from the available articulatory video. During testing, image sequences are selected from this repository to ensure a smooth transition across phonetic events. The selected image sequences are finally stitched to synthesize the articulatory video for the test audio. Articulatory videos are synthesized for 50 words randomly selected from the MRI-TIMIT database, not seen in the training data. Subjective evaluation on the quality of the synthesized videos using twelve subjects suggests that the videos are close to the original ones with a rating of 3.78 out of 5, where a score of 5 (1) indicates that there is no (great) difference in quality between the original and the synthesized videos.

Item Type: Conference Proceedings
Publisher: IEEE
Additional Information: Copy right for this article belong to IEEE
Keywords: Articulatory video synthesis; spoken language training; concatenative synthesis; real-time MRI videos
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 25 Oct 2018 14:30
Last Modified: 25 Oct 2018 14:30
URI: http://eprints.iisc.ac.in/id/eprint/60960

Actions (login required)

View Item View Item