ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

PRAV: A phonetically rich audio visual corpus

Ghosh, P and Narwekar, A (2017) PRAV: A phonetically rich audio visual corpus. In: 18th Annual Conference of the International Speech Communication Association, 20 - 24 August 2017, Stockholm, pp. 3747-3751.

[img] PDF
INTERSPEECH_2017.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: http://doi.org/10.21437/Interspeech.2017-242

Abstract

This paper describes the acquisition of PRAV, a phonetically rich audio-visual Corpus. The PRAV Corpus contains audio as well as visual recordings of 2368 sentences from the TIMIT corpus each spoken by four subjects, making it the largest audiovisual corpus in the literature in terms of the number of sentences per subject. Visual features, comprising the coordinates of points along the contour of the subjects lips, have been extracted for the entire PRAV Corpus using the Active Appearance Models (AAM) algorithm and have been made available along with the audio and video recordings. The subjects being Indian makes PRAV an ideal resource for audio-visual speech study with non-native English speakers. Moreover, this paper describes how the large number of sentences per subject makes the PRAV Corpus a significant dataset by highlighting its utility in exploring a number of potential research problems including visual speech synthesis and perception studies.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: Audio recordings; Speech; Speech analysis; Speech synthesis; Video recording, Active appearance models; Audio-visual; Audio-visual corpora; Audio-visual speech; Non-native; Potential researches; Visual feature; Visual speech synthesis, Speech communication
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 27 Jul 2022 10:06
Last Modified: 27 Jul 2022 10:06
URI: https://eprints.iisc.ac.in/id/eprint/74717

Actions (login required)

View Item View Item