ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A Comparative Study of Estimating Articulatory Movements from Phoneme Sequences and Acoustic Features

Singh, A and Illa, A and Ghosh, PK (2020) A Comparative Study of Estimating Articulatory Movements from Phoneme Sequences and Acoustic Features. In: 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, 4-8, May 2020, Barcelona, Spain, pp. 7334-7338.

[img] PDF
ica_iee_int_con_aco_spe_sig_pro_pro_7334-7338_2020.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://dx.doi.org/10.1109/ICASSP40776.2020.905385...


Unlike phoneme sequences, movements of speech articulators (lips, tongue, jaw, velum) and the resultant acoustic signal are known to encode not only the linguistic message but also carry para-linguistic information. While several works exist for estimating articulatory movement from acoustic signals, little is known to what extent articulatory movements can be predicted only from linguistic information, i.e., phoneme sequence. In this work, we estimate articulatory movements from three different input representations: R1) acoustic signal, R2) phoneme sequence, R3) phoneme sequence with timing information. While an attention network is used for estimating articulatory movement in the case of R2, BLSTM network is used for R1 and R3. Experiments with ten subjects' acoustic-articulatory data reveal that the estimation techniques achieve an average correlation coefficient of 0.85, 0.81, and 0.81 in the case of R1, R2, and R3 respectively. This indicates that attention network, although uses only phoneme sequence (R2) without any timing information, results in an estimation performance similar to that using rich acoustic signal (R1), suggesting that articulatory motion is primarily driven by the linguistic message. The correlation coefficient is further improved to 0.88 when R1 and R3 are used together for estimating articulatory movements. © 2020 IEEE.

Item Type: Conference Paper
Publication: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright of this article belongs to Institute of Electrical and Electronics Engineers Inc.
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 27 Aug 2020 08:54
Last Modified: 27 Aug 2020 08:54
URI: http://eprints.iisc.ac.in/id/eprint/66377

Actions (login required)

View Item View Item