ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Streaming model for Acoustic to Articulatory Inversion with transformer networks

Udupa, S and Illa, A and Ghosh, PK (2022) Streaming model for Acoustic to Articulatory Inversion with transformer networks. In: 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, 18 - 22 September 2022, Incheon, pp. 625-629.

[img] PDF
INTERSPEECH_2022.pdf - Published Version
Restricted to Registered users only

Download (666kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2022-10159

Abstract

Estimating speech articulatory movements from speech acoustics is known as Acoustic to Articulatory Inversion (AAI). Recently, transformer-based AAI models have been shown to achieve state-of-art performance. However, in transformer networks, the attention is applied over the whole utterance, thereby needing to obtain the full utterance before the inference, which leads to high latency and is impractical for streaming AAI. To enable streaming during inference, evaluation could be performed on non-overlapping chucks instead of a full utterance. However, due to a mismatch of the attention receptive field during training and evaluation, there could be a drop in AAI performance. To overcome this scenario, in this work we perform experiments with different attention masks and use context from previous predictions during training. Experiments results revealed that using the random start mask attention with the context from previous predictions of transformer decoder performs better than the baseline results.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: Speech communication, Acoustics to articulatory inversion; Articulatory inversion; Inversion models; Performance; Receptive fields; Speech acoustics; State-of-art performance; Streaming model; Streaming prediction; Transformer network, Forecasting
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 10 Nov 2022 06:16
Last Modified: 10 Nov 2022 06:16
URI: https://eprints.iisc.ac.in/id/eprint/77854

Actions (login required)

View Item View Item