Illa, A and Ghosh, PK (2020) Closed-set speaker conditioned acoustic-to-articulatory inversion using bi-directional long short term memory network. In: Journal of the Acoustical Society of America, 147 (2). EL171-EL176.
|
PDF
jou_aco_soc_147-2_EL171-EL176_2020.pdf - Published Version Download (1MB) | Preview |
Abstract
Estimating articulatory movements from speech acoustic representations is known as acoustic-to-articulatory inversion (AAI). In this work, a speaker conditioned AAI (SC AAI) is proposed using a bi-directional LSTM neural network, where training is performed by pooling acoustic-articulatory data from multiple speakers along with their corresponding speaker identity information. For this work, 7.24 h of multi-speaker acoustic-articulatory data are collected from 20 speakers speaking 460 English sentences. Experiments with 20 speakers indicate that the SC AAI model performs better than SD AAI model with an improvement of correlation coefficient by 0.036 (absolute) between the original and estimated articulatory movements.
Item Type: | Journal Article |
---|---|
Publication: | Journal of the Acoustical Society of America |
Publisher: | Acoustical Society of America |
Additional Information: | The copyright for this article belongs to the Authors. |
Keywords: | Motion estimation, Articulatory data; Articulatory inversion; Bi-directional; Correlation coefficient; English sentences; Identity information; Short term memory; Speech acoustics, Long short-term memory, article; correlation coefficient; human; human experiment; long short term memory network; speech |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 24 Jan 2023 11:39 |
Last Modified: | 24 Jan 2023 11:39 |
URI: | https://eprints.iisc.ac.in/id/eprint/79448 |
Actions (login required)
View Item |