Mannem, R and Hima Jyothi, R and Illa, A and Ghosh, PK (2020) Speech rate task-specific representation learning from acoustic-articulatory data. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 25-29 October 2020, Shanghai; China, pp. 2892-2896.
PDF
Pro-Ann-Con-2020-2892-2896.pdf - Published Version Restricted to Registered users only Download (275kB) | Request a copy |
Abstract
In this work, speech rate is estimated using the task-specific representations which are learned from the acoustic-articulatory data, in contrast to generic representations which may not be optimal for the speech rate estimation. 1-D convolutional filters are used to learn speech rate specific acoustic representations from the raw speech. A convolutional dense neural network (CDNN) is used to estimate the speech rate from the learned representations. In practice, articulatory data is not directly available; thus, we use Acoustic-to-Articulatory Inversion (AAI) to derive the articulatory representations from acoustics. However, these pseudo-articulatory representations are also generic and not optimized for any task. To learn the speech-rate specific pseudo-articulatory representations, we propose a joint training of BLSTM-based AAI and CDNN using a weighted loss function that considers the losses corresponding to speech rate estimation and articulatory prediction. The proposed model yields an improvement in speech rate estimation by ~18.5 in terms of pearson correlation coefficient (CC) compared to the baseline CDNN model with generic articulatory representations as inputs. To utilize complementary information from articulatory features, we further perform experiments by concatenating task-specific acoustic and pseudo-articulatory representations, which yield an improvement in CC by ~2.5 compared to the baseline CDNN model. © 2020 ISCA
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher: | International Speech Communication Association |
Additional Information: | cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507 |
Keywords: | Convolution; Convolutional neural networks; Correlation methods; Speech, Articulatory data; Articulatory features; Articulatory inversion; Generic representation; Model yields; Pearson correlation coefficients; Speech rates; Weighted loss function, Speech communication |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 12 Jan 2021 05:41 |
Last Modified: | 12 Jan 2021 05:41 |
URI: | http://eprints.iisc.ac.in/id/eprint/67640 |
Actions (login required)
View Item |