ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Speech rate estimation using representations learned from speech with convolutional neural network

Mannem, R and Jyothi, H and Illa, A and Ghosh, PK (2020) Speech rate estimation using representations learned from speech with convolutional neural network. In: SPCOM 2020 - International Conference on Signal Processing and Communications, 19 July - 24 July 2020, Bangalore.

[img] PDF
Int_con_Sig_2020.pdf - Published Version
Restricted to Registered users only

Download (631kB) | Request a copy
Official URL: https://doi.org/10.1109/SPCOM50965.2020.9179502

Abstract

With advancement in machine learning techniques, several speech related applications deploy end-to-end models to learn relevant features from the raw speech signal. In this work, we focus on the speech rate estimation task using an end-to-end model to learn representation from raw speech in a data driven manner. We propose an end-to-end model that comprises of 1-d convolutional layer to extract representations from raw speech and a convolutional dense neural network (CDNN) to predict speech rate from these representations. The primary aim of the work is to understand the nature of representations learned by end-to-end model for the speech rate estimation task. Experiments are performed using TIMIT corpus, in seen and unseen subject conditions. Experimental results reveal that, the frequency response of the learned 1-d CNN filters are low-pass in nature, and center frequencies of majority of the filters lie below 1000Hz. While comparing the performance of the proposed end-to-end system with the baseline MFCC based approach, we find that the performance of the learned features with CNN are on par with MFCC.

Item Type: Conference Proceedings
Publication: SPCOM 2020 - International Conference on Signal Processing and Communications
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Convolution; Convolutional neural networks; Frequency response; Learning systems; Multilayer neural networks; Signal processing, Center frequency; Data driven; End-to-end models; End-to-end systems; Machine learning techniques; Relevant features; Speech rates; Speech signals, Speech
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 06 Feb 2023 06:30
Last Modified: 06 Feb 2023 06:30
URI: https://eprints.iisc.ac.in/id/eprint/79851

Actions (login required)

View Item View Item