ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Pitch prediction from Mel-frequency cepstral coefficients using sparse spectrum recovery

Rao, M V Achuth and Ghosh, Prasanta Kumar (2017) Pitch prediction from Mel-frequency cepstral coefficients using sparse spectrum recovery. In: 23rd National Conference on Communications, NCC 2017, 02-04 March 2017, Chennai, India, pp. 1-6.

[img] PDF
IEEE_NCC_2017.pdf - Published Version
Restricted to Registered users only

Download (194kB) | Request a copy
Official URL: https://doi.org/10.1109/NCC.2017.8077130


This work proposes a technique for predicting the pitch from Mel-frequency cepstral coefficients (MFCC) vectors. Previous pitch prediction methods are based on the statistical models such as Gaussian mixture models and hidden Markov models. In this paper, we propose a three-step method to estimate pitch from MFCC vectors. First the Mel-filterbank energies (MFBEs) are estimated from MFCC vectors. Secondly, we propose a novel method to estimate the spectrum from MFBE that exploits the sparse nature of the voiced speech spectrum. Finally, the pitch is estimated from the recovered spectrum. We also explore the effect of different levels of truncation of the discrete cosine transformation (DCT) coefficients in MFCC computation on the pitch prediction error. We use the deep neutral network (DNN) based predictor as baseline to predict the pitch from MFCC vectors. The experiments using CMU-ARCTIC and KEELE database show that the proposed three-step method generalizes better across databases and genders resulting in a drop of ∼8Hz and ∼5Hz in average RMSE of predicted pitch with respect to those from DNN when 13-dimensional and 26-dimensional MFCC vectors are used for pitch prediction respectively. We also find that the sparsity constraint performs better in recovering the spectrum at lower pitch values.

Item Type: Conference Paper
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The Copyright of this article belongs to the Institute of Electrical and Electronics Engineers Inc.
Keywords: Deep neural networks; Discrete cosine transforms; Forecasting; Hidden Markov models; Image coding; Markov processes; Recovery; Trellis codes; Vectors; Discrete cosine transformation; Gaussian Mixture Model; Mel frequency cepstral co-efficient; Mel-frequency cepstral coefficients; Prediction errors; Prediction methods; Sparse spectrums; Sparsity constraints; Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 13 Jun 2022 06:48
Last Modified: 13 Jun 2022 06:48
URI: https://eprints.iisc.ac.in/id/eprint/73313

Actions (login required)

View Item View Item