ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A robust Voiced/Unvoiced phoneme classification from whispered speech using the 'color' of whispered phonemes and Deep Neural Network

Nisha Meenakshi, G and Ghosh, PK (2017) A robust Voiced/Unvoiced phoneme classification from whispered speech using the 'color' of whispered phonemes and Deep Neural Network. In: 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 - 24 August 2017, Stockholm, pp. 503-507.

[img] PDF
Pro_Ann-Con_Int-Spe_ Com-Ass_2017_503- 507_2017.pdf - Published Version
Restricted to Registered users only

Download (308kB) | Request a copy
Official URL: https://10.21437/Interspeech.2017-1388

Abstract

In this work, we propose a robust method to perform framelevel classification of voiced (V) and unvoiced (UV) phonemes from whispered speech, a challenging task due to its voiceless and noise-like nature. We hypothesize that a whispered speech spectrum can be represented as a linear combination of a set of colored noise spectra. A five-dimensional (5D) feature is computed by employing non-negative matrix factorization with a fixed basis dictionary, constructed using spectra of five colored noises. Deep Neural Network (DNN) is used as the classifier. We consider two baseline features-1) Mel Frequency Cepstral Coefficients (MFCC), 2) features computed from a data driven dictionary. Experiments reveal that the features from the colored noise dictionary perform better (on average) than that using the data driven dictionary, with a relative improvement in the average V/UV accuracy of 10:30, within, and 10:41, across, data from seven subjects. We also find that the MFCCs and 5D features carry complementary information regarding the nature of voicing decisions in whispered speech. Hence, across all subjects, we obtain a balanced frame-level V/UV classification performance, when MFCC and 5D features are combined, compared to a skewed performance when they are considered separately.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to international Speech Communication Association.
Keywords: Classification (of information); Factorization; Matrix algebra; Speech; Speech communication; Speech recognition; White noise, Classification performance; Linear combinations; Mel-frequency cepstral coefficients; Nonnegative matrix factorization; Phoneme classification; Voiced and Unvoiced whispered phonemes; Voicing decision; Whispered speech, Deep neural networks
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 22 Jul 2022 10:51
Last Modified: 22 Jul 2022 10:51
URI: https://eprints.iisc.ac.in/id/eprint/74710

Actions (login required)

View Item View Item