ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Automatic Gender Classification Using the Mel Frequency Cepstrum of Neutral and Whispered Speech: a Comparative Study

Meenakshi, Nisha G and Ghosh, Prasanta Kumar (2015) Automatic Gender Classification Using the Mel Frequency Cepstrum of Neutral and Whispered Speech: a Comparative Study. In: 21st National Conference on Communications (NCC), FEB 27-MAR 01, 2015, Indian Inst Technol, Bombay, INDIA.

[img] PDF
Twe_Fir_NCC_2015.pdf - Published Version
Restricted to Registered users only

Download (835kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/NCC.2015.7084886

Abstract

A whispered speech resembles an unvoiced speech due to the lack of vocal fold vibration unlike the neutral speech. Since information about the gender of a speaker typically lies in the pitch resulted from the vocal fold vibration (or source signal), identifying gender from the whispered speech is more challenging compared to that from the neutral speech. In the absence of the pitch, we study the use of the vocal tract filter captured through the spectral envelope for automatic gender classification (AGC) from a whispered speech. The spectral envelope is represented by the Mel frequency cepstral coefficients (MFCCs). We also compare the AGC performance from the neutral speech using only MFCCs with that from the whispered speech. AGC experiment using a set of 33 sentences spoken in neutral and whispered mode by 16 female and 20 male speakers reveals that the AGC accuracy using the neutral speech is, on average, higher (4.5 % absolute) than that using the whispered speech when only the spectral shape information is used. This is true even when we use a subset of MFCCs obtained by a forward cepstral coefficient selection algorithm. However, the AGC accuracy obtained using the MFCC of the neutral speech is found to be 2.8 3 % (absolute) lower compared to that using pitch. These findings not only suggest that there is gender specific information in the spectral shape but also indicate that the spectral shape carries less gender specific information when a speaker whispers as opposed to speaking normally.

Item Type: Conference Proceedings
Series.: National Conference on Communications NCC
Additional Information: Copy right for this article belongs to the IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 24 Aug 2016 10:37
Last Modified: 24 Aug 2016 10:37
URI: http://eprints.iisc.ac.in/id/eprint/54566

Actions (login required)

View Item View Item