ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Phoneme state posteriorgram features for speech based automatic classification of speakers in cold and healthy condition

Suresh, AK and Srinivasa Raghavan, KM and Ghosh, PK (2017) Phoneme state posteriorgram features for speech based automatic classification of speakers in cold and healthy condition. In: 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 - 24 August 2017, Stockholm, pp. 3462-3466.

[img] PDF
INTERSPEECH 2017_2017_3462-3466_2017.pdf - Published Version
Restricted to Registered users only

Download (276kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2017-1550

Abstract

We consider the problem of automatically detecting if a speaker is suffering from common cold from his/her speech. When a speaker has symptoms of cold, his/her voice quality changes compared to the normal one. We hypothesize that such a change in voice quality could be reflected in lower likelihoods from a model built using normal speech. In order to capture this, we compute a 120-dimensional posteriorgram feature in each frame using Gaussian mixture model from 120 states of 40 three-states phonetic hidden Markov models trained on approximately 16.4 hours of normal English speech. Finally, a fixed 5160-dimensional phoneme state posteriorgram (PSP) feature vector for each utterance is obtained by computing statistics from the posteriorgram feature trajectory. Experiments on the 2017-Cold sub-challenge data show that when the decisions from bag-of-Audio-words (BoAW) and end-To-end (e2e) are combined with those from PSP features with unweighted majority rule, the UAR on the development set becomes 69 which is 2.9 (absolute) better than the best of the UARs obtained by the baseline schemes. When the decisions from ComParE, BoAW and PSP features are combined with simple majority rule, it results in a UAR of 68.52 on the test set.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to the International Speech Communication Association.
Keywords: Decision theory; Gaussian distribution; Hidden Markov models; Human computer interaction; Linguistics; Markov processes; Speech; Speech communication; Trellis codes, Automatic classification; Feature vectors; Gaussian Mixture Model; Majority rule; Paralinguistics; Posteriorgram; Simple majority; Voice quality, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 22 Jul 2022 10:52
Last Modified: 22 Jul 2022 10:52
URI: https://eprints.iisc.ac.in/id/eprint/74711

Actions (login required)

View Item View Item