Classification of story-telling and poem recitation using head gesture of the talker

Valliappan, CA and Das, A and Ghosh, PK (2019) Classification of story-telling and poem recitation using head gesture of the talker. In: 12th International Conference on Signal Processing and Communications, SPCOM 2018, 16 - 19 July 2018, Bangalore, pp. 36-40.

PDF
SPCOM_2018.pdf - Published Version
Restricted to Registered users only
Download (370kB) | Request a copy

Official URL: https://doi.org/10.1109/SPCOM.2018.8724420

Abstract

In this work, we investigate the nature of head gestures in spontaneous speech during story-telling in comparison to that in poem recitation. We hypothesize that head gestures during poem recitation would be more repetitive and structured compared to those in case of spontaneous speech. To quantify this, we proposed a measure called degree of repetition (DoR). We also perform a story-telling vs poem recitation classification experiment using deep neural network (DNN). For the classification, both DoR as well as context dependent raw head gesture data are used. Analysis and experiments are performed using a database of 24 subjects each telling five stories and a different set of 10 subjects each reciting 20 poems, three times each, thus having data of comparable durations for story telling and poem recitation. Analysis of head gestures using DoR reveals that the DoR, on average, is higher during poem recitation compared to that during story-telling. A four-fold classification experiment between story-telling and poem recitation using DNN demonstrates that the raw head gestures result in an average classification accuracy of 85.79 and an average F-score of 89.05 while the DoR results in an average accuracy and F-score of 80.59 and 82.30 respectively indicating that the features learnt by DNN from raw head gestures are more discriminative than DoR features. While these accuracy and F-score are less than those (94.67 95.60) obtained using acoustic feature such as Mel frequency cepstral coefficients (MFCCs), raw head gestures and MFCCs together yield a higher average accuracy (98.62) and F-score (98.92), indicating that the head gestures are complementary to the acoustic features for the classification task.

Item Type:	Conference Paper
Publication:	SPCOM 2018 - 12th International Conference on Signal Processing and Communications
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords:	Deep neural networks, Acoustic features; Classification accuracy; Classification tasks; Context dependent; Head gestures; Mel-frequency cepstral coefficients; Spontaneous speech; Story telling, Signal processing
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	08 Aug 2022 06:06
Last Modified:	08 Aug 2022 06:06
URI:	https://eprints.iisc.ac.in/id/eprint/75484

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India