Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

Tiwari, V and Hashmi, MF and Keskar, A and Shivaprakash, NC (2020) Virtual home assistant for voice based controlling and scheduling with short speech speaker identification. In: Multimedia Tools and Applications, 79 (7-8). pp. 5243-5268.

PDF
Mul_Too_App_79-7-8_5243-5268_2020.pdf - Published Version
Restricted to Registered users only
Download (3MB) | Request a copy

Official URL: https://doi.org/10.1007/s11042-018-6358-x

Abstract

With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion

Item Type:	Journal Article
Publication:	Multimedia Tools and Applications
Publisher:	Springer
Additional Information:	The copyright for this article belongs to the Springer.
Keywords:	Distributed database systems; Gaussian distribution; Internet of things; Loudspeakers; Object recognition; Principal component analysis; Speech; Vector quantization; Vectors, Cloud services; Dimensionality reduction; Gaussian Mixture Model; Indoor environment; Interface technology; Mel frequency cepstrum coefficients; Speaker identification; Speech-to-text conversion, Speech recognition
Department/Centre:	Division of Physical & Mathematical Sciences > Instrumentation Appiled Physics
Date Deposited:	02 Feb 2023 09:39
Last Modified:	02 Feb 2023 09:39
URI:	https://eprints.iisc.ac.in/id/eprint/79759

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India