Tiwari, V and Hashmi, MF and Keskar, A and Shivaprakash, NC (2020) Virtual home assistant for voice based controlling and scheduling with short speech speaker identification. In: Multimedia Tools and Applications, 79 (7-8). pp. 5243-5268.
PDF
Mul_Too_App_79-7-8_5243-5268_2020.pdf - Published Version Restricted to Registered users only Download (3MB) | Request a copy |
Abstract
With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion
Item Type: | Journal Article |
---|---|
Publication: | Multimedia Tools and Applications |
Publisher: | Springer |
Additional Information: | The copyright for this article belongs to the Springer. |
Keywords: | Distributed database systems; Gaussian distribution; Internet of things; Loudspeakers; Object recognition; Principal component analysis; Speech; Vector quantization; Vectors, Cloud services; Dimensionality reduction; Gaussian Mixture Model; Indoor environment; Interface technology; Mel frequency cepstrum coefficients; Speaker identification; Speech-to-text conversion, Speech recognition |
Department/Centre: | Division of Physical & Mathematical Sciences > Instrumentation Appiled Physics |
Date Deposited: | 02 Feb 2023 09:39 |
Last Modified: | 02 Feb 2023 09:39 |
URI: | https://eprints.iisc.ac.in/id/eprint/79759 |
Actions (login required)
View Item |