ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help


Naini, AR and Singhal, B and Ghosh, PK (2022) DUAL ATTENTION POOLING NETWORK FOR RECORDING DEVICE CLASSIFICATION USING NEUTRAL AND WHISPERED SPEECH. In: 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 - 27 May 2022, Virtual, Online at Singapore, pp. 8487-8491.

IEEE_ICASSP 2022_2022_8487-8491_2022.pdf - Published Version

Download (951kB) | Preview
Official URL: https://doi.org/10.1109/ICASSP43922.2022.9747700


In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.

Item Type: Conference Paper
Publication: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to the Authors.
Keywords: Audio recordings; Audio signal processing; Speech communication, Convolutional neural network; Device classifications; Dual attention pooling network; Neural network method; Performance; Recording devices; Speech recording; Speech signals; Voiced/unvoiced speech; Whispered speech, Convolutional neural networks
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 05 Aug 2022 09:00
Last Modified: 05 Aug 2022 09:03
URI: https://eprints.iisc.ac.in/id/eprint/75353

Actions (login required)

View Item View Item