ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation

Takahashi, Naoya and Agrawal, Purvi and Goswami, Nabarun and Mitsufuji, Yuki (2018) PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation. In: 19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, AUG 02-SEP 06, 2018, August 2-September 6, 2018, Hyderabad, India, pp. 2713-2717.

[img] PDF
interspeech_2018.pdf - Published Version
Restricted to Registered users only

Download (882kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2018-1773

Abstract

Previous research on audio source separation based on deep neural networks (DNNs) mainly focuses on estimating the magnitude spectrum of target sources and typically, phase of the mixture signal is combined with the estimated magnitude spectra in an ad-hoc way. Although recovering target phase is assumed to be important for the improvement of separation quality, it can be difficult to handle the periodic nature of the phase with the regression approach. Unwrapping phase is one way to eliminate the phase discontinuity, however, it increases the range of value along with the times of unwrapping, making it difficult for DNNs to model. To overcome this difficulty, we propose to treat the phase estimation problem as a classification problem by discretizing phase values and assigning class indices to them. Experimental results show that our classification based approach 1) successfully recovers the phase of the target source in the discretized domain, 2) improves signal-to distortion ratio (SDR) over the regression-based approach in both speech enhancement task and music source separation (MSS) task, and 3) outperforms state-of-the-art MSS.

Item Type: Conference Proceedings
Series.: Interspeech
Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC
Additional Information: 19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, AUG 02-SEP 06, 2018
Keywords: phase modeling;quantized phase;deep neural networks
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 09 Jun 2020 05:42
Last Modified: 09 Jun 2020 05:42
URI: http://eprints.iisc.ac.in/id/eprint/62926

Actions (login required)

View Item View Item