ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Air-tissue boundary segmentation in real time magnetic resonance imaging video using 3-d convolutional neural network

Mannem, R and Gaddam, N and Ghosh, PK (2020) Air-tissue boundary segmentation in real time magnetic resonance imaging video using 3-d convolutional neural network. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 25 October 2020 through 29 October 2020, Shanghai; China, pp. 1396-1400.

[img] PDF
INTERSPEECH-2020-2020-1396-1400.pdf - Published Version
Restricted to Registered users only

Download (617kB) | Request a copy
Official URL: https://dx.doi.org/10.21437/Interspeech.2020-2241

Abstract

The real-time Magnetic Resonance Imaging (rtMRI) is often used for speech production research as it captures the complete view of the vocal tract during speech. Air-tissue boundaries (ATBs) are the contours that trace the transition between high-intensity tissue region and low-intensity airway cavity region in an rtMRI video. The ATBs are used in several speech related applications. However, the ATB segmentation is a challenging task as the rtMRI frames have low resolution and low signal-to-noise ratio. Several works have been proposed in the past for ATB segmentation. Among these, the supervised algorithms have been shown to perform well compared to the unsupervised algorithms. However, the supervised algorithms have limited generalizability towards subjects not involved in training. In this work, we propose a 3-dimensional convolutional neural network (3D-CNN) which utilizes both spatial and temporal information from the rtMRI video for accurate ATB segmentation. The 3D-CNN model captures the vocal tract dynamics in an rtMRI video independent of the morphology of the subject leading to an accurate ATB segmentation for unseen subjects. In a leave-one-subject-out experimental setup, it is observed that the proposed approach provides ~32 relative improvement in the performance compared to the best (SegNet based) baseline approach. © 2020 ISCA

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507
Keywords: 3D modeling; Convolution; Magnetic resonance imaging; Natural language processing systems; Signal to noise ratio; Speech communication; Tissue, High intensity; Low resolution; Low signal-to-noise ratio; Speech production; Supervised algorithm; Temporal information; Tissue boundary; Unsupervised algorithms, Convolutional neural networks
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 11 Jan 2021 09:23
Last Modified: 11 Jan 2021 09:23
URI: http://eprints.iisc.ac.in/id/eprint/67648

Actions (login required)

View Item View Item