ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Automatic glottis detection and segmentation in stroboscopic videos using convolutional networks

Degala, D and Achuth Rao, MV and Krishnamurthy, R and Gopikishore, P and Priyadharshini, V and Prakash, TK and Ghosh, PK (2020) Automatic glottis detection and segmentation in stroboscopic videos using convolutional networks. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 25 October 2020 through 29 October 2020, Shanghai; China, pp. 4801-4805.

[img] PDF
INTERSPEECH-Vol-2020-4801-4805.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://dx.doi.org/10.21437/Interspeech.2020-2599

Abstract

Laryngeal videostroboscopy is widely used for the analysis of glottal vibration patterns. This analysis plays a crucial role in the diagnosis of voice disorders. It is essential to study these patterns using automatic glottis segmentation methods to avoid subjectiveness in diagnosis. Glottis detection is an essential step before glottis segmentation. This paper considers the problem of automatic glottis segmentation using U-Net based deep convolutional networks. For accurate glottis detection, we train a fully convolutional network with a large amount of glottal and non-glottal images. In glottis segmentation, we consider U-Net with three different weight initialization schemes: 1) Random weight Initialization (RI), 2) Detection Network weight Initialization (DNI) and 3) Detection Network encoder frozen weight Initialization (DNIFr), using two different architectures: 1) U-Net without skip connection (UWSC) 2) U-Net with skip connection (USC). Experiments with 22 subjects' data reveal that the performance of glottis segmentation network can be increased by initializing its weights using those of the glottis detection network. Among all schemes, when DNI is used, the USC yields an average localization accuracy of 81.3 and a Dice score of 0.73, which are better than those from the baseline approach by 15.87 and 0.07 (absolute), respectively. Copyright © 2020 ISCA

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Additional Information: cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507
Keywords: Convolution; Network coding; Speech communication; Vibration analysis, Convolutional networks; Detection networks; Diagnosis of voice disorders; Large amounts; Localization accuracy; Segmentation methods; Vibration pattern; Weight initialization, Convolutional neural networks
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 12 Jan 2021 11:20
Last Modified: 12 Jan 2021 11:20
URI: http://eprints.iisc.ac.in/id/eprint/67637

Actions (login required)

View Item View Item