Kumar, Deepak and Ramakrishnan, AG (2012) OTCYMIST: Otsu-Canny minimal spanning tree for born-digital images. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), 27-29 March 2012, Gold Cost, QLD.
PDF
IAPR_Int_Wor_Doc_Ana_Sys_389_2012.pdf - Published Version Restricted to Registered users only Download (400kB) | Request a copy |
Abstract
Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC's to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.
Item Type: | Conference Paper |
---|---|
Publisher: | IEEE |
Additional Information: | Copyright of this article belongs to IEEE. |
Keywords: | Binarization; Edge Detection; Minimum spanning Tree; Text Segmentation; Text Localization |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 02 Jul 2013 07:47 |
Last Modified: | 02 Jul 2013 07:47 |
URI: | http://eprints.iisc.ac.in/id/eprint/46544 |
Actions (login required)
View Item |