ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Script Independent Detection of Bold Words in Multi Font-size Documents

Saikrishna, Pedamalli and Ramakrishnan, A G (2013) Script Independent Detection of Bold Words in Multi Font-size Documents. In: FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), DEC 18-21, 2013, Jodhpur, INDIA.

[img] PDF
Fou_Nat_Con_2013.pdf - Published Version
Restricted to Registered users only

Download (440kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/NCVPRIPG.2013.6776180

Abstract

A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the font size and characteristics such as bold, and italics in the OCR recognized document. In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information. Rough estimation of the stroke widths of characters in each sub-image is obtained from the density. Each sub-image is then opened with a square structuring element of size determined by the respective stroke width. The union of all the opened sub-images is used to determine the locations of the bold words. Extracting all such words from the binarized image gives the final image. A minimum of 98 % of bold words were detected from a total of 65 Tamil, Kannada and English pages and the false alarm rate is less than 0.4 %.

Item Type: Conference Proceedings
Additional Information: Copy right for this article belongs to the IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Depositing User: Id for Latest eprints
Date Deposited: 25 Aug 2016 10:35
Last Modified: 25 Aug 2016 10:35
URI: http://eprints.iisc.ac.in/id/eprint/54312

Actions (login required)

View Item View Item