Saikrishna, Pedamalli and Ramakrishnan, A G (2013) Script Independent Detection of Bold Words in Multi Font-size Documents. In: FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), DEC 18-21, 2013, Jodhpur, INDIA.
PDF
Fou_Nat_Con_2013.pdf - Published Version Restricted to Registered users only Download (440kB) | Request a copy |
Abstract
A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the font size and characteristics such as bold, and italics in the OCR recognized document. In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information. Rough estimation of the stroke widths of characters in each sub-image is obtained from the density. Each sub-image is then opened with a square structuring element of size determined by the respective stroke width. The union of all the opened sub-images is used to determine the locations of the bold words. Extracting all such words from the binarized image gives the final image. A minimum of 98 % of bold words were detected from a total of 65 Tamil, Kannada and English pages and the false alarm rate is less than 0.4 %.
Item Type: | Conference Proceedings |
---|---|
Series.: | National Conference on Computer Vision Pattern Recognition Image Processing and Graphics |
Publisher: | IEEE |
Additional Information: | Copy right for this article belongs to the IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 25 Aug 2016 10:35 |
Last Modified: | 25 Aug 2016 10:35 |
URI: | http://eprints.iisc.ac.in/id/eprint/54312 |
Actions (login required)
View Item |