ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Improved recognition of aged Kannada documents by effective segmentation of merged characters

Madhavaraj, A and Ramakrishnan, AG and Kumar, Shiva HR and Bhat, Nagaraj (2014) Improved recognition of aged Kannada documents by effective segmentation of merged characters. In: International Conference on Signal Processing and Communications (SPCOM), JUL 22-25, 2014, Banaglore, INDIA.

[img] PDF
Int_Com_Sig_Pro_Com_2014.pdf - Published Version
Restricted to Registered users only

Download (406kB) | Request a copy
Official URL: http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp...


In optical character recognition of very old books, the recognition accuracy drops mainly due to the merging or breaking of characters. In this paper, we propose the first algorithm to segment merged Kannada characters by using a hypothesis to select the positions to be cut. This method searches for the best possible positions to segment, by taking into account the support vector machine classifier's recognition score and the validity of the aspect ratio (width to height ratio) of the segments between every pair of cut positions. The hypothesis to select the cut position is based on the fact that a concave surface exists above and below the touching portion. These concave surfaces are noted down by tracing the valleys in the top contour of the image and similarly doing it for the image rotated upside-down. The cut positions are then derived as closely matching valleys of the original and the rotated images. Our proposed segmentation algorithm works well for different font styles, shapes and sizes better than the existing vertical projection profile based segmentation. The proposed algorithm has been tested on 1125 different word images, each containing multiple merged characters, from an old Kannada book and 89.6% correct segmentation is achieved and the character recognition accuracy of merged words is 91.2%. A few points of merge are still missed due to the absence of a matched valley due to the specific shapes of the particular characters meeting at the merges.

Item Type: Conference Proceedings
Publisher: IEEE
Additional Information: Copy right for this article belongs to the IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Keywords: optical character recognition; aspect ratio; merged character segmentation; recognition based segmentation; support vector machine; recognition score; OCR; Kannada; matched valleys; segmentation path; vertical projection profile
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 30 Dec 2015 06:07
Last Modified: 30 Dec 2015 06:07
URI: http://eprints.iisc.ac.in/id/eprint/52979

Actions (login required)

View Item View Item