ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A font and size-independent OCR system for printed Kannada documents using support vector machines

Ashwin, TN and Sastry, PS (2002) A font and size-independent OCR system for printed Kannada documents using support vector machines. In: Sadhana, 27 (1). pp. 35-58.

[img]
Preview
PDF
a_font.pdf

Download (602kB)

Abstract

This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extractswords from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.

Item Type: Journal Article
Publication: Sadhana
Publisher: Indian Academy of Sciences
Additional Information: Copyright of this article belongs to Indian Academy of Sciences.
Keywords: OCR;pattern recognition;support vector machines;Kannada script
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 09 Jun 2006
Last Modified: 19 Sep 2010 04:29
URI: http://eprints.iisc.ac.in/id/eprint/7573

Actions (login required)

View Item View Item