ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Word level multi-script identification

Pati, Peeta B and Ramakrishnan, AG (2008) Word level multi-script identification. In: Pattern Recognition Letters . (In Press)

[img] PDF
word_level.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

We report an algorithm to identify the script of each word in a document image. We start with a bi-script scenario which is later extended to tri-script and then to eleven-script scenarios. A database of 20,000 words of different font styles and sizes has been collected and used for each script. Effectiveness of Gabor and discrete cosine transform (DCT) features has been independently evaluated using nearest neighbor, linear discriminant and support vector machines (SVM) classifiers. The combination of Gabor features with nearest neighbor or SVM classifier shows promising results; i.e., over 98% for bi-script and tri-script cases and above 89% for the eleven-script scenario.

Item Type: Journal Article
Publication: Pattern Recognition Letters
Publisher: Elsevier
Additional Information: Copyright of this article belongs to Elsevier. Online publication available. Publication details awaited.
Keywords: Gabor filter;DCT;Script identification
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 27 Mar 2008
Last Modified: 19 Sep 2010 04:43
URI: http://eprints.iisc.ac.in/id/eprint/13518

Actions (login required)

View Item View Item