ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Script identification in printed bilingual documents

Dhanya, D and Ramakrishnan, AG and Pati, Peeta Basa (2002) Script identification in printed bilingual documents. In: Sadhana, 27 (1). pp. 73-82.

[img]
Preview
PDF
script.pdf

Download (767kB)

Abstract

Identification of the script of the text in multi-script documents is one of the important steps in the design of an OCR system for the analysis and recognition of the page. Much work has already been reported in this area relating to Roman, Arabic, Chinese, Korean and Japanese scripts. In the Indian context, though some results have been reported, the task is still at its infancy. In the work presented in this paper, a successful attempt has been made to identify the script, at the word level, in a bilingual document containing Roman and Tamil scripts. Two different approaches have been proposed and thoroughly tested. In the first method, words are divided into three distinct spatial zones. The spatial spread of a word in upper and lower zones, together with the character density, is used to identify the script. The second technique analyses the directional energy distribution of a word using Gabor filters with suitable frequencies and orientations. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results are quite encouraging.

Item Type: Journal Article
Publication: Sadhana
Publisher: Indian Academy of Sciences
Additional Information: Copyright of this article belongs to Indian Academy of Sciences.
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 11 Jun 2006
Last Modified: 19 Sep 2010 04:29
URI: http://eprints.iisc.ac.in/id/eprint/7580

Actions (login required)

View Item View Item