ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images

Raju, Sabari S and Pati, Peeta Basa and Ramakrishnan, AG (2004) Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images. In: First International Workshop on Document Image Analysis for Libraries (DIAL 2004), Jan 23-24, 2004, Palo Alto, California, USA, pp. 233-243.

[img]
Preview
PDF
Gabor_Filter_Based_Block_Engery_Analy...pdf

Download (6MB)

Abstract

Extraction of text areas is a necessary first step for taking a complex document image for character recognition task. In digital libraries, such OCR'ed text facilitates access to the image of document page through keyword search. Gabor filters, known to be simulating certain characteristics of the Human Visual System (HVS), have been employed for this task by a large number of scientists, in scanned document images.Adapting such a scheme for camera based document images is a relatively new approach. Moreover, design of the appropriate filters to separate text areas, which are assumed to be rich in high frequency components, from non-text areas is a difficult task. The difficulty increases if the clutter is also rich in high frequency components. Other reported works, on separating text from non-text areas, have used geometrical/structural information like shape and size of the regions in binarized document images.In this work, we have used a combination of the above mentioned approaches for the purpose. We have used connected component analysis (CCA), in binarized images, to segment non-text areas based on the size information of the connected regions. A Gabor function based filter bank is used to separate the text and the non-text areas of comparable size. The technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.Key Words: Gabor filter, connected component analysis, document image, multi-channel filtering.

Item Type: Conference Paper
Publisher: IEEE
Additional Information: ©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 13 Feb 2007
Last Modified: 19 Sep 2010 04:13
URI: http://eprints.iisc.ac.in/id/eprint/490

Actions (login required)

View Item View Item