ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

QUAD: quality assessment of documents

Deepak, Kumar and Ramakrishnan, AG (2011) QUAD: quality assessment of documents. In: Proc. 4th International Workshop on Camera-based Document Analysis and Recognition (CBDAR 2011), 2011.

[img] PDF
Cam_Doc_Ana_Rec_1_2011.pdf - Published Version
Restricted to Registered users only

Download (232kB) | Request a copy
Official URL: http://imlab.jp/cbdar2011/

Abstract

We propose a set of metrics that evaluate the uniformity, sharpness, continuity, noise, stroke width variance,pulse width ratio, transient pixels density, entropy and variance of components to quantify the quality of a document image. The measures are intended to be used in any optical character recognition (OCR) engine to a priori estimate the expected performance of the OCR. The suggested measures have been evaluated on many document images, which have different scripts. The quality of a document image is manually annotated by users to create a ground truth. The idea is to correlate the values of the measures with the user annotated data. If the measure calculated matches the annotated description,then the metric is accepted; else it is rejected. In the set of metrics proposed, some of them are accepted and the rest are rejected. We have defined metrics that are easily estimatable. The metrics proposed in this paper are based on the feedback of homely grown OCR engines for Indic (Tamil and Kannada) languages. The metrics are independent of the scripts, and depend only on the quality and age of the paper and the printing. Experiments and results for each proposed metric are discussed. Actual recognition of the printed text is not performed to evaluate the proposed metrics. Sometimes, a document image containing broken characters results in good document image as per the evaluated metrics, which is part of the unsolved challenges. The proposed measures work on gray scale document images and fail to provide reliable information on binarized document image.

Item Type: Conference Paper
Publisher: National Association of Theatre Nurses
Additional Information: Copyright of this article belongs to National Association of Theatre Nurses.
Keywords: Quality Metrics; Document Images; Multi-Script; Document Image Quality Analysis; Optical Character Recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 19 Apr 2013 09:19
Last Modified: 19 Apr 2013 09:19
URI: http://eprints.iisc.ac.in/id/eprint/46229

Actions (login required)

View Item View Item