Kumar, HRS and Ramakrishnan, AG (2019) Gamma enhanced binarization - An adaptive nonlinear enhancement of degraded word images for improved recognition of split characters. In: 25th National Conference on Communications, NCC 2019, 20 - 23 February 2019, Bangalore.
PDF
25th_nat_NCC 2019_February 2019_2019.pdf - Published Version Restricted to Registered users only Download (1MB) | Request a copy |
Abstract
Recognition performance of any OCR suffers because of the merged and split characters that occur in the scanned images of degraded printed documents. We propose an elegant method of non-linearly enhancing such degraded, gray-scale word images. This connects the broken strokes of the characters, so that binarization of the processed word images gives components with better connectivity for most characters or recognizable units. From an initial value of one, the value of gamma, the parameter determining the enhancement, is decreased in powers of 2 and the right value of gamma is chosen based on the recognition score of our character classifier. We have created a benchmark dataset of 1685 degraded word images obtained from scanned pages of several old Kannada books. The word images have been recognized before and after the proposed nonlinear enhancement. There is an absolute improvement of 14.8 in the Unicode level recognition accuracy of our SVM-based character classifier on the above dataset due to the proposed enhancement of the gray-scale word images. Even on the Google's Tesseract OCR for Kannada, our gamma enhanced binarization results in an improvement of 5.6 in the Unicode level accuracy.
Item Type: | Conference Paper |
---|---|
Publication: | 25th National Conference on Communications, NCC 2019 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc. |
Keywords: | Classification (of information); Optical character recognition; Support vector machines, Binarizations; Kannada; Old books; Power-law; Printed texts; Split characters; Tesseract; Word images, Image enhancement |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 29 Nov 2022 05:48 |
Last Modified: | 29 Nov 2022 05:48 |
URI: | https://eprints.iisc.ac.in/id/eprint/78066 |
Actions (login required)
View Item |