ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Splitting merged characters of kannada benchmark dataset using simplified paired-valleys and l-cut

Kumar, HRS and Madhavaraj, A and Ramakrishnan, AG (2019) Splitting merged characters of kannada benchmark dataset using simplified paired-valleys and l-cut. In: 25th National Conference on Communications, NCC 2019, 20 - 23 February 2019, Bangalore.

[img] PDF
NCC_2019.pdf - Published Version
Restricted to Registered users only

Download (904kB) | Request a copy
Official URL: https://doi.org/10.1109/NCC.2019.8732239

Abstract

Abstract We reduce the computational complexity of the paired-valley algorithm for splitting merged characters, from Θ(N2) down to Θ(N), where N is the number of symbols merged. We also propose an effective way (L-cut algorithm) to separate the merged half-consonants (known in Kannada as ottus) from the base symbols. We have created a benchmark dataset of 4033 sub-word images in Kannada, each comprising two or more merged characters. We test the recognition accuracy of Tesseract OCR on the created benchmark dataset, before and after applying our technique. The accuracy of Tesseract v3 OCR on the created dataset of 61.6% increases by 20% to a value of 81.7% after the splitting of the characters by our method. The algorithm's scalability to other scripts has been explored by limited experiments on Telugu and Tamil.

Item Type: Conference Paper
Publication: 25th National Conference on Communications, NCC 2019
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc. .
Keywords: Computational complexity; Landforms; Optical character recognition, Kannada; Merged characters; Old books; Ottu; Paired valleys; Printed texts; Tamil; Telugu; Tesseract, Statistical tests
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 29 Nov 2022 05:31
Last Modified: 29 Nov 2022 05:31
URI: https://eprints.iisc.ac.in/id/eprint/78061

Actions (login required)

View Item View Item