Skip to main content

Research Repository

Advanced Search

Text segmentation in degraded historical document images

Kavitha, A.S.; Shivakumara, P.; Kumar, G.H.; Lu, Tong

Text segmentation in degraded historical document images Thumbnail


Authors

A.S. Kavitha

G.H. Kumar

Tong Lu



Abstract

Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR) to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision.

Citation

Kavitha, A., Shivakumara, P., Kumar, G., & Lu, T. (2016). Text segmentation in degraded historical document images. Egyptian Informatics Journal, 17(2), 189-197. https://doi.org/10.1016/j.eij.2015.11.003

Journal Article Type Article
Acceptance Date Nov 6, 2015
Online Publication Date Jun 20, 2016
Publication Date 2016-07
Deposit Date Feb 2, 2024
Publicly Available Date Feb 5, 2024
Journal Egyptian Informatics Journal
Print ISSN 1110-8665
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 17
Issue 2
Pages 189-197
DOI https://doi.org/10.1016/j.eij.2015.11.003

Files





You might also like



Downloadable Citations