P Shivakumara
Text line segmentation from struck-out handwritten document images
Shivakumara, P; Jain, T; Pal, U; Surana, N; Antonacopoulos, A; Lu, T
Authors
Abstract
In the case of freestyle everyday handwritten documents, writing, erasing, striking out, and overwriting are common behaviors of the writers. This not cleanly-written text poses significant challenges for text line segmentation. Accurate text line segmentation in handwritten documents is essential to the success of several real-world applications, such as answer script evaluation, fraud document identification, writer identification, document age estimation and writer gender classification, to name a few. This paper proposes the first, to the authors’ best knowledge, text line segmentation approach that is applicable in the presence of both cleanly-written and struck-out text. The approach consists of three steps. In the first step, components - at the word level - are detected in the input handwritten document images (containing both cleanly-written and struck-out text) based on stroke width information estimation, filtering of noise, and morphological operations. In the second step, the struck-out components are identified using the DenseNet deep learning model and treated differently to clean text in further analysis. In the third step, geometrical spatial features, the direction between candidate components and the overall text line, and the common overlapping region between adjacent components are evaluated to progressively form text lines. To evaluate the proposed steps and compare the proposed method to the state-of-the-art, experiments have been conducted on a new problem-focused dataset containing instances of struck-out text in handwritten documents, as well as on two standard datasets (ICDAR2013 text line segmentation contest dataset and ICDAR2019 HDRC dataset) to show the proposed steps are effective and useful, with superior performance compared to existing methods.
Citation
Shivakumara, P., Jain, T., Pal, U., Surana, N., Antonacopoulos, A., & Lu, T. (2022). Text line segmentation from struck-out handwritten document images. Expert systems with applications, 210, 118266. https://doi.org/10.1016/j.eswa.2022.118266
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 21, 2022 |
Online Publication Date | Aug 18, 2022 |
Publication Date | Aug 18, 2022 |
Deposit Date | Nov 17, 2022 |
Publicly Available Date | Aug 19, 2024 |
Journal | Expert Systems with Applications |
Print ISSN | 0957-4174 |
Publisher | Elsevier |
Volume | 210 |
Pages | 118266 |
DOI | https://doi.org/10.1016/j.eswa.2022.118266 |
Publisher URL | https://doi.org/10.1016/j.eswa.2022.118266 |
Files
Accepted Version
(29.9 Mb)
PDF
You might also like
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search