Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Document representation refinement for precise region description
Clausner, C; Pletschacher, S; Antonacopoulos, A
Authors
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Contributors
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Editor
KU Schulz
Editor
Abstract
Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.
Citation
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2014). Document representation refinement for precise region description. In A. Antonacopoulos, & K. Schulz (Eds.), DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/2595188.2595198
Conference Name | DATeCH 2014: Digital Access to Textual Cultural Heritage 2014 |
---|---|
Conference Location | Madrid, Spain |
Start Date | May 19, 2014 |
End Date | May 20, 2014 |
Publication Date | May 19, 2014 |
Deposit Date | Jan 28, 2015 |
Book Title | DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage |
ISBN | 9781450325882 |
DOI | https://doi.org/10.1145/2595188.2595198 |
Publisher URL | http://dx.doi.org/10.1145/2595188.2595198 |
Related Public URLs | http://dl.acm.org/dl.cfm?CFID=473908569&CFTOKEN=68594310 |
You might also like
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search