Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
S Pletschacher
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Editor
KU Schulz
Editor
Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | DATeCH 2014: Digital Access to Textual Cultural Heritage 2014 |
Start Date | May 19, 2014 |
End Date | May 20, 2014 |
Publication Date | May 19, 2014 |
Deposit Date | Jan 28, 2015 |
Book Title | DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage |
ISBN | 9781450325882 |
DOI | https://doi.org/10.1145/2595188.2595198 |
Publisher URL | http://dx.doi.org/10.1145/2595188.2595198 |
Related Public URLs | http://dl.acm.org/dl.cfm?CFID=473908569&CFTOKEN=68594310 |
Efficient and effective OCR engine training
(2019)
Journal Article
The ENP image and ground truth dataset of historical newspapers
(-0001)
Book Chapter
A survey of OCR evaluation tools and metrics
(2021)
Presentation / Conference Contribution
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search