Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Editor
KU Schulz
Editor
Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | DATeCH 2014: Digital Access to Textual Cultural Heritage 2014 |
Start Date | May 19, 2014 |
End Date | May 20, 2014 |
Publication Date | May 19, 2014 |
Deposit Date | Jan 28, 2015 |
Book Title | DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage |
ISBN | 9781450325882 |
DOI | https://doi.org/10.1145/2595188.2595198 |
Publisher URL | http://dx.doi.org/10.1145/2595188.2595198 |
Related Public URLs | http://dl.acm.org/dl.cfm?CFID=473908569&CFTOKEN=68594310 |
Efficient and effective OCR engine training
(2019)
Journal Article
Highlights of the novel dewaterability estimation test (DET) device
(2019)
Journal Article
The ENP image and ground truth dataset of historical newspapers
(-0001)
Book Chapter
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search