Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Aletheia - An advanced document layout and text ground-truthing system for production environments
Clausner, C; Pletschacher, S; Antonacopoulos, A
Authors
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Abstract
Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms of productivity. Ground truth is not only crucial for training and evaluation at the development stage of tools but also for quality assurance in the scope of production workflows for digital libraries. This paper describes Aletheia, an advanced system for accurate and yet cost-effective ground truthing of large amounts of documents. It aids the user with a number of automated and semi-automated tools which were partly developed and improved based on feedback from major libraries across Europe and from their digitisation service providers which are using the tool in a production environment. Novel features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. Special features have been developed to support working with the complexities of historical documents. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth.
Citation
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011). Aletheia - An advanced document layout and text ground-truthing system for production environments. In 2011 International Conference on Document Analysis and Recognition ICDAR 2011. https://doi.org/10.1109/ICDAR.2011.19
Conference Name | Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR) |
---|---|
Conference Location | Beijing, China |
Start Date | Sep 18, 2011 |
End Date | Sep 21, 2011 |
Online Publication Date | Nov 3, 2011 |
Publication Date | Nov 3, 2011 |
Deposit Date | Oct 5, 2012 |
Series Title | Proceedings of the International Conference on Document Analysis and Recognition |
Book Title | 2011 International Conference on Document Analysis and Recognition ICDAR 2011 |
ISBN | 9781457713507 |
DOI | https://doi.org/10.1109/ICDAR.2011.19 |
Publisher URL | http://dx.doi.org/10.1109/ICDAR.2011.19 |
Related Public URLs | https://ieeexplore.ieee.org/xpl/conhome/6065245/proceeding |
You might also like
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search