C Neudecker
A survey of OCR evaluation tools and metrics
Neudecker, C; Baierer, K; Gerber, M; Clausner, C; Antonacopoulos, A; Pletschacher, S
Authors
K Baierer
M Gerber
Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
S Pletschacher
Abstract
The millions of pages of historical documents that are digitized in libraries are increasingly used in contexts that have more specific requirements for OCR quality than keyword search. How to comprehensively, efficiently and reliably assess the quality of OCR results against the background of mass digitization, when ground truth can only ever be produced for very small numbers? Due to gaps in specifications, results from OCR evaluation tools can return different results, and due to differences in implementation, even commonly used error rates are often not directly comparable. OCR evaluation metrics and sampling methods are also not sufficient where they do not take into account the accuracy of layout analysis, since for advanced use cases like Natural Language Processing or the Digital Humanities, accurate layout analysis and detection of the reading order are crucial. We provide an overview of OCR evaluation metrics and tools, describe two advanced use cases for OCR results, and perform an OCR evaluation experiment with multiple evaluation tools and different metrics for two distinct datasets. We analyze the differences and commonalities in light of the presented use cases and suggest areas for future work.
Citation
Neudecker, C., Baierer, K., Gerber, M., Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2021). A survey of OCR evaluation tools and metrics. In HIP '21: The 6th International Workshop on Historical Document Imaging and Processing. https://doi.org/10.1145/3476887.3476888
Conference Name | HIP '21: The 6th International Workshop on Historical Document Imaging and Processing |
---|---|
Conference Location | Lausanne, Switzerland |
End Date | Sep 6, 2021 |
Acceptance Date | Jul 4, 2021 |
Online Publication Date | Sep 5, 2021 |
Publication Date | Oct 31, 2021 |
Deposit Date | Nov 10, 2021 |
Publicly Available Date | Nov 10, 2021 |
Publisher | Association for Computing Machinery (ACM) |
Book Title | HIP '21: The 6th International Workshop on Historical Document Imaging and Processing |
ISBN | 9781450386906 |
DOI | https://doi.org/10.1145/3476887.3476888 |
Publisher URL | https://doi.org/10.1145/3476887.3476888 |
Related Public URLs | https://dl.acm.org/doi/proceedings/10.1145/3476887 |
Additional Information | Additional Information : ** From Crossref proceedings articles via Jisc Publications Router **History: published_online 31-10-2021; issued 05-09-2021; published 05-09-2021 Event Type : Conference Funders : German Research Foundation (DFG);Federal German Ministry of Education and Research (BMBF) Projects : OCR-D;QURATOR Grant Number: 409784275 Grant Number: 03WKDA1A |
Files
pg13_hip21-1_Neudecker(1).pdf
(680 Kb)
PDF
You might also like
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
Crowdsourcing historical tabular data : 1961 census of England and Wales
(2019)
Conference Proceeding
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search