Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Europeana newspapers OCR workflow evaluation
Pletschacher, S; Clausner, C; Antonacopoulos, A
Authors
Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Abstract
This paper summarises the final performance evaluation results of the OCR workflow which was employed for large-scale production in the Europeana Newspapers project. It gives a detailed overview of how the involved software performed on a representative dataset of historical newspaper pages (for which ground truth was created) with regard to general text accuracy as well as layout-related factors which have an impact on how the material can be used in specific use scenarios. Specific types of errors are examined and evaluated in order to identify possible improvements related to the employed document image analysis and recognition methods. Moreover, alternatives to the standard production workflow are assessed to determine future directions and give advice on best practice related to OCR projects.
Citation
Pletschacher, S., Clausner, C., & Antonacopoulos, A. (2015). Europeana newspapers OCR workflow evaluation. . https://doi.org/10.1145/2809544.2809554
Conference Name | 2015 Workshop on Historical Document Imaging and Processing (HIP2015) |
---|---|
Conference Location | Nancy, France |
Start Date | Aug 1, 2015 |
Publication Date | Aug 1, 2015 |
Deposit Date | Dec 23, 2015 |
DOI | https://doi.org/10.1145/2809544.2809554 |
Publisher URL | http://dx.doi.org/10.1145/2809544.2809554 |
Related Public URLs | http://hip2015.irisa.fr/ |
Additional Information | Event Type : Workshop Funders : EU Competitiveness and Innovation Framework Programme Projects : Europeana Newspapers Grant Number: 297380 |
You might also like
Text line segmentation from struck-out handwritten document images
(2022)
Journal Article
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search