Skip to main content

Research Repository

Advanced Search

Europeana newspapers OCR workflow evaluation

Pletschacher, S; Clausner, C; Antonacopoulos, A

Authors



Abstract

This paper summarises the final performance evaluation results of the OCR workflow which was employed for large-scale production in the Europeana Newspapers project. It gives a detailed overview of how the involved software performed on a representative dataset of historical newspaper pages (for which ground truth was created) with regard to general text accuracy as well as layout-related factors which have an impact on how the material can be used in specific use scenarios. Specific types of errors are examined and evaluated in order to identify possible improvements related to the employed document image analysis and recognition methods. Moreover, alternatives to the standard production workflow are assessed to determine future directions and give advice on best practice related to OCR projects.

Citation

Pletschacher, S., Clausner, C., & Antonacopoulos, A. (2015). Europeana newspapers OCR workflow evaluation. . https://doi.org/10.1145/2809544.2809554

Conference Name 2015 Workshop on Historical Document Imaging and Processing (HIP2015)
Conference Location Nancy, France
Start Date Aug 1, 2015
Publication Date Aug 1, 2015
Deposit Date Dec 23, 2015
DOI https://doi.org/10.1145/2809544.2809554
Publisher URL http://dx.doi.org/10.1145/2809544.2809554
Related Public URLs http://hip2015.irisa.fr/
Additional Information Event Type : Workshop
Funders : EU Competitiveness and Innovation Framework Programme
Projects : Europeana Newspapers
Grant Number: 297380