C Neudecker
Making Europe’s historical newspapers searchable
Neudecker, C; Antonacopoulos, Apostolos
Abstract
This paper provides a rare glimpse into the overall approach for the refinement, i.e. the enrichment of scanned historical newspapers with text and layout recognition, in the Europeana Newspapers project. Within three years, the project processed more than 10 million pages of historical newspapers from 12 national and major libraries to produce the largest open access and fully searchable text collection of digital historical newspapers in Europe. In this, a wide variety of legal, logistical, technical and other challenges were encountered. After introducing the background issues in newspaper digitization in Europe, the paper discusses the technical aspects of refinement in greater detail. It explains what decisions were taken in the design of the large-scale processing workflow to address these challenges, what were the results produced and what were identified as best practices.
Citation
Neudecker, C., & Antonacopoulos, A. (2016). Making Europe’s historical newspapers searchable. https://doi.org/10.1109/DAS.2016.83
Journal Article Type | Article |
---|---|
Acceptance Date | Dec 14, 2015 |
Publication Date | Jun 13, 2016 |
Deposit Date | Mar 22, 2016 |
Journal | Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016) |
Volume | 2016 |
Pages | 405-410 |
DOI | https://doi.org/10.1109/DAS.2016.83 |
Publisher URL | http://dx.doi.org/10.1109/DAS.2016.83 |
Related Public URLs | http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7485953 |
Additional Information | Funders : European Commission |
You might also like
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
Highlights of the novel dewaterability estimation test (DET) device
(2019)
Journal Article
Document analysis and text recognition
(2018)
Book
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search