Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Quality prediction system for large-scale digitisation workflows
Clausner, C; Pletschacher, S; Antonacopoulos, Apostolos
Authors
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Abstract
The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.
Citation
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2016). Quality prediction system for large-scale digitisation workflows. . https://doi.org/10.1109/das.2016.82
Conference Name | 2016 12th IAPR Workshop on Document Analysis Systems (DAS) |
---|---|
Conference Location | Santorini, Greece |
Start Date | Aug 11, 2016 |
End Date | Aug 14, 2016 |
Acceptance Date | Dec 14, 2015 |
Publication Date | Jun 13, 2016 |
Deposit Date | Mar 22, 2016 |
Volume | 2016 |
Pages | 138-143 |
DOI | https://doi.org/10.1109/das.2016.82 |
Publisher URL | http://dx.doi.org/10.1109/das.2016.82 |
Related Public URLs | http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7485953 |
Additional Information | This work has been funded through the EU Competitiveness and Innovation Framework Programme grant Europeana Newspapers(Ref. 297380) |
You might also like
Text line segmentation from struck-out handwritten document images
(2022)
Journal Article
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search