Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2016). Quality prediction system for large-scale digitisation workflows. . https://doi.org/10.1109/das.2016.82
Conference Name | 2016 12th IAPR Workshop on Document Analysis Systems (DAS) |
---|---|
Conference Location | Santorini, Greece |
Start Date | Aug 11, 2016 |
End Date | Aug 14, 2016 |
Acceptance Date | Dec 14, 2015 |
Publication Date | Jun 13, 2016 |
Deposit Date | Mar 22, 2016 |
Volume | 2016 |
Pages | 138-143 |
DOI | https://doi.org/10.1109/das.2016.82 |
Publisher URL | http://dx.doi.org/10.1109/das.2016.82 |
Related Public URLs | http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7485953 |
Additional Information | This work has been funded through the EU Competitiveness and Innovation Framework Programme grant Europeana Newspapers(Ref. 297380) |
Text line segmentation from struck-out handwritten document images
(2022)
Journal Article
A new deep wavefront based model for text localization in 3D video
(2021)
Journal Article
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search