Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Creating a complete workflow for digitising historical census documents : considerations and evaluation
Clausner, C; Hayes, J; Antonacopoulos, A; Pletschacher, S
Authors
J Hayes
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Abstract
The 1961 Census of England and Wales was the first UK census to make use of computers. However, only bound volumes and microfilm copies of printouts remain, locking a wealth of information in a form that is practically unusable for research. In this paper, we describe process of creating the digitisation workflow that was developed as part of a pilot study for the Office for National Statistics. The emphasis of the paper is on the issues originating from the historical nature of the material and how they were resolved. The steps described include image pre-processing, OCR setup, table recognition, post-processing, data ingestion, crowdsourcing, and quality assurance. Evaluation methods and results are presented for all steps.
Citation
Clausner, C., Hayes, J., Antonacopoulos, A., & Pletschacher, S. (2017). Creating a complete workflow for digitising historical census documents : considerations and evaluation. . https://doi.org/10.1145/3151509.3151525
Conference Name | 2017 Workshop on Historical Document Imaging and Processing (HIP2017) |
---|---|
Conference Location | Kyoto, Japan |
Start Date | Nov 10, 2017 |
End Date | Nov 11, 2017 |
Publication Date | Nov 11, 2017 |
Deposit Date | Nov 20, 2017 |
Publicly Available Date | Nov 21, 2017 |
ISBN | 9781450353908 |
DOI | https://doi.org/10.1145/3151509.3151525 |
Related Public URLs | http://events.unifr.ch/hip2017/ |
Files
HIP2017 - Census 1961 camera-ready 3.pdf
(1 Mb)
PDF
Version
Author's accepted manuscript
You might also like
Text line segmentation from struck-out handwritten document images
(2022)
Journal Article
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search