Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
J Hayes
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
This paper describes how crowdsourcing can be incorporated as an integral part of a comprehensive technical workflow to identify, extract and validate data from large volumes of printed tabular statistics, and transform them into operable digital datasets using current structural and descriptive standards. The recently completed digitisation project for the 1961 Census of England and Wales (commissioned by the UK's Office for National Statistics) is used to provide details on data processing, crowdsourcing platform and tasks, crowd interaction, and validation of results. The multi-modal approach employed was very successful, delivering far more complete and validated data than automated processes alone could produce (due to the challenging nature of the source material).
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 5th International Workshop on Historical Document Imaging and Processing - HIP'19 |
Start Date | Sep 20, 2019 |
End Date | Sep 21, 2019 |
Online Publication Date | Sep 20, 2019 |
Publication Date | Sep 20, 2019 |
Deposit Date | Nov 12, 2019 |
Publicly Available Date | Nov 12, 2019 |
Series Title | ACM International Conference Proceeding Series: HIP: Historical Document Imaging and Processing |
Series Number | 02155 |
Book Title | Proceedings of the 5th International Workshop on Historical Document Imaging and Processing - HIP '19 |
ISBN | 9781450376686 |
DOI | https://doi.org/10.1145/3352631.3352643 |
Publisher URL | https://doi.org/10.1145/3352631.3352643 |
Related Public URLs | https://www.primaresearch.org/hip2019/ https://dl.acm.org/citation.cfm?id=3352631&picked=prox |
Crowdsourcing historical tabular data - usir.pdf
(714 Kb)
PDF
Efficient and effective OCR engine training
(2019)
Journal Article
Highlights of the novel dewaterability estimation test (DET) device
(2019)
Journal Article
The ENP image and ground truth dataset of historical newspapers
(-0001)
Book Chapter
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search