Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
C Henshaw
J Hayes
Numerical data of considerable significance is present in historical documents in tabular form. Due to the challenges involved in the extraction of this data from the scanned documents it is not available to researchers in a useful representation that unlocks the underlying statistical information. This paper sets out to create a better understanding of the problem of extracting and representing statistical information from numerical tables, in order to enable the creation of appropriate technical solutions and also for collection holders to appropriately plan their digitisation projects to better serve their readers. To that effect, after an initial overview of current practices in digitisation and representation of historical numerical data, the authors’ findings are presented from a scoping exercise of the Wellcome Library’s high-profile collection of the Medical Officer of Health reports. In addition to users’ perspectives and a detailed examination of the nature and structure of the data in the reports, a study of the extraction and integration of the data is also described.
Clausner, C., Antonacopoulos, A., Henshaw, C., & Hayes, J. (2019). Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study. In DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/3322905.3322932
Conference Name | Third International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2019) |
---|---|
Conference Location | Brussels, Belgium |
Start Date | May 8, 2019 |
End Date | May 10, 2019 |
Publication Date | May 8, 2019 |
Deposit Date | Oct 28, 2019 |
Publicly Available Date | Oct 28, 2019 |
Book Title | DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage |
ISBN | 9781450371940 |
DOI | https://doi.org/10.1145/3322905.3322932 |
Publisher URL | http://dx.doi.org/10.1145/3322905.3322932 |
Related Public URLs | https://dl.acm.org/citation.cfm?id=3322905 http://datech.digitisation.eu/ |
MOH Paper DATeCH 2019_usir.pdf
(434 Kb)
PDF
Text line segmentation from struck-out handwritten document images
(2022)
Journal Article
A new deep wavefront based model for text localization in 3D video
(2021)
Journal Article
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search