Skip to main content

Research Repository

Advanced Search

Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study

Clausner, C; Antonacopoulos, A; Henshaw, C; Hayes, J

Authors

C Henshaw

J Hayes



Abstract

Numerical data of considerable significance is present in historical documents in tabular form. Due to the challenges involved in the extraction of this data from the scanned documents it is not available to researchers in a useful representation that unlocks the underlying statistical information. This paper sets out to create a better understanding of the problem of extracting and representing statistical information from numerical tables, in order to enable the creation of appropriate technical solutions and also for collection holders to appropriately plan their digitisation projects to better serve their readers. To that effect, after an initial overview of current practices in digitisation and representation of historical numerical data, the authors’ findings are presented from a scoping exercise of the Wellcome Library’s high-profile collection of the Medical Officer of Health reports. In addition to users’ perspectives and a detailed examination of the nature and structure of the data in the reports, a study of the extraction and integration of the data is also described.

Citation

Clausner, C., Antonacopoulos, A., Henshaw, C., & Hayes, J. (2019). Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study. In DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/3322905.3322932

Conference Name Third International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2019)
Conference Location Brussels, Belgium
Start Date May 8, 2019
End Date May 10, 2019
Publication Date May 8, 2019
Deposit Date Oct 28, 2019
Publicly Available Date Oct 28, 2019
Book Title DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage
ISBN 9781450371940
DOI https://doi.org/10.1145/3322905.3322932
Publisher URL http://dx.doi.org/10.1145/3322905.3322932
Related Public URLs https://dl.acm.org/citation.cfm?id=3322905
http://datech.digitisation.eu/

Files





You might also like



Downloadable Citations