Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Dr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
C Henshaw
J Hayes
Numerical data of considerable significance is present in historical documents in tabular form. Due to the challenges involved in the extraction of this data from the scanned documents it is not available to researchers in a useful representation that unlocks the underlying statistical information. This paper sets out to create a better understanding of the problem of extracting and representing statistical information from numerical tables, in order to enable the creation of appropriate technical solutions and also for collection holders to appropriately plan their digitisation projects to better serve their readers. To that effect, after an initial overview of current practices in digitisation and representation of historical numerical data, the authors’ findings are presented from a scoping exercise of the Wellcome Library’s high-profile collection of the Medical Officer of Health reports. In addition to users’ perspectives and a detailed examination of the nature and structure of the data in the reports, a study of the extraction and integration of the data is also described.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Third International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2019) |
Start Date | May 8, 2019 |
End Date | May 10, 2019 |
Publication Date | May 8, 2019 |
Deposit Date | Oct 28, 2019 |
Publicly Available Date | Oct 28, 2019 |
Book Title | DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage |
ISBN | 9781450371940 |
DOI | https://doi.org/10.1145/3322905.3322932 |
Publisher URL | http://dx.doi.org/10.1145/3322905.3322932 |
Related Public URLs | https://dl.acm.org/citation.cfm?id=3322905 http://datech.digitisation.eu/ |
MOH Paper DATeCH 2019_usir.pdf
(434 Kb)
PDF
Efficient and effective OCR engine training
(2019)
Journal Article
Highlights of the novel dewaterability estimation test (DET) device
(2019)
Journal Article
The ENP image and ground truth dataset of historical newspapers
(-0001)
Book Chapter
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search