Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
The lifecycle of a digital historical document: structure and content
Antonacopoulos, A; Wiszniewski, B; Krawczyk, H; Karatzas, D
Authors
B Wiszniewski
H Krawczyk
D Karatzas
Abstract
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic information) along with the tools that have been created to realise each stage in the lifecycle. The whole approach is described in the context of different types of typewritten documents relating to
prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (€1.5M) by the European Union (www.memorialproject.info). Extensive tests with historians/archivists and evaluation of the content extraction results indicate the superior performance of the whole semantics-driven approach both over manual transcription and over the semi-automated application of
off-the-shelf OCR and the use of a conventional (text and layout) document format.
Citation
Antonacopoulos, A., Wiszniewski, B., Krawczyk, H., & Karatzas, D. The lifecycle of a digital historical document: structure and content.
Conference Name | ACM Symposium on Document Engineering (DocEng'04) |
---|---|
Conference Location | Milwaukee, Wisconsin, USA |
Start Date | Oct 28, 2004 |
End Date | Oct 30, 2004 |
Deposit Date | Jan 5, 2009 |
Publisher URL | http://eprints.ecs.soton.ac.uk/13538/ |
Additional Information | Additional Information : Publisher: ACM Press Event Type : Conference |
You might also like
Text line segmentation from struck-out handwritten document images
(2022)
Journal Article
A new deep wavefront based model for text localization in 3D video
(2021)
Journal Article
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search