Skip to main content

Research Repository

Advanced Search

All Outputs (24)

NAME – A Rich XML Format for Named Entity and Relation Tagging (2023)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2023). NAME – A Rich XML Format for Named Entity and Relation Tagging. In HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing (91-96). https://doi.org/10.1145/3604951.3605521

We present NAME XML, a schema for named entities and relations in documents. The standout features are: option to reference a variety of document formats (such as PAGE XML or plain text), support of entity hierarchies, custom entity types via ontolog... Read More about NAME – A Rich XML Format for Named Entity and Relation Tagging.

A survey of OCR evaluation tools and metrics (2021)
Conference Proceeding
Neudecker, C., Baierer, K., Gerber, M., Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2021). A survey of OCR evaluation tools and metrics. In HIP '21: The 6th International Workshop on Historical Document Imaging and Processing. https://doi.org/10.1145/3476887.3476888

The millions of pages of historical documents that are digitized in libraries are increasingly used in contexts that have more specific requirements for OCR quality than keyword search. How to comprehensively, efficiently and reliably assess the qual... Read More about A survey of OCR evaluation tools and metrics.

Identifying information needs of patients with IgA Nephropathy, using an innovative social media stepped analytical approach (2021)
Journal Article
Vasilica, C., Oates, T., Clausner, C., Ormandy, P., Barratt, J., & Graham-Brown, M. (2021). Identifying information needs of patients with IgA Nephropathy, using an innovative social media stepped analytical approach. Kidney International Reports, 6(5), 1317-1325. https://doi.org/10.1016/j.ekir.2021.02.030

Introduction Increasingly people with kidney disease are using social media to search for medical information and to find peer-support. IgA nephropathy (IgAN) predominantly affects young adults, demographically the biggest users of... Read More about Identifying information needs of patients with IgA Nephropathy, using an innovative social media stepped analytical approach.

Flexible character accuracy measure for reading-order-independent evaluation (2020)
Journal Article
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2020). Flexible character accuracy measure for reading-order-independent evaluation. Pattern Recognition Letters, 131, 390-397. https://doi.org/10.1016/j.patrec.2020.02.003

The extraction of textual information from scanned document pages is a fundamental stage in any digitisation effort and directly determines the success of the overall document analysis and understanding application scenarios. To evaluate and improve... Read More about Flexible character accuracy measure for reading-order-independent evaluation.

A cloud-hosted MapReduce architecture for syntactic parsing (2019)
Conference Proceeding
Woldemariam, Y., Pletschacher, S., Clausner, C., & Bass, J. (2019). A cloud-hosted MapReduce architecture for syntactic parsing. In Kallithea, Greece. https://doi.org/10.1109/SEAA.2019.00024

Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequenti... Read More about A cloud-hosted MapReduce architecture for syntactic parsing.

Efficient and effective OCR engine training (2019)
Journal Article
Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2020). Efficient and effective OCR engine training. International Journal on Document Analysis and Recognition, 23(1), 73-78. https://doi.org/10.1007/s10032-019-00347-8

We present an efficient and effective approach to train OCR engines using the Aletheia document analysis system. All components required for training are seamlessly integrated into Aletheia: training data preparation, the OCR engine’s training proces... Read More about Efficient and effective OCR engine training.

Crowdsourcing historical tabular data : 1961 census of England and Wales (2019)
Conference Proceeding
Clausner, C., Hayes, J., & Antonacopoulos, A. (2019). Crowdsourcing historical tabular data : 1961 census of England and Wales. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing - HIP '19. https://doi.org/10.1145/3352631.3352643

This paper describes how crowdsourcing can be incorporated as an integral part of a comprehensive technical workflow to identify, extract and validate data from large volumes of printed tabular statistics, and transform them into operable digital dat... Read More about Crowdsourcing historical tabular data : 1961 census of England and Wales.

Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study (2019)
Conference Proceeding
Clausner, C., Antonacopoulos, A., Henshaw, C., & Hayes, J. (2019). Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study. In DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/3322905.3322932

Numerical data of considerable significance is present in historical documents in tabular form. Due to the challenges involved in the extraction of this data from the scanned documents it is not available to researchers in a useful representation tha... Read More about Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study.

Highlights of the novel dewaterability estimation test (DET) device (2019)
Journal Article
Scholz, M., Almuktar, S., Clausner, C., & Antonacopoulos, A. (2020). Highlights of the novel dewaterability estimation test (DET) device. Environmental Technology, 41(20), 2594-2602. https://doi.org/10.1080/09593330.2019.1575916

Many industries, which are producing sludge in large quantities, depend on sludge dewatering technology to reduce the corresponding water content. A key design parameter for dewatering equipment is the capillary suction time (CST) test, which has, ho... Read More about Highlights of the novel dewaterability estimation test (DET) device.

ICFHR 2018 Competition on recognition of historical Arabic scientific manuscripts - RASM2018 (2018)
Conference Proceeding
Clausner, C., Antonacopoulos, A., McGregor, N., & Wilson-Nunn, D. (2018). ICFHR 2018 Competition on recognition of historical Arabic scientific manuscripts - RASM2018. . https://doi.org/10.1109/ICFHR-2018.2018.00088

This paper presents an objective comparative evaluation of page analysis and recognition methods for historical scientific manuscripts with text in Arabic language and script. It describes the competition (modus operandi, dataset and evaluation metho... Read More about ICFHR 2018 Competition on recognition of historical Arabic scientific manuscripts - RASM2018.

Ontology and framework for semantic labelling of document data and software methods (2018)
Conference Proceeding
Clausner, C., & Antonacopoulos, A. (2018). Ontology and framework for semantic labelling of document data and software methods. . https://doi.org/10.1109/DAS.2018.46

We present a metadata labelling framework for datasets, software tools, and workflows. An ontology for document image analysis was developed with deep support for historical data. An accompanying open source software framework was implemented to enab... Read More about Ontology and framework for semantic labelling of document data and software methods.

Study protocol : responding to the needs of patients with IgA nephropathy, a social media approach (2017)
Journal Article
Graham-Brown, M., Vasilica, C., Oates, T., Light, B., Clausner, C., Antonacopoulos, A., …Barratt, J. (2017). Study protocol : responding to the needs of patients with IgA nephropathy, a social media approach. Clinical Kidney Journal, 11(4), 474-478. https://doi.org/10.1093/ckj/sfx131

Background IgA nephropathy is the most common cause of glomerulonephritis in the Western world and predominantly affects young adults. Demographically these patients are the biggest users of social media. With increasing numbers of patients turning... Read More about Study protocol : responding to the needs of patients with IgA nephropathy, a social media approach.

ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017 (2017)
Conference Proceeding
Clausner, C., Antonacopoulos, A., Derrick, T., & Pletschacher, S. (2017). ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017. . https://doi.org/10.1109/ICDAR.2017.230

This paper presents an objective comparative evaluation of page analysis and recognition methods for historical documents with text mainly in Bengali language and script. It describes the competition (modus operandi, dataset and evaluation methodolog... Read More about ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017.

ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017 (2017)
Conference Proceeding
Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2017). ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017. . https://doi.org/10.1109/ICDAR.2017.229

This paper presents an objective comparative evaluation of page segmentation and region classification methods for documents with complex layouts. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context o... Read More about ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017.

Creating a complete workflow for digitising historical census documents : considerations and evaluation (2017)
Conference Proceeding
Clausner, C., Hayes, J., Antonacopoulos, A., & Pletschacher, S. (2017). Creating a complete workflow for digitising historical census documents : considerations and evaluation. . https://doi.org/10.1145/3151509.3151525

The 1961 Census of England and Wales was the first UK census to make use of computers. However, only bound volumes and microfilm copies of printouts remain, locking a wealth of information in a form that is practically unusable for research. In this... Read More about Creating a complete workflow for digitising historical census documents : considerations and evaluation.

Unearthing the recent past : digitising and understanding statistical information from census tables (2017)
Conference Proceeding
Clausner, C., Hayes, J., Antonacopoulos, A., & Pletschacher, S. (2017). Unearthing the recent past : digitising and understanding statistical information from census tables. In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage - DATeCH2017. https://doi.org/10.1145/3078081.3078106

Censuses comprise a wealth of information at a large (national) scale that allow governments (who commission them) and the public to have a detailed snapshot of how people live (geographical distribution and characteristics). In addition to underpinn... Read More about Unearthing the recent past : digitising and understanding statistical information from census tables.

Effective geometric restoration of distorted historical documents for large-scale digitization (2017)
Journal Article
Yang, P., Antonacopoulos, A., Clausner, C., Pletschacher, S., & Qi, J. (2017). Effective geometric restoration of distorted historical documents for large-scale digitization. IET Image Processing, 11(10), 841-853. https://doi.org/10.1049/iet-ipr.2016.0973

Due to storage conditions and material’s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recog... Read More about Effective geometric restoration of distorted historical documents for large-scale digitization.

Quality prediction system for large-scale digitisation workflows (2016)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2016). Quality prediction system for large-scale digitisation workflows. . https://doi.org/10.1109/das.2016.82

The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to... Read More about Quality prediction system for large-scale digitisation workflows.

Europeana newspapers OCR workflow evaluation (2015)
Conference Proceeding
Pletschacher, S., Clausner, C., & Antonacopoulos, A. (2015). Europeana newspapers OCR workflow evaluation. . https://doi.org/10.1145/2809544.2809554

This paper summarises the final performance evaluation results of the OCR workflow which was employed for large-scale production in the Europeana Newspapers project. It gives a detailed overview of how the involved software performed... Read More about Europeana newspapers OCR workflow evaluation.

ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015 (2015)
Book Chapter
Antonacopoulos, A., Clausner, C., Papadopoulos, C., & Pletschacher, S. (2015). ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (1151-1155). IEEE. https://doi.org/10.1109/ICDAR.2015.7333941

This paper presents an objective comparative evaluation of page segmentation and region classification methods for documents with complex layouts. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context o... Read More about ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015.