Skip to main content

Research Repository

Advanced Search

All Outputs (5)

A cloud-hosted MapReduce architecture for syntactic parsing (2019)
Conference Proceeding
Woldemariam, Y., Pletschacher, S., Clausner, C., & Bass, J. (2019). A cloud-hosted MapReduce architecture for syntactic parsing. In Kallithea, Greece. https://doi.org/10.1109/SEAA.2019.00024

Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequenti... Read More about A cloud-hosted MapReduce architecture for syntactic parsing.

Efficient and effective OCR engine training (2019)
Journal Article
Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2020). Efficient and effective OCR engine training. International Journal on Document Analysis and Recognition, 23(1), 73-78. https://doi.org/10.1007/s10032-019-00347-8

We present an efficient and effective approach to train OCR engines using the Aletheia document analysis system. All components required for training are seamlessly integrated into Aletheia: training data preparation, the OCR engine’s training proces... Read More about Efficient and effective OCR engine training.

Crowdsourcing historical tabular data : 1961 census of England and Wales (2019)
Conference Proceeding
Clausner, C., Hayes, J., & Antonacopoulos, A. (2019). Crowdsourcing historical tabular data : 1961 census of England and Wales. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing - HIP '19. https://doi.org/10.1145/3352631.3352643

This paper describes how crowdsourcing can be incorporated as an integral part of a comprehensive technical workflow to identify, extract and validate data from large volumes of printed tabular statistics, and transform them into operable digital dat... Read More about Crowdsourcing historical tabular data : 1961 census of England and Wales.

Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study (2019)
Conference Proceeding
Clausner, C., Antonacopoulos, A., Henshaw, C., & Hayes, J. (2019). Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study. In DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/3322905.3322932

Numerical data of considerable significance is present in historical documents in tabular form. Due to the challenges involved in the extraction of this data from the scanned documents it is not available to researchers in a useful representation tha... Read More about Towards the extraction of statistical information from digitised numerical tables - the Medical Officer of Health reports scoping study.

Highlights of the novel dewaterability estimation test (DET) device (2019)
Journal Article
Scholz, M., Almuktar, S., Clausner, C., & Antonacopoulos, A. (2020). Highlights of the novel dewaterability estimation test (DET) device. Environmental Technology, 41(20), 2594-2602. https://doi.org/10.1080/09593330.2019.1575916

Many industries, which are producing sludge in large quantities, depend on sludge dewatering technology to reduce the corresponding water content. A key design parameter for dewatering equipment is the capillary suction time (CST) test, which has, ho... Read More about Highlights of the novel dewaterability estimation test (DET) device.