Skip to main content

Research Repository

Advanced Search

Ontology and framework for semantic labelling of document data and software methods (2018)
Conference Proceeding
Clausner, C., & Antonacopoulos, A. (2018). Ontology and framework for semantic labelling of document data and software methods. . https://doi.org/10.1109/DAS.2018.46

We present a metadata labelling framework for datasets, software tools, and workflows. An ontology for document image analysis was developed with deep support for historical data. An accompanying open source software framework was implemented to enab... Read More about Ontology and framework for semantic labelling of document data and software methods.

Document representation refinement for precise region description (2014)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2014). Document representation refinement for precise region description. In A. Antonacopoulos, & K. Schulz (Eds.), DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/2595188.2595198

Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, fr... Read More about Document representation refinement for precise region description.

The significance of reading order in document recognition and its evaluation (2013)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2013). The significance of reading order in document recognition and its evaluation. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2013.141

Reading order detection and representation is an important task in many digitisation scenarios involving the preservation of the logical structure of a document. The corresponding need for the evaluation of reading order results generated by layout a... Read More about The significance of reading order in document recognition and its evaluation.

Aletheia - An advanced document layout and text ground-truthing system for production environments (2011)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011). Aletheia - An advanced document layout and text ground-truthing system for production environments. In 2011 International Conference on Document Analysis and Recognition ICDAR 2011. https://doi.org/10.1109/ICDAR.2011.19

Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms... Read More about Aletheia - An advanced document layout and text ground-truthing system for production environments.

The PAGE (Page Analysis and Ground-Truth Elements) format framework (2010)
Conference Proceeding
Pletschacher, S., & Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-Truth Elements) format framework. In 2010 20th International Conference on Pattern Recognition. https://doi.org/10.1109/ICPR.2010.72

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to... Read More about The PAGE (Page Analysis and Ground-Truth Elements) format framework.

Word-Based adaptive OCR for historical books (2009)
Conference Proceeding
Kluzner, V., Tzadok, A., Shimony, Y., Walach, E., & Antonacopoulos, A. (2009). Word-Based adaptive OCR for historical books. In 2009 10th International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2009.133

The aim of this work is to propose a new approach to the recognition of historical texts by providing an adaptive mechanism that automatically tunes itself to a specific book. The system is based on clustering together all the similar words in a book... Read More about Word-Based adaptive OCR for historical books.

The lifecycle of a digital historical document: structure and content
Conference Proceeding
Antonacopoulos, A., Wiszniewski, B., Krawczyk, H., & Karatzas, D. The lifecycle of a digital historical document: structure and content.

This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic... Read More about The lifecycle of a digital historical document: structure and content.