Skip to main content

Research Repository

Advanced Search

Flexible character accuracy measure for reading-order-independent evaluation (2020)
Journal Article
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2020). Flexible character accuracy measure for reading-order-independent evaluation. Pattern Recognition Letters, 131, 390-397. https://doi.org/10.1016/j.patrec.2020.02.003

The extraction of textual information from scanned document pages is a fundamental stage in any digitisation effort and directly determines the success of the overall document analysis and understanding application scenarios. To evaluate and improve... Read More about Flexible character accuracy measure for reading-order-independent evaluation.

A cloud-hosted MapReduce architecture for syntactic parsing (2019)
Conference Proceeding
Woldemariam, Y., Pletschacher, S., Clausner, C., & Bass, J. (2019). A cloud-hosted MapReduce architecture for syntactic parsing. In Kallithea, Greece. https://doi.org/10.1109/SEAA.2019.00024

Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequenti... Read More about A cloud-hosted MapReduce architecture for syntactic parsing.

Ontology and framework for semantic labelling of document data and software methods (2018)
Conference Proceeding
Clausner, C., & Antonacopoulos, A. (2018). Ontology and framework for semantic labelling of document data and software methods. . https://doi.org/10.1109/DAS.2018.46

We present a metadata labelling framework for datasets, software tools, and workflows. An ontology for document image analysis was developed with deep support for historical data. An accompanying open source software framework was implemented to enab... Read More about Ontology and framework for semantic labelling of document data and software methods.

Effective geometric restoration of distorted historical documents for large-scale digitization (2017)
Journal Article
Yang, P., Antonacopoulos, A., Clausner, C., Pletschacher, S., & Qi, J. (2017). Effective geometric restoration of distorted historical documents for large-scale digitization. IET Image Processing, 11(10), 841-853. https://doi.org/10.1049/iet-ipr.2016.0973

Due to storage conditions and material’s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recog... Read More about Effective geometric restoration of distorted historical documents for large-scale digitization.

Document representation refinement for precise region description (2014)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2014). Document representation refinement for precise region description. In A. Antonacopoulos, & K. Schulz (Eds.), DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/2595188.2595198

Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, fr... Read More about Document representation refinement for precise region description.

The significance of reading order in document recognition and its evaluation (2013)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2013). The significance of reading order in document recognition and its evaluation. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2013.141

Reading order detection and representation is an important task in many digitisation scenarios involving the preservation of the logical structure of a document. The corresponding need for the evaluation of reading order results generated by layout a... Read More about The significance of reading order in document recognition and its evaluation.

Aletheia - An advanced document layout and text ground-truthing system for production environments (2011)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011). Aletheia - An advanced document layout and text ground-truthing system for production environments. In 2011 International Conference on Document Analysis and Recognition ICDAR 2011. https://doi.org/10.1109/ICDAR.2011.19

Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms... Read More about Aletheia - An advanced document layout and text ground-truthing system for production environments.