Skip to main content

Research Repository

Advanced Search

All Outputs (54)

Document analysis and text recognition (2018)
Book
(2018). V. Märgner, U. Pal, & A. Antonacopoulos (Eds.), Document analysis and text recognition. World Scientific. https://doi.org/10.1142/10689

The compendium presents the latest results of the most prominent competitions held in the field of Document Analysis and Text Recognition. It includes a description of the participating systems and the underlying methods on one hand and the datasets... Read More about Document analysis and text recognition.

Creating a complete workflow for digitising historical census documents : considerations and evaluation (2017)
Presentation / Conference Contribution

The 1961 Census of England and Wales was the first UK census to make use of computers. However, only bound volumes and microfilm copies of printouts remain, locking a wealth of information in a form that is practically unusable for research. In this... Read More about Creating a complete workflow for digitising historical census documents : considerations and evaluation.

Unearthing the recent past : digitising and understanding statistical information from census tables (2017)
Presentation / Conference Contribution

Censuses comprise a wealth of information at a large (national) scale that allow governments (who commission them) and the public to have a detailed snapshot of how people live (geographical distribution and characteristics). In addition to underpinn... Read More about Unearthing the recent past : digitising and understanding statistical information from census tables.

Effective geometric restoration of distorted historical documents for large-scale digitization (2017)
Journal Article
Yang, P., Antonacopoulos, A., Clausner, C., Pletschacher, S., & Qi, J. (2017). Effective geometric restoration of distorted historical documents for large-scale digitization. IET Image Processing, 11(10), 841-853. https://doi.org/10.1049/iet-ipr.2016.0973

Due to storage conditions and material’s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recog... Read More about Effective geometric restoration of distorted historical documents for large-scale digitization.

Making Europe’s historical newspapers searchable (2016)
Journal Article
Neudecker, C., & Antonacopoulos, A. (2016). Making Europe’s historical newspapers searchable. https://doi.org/10.1109/DAS.2016.83

This paper provides a rare glimpse into the overall approach for the refinement, i.e. the enrichment of scanned historical newspapers with text and layout recognition, in the Europeana Newspapers project. Within three years, the project processed mor... Read More about Making Europe’s historical newspapers searchable.

ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015 (2015)
Book Chapter
Antonacopoulos, A., Clausner, C., Papadopoulos, C., & Pletschacher, S. (2015). ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (1151-1155). IEEE. https://doi.org/10.1109/ICDAR.2015.7333941

This paper presents an objective comparative evaluation of page segmentation and region classification methods for documents with complex layouts. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context o... Read More about ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015.

Historical typewritten document recognition using minimal user interaction (2015)
Book Chapter
Retsinas, G., Gatos, B., Antonacopoulos, A., Louloudis, G., & Stamatopoulos, N. (2015). Historical typewritten document recognition using minimal user interaction. In Proceedings of the 3rd Workshop on Historical Document Imaging and Processing (HIP2015) (31-38). ACM Digital Library. https://doi.org/10.1145/2809544.2809559

Recognition of low-quality historical typewritten documents can still be considered as a challenging and difficult task due to several issues i.e. the existence of faint and degraded characters, stains, tears, punch holes etc. In this paper, we explo... Read More about Historical typewritten document recognition using minimal user interaction.

Distinction between handwritten and machine-printed text based on the bag of visual words model (2014)
Journal Article
Zagoris, K., Pratikakis, I., Antonacopoulos, A., Gatos, B., & Papamarkos, N. (2014). Distinction between handwritten and machine-printed text based on the bag of visual words model. Pattern recognition, 47(3), 1051-1062. https://doi.org/10.1016/j.patcog.2013.09.005

In a variety of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may coexist in the same document image, raising significant issues within the recognition pipeline. It is, therefore,... Read More about Distinction between handwritten and machine-printed text based on the bag of visual words model.

Aletheia - An advanced document layout and text ground-truthing system for production environments (2011)
Presentation / Conference Contribution

Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms... Read More about Aletheia - An advanced document layout and text ground-truthing system for production environments.