Skip to main content

Research Repository

Advanced Search

All Outputs (46)

Creating a complete workflow for digitising historical census documents : considerations and evaluation (2017)
Conference Proceeding
Clausner, C., Hayes, J., Antonacopoulos, A., & Pletschacher, S. (2017). Creating a complete workflow for digitising historical census documents : considerations and evaluation. . https://doi.org/10.1145/3151509.3151525

The 1961 Census of England and Wales was the first UK census to make use of computers. However, only bound volumes and microfilm copies of printouts remain, locking a wealth of information in a form that is practically unusable for research. In this... Read More about Creating a complete workflow for digitising historical census documents : considerations and evaluation.

Unearthing the recent past : digitising and understanding statistical information from census tables (2017)
Conference Proceeding
Clausner, C., Hayes, J., Antonacopoulos, A., & Pletschacher, S. (2017). Unearthing the recent past : digitising and understanding statistical information from census tables. In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage - DATeCH2017. https://doi.org/10.1145/3078081.3078106

Censuses comprise a wealth of information at a large (national) scale that allow governments (who commission them) and the public to have a detailed snapshot of how people live (geographical distribution and characteristics). In addition to underpinn... Read More about Unearthing the recent past : digitising and understanding statistical information from census tables.

Effective geometric restoration of distorted historical documents for large-scale digitization (2017)
Journal Article
Yang, P., Antonacopoulos, A., Clausner, C., Pletschacher, S., & Qi, J. (2017). Effective geometric restoration of distorted historical documents for large-scale digitization. IET Image Processing, 11(10), 841-853. https://doi.org/10.1049/iet-ipr.2016.0973

Due to storage conditions and material’s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recog... Read More about Effective geometric restoration of distorted historical documents for large-scale digitization.

Quality prediction system for large-scale digitisation workflows (2016)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2016). Quality prediction system for large-scale digitisation workflows. . https://doi.org/10.1109/das.2016.82

The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to... Read More about Quality prediction system for large-scale digitisation workflows.

Making Europe’s historical newspapers searchable (2016)
Journal Article
Neudecker, C., & Antonacopoulos, A. (2016). Making Europe’s historical newspapers searchable. https://doi.org/10.1109/DAS.2016.83

This paper provides a rare glimpse into the overall approach for the refinement, i.e. the enrichment of scanned historical newspapers with text and layout recognition, in the Europeana Newspapers project. Within three years, the project processed mor... Read More about Making Europe’s historical newspapers searchable.

Navigating the storm : IMPACT, eMOP, and Agile Steering Standards (2015)
Journal Article
Mandell, L., Neudecker, C., Antonacopoulos, A., Grumbach, E., Auvil, L., Christy, M., …Samuelson, T. (2015). Navigating the storm : IMPACT, eMOP, and Agile Steering Standards. Digital Scholarship in the Humanities, 32(1), 189-194. https://doi.org/10.1093/llc/fqv062

This article discusses two major initiatives tasked with developing tools to im- prove optical character recognition (OCR) or the mechanical keying of texts that are digitally available only as page images. The two initiatives are the IMProving ACces... Read More about Navigating the storm : IMPACT, eMOP, and Agile Steering Standards.

Europeana newspapers OCR workflow evaluation (2015)
Conference Proceeding
Pletschacher, S., Clausner, C., & Antonacopoulos, A. (2015). Europeana newspapers OCR workflow evaluation. . https://doi.org/10.1145/2809544.2809554

This paper summarises the final performance evaluation results of the OCR workflow which was employed for large-scale production in the Europeana Newspapers project. It gives a detailed overview of how the involved software performed... Read More about Europeana newspapers OCR workflow evaluation.

ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015 (2015)
Book Chapter
Antonacopoulos, A., Clausner, C., Papadopoulos, C., & Pletschacher, S. (2015). ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (1151-1155). IEEE. https://doi.org/10.1109/ICDAR.2015.7333941

This paper presents an objective comparative evaluation of page segmentation and region classification methods for documents with complex layouts. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context o... Read More about ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015.

Historical typewritten document recognition using minimal user interaction (2015)
Book Chapter
Retsinas, G., Gatos, B., Antonacopoulos, A., Louloudis, G., & Stamatopoulos, N. (2015). Historical typewritten document recognition using minimal user interaction. In Proceedings of the 3rd Workshop on Historical Document Imaging and Processing (HIP2015) (31-38). ACM Digital Library. https://doi.org/10.1145/2809544.2809559

Recognition of low-quality historical typewritten documents can still be considered as a challenging and difficult task due to several issues i.e. the existence of faint and degraded characters, stains, tears, punch holes etc. In this paper, we explo... Read More about Historical typewritten document recognition using minimal user interaction.

Document representation refinement for precise region description (2014)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2014). Document representation refinement for precise region description. In A. Antonacopoulos, & K. Schulz (Eds.), DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. https://doi.org/10.1145/2595188.2595198

Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, fr... Read More about Document representation refinement for precise region description.

Distinction between handwritten and machine-printed text based on the bag of visual words model (2014)
Journal Article
Zagoris, K., Pratikakis, I., Antonacopoulos, A., Gatos, B., & Papamarkos, N. (2014). Distinction between handwritten and machine-printed text based on the bag of visual words model. Pattern recognition, 47(3), 1051-1062. https://doi.org/10.1016/j.patcog.2013.09.005

In a variety of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may coexist in the same document image, raising significant issues within the recognition pipeline. It is, therefore,... Read More about Distinction between handwritten and machine-printed text based on the bag of visual words model.

The significance of reading order in document recognition and its evaluation (2013)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2013). The significance of reading order in document recognition and its evaluation. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2013.141

Reading order detection and representation is an important task in many digitisation scenarios involving the preservation of the logical structure of a document. The corresponding need for the evaluation of reading order results generated by layout a... Read More about The significance of reading order in document recognition and its evaluation.

Aletheia - An advanced document layout and text ground-truthing system for production environments (2011)
Conference Proceeding
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2011). Aletheia - An advanced document layout and text ground-truthing system for production environments. In 2011 International Conference on Document Analysis and Recognition ICDAR 2011. https://doi.org/10.1109/ICDAR.2011.19

Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms... Read More about Aletheia - An advanced document layout and text ground-truthing system for production environments.

A real-life system for identifying and monitoring objects for user-specified scenarios in live CCTV (2011)
Thesis
Fairchild, A. A real-life system for identifying and monitoring objects for user-specified scenarios in live CCTV. (Thesis). Salford : University of Salford

Abstract This thesis presents the research and subsequent development of a real life system capable of identifying and monitoring objects for user-specified scenarios in live CCTV video. More specifically, after a review of the state of the art in... Read More about A real-life system for identifying and monitoring objects for user-specified scenarios in live CCTV.

Restoration of arbitrarily warped historical document images using flow lines (2011)
Conference Proceeding
Rahnemoonfar, M., & Antonacopoulos, A. (2011). Restoration of arbitrarily warped historical document images using flow lines. In 2011 International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2011.184

Historical documents frequently suffer from arbitrary geometric distortions (warping and folds) due to storage conditions, use and to, some extent, the printing process of the time. In addition, page curl can be prominent due to the scanning tech... Read More about Restoration of arbitrarily warped historical document images using flow lines.

The PAGE (Page Analysis and Ground-Truth Elements) format framework (2010)
Conference Proceeding
Pletschacher, S., & Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-Truth Elements) format framework. In 2010 20th International Conference on Pattern Recognition. https://doi.org/10.1109/ICPR.2010.72

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to... Read More about The PAGE (Page Analysis and Ground-Truth Elements) format framework.

Correction of arbitrary geometric artefacts in historical documents (2010)
Thesis
Rahnemoonfar, M. Correction of arbitrary geometric artefacts in historical documents. (Thesis). Salford : University of Salford

The research presented in this thesis addresses the problem of correction of arbitrary geometric artefacts in historical documents. Geometric distortions in historical documents may be introduced at any time during the... Read More about Correction of arbitrary geometric artefacts in historical documents.

Word-Based adaptive OCR for historical books (2009)
Conference Proceeding
Kluzner, V., Tzadok, A., Shimony, Y., Walach, E., & Antonacopoulos, A. (2009). Word-Based adaptive OCR for historical books. In 2009 10th International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2009.133

The aim of this work is to propose a new approach to the recognition of historical texts by providing an adaptive mechanism that automatically tunes itself to a specific book. The system is based on clustering together all the similar words in a book... Read More about Word-Based adaptive OCR for historical books.