Skip to main content

Research Repository

Advanced Search

The PAGE (Page Analysis and Ground-Truth Elements) format framework

Pletschacher, S; Antonacopoulos, A

The PAGE (Page Analysis and Ground-Truth Elements) format framework Thumbnail


Authors



Abstract

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.

Citation

Pletschacher, S., & Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-Truth Elements) format framework. In 2010 20th International Conference on Pattern Recognition. https://doi.org/10.1109/ICPR.2010.72

Conference Name 20th International Conference on Pattern Recognition (ICPR2010)
Conference Location Istanbul, Turkey
Start Date Aug 23, 2010
End Date Aug 26, 2010
Online Publication Date Oct 7, 2010
Publication Date Aug 26, 2010
Deposit Date Oct 7, 2011
Publicly Available Date Apr 5, 2016
Book Title 2010 20th International Conference on Pattern Recognition
ISBN 9781424475421
DOI https://doi.org/10.1109/ICPR.2010.72
Publisher URL http://dx.doi.org/10.1109/ICPR.2010.72
Related Public URLs https://ieeexplore.ieee.org/xpl/conhome/5595335/proceeding

Files





You might also like



Downloadable Citations