Skip to main content

Research Repository

Advanced Search

Correction of arbitrary geometric artefacts in historical documents

Rahnemoonfar, M

Authors

M Rahnemoonfar



Contributors

Abstract

The research presented in this thesis addresses the problem of correction of arbitrary
geometric artefacts in historical documents. Geometric distortions in historical
documents may be introduced at any time during the life cycle of a document, from
when it was first printed to the time it is digitised by an imaging device. Such
distortions appear as arbitrary warping, folds and page curl, and have detrimental effects
to recognition (OCR) and readability (e.g. for print-on-demand). This thesis also
critically examines the state of the art methods and identifies opportunities for
significant improvement.
Firstly, the present work focuses on the main issues in text line segmentation and
proposes a method which is robust in the presence of various geometric distortions,
other artefacts in historical documents, and dense and complex layout. Secondly, a
precise base line detection method based on geometric features of the parametric model
of the segmented line is presented. In other words, the proposed base line detection
method not only takes into consideration unexpected geometric distortions, which are
common in historical document images— but it also identifies certain main components
of the text line, such as ascenders, descenders, and certain decorative marks, and makes
intelligent distinctions between such native (but potentially misleading) components of
the line and other global and local distortions of the whole page.
Such precise derivation of the baselines (and in certain instances the top lines) will serve
as building blocks for a major correction stage, namely the de-warping procedure. At
its starting point, the proposed de-warping method takes into account both global and
local characteristics of the text image and models the smooth deformations between text
lines; by taking advantage of the proposed line segmentation and baseline detection
stages, it can cope with a variety of distortions, such as page curl, arbitrary warping and
fold, in a reliable, robust, and flexible manner.

Citation

Rahnemoonfar, M. Correction of arbitrary geometric artefacts in historical documents. (Thesis). Salford : University of Salford

Thesis Type Thesis
Deposit Date Oct 3, 2012
Award Date Jan 1, 2010