M Rahnemoonfar
Correction of arbitrary geometric artefacts in historical documents
Rahnemoonfar, M
Abstract
The research presented in this thesis addresses the problem of correction of arbitrary
geometric artefacts in historical documents. Geometric distortions in historical
documents may be introduced at any time during the life cycle of a document, from
when it was first printed to the time it is digitised by an imaging device. Such
distortions appear as arbitrary warping, folds and page curl, and have detrimental effects
to recognition (OCR) and readability (e.g. for print-on-demand). This thesis also
critically examines the state of the art methods and identifies opportunities for
significant improvement.
Firstly, the present work focuses on the main issues in text line segmentation and
proposes a method which is robust in the presence of various geometric distortions,
other artefacts in historical documents, and dense and complex layout. Secondly, a
precise base line detection method based on geometric features of the parametric model
of the segmented line is presented. In other words, the proposed base line detection
method not only takes into consideration unexpected geometric distortions, which are
common in historical document images— but it also identifies certain main components
of the text line, such as ascenders, descenders, and certain decorative marks, and makes
intelligent distinctions between such native (but potentially misleading) components of
the line and other global and local distortions of the whole page.
Such precise derivation of the baselines (and in certain instances the top lines) will serve
as building blocks for a major correction stage, namely the de-warping procedure. At
its starting point, the proposed de-warping method takes into account both global and
local characteristics of the text image and models the smooth deformations between text
lines; by taking advantage of the proposed line segmentation and baseline detection
stages, it can cope with a variety of distortions, such as page curl, arbitrary warping and
fold, in a reliable, robust, and flexible manner.
Citation
Rahnemoonfar, M. Correction of arbitrary geometric artefacts in historical documents. (Thesis). Salford : University of Salford
Thesis Type | Thesis |
---|---|
Deposit Date | Oct 3, 2012 |
Award Date | Jan 1, 2010 |
This file is under embargo due to copyright reasons.
Contact Library-ThesesRequest@salford.ac.uk to request a copy for personal use.
You might also like
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
VISE : an interface for Visual Search and Exploration of museum collections
(2019)
Journal Article
Efficient and effective OCR engine training
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search