Skip to main content

Research Repository

Advanced Search

A new framework for recognition of heavily degraded characters in historical typewritten documents based on semi-supervised clustering

Pletschacher, S; Hu, J; Antonacopoulos, A

Authors

J Hu



Abstract

This paper presents a new semi-supervised clustering
framework to the recognition of heavily degraded characters
in historical typewritten documents, where off-theshelf
OCR typically fails. The constraints are generated
using typographical (collection-independent) domain
knowledge and are used to guide both sample (glyph set)
partitioning and metric learning. Experimental results using
simple features provide encouraging evidence that
this approach can lead to significantly improved clustering
results compared to simple K-Means clustering, as
well as to clustering using a state-of-the art OCR engine.

Citation

Pletschacher, S., Hu, J., & Antonacopoulos, A. (2009). A new framework for recognition of heavily degraded characters in historical typewritten documents based on semi-supervised clustering. In 2009 10th International Conference on Document Analysis and Recognition. https://doi.org/10.1109/ICDAR.2009.267

Conference Name 10th International Conference on Document Analysis and Recognition
Conference Location Barcelona
Start Date Jul 26, 2009
End Date Jul 29, 2009
Publication Date Jan 1, 2009
Deposit Date Dec 21, 2011
Book Title 2009 10th International Conference on Document Analysis and Recognition
ISBN 9781424445004
DOI https://doi.org/10.1109/ICDAR.2009.267
Publisher URL http://dx.doi.org/10.1109/ICDAR.2009.267