S Pletschacher
A new framework for recognition of heavily degraded characters in historical typewritten documents based on semi-supervised clustering
Pletschacher, S; Hu, J; Antonacopoulos, A
Abstract
This paper presents a new semi-supervised clustering
framework to the recognition of heavily degraded characters
in historical typewritten documents, where off-theshelf
OCR typically fails. The constraints are generated
using typographical (collection-independent) domain
knowledge and are used to guide both sample (glyph set)
partitioning and metric learning. Experimental results using
simple features provide encouraging evidence that
this approach can lead to significantly improved clustering
results compared to simple K-Means clustering, as
well as to clustering using a state-of-the art OCR engine.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 10th International Conference on Document Analysis and Recognition |
Start Date | Jul 26, 2009 |
End Date | Jul 29, 2009 |
Publication Date | Jan 1, 2009 |
Deposit Date | Dec 21, 2011 |
Book Title | 2009 10th International Conference on Document Analysis and Recognition |
ISBN | 9781424445004 |
DOI | https://doi.org/10.1109/ICDAR.2009.267 |
Publisher URL | http://dx.doi.org/10.1109/ICDAR.2009.267 |
You might also like
A new deep CNN for 3D text localization in the wild through shadow removal
(2023)
Journal Article
NAME – A Rich XML Format for Named Entity and Relation Tagging
(2023)
Presentation / Conference Contribution
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search