Mr Christian Clausner C.Clausner@salford.ac.uk
Senior Research Fellow
Flexible character accuracy measure for reading-order-independent evaluation
Clausner, C; Pletschacher, S; Antonacopoulos, A
Authors
Mr Stefan Pletschacher S.Pletschacher@salford.ac.uk
Lecturer
Prof Apostolos Antonacopoulos A.Antonacopoulos@salford.ac.uk
Professor
Abstract
The extraction of textual information from scanned document pages is a fundamental stage in any digitisation effort and directly determines the success of the overall document analysis and understanding application scenarios. To evaluate and improve the performance of optical character recognition (OCR), it is necessary to measure the accuracy of that step alone, without the influence of the processing steps that precede it (e.g. text block segmentation and ordering). Current OCR performance evaluation measures (based on edit distance) are strongly subjective as they need to first serialise the entire text in the documents – a process influenced heavily by the specific reading order determined (often wrongly, especially in cases of multicolumn and complex layouts) by processing steps prior to OCR. This paper presents a new objective and practical edit-distance-based character recognition accuracy measure which overcomes those limitations. It achieves its independence from the reading order by comparing sub-strings of text in a flexible way (i.e. allowing for ordering variations). The precision of the flexible character accuracy measure enables the effective tuning of complete digitisation workflows (as OCR errors are isolated and other steps can be evaluated and optimised separately). For the same reason, it also enables a better estimation of post-OCR (manual) correction effort required. The proposed character accuracy measure has been systematically analysed and validated under lab conditions as well as successfully used in practice in a number of high-profile international competitions since 2017.
Citation
Clausner, C., Pletschacher, S., & Antonacopoulos, A. (2020). Flexible character accuracy measure for reading-order-independent evaluation. Pattern Recognition Letters, 131, 390-397. https://doi.org/10.1016/j.patrec.2020.02.003
Journal Article Type | Article |
---|---|
Acceptance Date | Feb 1, 2020 |
Online Publication Date | Feb 3, 2020 |
Publication Date | Mar 1, 2020 |
Deposit Date | Feb 7, 2020 |
Publicly Available Date | Apr 3, 2020 |
Journal | Pattern Recognition Letters |
Print ISSN | 0167-8655 |
Electronic ISSN | 1872-7344 |
Publisher | Elsevier |
Volume | 131 |
Pages | 390-397 |
DOI | https://doi.org/10.1016/j.patrec.2020.02.003 |
Publisher URL | https://doi.org/10.1016/j.patrec.2020.02.003 |
Related Public URLs | https://www.sciencedirect.com/journal/pattern-recognition-letters |
Files
1-s2.0-S0167865520300416-main.pdf
(2.3 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
A survey of OCR evaluation tools and metrics
(2021)
Conference Proceeding
A cloud-hosted MapReduce architecture for syntactic parsing
(2019)
Conference Proceeding
Efficient and effective OCR engine training
(2019)
Journal Article
Crowdsourcing historical tabular data : 1961 census of England and Wales
(2019)
Conference Proceeding
Highlights of the novel dewaterability estimation test (DET) device
(2019)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search