Dajian Zhong
NDOrder: Exploring a Novel Decoding Order for Scene Text Recognition
Zhong, Dajian; Zhan, Hongjian; Lyu, Shujing; Liu, Cong; Yin, Bing; Palaiahankote, Shivakumara; Pal, Umapada; Lu, Yue
Authors
Hongjian Zhan
Shujing Lyu
Cong Liu
Bing Yin
Dr Shivakumara Palaiahnakote S.Palaiahnakote@salford.ac.uk
Lecturer in Computer Vision
Umapada Pal
Yue Lu
Contributors
Dr Shivakumara Palaiahnakote S.Palaiahnakote@salford.ac.uk
Supervisor
Abstract
Text recognition in scene images is still considered as a challenging task for the computer vision and pattern recognition community. For text images affected by multiple adverse factors, such as occlusion (due to obstacles) and poor quality (due to blur and low resolution), the performance of the state-of-the-art scene text recognition methods degrades. The key reason is that the existing encoder-decoder framework follows fixed left-to-right decoding order, which lacks sufficient contextual information. In this paper, we present a novel decoding order where good-quality characters can first be decoded followed by low-quality characters, which preserves the contextual information irrespective of the afore-mentioned difficult scenarios. Our method, named NDOrder, extracts visual features with a ViT encoder and then decodes with the Random Order Generation module (ROG) for learning to decode with random decoding orders and the Vision-Content-Position module (VCP) for exploiting the connections among visual information, content and position. In addition, a new dataset named OLQT (Occluded and Low-Quality Text) is created by manually collecting text images that suffer from occlusion or low-quality from several standard text recognition datasets. The dataset is now available at https://github.com/djzhong1/OLQT. Experiments on OLQT and public scene text recognition benchmarks show that the proposed method achieves state-of-the-art performance.
Journal Article Type | Article |
---|---|
Acceptance Date | Mar 18, 2024 |
Online Publication Date | Mar 23, 2024 |
Publication Date | 2024-09 |
Deposit Date | Mar 19, 2024 |
Publicly Available Date | Mar 24, 2026 |
Print ISSN | 0957-4174 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 249 |
DOI | https://doi.org/10.1016/j.eswa.2024.123771 |
Keywords | Scene text recognition; transformer; decoding order optimization; random order generation; contextual information |
Files
This file is under embargo until Mar 24, 2026 due to copyright reasons.
Contact S.Palaiahnakote@salford.ac.uk to request a copy for personal use.
You might also like
A Newly Adopted YOLOv9 Model for Detecting Mould Regions Inside of Buildings
(2024)
Journal Article
Spatial-Frequency Based EEG Features for Classification of Human Emotions
(2024)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search