Ayush Roy
A novel domain independent scene text localizer
Roy, Ayush; Palaiahnakote, Shivakumara; Pal, Umapada; Liu, Cheng-Lin
Authors
Abstract
Text localization across multiple domains is crucial for applications like autonomous driving and tracking marathon runners. This work introduces DIPCYT, a novel model that utilizes Domain Independent Partial Convolution and a Yolov5-based Transformer for text localization in scene images from various domains, including natural scenes, underwater, and drone images. Each domain presents unique challenges: underwater images suffer from poor quality and degradation, drone images suffer from tiny text and loss of shapes, and scene images suffer from arbitrarily oriented, shaped text. Additionally, license plates in drone images may not provide rich semantic information compared to other text types due to loss of contextual information between characters. To tackle these challenges, DIPCYT employs new partial convolution layers within Yolov5 and integrates Transformer detection heads with a novel Fourier Positional Convolutional Block Attention Module (FPCBAM). This approach leverages common text properties across domains, such as contextual (global) and spatial (local) relationships. Experimental results demonstrate that DIPCYT outperforms existing methods, achieving F-scores of 0.90, 0.90, 0.77, 0.85, 0.85, and 0.88 on Total-Text, ICDAR 2015, ICDAR 2019 MLT, CTW1500, Drone, and Underwater datasets, respectively.
Citation
Roy, A., Palaiahnakote, S., Pal, U., & Liu, C.-L. (2024). A novel domain independent scene text localizer. Pattern recognition, 158, Article 111015. https://doi.org/10.1016/j.patcog.2024.111015
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 10, 2024 |
Online Publication Date | Sep 15, 2024 |
Publication Date | Sep 18, 2024 |
Deposit Date | Sep 12, 2024 |
Publicly Available Date | Sep 23, 2024 |
Journal | Pattern Recognition |
Print ISSN | 0031-3203 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 158 |
Article Number | 111015 |
DOI | https://doi.org/10.1016/j.patcog.2024.111015 |
Keywords | Scene text detection; Transformer; Attention module; Drone images; Underwater images |
Files
Published Version
(47 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
An Adaptive Xception Model for Classification of Brain Tumors
(2024)
Journal Article
Altered Handwritten Text Detection in Document Images Using Deep Learning
(2024)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search