Arnab Halder
A New Unsupervised Approach for Text Localization in Shaky and Non-shaky Scene Video
Halder, Arnab; Palaiahnakote, Shivakumara; Pal, Umapada; Blumenstein, Michael; Liu, Cheng-Lin
Authors
Dr Shivakumara Palaiahnakote S.Palaiahnakote@salford.ac.uk
Lecturer in Computer Vision
Umapada Pal
Michael Blumenstein
Cheng-Lin Liu
Abstract
Text Detection in shaky and non-shaky videos is challenging due to poor video quality and the presence of static and dynamic obstacles. Video captured by a shaky camera due to wind is considered shaky video, while video captured by a fixed camera is considered as non-shaky video. Most state-of-the-art methods achieve the best results when exploring the concept of deep learning. The present study proposes an unsupervised approach for text spotting in shaky and non-shaky videos. In the first stage, our method selects keyframes from the input video by estimating the similarity between the temporal frames, which we named activation frames. For each activation frame, the proposed method extracts statistical features such as orientation, spectral, edge density and intensity features that represent text information. The extracted features are fed to a K-means clustering method to obtain the text clusters, which results in text regions in the activation frames. For each region, the proposed method uses optical flow to extract spatial consistency, motion consistency and depth map consistency for localizing text using temporal voting-non-maximum suppression. Experiments are conducted on our shaky and non-shaky dataset, and the benchmark dataset of ICDAR 2015. For the experiments it can be seen that the proposed method is superior to existing methods.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Document Analysis and Recognition - ICDAR 2024 |
Start Date | Aug 30, 2024 |
End Date | Sep 4, 2024 |
Acceptance Date | Aug 30, 2024 |
Online Publication Date | Sep 9, 2024 |
Publication Date | 2024 |
Deposit Date | Nov 15, 2024 |
Publicly Available Date | Sep 10, 2025 |
Publisher | Springer |
Series ISSN | 0302-9743 |
ISBN | 978-3-031-70548-9 |
DOI | https://doi.org/10.1007/978-3-031-70549-6_10 |
Files
This file is under embargo until Sep 10, 2025 due to copyright reasons.
Contact S.Palaiahnakote@salford.ac.uk to request a copy for personal use.
You might also like
A Newly Adopted YOLOv9 Model for Detecting Mould Regions Inside of Buildings
(2024)
Journal Article
Spatial-Frequency Based EEG Features for Classification of Human Emotions
(2024)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search