Lokesh Nandanwar
A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images
Nandanwar, Lokesh; Shivakumara, Palaiahnakote; Manna, Suvojit; Pal, Umapada; Lu, Tong; Blumenstein, Michael
Authors
Dr Shivakumara Palaiahnakote S.Palaiahnakote@salford.ac.uk
Lecturer
Suvojit Manna
Umapada Pal
Tong Lu
Michael Blumenstein
Abstract
Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Pattern Recognition and Artificial Intelligence International Conference, ICPRAI 2020 |
Start Date | Oct 19, 2020 |
End Date | Oct 23, 2020 |
Online Publication Date | Oct 9, 2020 |
Publication Date | Oct 9, 2020 |
Deposit Date | Nov 15, 2024 |
Publisher | Springer |
Series Title | Lecture Notes in Computer Science |
Series ISSN | 1611-3349 |
Book Title | Pattern Recognition and Artificial Intelligence |
ISBN | 978-3-030-59829-7 |
DOI | https://doi.org/10.1007/978-3-030-59830-3_7 |
You might also like
Altered Handwritten Text Detection in Document Images Using Deep Learning
(2024)
Journal Article
A novel autoencoder for structural anomalies detection in river tunnel operation
(2023)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search