A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Nandanwar, Lokesh; Shivakumara, Palaiahnakote; Manna, Suvojit; Pal, Umapada; Lu, Tong; Blumenstein, Michael

doi:10.1007/978-3-030-59830-3_7

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Nandanwar, Lokesh; Shivakumara, Palaiahnakote; Manna, Suvojit; Pal, Umapada; Lu, Tong; Blumenstein, Michael

Authors

Lokesh Nandanwar

Dr Shivakumara Palaiahnakote S.Palaiahnakote@salford.ac.uk
Lecturer in Computer Vision

Suvojit Manna

Umapada Pal

Tong Lu

Michael Blumenstein

Abstract

Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification.

Presentation Conference Type	Conference Paper (published)
Conference Name	Pattern Recognition and Artificial Intelligence International Conference, ICPRAI 2020
Start Date	Oct 19, 2020
End Date	Oct 23, 2020
Online Publication Date	Oct 9, 2020
Publication Date	Oct 9, 2020
Deposit Date	Nov 15, 2024
Publisher	Springer
Series Title	Lecture Notes in Computer Science
Series ISSN	1611-3349
Book Title	Pattern Recognition and Artificial Intelligence
ISBN	978-3-030-59829-7
DOI	https://doi.org/10.1007/978-3-030-59830-3_7

Struck-Out Handwritten Word Detection and Restoration for Automatic Descriptive Answer Evaluation (2025)
Journal Article

Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction (2024)
Journal Article

A Newly Adopted YOLOv9 Model for Detecting Mould Regions Inside of Buildings (2024)
Journal Article

A Novel Infogain and Multi-Axial Wavelet-Based Transformer for Personality Trait Question Answering (2024)
Journal Article

Spatial-Frequency Based EEG Features for Classification of Human Emotions (2024)
Journal Article

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Nandanwar, Lokesh; Shivakumara, Palaiahnakote; Manna, Suvojit; Pal, Umapada; Lu, Tong; Blumenstein, Michael

Authors

Abstract

You might also like

Downloadable Citations