Abhra Chaudhuri
A deep action-oriented video image classification system for text detection and recognition
Chaudhuri, Abhra; Shivakumara, Palaiahnakote; Nath Chowdhury, Pinaki; Pal, Umapada; Lu, Tong; Lopresti, Daniel; Hemantha Kumar, G.
Authors
Dr Shivakumara Palaiahnakote S.Palaiahnakote@salford.ac.uk
Lecturer
Pinaki Nath Chowdhury
Umapada Pal
Tong Lu
Daniel Lopresti
G. Hemantha Kumar
Abstract
For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly.
Citation
Chaudhuri, A., Shivakumara, P., Nath Chowdhury, P., Pal, U., Lu, T., Lopresti, D., & Hemantha Kumar, G. (2021). A deep action-oriented video image classification system for text detection and recognition. SN Applied Sciences, 3, Article 838. https://doi.org/10.1007/s42452-021-04821-z
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 22, 2021 |
Publication Date | 2021-11 |
Deposit Date | Feb 2, 2024 |
Publicly Available Date | Feb 5, 2024 |
Journal | SN Applied Sciences |
Print ISSN | 2523-3971 |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Volume | 3 |
Article Number | 838 |
DOI | https://doi.org/10.1007/s42452-021-04821-z |
Files
Published Version
(3.8 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
An Adaptive Xception Model for Classification of Brain Tumors
(2024)
Journal Article
Altered Handwritten Text Detection in Document Images Using Deep Learning
(2024)
Journal Article
NDOrder: Exploring a Novel Decoding Order for Scene Text Recognition
(2024)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search