Skip to main content

Research Repository

Advanced Search

Capsule network with using shifted windows for 3D human pose estimation

Liu, Xiufeng; Zhao, Zhongqiu; Tian, Weidong; Liu, Binbin; He, Hongmei

Authors

Xiufeng Liu

Zhongqiu Zhao

Weidong Tian

Binbin Liu

Profile image of Mary He

Prof Mary He H.He5@salford.ac.uk
Professor in A.I. for Robotics



Abstract

3D human pose estimation (HPE) is a vital technology with diverse applications, enhancing precision in tracking, analyzing, and understanding human movements. However, 3D HPE from monocular videos presents significant challenges, primarily due to self-occlusion, which can partially hinder traditional neural networks’ ability to accurately predict these positions. To address this challenge, we propose a novel approach using a capsule network integrated with the shifted windows attention model (SwinCAP). It improves prediction accuracy by effectively capturing the spatial hierarchical relationships between different parts and objects. A Parallel Double Attention mechanism is applied in SwinCAP enhances both computational efficiency and modeling capacity, and a Multi-Attention Collaborative module is introduced to capture a diverse range of information, including both coarse and fine details. Extensive experiments demonstrate that our SwinCAP achieves better or comparable results to state-of-the-art models in the challenging task of viewpoint transfer on two commonly used datasets: Human3.6M and MPI-INF-3DHP.

Journal Article Type Article
Acceptance Date Feb 1, 2025
Online Publication Date Feb 10, 2025
Publication Date 2025-04
Deposit Date Mar 21, 2025
Journal Journal of Visual Communication and Image Representation
Print ISSN 1047-3203
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 108
Article Number 104409
DOI https://doi.org/10.1016/j.jvcir.2025.104409