AUTHOR=Li Yanli , Wang Congyi , Wang Huan TITLE=Toward accurate hand mesh estimation via masked image modeling JOURNAL=Frontiers in Physics VOLUME=Volume 12 - 2024 YEAR=2025 URL=https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2024.1515842 DOI=10.3389/fphy.2024.1515842 ISSN=2296-424X ABSTRACT=IntroductionWith an enormous number of hand images generated over time, leveraging unlabeled images for pose estimation is an emerging yet challenging topic. While some semi-supervised and self-supervised methods have emerged, they are constrained by their reliance on high-quality keypoint detection models or complicated network architectures.MethodsWe propose a novel selfsupervised pretraining strategy for 3D hand mesh regression. Our approach integrates a multi-granularity strategy with pseudo-keypoint alignment in a teacher–student framework, employing self-distillation and masked image modeling for comprehensive representation learning. We pair this with a robust pose estimation baseline, combining a standard vision transformer backbone with a pyramidal mesh alignment feedback head.ResultsExtensive experiments demonstrate HandMIM’s competitive performance across diverse datasets, notably achieving an 8.00 mm Procrustes alignment vertex-point-error on the challenging HO3Dv2 test set, which features severe hand occlusions, surpassing many specially optimized architectures.