Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Computer Vision

Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1644044

Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation

Provisionally accepted
  • 1Zurich University of Applied Sciences, Winterthur, Switzerland
  • 2AI Initiative, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

The final, formatted version of the article will be published soon.

Humans and animals recognize objects irrespective of the beholder's point of view, which may drastically change their appearance. Artificial pattern recognizers strive to also achieve this, e.g., through translational invariance in convolutional neural networks (CNNs). However, CNNs and vision transformers (ViTs) both perform poorly on rotated inputs. Here we present AMR (artificial mental rotation), a method for dealing with in-plane rotations focusing on large datasets and architectural flexibility, our simple AMR implementation works with all common CNN and ViT architectures. We test it on randomly rotated versions of ImageNet, Stanford Cars, and Oxford Pet. With a top-1 error (averaged across datasets and architectures) of 0.743, AMR outperforms rotational data augmentation (average top-1 error of 0.626) by 19%. We also easily transfer a trained AMR module to a downstream task to improve the performance of a pre-trained semantic segmentation model on rotated CoCo from 32.7 to 55.2 IoU.

Keywords: Computer Vision, mental rotation, CNN, transformer, In-plane Rotations, bio-inspired, Neural network architecture

Received: 09 Jun 2025; Accepted: 13 Aug 2025.

Copyright: © 2025 Tuggener, Stadelmann and Schmidhuber. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Lukas Tuggener, Zurich University of Applied Sciences, Winterthur, Switzerland

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.