Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Remote Sens.

Sec. Image Analysis and Classification

Volume 6 - 2025 | doi: 10.3389/frsen.2025.1683696

This article is part of the Research TopicMachine Learning for Advanced Remote Sensing: From Theory to Applications and Societal ImpactView all 4 articles

ResFormer: An Efficient Transformer Framework for Scalable Semantic Segmentation of Remote Sensing Images

Provisionally accepted
Yanglin  OuYanglin Ou1,2Xiqi  WangXiqi Wang3*
  • 1Huizhou Technician Institute, Huizhou, China
  • 2Huazhong University of Science and Technology, Wuhan, China
  • 3Shandong Jianzhu University, Jinan, China

The final, formatted version of the article will be published soon.

The translation of machine learning theory into operational remote sensing applications that deliver measurable societal value remains a paramount challenge. This endeavor requires models that are not only accurate but also scalable, reliable, and directly applicable to real-world problems such as climate resilience and sustainable urban development. While Convolutional Neural Networks (CNNs) have been foundational, their limited receptive fields often fail to capture the global context essential for interpreting complex scenes. Vision Transformers, with their global self-attention mechanism, offer a powerful alternative but typically incur prohibitive computational costs. To address these challenges, this paper introduces ResFormer, a novel architecture designed to bridge the gap between algorithmic innovation and demonstrable public good. Specifically, we propose a novel linear-complexity Transformer block integrated with residual connections, which drastically reduces the computational overhead from quadratic to linear complexity without sacrificing global context modeling. This efficiency enables the processing of high-resolution remote sensing imagery on commodity hardware. On the large-scale UAVid urban-scene dataset, ResFormer achieves a mean Intersection-over-Union (mIoU) of 68.7%, and on the ISPRS Potsdam dataset, it attains 85.9% mIoU. By holistically addressing scalability, reliability, and impact, ResFormer serves as a reproducible exemplar that moves the field toward machine learning systems that generate trustworthy and actionable knowledge for the public good. The implementation will be made publicly available to foster further research and application.

Keywords: image segmentation1, UAV2, Transformer3, Scalability4, reliability5

Received: 11 Aug 2025; Accepted: 08 Oct 2025.

Copyright: © 2025 Ou and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Xiqi Wang, wangxiqi24@sdjzu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.