Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Computer Vision

Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1658556

Optimized Encoder-based Transformers for Improved Local and Global Integration in Railway Image Classification

Provisionally accepted
Lilan  LiLilan LiXuemei  ZhanXuemei ZhanTiantian  WuTiantian WuHua  MaHua Ma*
  • Zhengzhou Railway Vocational and Technical College, Zhengzhou, China

The final, formatted version of the article will be published soon.

Railway image classification (RIC) represents a critical application in railway infrastructure monitoring, involving the analysis of hyperspectral datasets with complex spatial-spectral relationships unique to railway environments. Nevertheless, Transformer-based methodologies for RIC face obstacles pertaining to the extraction of local features and the efficiency of training processes. To address these challenges, we introduce the Pure Transformer Network (PTN), an entirely Transformer-centric framework tailored for the effective execution of RIC tasks. Our approach improves the amalgamation of local and global data within railway images by utilizing a Patch Embedding Transformer (PET) module that employs an "unfold + attention + fold" mechanism in conjunction with a Transformer module that incorporates relative attention. The PET module harnesses attention mechanisms to replicate convolutional operations, enabling adaptive receptive fields for varying spatial patterns in railway infrastructure, thus circumventing the constraints imposed by fixed convolutional kernels. Additionally, we propose a Memory Efficient Algorithm that achieves 35% training time reduction while preserving accuracy. Thorough assessments conducted on four hyperspectral railway image datasets validate the PTN's exceptional performance, demonstrating superior accuracy compared to existing CNN-and Transformer-based baselines.

Keywords: Efficient Transformer, local feature, optimization, Railway Image Classification, Global feature

Received: 05 Aug 2025; Accepted: 17 Oct 2025.

Copyright: © 2025 Li, Zhan, Wu and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Hua Ma, mahua11352@outlook.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.