Crack Detection in Structural Images Using a Hybrid Swin Transformer and Enhanced Features Representation Block

N, Anusha; ANBARASI L, JANI

doi:10.3389/frai.2025.1655091

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

This article is part of the Research TopicDeep Learning for Computer Vision and Measurement SystemsView all 5 articles

Crack Detection in Structural Images Using a Hybrid Swin Transformer and Enhanced Features Representation Block

Provisionally accepted

Anusha N^1*

JANI ANBARASI L²

¹Department of IoT, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
²School of Computer Science and Engineering, Vellore Institute of Technology - Chennai Campus, Chennai, India

The final, formatted version of the article will be published soon.

This paper presents a crack detection framework employing a hybrid model that integrates the Swin Transformer with an Enhanced Features Representation Block (EFRB) to precisely detect cracks in images. The Swin Transformer captures long-range dependencies and efficiently processes complex images, forming the backbone of the feature extraction process. The EFRB improved spatial granularity through depthwise convolutions, that focus on spatial features independently across each channel, and pointwise convolutions to improve channel representation. The proposed model used residual connections to enable deeper networks to overcome vanishing gradient problem. The training process is optimized using population-based feature selection, resulting in robust performance. The network is trained on a dataset split into 80% training and 20% testing, with a learning rate of 1e-3, batch size of 16, and 30 epochs. Evaluation results show that the model achieves an accuracy of 98%, with precision, recall, and F1-scores as 0.97, 0.99, and 0.98 for crack detection, respectively. These results show the effectiveness of the proposed architecture for real-world crack detection applications in structural monitoring.

Keywords: swin transformer, Crack detection, Convolutional Neural Network, Residual network, Residual Network.

Received: 27 Jun 2025; Accepted: 23 Sep 2025.

Copyright: © 2025 N and ANBARASI L. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Anusha N

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.