Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Radiation Oncology

Volume 15 - 2025 | doi: 10.3389/fonc.2025.1640685

This article is part of the Research TopicInnovative Approaches in Precision Radiation OncologyView all 20 articles

Innovative Patient-Specific Delivered-Dose Prediction for Volumetric Modulated Arc Therapy Using Lightweight Swin-Transformer

Provisionally accepted
  • 1The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
  • 2Jiangxi Cancer Hospital, Nanchang, China

The final, formatted version of the article will be published soon.

Background: Volumetric modulated arc therapy (VMAT) necessitates rigorous pre-treatment patient-specific quality assurance (PSQA) to ensure dosimetric accuracy, yet conventional manual verification methods encounter time and labor constraints in clinical workflows. While deep learning (DL) models have advanced PSQA by automating metrics prediction, existing approaches relying on convolutional neural networks struggle to reconcile local feature extraction with global contextual awareness. This study aims to develop a novel lightweight DL framework that synergizes hierarchical spatial feature learning and computational efficiency to enhance VMAT-delivered dose (VTDose) prediction. Methods: We propose a hybrid architecture featuring a novel hierarchical fusion framework that synergizes shifted-window self-attention with adaptive local-global feature interaction. (termed "STQA"). Specially, strategic replacement of Swin-Transformer blocks with ResNet residual modules in deep layers, coupled with depthwise separable attention mechanisms, enables 40% parameter reduction while preserving spatial resolution. The model was trained on multimodal inputs and evaluated against state-of-the-art methods using structural similarity index (SSIM), mean absolute error (MAE), root mean square error (RMSE), and gamma passing rate (GPR). Results: Visual evaluation of VTDose and discrepancy maps across axial, coronal, and sagittal planes demonstrated enhanced fidelity of STQA to ground truth (GT). Quantitative analysis revealed superior performance of STQA across all evaluation metrics: SSIM=0.978, MAE=0.163, and RMSE=0.416. GPR analysis confirmed clinical applicability, with STQA achieving 95.43%±3.41% agreement with GT (94.63%±2.84%). Conclusions: STQA establishes a paradigm for efficient and accurate VTDose prediction. Its lightweight design, validated through multi-site clinical data, addresses critical limitations in current DL-based PSQA, offering a clinically viable solution to enhance radiotherapy PSQA workflows.

Keywords: deep learning, Swin-Transformer, volumetric modulated arc therapy, Pre-treatmentspecific quality assurance, multimodal

Received: 04 Jun 2025; Accepted: 03 Sep 2025.

Copyright: © 2025 Zhou, Gong, Jian and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yun Zhang, Jiangxi Cancer Hospital, Nanchang, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.