AUTHOR=Tang Songyuan , Bicer Tekin , Fezzaa Kamel , Clark Samuel 

TITLE=A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources

JOURNAL=Frontiers in High Performance Computing

VOLUME=Volume 3 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/high-performance-computing/articles/10.3389/fhpcp.2025.1537080

DOI=10.3389/fhpcp.2025.1537080

ISSN=2813-7337

ABSTRACT=IntroductionHigh-speed x-ray imaging experiments at synchrotron radiation facilities enable the acquisition of spatiotemporal measurements, reaching millions of frames per second. These high data acquisition rates are often prone to noisy measurements, or in the case of slower (but less noisy) rates, the loss of scientifically significant phenomena.MethodsWe develop a Shifted Window (SWIN)-based vision transformer to reconstruct high-resolution x-ray image sequences with high fidelity and at a high frame rate and evaluate the underlying algorithmic framework on a high-performance computing (HPC) system. We characterize model parameters that could affect the training scalability, quality of the reconstruction, and running time during the model inference stage, such as the batch size, number of input frames to the model, their composition in terms of low and high-resolution frames, and the model size and architecture.ResultsWith 3 subsequent low resolution (LR) frames and another 2 high resolution (HR) frames differing in the spatial and temporal resolutions by factors of 4 and 20, respectively, the proposed algorithm achieved an average peak signal-to-noise ratio of 37.40 dB and 35.60 dB.DiscussionFurther, the model was trained on the Argonne Leadership Computing Facility's Polaris HPC system using 40 Nvidia A100 GPUs, speeding up the end-to-end training time by about ~10 × compared to the training with beamline-local computing resources.