ORIGINAL RESEARCH article
Front. High Perform. Comput.
Sec. Big Data and AI
Volume 3 - 2025 | doi: 10.3389/fhpcp.2025.1537080
This article is part of the Research TopicAI/ML-Enhanced High-Performance Computing Techniques and Runtime Systems for Scientific Image and Dataset AnalysisView all 4 articles
A SWIN-based Vision Transformer for High-fidelity and High-speed Imaging Experiments at Light Sources
Provisionally accepted- Argonne National Laboratory (DOE), Lemont, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
High-speed x-ray imaging experiments at synchrotron radiation facilities enable the acquisition of spatiotemporal measurements, reaching millions of frames per second. These high data acquisition rates are often prone to noisy measurements, or in the case of slower (but less noisy) rates, the loss of scientifically significant phenomena. We develop a Shifted Window (SWIN)-based vision transformer to reconstruct high-resolution x-ray image sequences with high fidelity and at a high frame rate and evaluate the underlying algorithmic framework on a high-performance computing (HPC) system. We characterize model parameters that could affect the training scalability, quality of the reconstruction, and running time during the model inference stage, such as the batch size, number of input frames to the model, their composition in terms of low and high-resolution frames, and the model size and architecture. With 3 subsequent low resolution (LR) frames and another 2 high resolution (HR) frames differing in the spatial and temporal resolutions by factors of 4 and 20, respectively, the proposed algorithm achieved an average peak signal-to-noise ratio of 37.40 dB and 35.60 dB. Further, the model was trained on the Argonne Leadership Computing Facility's Polaris HPC system using 40 Nvidia A100 GPUs, speeding up the end-to-end training time by about ∼10× compared to the training with beamline-local computing resources.
Keywords: high-speed imaging, Spatio-temporal fusion, vision Transformer, Distributed training, full-field x-ray radiography
Received: 29 Nov 2024; Accepted: 24 Apr 2025.
Copyright: © 2025 Tang, Bicer, Fezzaa and Clark. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Songyuan Tang, Argonne National Laboratory (DOE), Lemont, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.