ORIGINAL RESEARCH article

Front. High Perform. Comput.

Sec. Big Data and AI

Volume 3 - 2025 | doi: 10.3389/fhpcp.2025.1520151

This article is part of the Research TopicRecent Trends and Advances for Energy Efficient HPC SystemsView all articles

FPGA-Accelerated SpeckleNN with SNL for Real-time X-ray Single-Particle Imaging

Provisionally accepted
  • SLAC National Accelerator Laboratory, Stanford University, Menlo Park, United States

The final, formatted version of the article will be published soon.

We present the implementation of a specialized version of our previously published unified embedding model, SpeckleNN, for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI), using the SLAC Neural Network Library (SNL) on an FPGA platform. This hardware realization transitions SpeckleNN from a prototypic model into a practical edge solution, optimized for running inference near the detector in high-throughput X-ray free-electron laser (XFEL) facilities, such as those found at the Linac Coherent Light Source (LCLS).To address the resource constraints inherent in FPGAs, we developed a more specialized version of SpeckleNN. The original model, which was designed for broader classification across multiple biological samples, comprised approximately 5.6 million parameters. The new implementation, while reducing the parameter count to 64.6K (a 98.8% reduction), focuses on maintaining the model's essential functionality for real-time operation, achieving an accuracy of 90%. Furthermore, we compressed the latent space from 128 to 50 dimensions. This implementation was demonstrated on the KCU1500 FPGA board, utilizing 71% of available DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W according to the Vivado post-implementation report. The FPGA performed inference on a single image with a latency of 45.015 microseconds at a 200 MHz clock rate.In comparison, running the same inference on an NVIDIA A100 GPU resulted in an average power consumption of approximately 73W and an image processing latency of around 400 microseconds. Our FPGA-accelerated version of SpeckleNN demonstrated significant improvements, achieving an 8.9x speedup and a 7.8x reduction in power consumption compared to the GPU implementation. Key advancements include model specialization and dynamic weight loading through SNL, which eliminates the need for time-consuming FPGA design re-synthesis, allowing fast and continuous deployment of models (re)trained online. These innovations enable real-time adaptive classification and efficient vetoing of speckle patterns, making SpeckleNN more suited for deployment in XFEL facilities. This implementation has the potential to significantly accelerate SPI experiments and enhance adaptability to evolving experimental conditions.

Keywords: FPGA, machine learning, XFEL, SPI, SpeckleNN, gpu, LCLS

Received: 30 Oct 2024; Accepted: 19 May 2025.

Copyright: © 2025 Dave, Wang, Russell, Thayer and Herbst. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Abhilasha Dave, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.