AUTHOR=Dave Abhilasha , Wang Cong , Russell James , Herbst Ryan , Thayer Jana 

TITLE=FPGA-accelerated SpeckleNN with SNL for real-time X-ray single-particle imaging

JOURNAL=Frontiers in High Performance Computing

VOLUME=Volume 3 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/high-performance-computing/articles/10.3389/fhpcp.2025.1520151

DOI=10.3389/fhpcp.2025.1520151

ISSN=2813-7337

ABSTRACT=We present the implementation of a specialized version of our previously published unified embedding model, SpeckleNN, for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI), using the SLAC Neural Network Library (SNL) on an FPGA platform. This hardware realization transitions SpeckleNN from a prototypic model into a practical edge solution, optimized for running inference near the detector in high-throughput X-ray free-electron laser (XFEL) facilities, such as those found at the Linac Coherent Light Source (LCLS). To address the resource constraints inherent in FPGAs, we developed a more specialized version of SpeckleNN. The original model, which was designed for broader classification across multiple biological samples, comprised ~5.6 million parameters. The new implementation, while reducing the parameter count to 64.6K (a 98.8% reduction), focuses on maintaining the model's essential functionality for real-time operation, achieving an accuracy of 90%. Furthermore, we compressed the latent space from 128 to 50 dimensions. This implementation was demonstrated on the KCU1500 FPGA board, utilizing 71% of available DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W according to the Vivado post-implementation report. The FPGA performed inference on a single image with a latency of 45.015 microseconds at a 200 MHz clock rate. In comparison, running the same inference on an NVIDIA A100 GPU resulted in an average power consumption of ~73W and an image processing latency of around 400 microseconds. Our FPGA-accelerated version of SpeckleNN demonstrated significant improvements, achieving an 8.9 × speedup and a 7.8 × reduction in power consumption compared to the GPU implementation. Key advancements include model specialization and dynamic weight loading through SNL, which eliminates the need for time-consuming FPGA design re-synthesis, allowing fast and continuous deployment of models (re)trained online. These innovations enable real-time adaptive classification and efficient vetoing of speckle patterns, making SpeckleNN more suited for deployment in XFEL facilities. This implementation has the potential to significantly accelerate SPI experiments and enhance adaptability to evolving experimental conditions.