<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in High Performance Computing | Big Data and AI section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/high-performance-computing/sections/big-data-and-ai</link>
        <description>RSS Feed for Big Data and AI section in the Frontiers in High Performance Computing journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-02T07:18:03.588+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2026.1778471</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2026.1778471</link>
        <title><![CDATA[Scalable foundation models for numerical simulations on HPC platforms]]></title>
        <pubdate>2026-03-26T00:00:00Z</pubdate>
        <category>Opinion</category>
        <author>Dali Wang</author><author>Qian Gong</author><author>Zirui Liu</author><author>Xiao Wang</author><author>Qinglei Cao</author><author>Scott Klasky</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1638924</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1638924</link>
        <title><![CDATA[Improving I/O phase predictions in FTIO using hybrid wavelet-Fourier analysis]]></title>
        <pubdate>2026-02-04T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Ahmad Tarraf</author><author>Felix Wolf</author>
        <description><![CDATA[With the growing complexity of I/O software stacks and the rise of data-intensive workloads, optimizing I/O performance is essential for enhancing overall system performance on HPC clusters. While many sophisticated I/O management approaches exist that try to alleviate I/O contention, they often rely on models that predict the future I/O behavior of applications. Yet, these models are often created from past execution runs and can be error-prone due to I/O variability. In this work, we propose an enhancement to an existing tool that leverages frequency-based techniques to characterize I/O phase. We explore methods to improve prediction accuracy by incorporating multiple frequency components. Furthermore, by coupling the wavelet transformation with the Fourier transformation, we enhance the precision of our predictions while maintaining a compact and efficient behavioral characterization. We demonstrate our approach using a deep learning benchmark executed on a production cluster.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1520151</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1520151</link>
        <title><![CDATA[FPGA-accelerated SpeckleNN with SNL for real-time X-ray single-particle imaging]]></title>
        <pubdate>2025-06-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Abhilasha Dave</author><author>Cong Wang</author><author>James Russell</author><author>Ryan Herbst</author><author>Jana Thayer</author>
        <description><![CDATA[We present the implementation of a specialized version of our previously published unified embedding model, SpeckleNN, for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI), using the SLAC Neural Network Library (SNL) on an FPGA platform. This hardware realization transitions SpeckleNN from a prototypic model into a practical edge solution, optimized for running inference near the detector in high-throughput X-ray free-electron laser (XFEL) facilities, such as those found at the Linac Coherent Light Source (LCLS). To address the resource constraints inherent in FPGAs, we developed a more specialized version of SpeckleNN. The original model, which was designed for broader classification across multiple biological samples, comprised ~5.6 million parameters. The new implementation, while reducing the parameter count to 64.6K (a 98.8% reduction), focuses on maintaining the model's essential functionality for real-time operation, achieving an accuracy of 90%. Furthermore, we compressed the latent space from 128 to 50 dimensions. This implementation was demonstrated on the KCU1500 FPGA board, utilizing 71% of available DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W according to the Vivado post-implementation report. The FPGA performed inference on a single image with a latency of 45.015 microseconds at a 200 MHz clock rate. In comparison, running the same inference on an NVIDIA A100 GPU resulted in an average power consumption of ~73W and an image processing latency of around 400 microseconds. Our FPGA-accelerated version of SpeckleNN demonstrated significant improvements, achieving an 8.9 × speedup and a 7.8 × reduction in power consumption compared to the GPU implementation. Key advancements include model specialization and dynamic weight loading through SNL, which eliminates the need for time-consuming FPGA design re-synthesis, allowing fast and continuous deployment of models (re)trained online. These innovations enable real-time adaptive classification and efficient vetoing of speckle patterns, making SpeckleNN more suited for deployment in XFEL facilities. This implementation has the potential to significantly accelerate SPI experiments and enhance adaptability to evolving experimental conditions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1550855</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1550855</link>
        <title><![CDATA[Resilient execution of distributed X-ray image analysis workflows]]></title>
        <pubdate>2025-06-06T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Hai Duc Nguyen</author><author>Tekin Bicer</author><author>Bogdan Nicolae</author><author>Rajkumar Kettimuthu</author><author>E. A. Huerta</author><author>Ian T. Foster</author>
        <description><![CDATA[Long-running scientific workflows, such as tomographic data analysis pipelines, are prone to a variety of failures, including hardware and network disruptions, as well as software errors. These failures can substantially degrade performance and increase turnaround times, particularly in large-scale, geographically distributed, and time-sensitive environments like synchrotron radiation facilities. In this work, we propose and evaluate resilience strategies aimed at mitigating the impact of failures in tomographic reconstruction workflows. Specifically, we introduce an asynchronous, non-blocking checkpointing mechanism and a dynamic load redistribution technique with lazy recovery, designed to enhance workflow reliability and minimize failure-induced overheads. These approaches facilitate progress preservation, balanced load distribution, and efficient recovery in error-prone environments. To evaluate their effectiveness, we implement a 3D tomographic reconstruction pipeline and deploy it across Argonne's leadership computing infrastructure and synchrotron facilities. Our results demonstrate that the proposed resilience techniques significantly reduce failure impact—by up to 500× —while maintaining negligible overhead (<3%).]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1537080</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1537080</link>
        <title><![CDATA[A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources]]></title>
        <pubdate>2025-05-30T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Songyuan Tang</author><author>Tekin Bicer</author><author>Kamel Fezzaa</author><author>Samuel Clark</author>
        <description><![CDATA[IntroductionHigh-speed x-ray imaging experiments at synchrotron radiation facilities enable the acquisition of spatiotemporal measurements, reaching millions of frames per second. These high data acquisition rates are often prone to noisy measurements, or in the case of slower (but less noisy) rates, the loss of scientifically significant phenomena.MethodsWe develop a Shifted Window (SWIN)-based vision transformer to reconstruct high-resolution x-ray image sequences with high fidelity and at a high frame rate and evaluate the underlying algorithmic framework on a high-performance computing (HPC) system. We characterize model parameters that could affect the training scalability, quality of the reconstruction, and running time during the model inference stage, such as the batch size, number of input frames to the model, their composition in terms of low and high-resolution frames, and the model size and architecture.ResultsWith 3 subsequent low resolution (LR) frames and another 2 high resolution (HR) frames differing in the spatial and temporal resolutions by factors of 4 and 20, respectively, the proposed algorithm achieved an average peak signal-to-noise ratio of 37.40 dB and 35.60 dB.DiscussionFurther, the model was trained on the Argonne Leadership Computing Facility's Polaris HPC system using 40 Nvidia A100 GPUs, speeding up the end-to-end training time by about ~10 × compared to the training with beamline-local computing resources.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536501</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536501</link>
        <title><![CDATA[A definition and taxonomy of digital twins: case studies with machine learning and scientific applications]]></title>
        <pubdate>2025-03-13T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Adam Weingram</author><author>Carolyn Cui</author><author>Stephanie Lin</author><author>Samuel Munoz</author><author>Toby Jacob</author><author>Joshua Viers</author><author>Xiaoyi Lu</author>
        <description><![CDATA[As next-generation scientific instruments and simulations generate ever larger datasets, there is a growing need for high-performance computing (HPC) techniques that can provide timely and accurate analysis. With artificial intelligence (AI) and hardware breakthroughs at the forefront in recent years, interest in using this technology to perform decision-making tasks with continuously evolving real-world datasets has increased. Digital twinning is one method in which virtual replicas of real-world objects are modeled, updated, and interpreted to perform such tasks. However, the interface between AI techniques, digital twins (DT), and HPC technologies has yet to be thoroughly investigated despite the natural synergies between them. This paper explores the interface between digital twins, scientific computing, and machine learning (ML) by presenting a consistent definition for the digital twin, performing a systematic analysis of the literature to build a taxonomy of ML-enhanced digital twins, and discussing case studies from various scientific domains. We identify several promising future research directions, including hybrid assimilation frameworks and physics-informed techniques for improved accuracy. Through this comprehensive analysis, we aim to highlight both the current state-of-the-art and critical paths forward in this rapidly evolving field.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471</link>
        <title><![CDATA[End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment]]></title>
        <pubdate>2025-03-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Cong Wang</author><author>Valerio Mariani</author><author>Frédéric Poitevin</author><author>Matthew Avaylon</author><author>Jana Thayer</author>
        <description><![CDATA[X-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 million images per second), traditional peak finding algorithms that require manual parameter tuning or exhaustive grid searches across multiple experiments become increasingly impractical. While deep learning approaches offer promising solutions, their deployment in high-throughput environments presents significant challenges in automated dataset labeling, model scalability, edge deployment efficiency, and distributed inference capabilities. We present an end-to-end deep learning pipeline with three key components: (1) a data engine that combines traditional algorithms with our peak matching algorithm to generate high-quality training data at scale, (2) a modular architecture that scales from a few million to hundreds of million parameters, enabling us to train large expert-level models offline while deploying smaller, distilled models at the edge, and (3) a decoupled producer-consumer architecture that separates specialized data source layer from model inference, enabling flexible deployment across diverse computing environments. Using this integrated approach, our pipeline achieves accuracy comparable to traditional methods tuned by human experts while eliminating the need for experiment-specific parameter tuning. Although current throughput requires optimization for MHz facilities, our system's scalable architecture and demonstrated model compression capabilities provide a foundation for future high-throughput XFEL deployments.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1520207</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1520207</link>
        <title><![CDATA[Energy-aware operation of HPC systems in Germany]]></title>
        <pubdate>2025-02-19T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Estela Suarez</author><author>Hendryk Bockelmann</author><author>Norbert Eicker</author><author>Jan Eitzinger</author><author>Salem El Sayed</author><author>Thomas Fieseler</author><author>Martin Frank</author><author>Peter Frech</author><author>Pay Giesselmann</author><author>Daniel Hackenberg</author><author>Georg Hager</author><author>Andreas Herten</author><author>Thomas Ilsche</author><author>Bastian Koller</author><author>Erwin Laure</author><author>Cristina Manzano</author><author>Sebastian Oeste</author><author>Michael Ott</author><author>Klaus Reuter</author><author>Ralf Schneider</author><author>Kay Thust</author><author>Benedikt von St. Vieth</author>
        <description><![CDATA[High Performance Computing (HPC) systems are among the most energy-intensive scientific facilities, with electric power consumption reaching and often exceeding 20 Megawatts per installation. Unlike other major scientific infrastructures such as particle accelerators or high-intensity light sources, which are few around the world, the number and size of supercomputers are continuously increasing. Even if every new system generation is more energy efficient than the previous one, the overall growth in size of the HPC infrastructure, driven by a rising demand for computational capacity across all scientific disciplines, and especially by Artificial Intelligence (AI) workloads, rapidly drives up the energy demand. This challenge is particularly significant for HPC centers in Germany, where high electricity costs, stringent national energy policies, and a strong commitment to environmental sustainability are key factors. This paper describes various state-of-the-art strategies and innovations employed to enhance the energy efficiency of HPC systems within the national context. Case studies from leading German HPC facilities illustrate the implementation of novel heterogeneous hardware architectures, advanced monitoring infrastructures, high-temperature cooling solutions, energy-aware scheduling, and dynamic power management, among other optimisations. By reviewing best practices and ongoing research, this paper aims to share valuable insight with the global HPC community, motivating the pursuit of more sustainable and energy-efficient HPC architectures and operations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1547340</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1547340</link>
        <title><![CDATA[Hyperspectral segmentation of plants in fabricated ecosystems]]></title>
        <pubdate>2025-02-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Petrus H. Zwart</author><author>Peter Andeer</author><author>Trent R. Northen</author>
        <description><![CDATA[Hyperspectral imaging provides a powerful tool for analyzing above-ground plant characteristics in fabricated ecosystems, offering rich spectral information across diverse wavelengths. This study presents an efficient workflow for hyperspectral data segmentation and subsequent data analytics, minimizing the need for user annotation through the use of ensembles of sparse mixed scale convolution neural networks. The segmentation process leverages the diversity of ensembles to achieve high accuracy with minimal labeled data, reducing labor-intensive annotation efforts. To further enhance robustness, we incorporate image alignment techniques to address spatial variability in the dataset. Downstream analysis focuses on using the segmented data for processing spectral data, enabling monitoring of plant health. This approach provides a scalable solution for spectral segmentation, and facilitates actionable insights into plant conditions in complex, controlled environments. Our results demonstrate the utility of combining advanced machine learning techniques with hyperspectral analytics for high-throughput plant monitoring.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2024.1384619</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2024.1384619</link>
        <title><![CDATA[Supercharging distributed computing environments for high-performance data engineering]]></title>
        <pubdate>2024-07-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Niranda Perera</author><author>Arup Kumar Sarker</author><author>Kaiying Shan</author><author>Alex Fetea</author><author>Supun Kamburugamuve</author><author>Thejaka Amila Kanewala</author><author>Chathura Widanage</author><author>Mills Staylor</author><author>Tianle Zhong</author><author>Vibhatha Abeykoon</author><author>Gregor von Laszewski</author><author>Geoffrey Fox</author>
        <description><![CDATA[The data engineering and data science community has embraced the idea of using Python and R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these frameworks are now ever more important in order to process terabytes of data. They can easily exceed the capabilities of a single machine but also demand significant developer time and effort due to their convenience and ability to manipulate data with high-level abstractions that can be optimized. Therefore it is essential to design scalable dataframe solutions. There have been multiple efforts to be integrated into the most efficient fashion to tackle this problem, the most notable being the dataframe systems developed using distributed computing environments such as Dask and Ray. Even though Dask and Ray's distributed computing features look very promising, we perceive that the Dask Dataframes and Ray Datasets still have room for optimization In this paper, we present CylonFlow, an alternative distributed dataframe execution methodology that enables state-of-the-art performance and scalability on the same Dask and Ray infrastructure (supercharging them!). To achieve this, we integrate a high-performance dataframe system Cylon, which was originally based on an entirely different execution paradigm, into Dask and Ray. Our experiments show that on a pipeline of dataframe operators, CylonFlow achieves 30 × more distributed performance than Dask Dataframes. Interestingly, it also enables superior sequential performance due to leveraging the native C++ execution of Cylon. We believe the performance of Cylon in conjunction with CylonFlow extends beyond the data engineering domain and can be used to consolidate high-performance computing and distributed computing ecosystems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2023.1233877</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2023.1233877</link>
        <title><![CDATA[Opportunities for enhancing MLCommons efforts while leveraging insights from educational MLCommons earthquake benchmarks efforts]]></title>
        <pubdate>2023-10-23T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Gregor von Laszewski</author><author>J. P. Fleischer</author><author>Robert Knuuti</author><author>Geoffrey C. Fox</author><author>Jake Kolessar</author><author>Thomas S. Butler</author><author>Judy Fox</author>
        <description><![CDATA[MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need for benchmark carpentry for scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection called cloudmesh with its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).]]></description>
      </item>
      </channel>
    </rss>