<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in High Performance Computing | HPC Applications section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/high-performance-computing/sections/hpc-applications</link>
        <description>RSS Feed for HPC Applications section in the Frontiers in High Performance Computing journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-04-04T14:35:17.508+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1638203</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1638203</link>
        <title><![CDATA[Toward a persistent event-streaming system for high-performance computing applications]]></title>
        <pubdate>2025-09-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Matthieu Dorier</author><author>Amal Gueroudji</author><author>Valérie Hayot-Sasson</author><author>Hai Duc Nguyen</author><author>Seth Ockerman</author><author>Renan Souza</author><author>Tekin Bicer</author><author>Haochen Pan</author><author>Philip Carns</author><author>Kyle Chard</author><author>Ryan Chard</author><author>Maxime Gonthier</author><author>Eliu Huerta</author><author>Ben Lenard</author><author>Bogdan Nicolae</author><author>Parth Patel</author><author>Justin Wozniak</author><author>Ian Foster</author><author>Nageswara S. Rao</author><author>Robert B. Ross</author>
        <description><![CDATA[High-performance computing (HPC) applications have traditionally relied on parallel file systems and file transfer services to manage data movement and storage. Alternative approaches have been proposed that use direct communications between application components, trading persistence and fault tolerance for speed. Event-driven architectures, as popularized in enterprise contexts, present a compelling middle ground, avoiding the performance cost and API constraints of parallel file systems while retaining persistence and offering impedance matching between application components. However, adapting streaming frameworks to HPC workloads requires addressing challenges unique to HPC systems. This paper investigates the potential for a streaming framework designed for HPC infrastructures and use cases. We introduce Mofka, a persistent event-streaming framework designed specifically for HPC environments. Mofka combines the capabilities of a traditional streaming service with optimizations tailored to the HPC context, such as support for massively multicore nodes, efficient scaling for large producer-consumer workflows, RDMA-enabled high-performance network communications, specialized network fabrics with multiple links per node, and efficient handling of large scientific data payloads. Built using the Mochi suite of HPC data service components, Mofka provides a lightweight, modular, and high-performance solution for persistent streaming in HPC systems. We present the architecture of Mofka and evaluate its performance against Kafka and Redpanda using benchmarks on diverse platforms, including Argonne's Polaris and Oak Ridge's Frontier supercomputers, showing up to 8× improvement in throughput in some scenarios. We then demonstrate its utility in several real-world applications: a tomographic reconstruction pipeline, a workflow for the discovery of metal-organic frameworks for carbon capture, and the instrumentation of Dask workflows for provenance tracking and performance analysis.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fhpcp.2024.1458674</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fhpcp.2024.1458674</link>
        <title><![CDATA[Addressing GPU memory limitations for Graph Neural Networks in High-Energy Physics applications]]></title>
        <pubdate>2024-09-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Claire Songhyun Lee</author><author>V. Hewes</author><author>Giuseppe Cerati</author><author>Kewei Wang</author><author>Adam Aurisano</author><author>Ankit Agrawal</author><author>Alok Choudhary</author><author>Wei-Keng Liao</author>
        <description><![CDATA[IntroductionReconstructing low-level particle tracks in neutrino physics can address some of the most fundamental questions about the universe. However, processing petabytes of raw data using deep learning techniques poses a challenging problem in the field of High Energy Physics (HEP). In the Exa.TrkX Project, an illustrative HEP application, preprocessed simulation data is fed into a state-of-art Graph Neural Network (GNN) model, accelerated by GPUs. However, limited GPU memory often leads to Out-of-Memory (OOM) exceptions during training, due to the large size of models and datasets. This problem is exacerbated when deploying models on High-Performance Computing (HPC) systems designed for large-scale applications.MethodsWe observe a high workload imbalance issue during GNN model training caused by the irregular sizes of input graph samples in HEP datasets, contributing to OOM exceptions. We aim to scale GNNs on HPC systems, by prioritizing workload balance in graph inputs while maintaining model accuracy. Our paper introduces diverse balancing strategies aimed at decreasing the maximum GPU memory footprint and avoiding the OOM exception, across various datasets.ResultsOur experiments showcase memory reduction of up to 32.14% compared to the baseline. We also demonstrate the proposed strategies can avoid OOM in application. Additionally, we create a distributed multi-GPU implementation using these samplers to demonstrate the scalability of these techniques on the HEP dataset.DiscussionBy assessing the performance of these strategies as data loading samplers across multiple datasets, we can gauge their effectiveness in both single-GPU and distributed environments. Our experiments, conducted on datasets of varying sizes and across multiple GPUs, broaden the applicability of our work to various GNN applications that handle input datasets with irregular graph sizes.]]></description>
      </item>
      </channel>
    </rss>