Scalable AI/ML on High-Performance Computing Platforms

  • 361

    Total views and downloads

About this Research Topic

Submission deadlines

  1. Manuscript Summary Submission Deadline 30 March 2026 | Manuscript Submission Deadline 30 September 2026

  2. This Research Topic is currently accepting articles.

Background

As Artificial Intelligence and Machine Learning (AI/ML) revolutionize science, industry, and society, the need to train and deploy ever-larger models on massive datasets has pushed high-performance computing (HPC) platforms to the forefront of AI-driven discovery. The intersection of AI/ML and HPC enables breakthroughs in fields ranging from natural language processing and computer vision to climate modeling and drug discovery. However, scaling these complex AI workloads on HPC systems introduces technical challenges in algorithmic design, data management, resource utilization, and benchmark evaluation.

Central topics include the development and deployment of scalable AI systems capable of training state-of-the-art deep learning models—including large language models (LLMs) and generative AI architectures—across thousands of compute nodes and accelerators. Efficient algorithms and software frameworks for distributed deep learning, model parallelism, and asynchronous training are crucial for harnessing the raw computational power of modern HPC environments. The convergence of stochastic optimization, AI, and emerging hybrid techniques, as well as the incorporation of quantum computing concepts into AI workflows, further broadens the research scope.

Benchmarking and performance modeling are essential for quantifying system effectiveness, guiding hardware-software co-design, and driving innovation in scalable AI/ML infrastructures. At the same time, integrating HPC resources for AI/ML applications requires careful attention to data movement, memory bandwidth, and fault resilience—especially as AI-powered simulations and analytics become integral to scientific discovery.

This Research Topic invites investigations into the technical frontiers where AI/ML and HPC meet. We aim to foster knowledge exchange on foundations, systems, and applications that push the boundaries of scalable intelligence, enable rapid exploration with large and generative models, and lay the groundwork for future AI/HPC convergence—including emerging paradigms such as quantum-enhanced AI.

We welcome original research articles, reviews, perspectives, and case studies on scalable AI/ML in HPC contexts, including but not limited to:
- Architectures and frameworks for scalable AI and deep learning on HPC platforms
- Techniques for distributed training of large models, LLMs, and generative AI
- Model and data parallelism, asynchronous optimization, and hybrid HPC/AI approaches
- Performance analysis, benchmarking, and workload characterization for AI on HPC systems
- Stochastic and AI-hybrid methods leveraging HPC for scientific or engineering applications
- Integration of quantum computing concepts with AI/ML and LLM workloads on HPC
- Workflows and pipelines for end-to-end scalable AI on scientific supercomputers
- Large-scale generative models and their applications in science and engineering
- Resource management, fault tolerance, and efficient data handling for AI/ML at scale

Submissions that bridge theoretical advances with real-world large-scale deployments, address system-level challenges, or present benchmark-driven insights into current and future AI/HPC integration are especially encouraged

Research Topic Research topic image

Article types and fees

This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:

  • Brief Research Report
  • Community Case Study
  • Conceptual Analysis
  • Data Report
  • Editorial
  • FAIR² Data
  • FAIR² DATA Direct Submission
  • Hypothesis and Theory
  • Methods

Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.

Keywords: Scalable AI/ML, Distributed Deep Learning, Benchmarking and Performance Analysis, Quantum-Enhanced AI

Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic editors

Manuscripts can be submitted to this Research Topic via the main journal or any other participating journal.

Impact

  • 361Topic views
View impact