As Artificial Intelligence and Machine Learning (AI/ML) revolutionize science, industry, and society, the need to train and deploy ever-larger models on massive datasets has pushed high-performance computing (HPC) platforms to the forefront of AI-driven discovery. The intersection of AI/ML and HPC enables breakthroughs in fields ranging from natural language processing and computer vision to climate modeling and drug discovery. However, scaling these complex AI workloads on HPC systems introduces technical challenges in algorithmic design, data management, resource utilization, and benchmark evaluation.
Central topics include the development and deployment of scalable AI systems capable of training state-of-the-art deep learning models—including large language models (LLMs) and generative AI architectures—across thousands of compute nodes and accelerators. Efficient algorithms and software frameworks for distributed deep learning, model parallelism, and asynchronous training are crucial for harnessing the raw computational power of modern HPC environments. The convergence of stochastic optimization, AI, and emerging hybrid techniques, as well as the incorporation of quantum computing concepts into AI workflows, further broadens the research scope.
Benchmarking and performance modeling are essential for quantifying system effectiveness, guiding hardware-software co-design, and driving innovation in scalable AI/ML infrastructures. At the same time, integrating HPC resources for AI/ML applications requires careful attention to data movement, memory bandwidth, and fault resilience—especially as AI-powered simulations and analytics become integral to scientific discovery.
This Research Topic invites investigations into the technical frontiers where AI/ML and HPC meet. We aim to foster knowledge exchange on foundations, systems, and applications that push the boundaries of scalable intelligence, enable rapid exploration with large and generative models, and lay the groundwork for future AI/HPC convergence—including emerging paradigms such as quantum-enhanced AI.
We welcome original research articles, reviews, perspectives, and case studies on scalable AI/ML in HPC contexts, including but not limited to: - Architectures and frameworks for scalable AI and deep learning on HPC platforms - Techniques for distributed training of large models, LLMs, and generative AI - Model and data parallelism, asynchronous optimization, and hybrid HPC/AI approaches - Performance analysis, benchmarking, and workload characterization for AI on HPC systems - Stochastic and AI-hybrid methods leveraging HPC for scientific or engineering applications - Integration of quantum computing concepts with AI/ML and LLM workloads on HPC - Workflows and pipelines for end-to-end scalable AI on scientific supercomputers - Large-scale generative models and their applications in science and engineering - Resource management, fault tolerance, and efficient data handling for AI/ML at scale
Submissions that bridge theoretical advances with real-world large-scale deployments, address system-level challenges, or present benchmark-driven insights into current and future AI/HPC integration are especially encouraged
Article types and fees
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Community Case Study
Conceptual Analysis
Data Report
Editorial
FAIR² Data
FAIR² DATA Direct Submission
Hypothesis and Theory
Methods
Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.
Article types
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Community Case Study
Conceptual Analysis
Data Report
Editorial
FAIR² Data
FAIR² DATA Direct Submission
Hypothesis and Theory
Methods
Mini Review
Opinion
Original Research
Perspective
Review
Systematic Review
Technology and Code
Keywords: Scalable AI/ML, Distributed Deep Learning, Benchmarking and Performance Analysis, Quantum-Enhanced AI
Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.