The need for adoption of neural HPC (NeuHPC) in space sciences

A major challenge facing scientists using conventional approaches for solving PDEs is the simulation of extreme multi-scale problems. While exascale computing will enable simulations of larger systems, the extreme multiscale nature of many problems requires new techniques. Deep learning techniques have disrupted several domains, such as computer vision, language (e.g., ChatGPT), and computational biology, leading to breakthrough advances. Similarly, the adaptation of these techniques for scientific computing has led to a new and rapidly advancing branch of High-Performance Computing (HPC), which we call neural-HPC (NeuHPC). Proof of concept studies in domains such as computational fluid dynamics and material science have demonstrated advantages in both efficiency and accuracy compared to conventional solvers. However, NeuHPC is yet to be embraced in plasma simulations. This is partly due to general lack of awareness of NeuHPC in the space physics community as well as the fact that most plasma physicists do not have training in artificial intelligence and cannot easily adapt these new techniques to their problems. As we explain below, there is a solution to this. We consider NeuHPC a critical paradigm for knowledge discovery in space sciences and urgently advocate for its adoption by both researchers as well as funding agencies. Here, we provide an overview of NeuHPC and specific ways that it can overcome existing computational challenges and propose a roadmap for future direction.


Introduction
Over the years there have been many techniques trumpeted as having great disruptive potential, which were eventually found to have muted applicability. It is rare that a technology comes along that is truly disruptive and is adopted across wide areas of science and engineering. Modern artificial intelligence (AI) is a rare technology where those claims are not overblown. In what follows, we will use the terms "machine learning" and "artificial intelligence" interchangeably.
One of the authors (HK) was an early advocate of the use of AI and computer vision in space sciences with applications in event detection/classification (e.g., Karimabadi et al., 2009), knowledge discovery in simulations and in-situ-visualization (e.g., Karimabadi et al., 2011a;Karimabadi et al., 2011c;2012;2013a), and derivation of equations from data (Karimabadi et al., 2007). The impetus for this effort was based on the vision that as our ability to generate data continues to grow exponentially, data driven science would become an indispensable field of scientific knowledge discovery. This vision has since come to pass, but the rate and scale with which this has happened has exceeded all expectations.
Despite the promising results and utility of those early works, including applications of simple neural nets to spacecraft data (e.g., Newell et al., 1991;Boberg et al., 2000), the techniques were not widely adopted. At the time, the field of AI was in a nascent stage in which artificial neural networks (ANNs) had been largely abandoned in favor of "lighter weight" techniques such as support vector machines. These algorithms had limited learning capacity, and relied heavily on hand engineered features, requiring a top-down agent to act as a "God outside the machine" to tell the models which attributes of the world to focus on, rather than allowing the algorithms to learn what is and is not relevant bottom-up, from the data and the model's objective function. Another factor that limited their utility was their lack of universality. One had to devise special algorithms for problems in computer vision, speech, and audio, among others.
Everything changed in 2012 when AlexNet, a GPU implemented convolutional network (CNN), won ImageNet's image classification competition by a wide margin. This seemingly overnight success was built upon 7 decades of slowly evolving research in deep learning (see the Supplementary Material for definition of deep learning). The field had to wait for the accessibility of large data sets and the development of GPUs, a widely available relatively inexpensive device with a special kind of massively parallel computational power, before its potential could be realized.
Since AlexNet, advances in AI have fueled adoption of neural algorithms across a myriad of industries and sciences. The first applications of AI in space sciences have been in analysis of spacecraft data (e.g., Camporeale, 2019;Breuillard et al., 2020;Li et al., 2020;Hu et al., 2022) where off-the-shelf AI techniques can be readily applied. However, application of AI in NeuHPC offers a greater opportunity with the potential to qualitatively change the field. The remainder of this article focusses on NeuHPC.
Partial differential equations (PDEs) often lead to extreme multiscale behavior which makes the resolution of all scales in one simulation impossible. While exascale computing will enable simulations of larger systems (e.g., Xiao et al., 2021;Ji et al., 2022), the extreme multiscale nature of many problems in space sciences requires new techniques. In the global magnetosphere, there are 10 7 degrees of separation in spatial and temporal scales, putting it beyond the conventional techniques even at exascale. Also, round-off error in time-stepped solvers is severely limiting. Further, exascale simulations present other challenges, from knowledge discovery to the massive datasets, to efficient checkpointing and data management. We consider AI as a core technology and its adoption as critical for meaningful advancement in scientific computing. This belief is based on unique features of neural nets and the rapid and promising advancements of their use in scientific computation. Table 1 summarizes key features of ANNs that make them especially suitable for overcoming the current HPC challenges by enabling new capabilities not possible with conventional approaches. While in-depth discussion of each topic is beyond the scope of this paper, relevant references are provided for interested reader to learn more. First, automated differentiation (see the Supplementary Material for more details) enables accurate computation of derivatives of arbitrary order (spatial and temporal) to working precision. This mesh-free operation, resulting in mesh invariant solutions, is advantageous over numerical differentiation methods (e.g., finite differencing) which suffer from discretization error with increasing cost and error in higher derivatives. As an example, one can solve the heat equation zu/zt = Δ(u) where the function u is represented as a neural net and the spatial and temporal derivatives are calculated using the chain rule.
Second, the universal approximation theorem (Hornik et al., 1989) implies that ANNs can accurately approximate any function. In contrast to fixed-shaped approximators that have no internal parameters (e.g., polynomials), neural networks consist of parameterized functions, allowing them to take on a variety of different shapes.
Less known but as important is the universal approximation theorem for operators (Chen and Chen, 1995) which states that a neural net with a single hidden layer can accurately approximate any non-linear continuous operator (Lu et al., 2021). The operator can be explicit such as derivatives (e.g., Laplacian), integrals (e.g., Laplace transform) or implicit such as solution operators of a PDE. This offers a unique capability where the network can learn the solution to an entire family of PDEs rather than an instance of a PDE, as in the conventional approaches. Once the model is trained, inference to obtain solutions for different parameters of the PDE is very fast. This can lead to orders of magnitude speedup and enables efficient exploration of the solution space and ensemble modeling which may be prohibitively expensive otherwise.
These capabilities open the door to zero-shot learning, i.e., the operator can be trained on a lower resolution and evaluated at a higher resolution, without seeing any higher resolution data. To this end, Li et al. (2021a) developed the first network (FNO) with zero-shot learning that successfully learns the resolution-invariant solution operator for the family of Navier-Stokes equations in the turbulent regime. This feature of transferring the solution between the meshes works well on both the spatial and temporal domain . We refer the reader to Kim et al. (2021) for discussion and differences of super-resolution reconstruction for paired versus unpaired data. Another useful feature of AI-based solvers is transfer learning. For example, Li et al. (2021b) used a pre-trained model on the Kolmogorov flow to transfer it to different Reynolds numbers.
A wide variety of solutions have been proposed to leverage ANNs in computations across domains such as CFD, material science, and weather forecasting. A detailed review is beyond the scope of the present work. Our goal is simply to bring awareness to promising advances in NeuHPC and provide a starting point for further exploration. Although our focus is NeuHPC, techniques such as system identification can also be applied to spacecraft data either in isolation or in combination with simulation data.

Proof of concepts and beyond 2.1 Quantitative data analysis
We demonstrate the utility of AI for analysis of simulation data by addressing the challenging problem of automated detection and measurement of scales of individual current sheets formed in plasma turbulence. Previous works, limited to two snapshots of MHD simulations, were based on phenomenological approach, utilizing insights on MHD physics (e.g., Uritsky et al., 2010;Zhdankin et al., 2013). We time-boxed ourselves to 2 days to see whether we can significantly reduce time-to-solution using existing AI techniques. We used the magnitude of current density (507 timeslices) from simulations of Karimabadi et al. (2013b). Figure 1 shows the results for one time slice, where lengths of only a few current sheets are displayed. Visual comparison with the raw image of the current sheets shows generally good agreement and demonstrates the utility of AI. Details including the code and videos of results over 507 slices are provided in the Supplementary Material.

Derivation of equations and operators from data
Deriving closed form, compact and understandable analytical equations from data is at the core of scientific discovery. In the following, we provide an overview of the recent ML techniques aimed at turning machine models to scientific knowledge. Such knowledge discovery can come in different forms: a) derivation of algebraic equation (e.g., law of gravity), b) derivation of ODE or PDE (e.g., the diffusion equation), and c) the derivation of the unknown parameters of a known equation (the so-called inverse problem).

Algebraic equations
Symbolic regression is an ML technique that searches the space of mathematical expressions to find the best data-feeding model. The goal is to strike a balance between model accuracy and model complexity. The common benchmark to compare the efficacy of different models is the Symbolic Regression database (https://space. mit.edu/home/tegmark/aifeynman.html) which contains 120 symbolic regression mysteries and answers. Most (100)  Symbolic regression has been commonly carried out using generic programming and evolutionary algorithms, and there are several open source and commercially available libraries such as Eureqa (Schmidt and Lipson, 2009). Their main drawback is that, due to the combinatorial nature of the problem, genetic programming does not scale well to high dimensional systems. In contrast, ANNs are highly efficient at learning in high-dimensional space, and this has led to a flurry of activity in their adaptation to address the combinatorial challenge of symbolic regression. The blackbox nature of neural nets seems at first to be at odds with the goals of symbolic regression. Various approaches differ in how they overcome this issue and have Frontiers in Astronomy and Space Sciences frontiersin.org been of two general varieties. In one approach, neural nets are used as an aid to reduce the search space of genetic programming techniques (e.g., Cranmer et al., 2020;Petersen et al., 2020;. In AI-Feynman , the neural nets are used to find hidden simplicity such as symmetry in the data. Using this approach, they were able to derive all 100 of Feynman equations versus 71 using previous techniques. The second class of solutions adapt the architecture of the neural nets for symbolic regression. The two key modifications required are to enable ANN to have access to a vocabulary of functions/primitives and to impose sparsity to reduce model complexity while maintaining high accuracy. Martius & Lampert (2016) proposed a simple feedforward ANN where standard activation functions are replaced with symbolic building blocks corresponding to functions common in science and engineering. These activation functions are analogous to the primitive functions in symbolic regression. Sahoo et al. (2018) extended the work to include division. In the Supplementary Material, we construct another type of ANN which, unlike standard ANNs, has a variety of synapses and cell body types. We show that it can derive law of gravity from data. Another approach involves adaptation of language models/transformers to the symbolic regression problem. Kamienny et al. (2022) developed an end-to-end transformer-based model that uses both symbolic tokens for the operators and variables, and numeric tokens for the constants. It shows a significant jump in accuracy compared to previous ANN-based approaches, with several orders of magnitude faster inference as compared to state-of-the-art genetic programming.

Unknown PDEs
In cases where the underlying PDEs are not known, scientists want i) accurate solvers that generalize well, ii) fast solvers which would be faster than traditional solvers in test cases where the PDE is known, iii) accurate symbolic extraction. Studies with their prime focus on symbolic extraction follow similar approaches as those for algebraic equations (see below). However, there are innovative breakthroughs in the development of solvers that address objectives i)-ii). This is accomplished through approaches that learn PDE solution operators. This includes DeepONet Lu et al., 2021) and FNO (Li et al., 2021a which are open source. See the latter for additional references and a useful literature review. Li et al. (2021a) showed successful experiments on Burger's equation, Darcy flow, and the Navier-Stokes equations and achieved up to three orders of magnitude in speedup compared to traditional PDE solvers. Another important proof point and real-world application for FNO came from its adaptation for weather forecasting (FourCastNet) by Pathak et al. (2022). In a head-to-head comparison with a state-of-the-art forecasting system (IFS), FourCastNet was found to have generally comparable accuracy as IFS but with higher accuracy for small-scale variables, including precipitation. In addition, FourCastNet can generate forecasts (less than 2 s for a week-long forecast) orders of magnitude faster than IFS. This enables creation of fast large-ensemble forecasts which are out of reach of traditional techniques.
While DeepONet and FNO were not focused on symbolic extraction, one can always add symbolic extraction to the models. The basic ideas for discovery of PDEs from data in symbolic form are like those for algebraic data and can be cast into three categories. One Frontiers in Astronomy and Space Sciences frontiersin.org category (e.g., sparse identification of non-linear dynamics (SINDy)) consists of construction of a candidate library of partial derivatives which is then used by a sparse regression technique to obtain a parsimonious model (Rudy et al., 2017;Champion et al., 2019). In case of PDEs, neural nets offer the added advantage of accurate differentiation. As a result, a second category of solutions combine neural nets with genetic algorithms where the derivatives are calculated by neural nets and genetic algorithms are used for search Desai and Strachan, 2021). A third class is purely neural net based and includes the use of symbolic networks (Long et al., 2019).

Solutions when the form of the PDE is known
Here, we discuss three class of AI based approaches when the form of the PDE is known. As mentioned earlier, the so-called inverse problem is not discussed here (see Camporeale et al., 2022 for an application in space physics).

AI solvers
Conventional solvers (e.g., FDM) discretize the domain into a grid and advance the simulation using time-stepped methodology or discreteevent based time advance (e.g., Omelchenko and Karimabadi, 2022). The so-called Physics-Informed Neural Network (PINN)-type methods (Raissi et al., 2019;Jagtap and Karniadakis, 2020) overcome discretization issues of conventional solvers by taking advantage of auto-differentiation to compute the exact, mesh-free derivatives. They also offer several advantages over other deep learning approaches. PINN requires less training data since the underlying equation is already known. And having the prior knowledge of the physical/conservation laws enables their incorporation into the neural network design which in turn reduces the space of admissible solutions.
A notable study is that of Li et al. (2021b) who combined operator learning (FNO) with function optimization (PINN). This integrated technique (PINO) outperforms previous ML methods including both PINN and FNO, while retaining the significant speedup of FNO compared to instance-based solvers. In the challenging problem of long temporal transient flow of Navier-Stokes equation, where the solution builds up from near-zero velocity to a velocity where the system reaches ergodic state, PINO produces accurate results while retaining a 400x speedup compared to the GPU-based pseudo-spectral solver.

Closure models
A common approach to deal with the extreme multi-scale solution to PDEs is using subgrid closure models. Given the utility of neural networks for extracting equations from data, there has been significant work, especially in the CFD domain, on their use for development of closure models (Kurz and Beck, 2022 and references therein). Here we refer the reader to several review articles on this topic (e.g., Taghizadeh et al., 2020;Sofos et al., 2022).

Error correction
Another approach has been to use AI to correct errors at each time step in under-resolved simulations (Kochkov et al., 2021 and references therein). This approach requires training a coarse resolution solver with high resolution ground truth simulations. Promising results were obtained in solution to Navier-Stokes by Kochkov et al. (2021). Results were as accurate as baseline solvers with 8-10x finer resolution in each spatial dimension, resulting in 40-80x fold computational speedups. The model exhibited good stability over long simulations and showed surprisingly good generalization to Reynolds numbers outside of the flows where it is trained.

Discussion and proposed roadmap for NeuHPC in space physics
We advocate for the following changes: a) make funding NeuHPC a priority, b) adapt the funding to the pace of AI developments. This means a short leash on grants and strong focus on results measured by well-established benchmarks (see examples of benchmarks below). c) Promotion of interdisciplinary collaboration with AI experts in industry and academia to overcome the fact that most plasma physicists do not have deep expertise in AI.
Given AI's prowess in predictions, we suggest as starting point proof-ofconcept (POC) studies focused on video prediction and error correction: • Video prediction: Apply off-the-shelf spatio-temporal deep learning models for video prediction (e.g., U-net, ResNet) to simulations. This would create a benchmark  for comparison with follow up studies using PDE centric AI approaches like PINO or FNO. We suggest 2D hybrid simulations (e.g., KHI) where many training cases can be generated for videos of current density, mixing (see Supplementary Material), among others. • Grid error correction: Assess the viability of error correction in a coarse grid hybrid simulation, against an equivalent high-resolution simulation. DES hybrid (Omelchenko and Karimabadi, 2022) is particularly useful since it remains stable even when the grid scale is significantly larger than the ion inertial length. • PIC noise error correction: Since noise level goes down only as the square root of number of particles, an AI-based correction would enable running a simulation with a low number of particles (e.g., 5 particles/cell) but reproducing results of a simulation with a much higher number of particles/cell (e.g., 500), a major breakthrough.
Other POCs of interest that target multi-scale problems include: • Closure models: Explore derivation of closure models for the island coalescence problem (Karimabadi et al., 2011b). • PDE derivation: Explore derivation of an equation that describes the temporal evolution of the island coalescence problem.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number (s) can be found in the article/Supplementary Material.

Author contributions
HK conceived of the idea for the paper and wrote the first draft. JW wrote the codes and produced the results for the two POCs in the Supplementary Material. All authors discussed and edited the final version of the manuscript.
Frontiers in Astronomy and Space Sciences frontiersin.org Funding Two of the authors (HK and JW) did not receive any external funding for their work. DAR is supported by NASA's Heliophysics Digital Resource Library.