<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">669097</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2021.669097</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The Old and the New: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers?</article-title>
<alt-title alt-title-type="left-running-head">Markidis&#x2009;</alt-title>
<alt-title alt-title-type="right-running-head">Can PINNs Replace Traditional Solvers?</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Markidis&#x2009;</surname>
<given-names>Stefano</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/533275/overview"/>
</contrib>
</contrib-group>
<aff>KTH Royal Institute of Technology, <addr-line>Stockholm</addr-line>, <country>Sweden</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/552047/overview">Javier Garcia-Blas</ext-link>, Universidad Carlos III de Madrid, Spain</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/555028/overview">Nathan Hodas</ext-link>, Pacific Northwest National Laboratory (DOE), United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1246046/overview">Changqing Luo</ext-link>, Virginia Commonwealth University, United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Stefano Markidis&#x2009;, <email>markidis@kth.se</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>669097</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>02</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Markidis&#x2009;.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Markidis&#x2009;</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Physics-Informed Neural Networks (PINN) are neural networks encoding the problem governing equations, such as Partial Differential Equations (PDE), as a part of the neural network. PINNs have emerged as a new essential tool to solve various challenging problems, including computing linear systems arising from PDEs, a task for which several traditional methods exist. In this work, we focus first on evaluating the potential of PINNs as linear solvers in the case of the Poisson equation, an omnipresent equation in scientific computing. We characterize PINN linear solvers in terms of accuracy and performance under different network configurations (depth, activation functions, input data set distribution). We highlight the critical role of transfer learning. Our results show that low-frequency components of the solution converge quickly as an effect of the F-principle. In contrast, an accurate solution of the high frequencies requires an exceedingly long time. To address this limitation, we propose integrating PINNs into traditional linear solvers. We show that this integration leads to the development of new solvers whose performance is on par with other high-performance solvers, such as PETSc conjugate gradient linear solvers, in terms of performance and accuracy. Overall, while the accuracy and computational performance are still a limiting factor for the direct use of PINN linear solvers, hybrid strategies combining old traditional linear solver approaches with new emerging deep-learning techniques are among the most promising methods for developing a new class of linear solvers.</p>
</abstract>
<kwd-group>
<kwd>physics-informed deep-learning</kwd>
<kwd>PINN</kwd>
<kwd>scientific computing</kwd>
<kwd>Poisson solvers</kwd>
<kwd>deep-learning</kwd>
</kwd-group>
<contract-sponsor id="cn001">European Commission<named-content content-type="fundref-id">10.13039/501100000780</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Deep Learning (DL) has revolutionized the way of performing classification, pattern recognition, and regression tasks in various application areas, such as image and speech recognition, recommendation systems, natural language processing, drug discovery, medical imaging, bioinformatics, and fraud detection, among few examples (<xref ref-type="bibr" rid="B10">Goodfellow et&#x20;al., 2016</xref>). However, scientific applications solving linear and non-linear equations with demanding accuracy and computational performance requirements have not been the DL focus. Only until recently, a new class of DL networks, called <italic>Physics-Informed Neural Networks</italic> (PINN), emerged as a very promising DL method to solve scientific computing problems (<xref ref-type="bibr" rid="B39">Raissi et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B37">2017a</xref>,<xref ref-type="bibr" rid="B38">b</xref>; <xref ref-type="bibr" rid="B6">Eivazi et&#x20;al., 2021</xref>). In fact, PINNs are specifically designed to integrate scientific computing equations, such as Ordinary Differential Equations (ODE), Partial Differential Equations (PDE), non-linear and integral-differential equations (<xref ref-type="bibr" rid="B32">Pang et&#x20;al., 2019</xref>), into the DL network training. In this work, we focus on PINN application to solve a traditional scientific computing problem: the solution of a linear system arising from the discretization of a PDE. We solve the linear system arising from the Poisson equation, one of the most common PDEs whose solution still requires a non-negligible time with traditional approaches. We evaluate the level of maturity in terms of accuracy and performance of PINN linear solver, either as a replacement of other traditional scientific approaches or to be deployed in combination with conventional scientific methods, such as the multigrid and Gauss-Seidel methods (<xref ref-type="bibr" rid="B36">Quarteroni et&#x20;al., 2010</xref>).</p>
<p>PINNs are deep-learning networks that, after training (solving an optimization problem to minimize a residual function), output an approximated solution of differential equation/equations, given an input point in the integration domain (called collocation point). Before PINNs, previous efforts, have explored solving PDEs with constrained neural networks (<xref ref-type="bibr" rid="B35">Psichogios and Ungar, 1992</xref>; <xref ref-type="bibr" rid="B21">Lagaris et&#x20;al., 1998</xref>). The major innovation with PINN is the introduction of a <italic>residual</italic> network that encodes the governing physics equations, takes the output of a deep-learning network (called <italic>surrogate</italic>), and calculates a residual value (a loss function in DL terminology). The inclusion of a <italic>residual</italic> network, somehow, bears a resemblance of those iterative Krylov linear solvers in scientific applications. The fundamental difference is that PINNs calculate differential operators on graphs using automatic differentiation (<xref ref-type="bibr" rid="B4">Baydin et&#x20;al., 2018</xref>) while traditional scientific approaches are based on numerical schemes for differentiation. As noted in previous works (<xref ref-type="bibr" rid="B39">Raissi et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B27">Mishra and Molinaro, 2020a</xref>), automatic differentiation is the main strength of PINNs because operators on the residual network can be elegantly and efficiently formulated with automatic differentiation. An important point is that the PINN&#x2019;s <italic>residual</italic> network should not be confused with the popular network architectures, called also <italic>Residual</italic> networks, or <italic>ResNet</italic> in short, where the name derives from using skip-connection or residual connections (<xref ref-type="bibr" rid="B10">Goodfellow et&#x20;al., 2016</xref>) instead of calculating a residual like in PINNs.</p>
<p>The basic formulation of the PINN training does not require labeled data, e.g., results from other simulations or experimental data, and is unsupervised PINNs only require the evaluation of the residual function (<xref ref-type="bibr" rid="B28">Mishra and Molinaro, 2020b</xref>). Providing simulation or experimental data for training the network in a supervised manner is also possible and necessary for so data-assimilation (<xref ref-type="bibr" rid="B40">Raissi et&#x20;al., 2020</xref>), inverse problems (<xref ref-type="bibr" rid="B27">Mishra and Molinaro, 2020a</xref>), super resolution (<xref ref-type="bibr" rid="B48">Wang C. et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B7">Esmaeilzadeh et&#x20;al., 2020</xref>), and discrete PINNs (<xref ref-type="bibr" rid="B39">Raissi et&#x20;al., 2019</xref>). The supervised approach is often used for solving ill-defined problems when for instance we lack boundary conditions or an Equation of State (EoS) to close a system of equations (for instance, EoS for the fluid equations (<xref ref-type="bibr" rid="B52">Zhu and Muller, 2020</xref>)). In this study, we only focus on the basic PINNs as we are interested in solving PDEs without relying on other simulations to assist the DL network training. A common case in scientific applications is that we solve the same PDE with different source terms at each time step. For instance, in addition to other computational kernels, Molecular Dynamics (MD) code and semi-implicit fluid and plasma codes, such as GROMACS (<xref ref-type="bibr" rid="B46">Van Der Spoel et&#x20;al., 2005</xref>), Nek5000 (<xref ref-type="bibr" rid="B34">Paul F. Fischer and Kerkemeier, 2008</xref>), and iPIC3D (<xref ref-type="bibr" rid="B25">Markidis et&#x20;al., 2010</xref>, <xref ref-type="bibr" rid="B26">2020</xref>), calculate the Poisson equation for the electrostatic and pressure solver (<xref ref-type="bibr" rid="B31">Offermans et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B2">Aguilar and Markidis, 2021</xref>) and divergence cleaning operations at each&#x20;cycle.</p>
<p>Once a PINN is trained, the inference from the trained PINN can be used to replace traditional numerical solvers in scientific computing. In this so-called <italic>inference</italic> or <italic>prediction</italic> step, the input includes independent variables like simulation time step and simulation domain positions. The output is the solution of the governing equations at the time and position specified by the input. Therefore, PINNs are a <italic>gridless</italic> method because any point in the domain can be taken as input without requiring the definition of a mesh. Moreover, the trained PINN network can be used for predicting the values on simulation grids of different resolutions without the need of being retrained. For this reason, the computational cost does not scale with the number of grid points like many traditional computational methods. PINNs borrow concepts from popular methods in traditional scientific computing, including Newton-Krylov solvers (<xref ref-type="bibr" rid="B17">Kelley, 1995</xref>), finite element methods (FEM) (<xref ref-type="bibr" rid="B42">Rao, 2017</xref>), and Monte Carlo techniques (<xref ref-type="bibr" rid="B43">Rubinstein and Kroese, 2016</xref>). Like the Newton-Krylov solvers, PINNs training is driven by the objective of minimizing the residual function and employs Newton methods during the optimization process. Similarly to the FEM, PINN uses interpolation basis (non-linear) functions, called <italic>activation functions</italic> (<xref ref-type="bibr" rid="B41">Ramachandran et&#x20;al., 2017</xref>) in the neural network fields. Like Monte Carlo and quasi-Monte Carlo methods, PINNs integrate the governing equations using a random or a low-discrepancy sequence, such as the Sobol sequence (<xref ref-type="bibr" rid="B44">Sobo&#x13a;, 1990</xref>), for the collocation points used during the evaluation the residual function.</p>
<p>The motivation of this work is twofold. First, we evaluate the potential of deploying PINNs for solving linear systems, such as the one arising from the Poisson equation. We focus on solving the Poisson equation, a generalization of the Laplace equation, and an omnipresent equation in scientific computing. Traditionally, Poisson solvers are based on linear solvers, such as the Conjugate Gradient (CG) or Fast Fourier Transform (FFT). These approaches may require a large number of iterations before convergence and are computationally expensive as the fastest methods scale as <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi mathvariant="script">O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>log</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, where <italic>N</italic>
<sub>
<italic>g</italic>
</sub> is the number of grid points in the simulation domain. The second goal of this work is to propose a new class of linear solvers combining new emerging DL approaches with old traditional linear solvers, such as multigrid and iterative solvers.</p>
<p>In this work, we show that the accuracy and the convergence of PINN solvers can be tuned by setting up an appropriate configuration of depth, layer size, activation functions and by leveraging transfer learning. We find that fully-connected surrogate/approximator networks with more than three layers produce similar performance results in the first thousand training epochs. The choice of activation function is critical for PINN performance: depending on the <italic>smoothness</italic> of the source term, different activation functions provide considerably different accuracy and convergence. Transfer learning in PINNs allow us to initialize the network with the results of another training solving the same PDE with a different source term (<xref ref-type="bibr" rid="B50">Weiss et&#x20;al., 2016</xref>). The usage of transfer learning considerably speed-up the training of the network. In terms of accuracy and computational performance, a naive replacement of traditional numerical approaches with the direct usage of PINNs is still not competitive with traditional solvers and codes, such as CG implementations in HPC packages (<xref ref-type="bibr" rid="B3">Balay et&#x20;al., 2019</xref>).</p>
<p>To address the limitations of the direct usage of PINN, we combine PINN linear solvers with traditional approaches such as the multigrid and Gauss-Seidel methods (<xref ref-type="bibr" rid="B45">Trottenberg et&#x20;al., 2000</xref>; <xref ref-type="bibr" rid="B36">Quarteroni et&#x20;al., 2010</xref>). The DL linear solver is used to solve the linear system on a coarse grid and the solution refined on finer grids using the multigrid V-cycle and Gauss-Seidel solver iterations. This approach allows us to use the DL networking of converging quickly on low-frequency components of the problem solution and rely on Gauss-Seidel to solve accurately high-frequency components of the solution. We show that the integration of DL techniques in traditional linear solvers leads to solvers that are on-par of high-performance solvers, such as PETSc conjugate gradient linear solvers, both in terms of performance and accuracy.</p>
<p>The paper is organized as follows. We first introduce the governing equations, the background information about PINN architecture and showcase the usage of PINN to solve the 2D Poisson equation. <xref ref-type="sec" rid="s3">Section 3</xref> presents a characterization of PINN linear solver performance when varying the network size, activation functions, and data set distribution and we highlight the critical importance of leveraging transfer learning. We present the design of a Poisson solver combining new emerging DL techniques into the V-cycle of the multigrid method and analyze its error and computational performance in <xref ref-type="sec" rid="s5">Section 5</xref>. Finally, we summarize this study and outline challenges and next step for the future work in <xref ref-type="sec" rid="s6">Section&#x20;6</xref>.</p>
</sec>
<sec id="s2">
<title>2 The New: Physics-Informed Linear Solvers</title>
<p>The PINNs goal is to approximate the solution of a system of one or more differential, possibly non-linear equations, by encoding explicitly the differential equation formulation in the neural network. Without loss of generality, PINN solves the non-linear equation:<disp-formula id="e1">
<mml:math id="m2">
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mi>u</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="normal">&#x3a9;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(1)</label>
</disp-formula>where <italic>u</italic> is the solution of the system, <italic>u</italic>
<sub>
<italic>t</italic>
</sub> is its derivative with respect to time <italic>t</italic> in the period [0, T], <inline-formula id="inf2">
<mml:math id="m3">
<mml:mi mathvariant="script">N</mml:mi>
</mml:math>
</inline-formula> is a non-linear differential operator, <italic>x</italic> is an independent, possibly multi-dimensional variable, defined over the domain &#x3a9;. As a main reference equation to solve, we consider the Poisson equation in a unit square domain and Dirichlet boundary conditions throughout this paper:<disp-formula id="e2">
<mml:math id="m4">
<mml:msup>
<mml:mrow>
<mml:mo>&#x2207;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mi>u</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>While this problem is linear in nature and PINNs can handle non-linear problems, we focus on the Poisson equation because it is one of the most solved PDEs in scientific applications. The Poisson equation, an example of elliptic PDE, arises in several different fields from electrostatic problems in plasma and MD codes, to potential flow and pressure solvers in Computational Fluid Dynamics (CFD), to structural mechanics problems. Elliptic problems are one of the Achilles&#x2019; heels for scientific applications (<xref ref-type="bibr" rid="B29">Morton and Mayers, 2005</xref>). While relatively fast and straightforward - albeit subject to numerical constraints - computational methods exist for solving hyperbolic and parabolic problems, e.g. explicit differentiation, traditionally the solution of elliptic problems requires linear solvers, such as Krylov (CG or GMREs) solvers or FFT. Typically, in scientific applications, the simulation progresses through several time steps, where a Poisson equation with same boundary conditions and different source term <italic>f</italic>(<italic>x</italic>, <italic>y</italic>) (typically not considerably different from the source term of the previous time step) is solved.</p>
<p>In its basic formulation, PINNs combine two networks together: an <italic>approximator</italic> or <italic>surrogate</italic> network and a residual network (see <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>) (<xref ref-type="bibr" rid="B39">Raissi et&#x20;al., 2019</xref>). The approximator/surrogate network undergoes training and after it provides a solution <inline-formula id="inf3">
<mml:math id="m5">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> at a given input point (<italic>x</italic>, <italic>y</italic>), called <italic>collocation point</italic>, in the simulation domain. The residual network encodes the governing equations and it is the distinctive feature of PINNs. The residual network is not trained and its only function is to provide the approximator/surrogate network with the residual (<italic>loss</italic> function in DL terminology):<disp-formula id="e3">
<mml:math id="m6">
<mml:mi>r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo>&#x2207;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>A PINN to solve a Poisson problem <inline-formula id="inf4">
<mml:math id="m7">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with associated Dirichlet boundary conditions. PINN consists of two basic interconnected networks. The first network (red vertices) provides a surrogate or approximation of the problem solution <italic>u</italic>. The network takes as input a point in the problem domain (<italic>x</italic>, <italic>y</italic>) and provides an approximate solution <inline-formula id="inf5">
<mml:math id="m8">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. This network weights and biases are trainable. The second network (blue vertices) takes the approximate solution from the first network and calculates the residual that is used as loss function to train the first network. The residual network includes the governing equations, boundary conditions and initial conditions (not included in the plot as the Poisson problem does not require initial conditions).</p>
</caption>
<graphic xlink:href="fdata-04-669097-g001.tif"/>
</fig>
<p>Differently from traditional methods often relying on finite difference approximation, the derivatives on the residual network graph, e.g, <inline-formula id="inf6">
<mml:math id="m9">
<mml:msup>
<mml:mrow>
<mml:mo>&#x2207;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> in <xref ref-type="disp-formula" rid="e3">Eq. (3)</xref>, are calculated using the so-called <italic>automatic differentiation</italic>, or autodiff, that leverages the chain rule (<xref ref-type="bibr" rid="B4">Baydin et&#x20;al., 2018</xref>) applied to the operations defined on the network nodes. In the solution of the Poisson Equation, the Laplacian operator is expressed as two successive first-oder derivatives of <inline-formula id="inf7">
<mml:math id="m10">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> in the <italic>x</italic> and <italic>y</italic> directions and their summation (see the blue network nodes in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>).</p>
<p>In the inference/prediction phase, only the surrogate network is used to calculate the solution to the problem (remember that the residual network is only used in the training process to calculate the residual).</p>
<p>The approximator/surrogate network is a feedforward neural network (<xref ref-type="bibr" rid="B10">Goodfellow et&#x20;al., 2016</xref>): it processes an input <italic>x</italic> via <italic>l</italic> layer of units (called also <italic>neurons</italic>). The approximator/surrogate network expresses affine-linear maps (<italic>Z</italic>) between units and scalar non-linear activation functions (<italic>a</italic>) within the units:<disp-formula id="e4">
<mml:math id="m11">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mo>&#x2026;</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mspace width="0.17em"/>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo mathvariant="italic">&#x00B0;</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>In DL, the most used activation functions are Rectified Linear Unit (ReLU), tanh, swish, sine, and sigmoid functions. See Ref. (<xref ref-type="bibr" rid="B41">Ramachandran et&#x20;al., 2017</xref>). for an overview of the different activation functions. As shown by Ref. (<xref ref-type="bibr" rid="B28">Mishra and Molinaro, 2020b</xref>), PINNs requires sufficiently smooth activation functions. PINNs with ReLU and other non-smooth activation functions, such as ELU and SELU (Exponential and Scaled Exponential Linear Units) are not &#x201c;consistent/convergent&#x201d; methods: in the limit of an infinite training dataset a well-trained PINN with ReLU-like activation functions, the solution does not converge to the exact solution (<xref ref-type="bibr" rid="B27">Mishra and Molinaro, 2020a</xref>). This theoretical result is also confirmed by our experiments using ReLU-like activation functions. For this reason, we do not use ReLU-like activation functions in PINNs.</p>
<p>The affine maps <italic>Z</italic> are characterized by the weights and biases of the approximator/surrogate network:<disp-formula id="e5">
<mml:math id="m12">
<mml:msub>
<mml:mrow>
<mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:math>
<label>(5)</label>
</disp-formula>where <italic>W</italic>
<sub>
<italic>l</italic>
</sub> is a <italic>weight</italic> matrix for the layer <italic>l</italic> and <italic>b</italic> is the <italic>bias</italic> vector. In PINNs, the weight values are initialized using the <italic>Xavier</italic> (also called <italic>Glorot</italic> when using the last name of the inventor instead) procedure (<xref ref-type="bibr" rid="B20">Kumar, 2017</xref>).</p>
<p>Typically, the PINN approximator/surrogate networks are fully connected networks consisting of 4-6 hidden layers(H) and 50&#x2013;100 units per layer, similarly to the network in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. There are also successful experiments using convolutional and recurrent layers (<xref ref-type="bibr" rid="B30">Nascimento and Viana, 2019</xref>; <xref ref-type="bibr" rid="B9">Gao et&#x20;al., 2020</xref>) but the vast majority of existing PINNs rely on fully-connected layers. In this work, we focus on studying the performance of fully-connected&#x20;PINN.</p>
<p>The residual network is responsible for encoding the equation to solve and provide the loss function to the approximator network for the optimization process. In PINNs, we minimize the Mean Squared Error (MSE) of the residual (<xref ref-type="disp-formula" rid="e3">Eq. (3)</xref>):<disp-formula id="e6">
<mml:math id="m13">
<mml:mi>M</mml:mi>
<mml:mi>S</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2211;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>where <inline-formula id="inf8">
<mml:math id="m14">
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the number of collocation points. In PINNs, the collocation points constitute the training dataset. Note that <italic>MSE</italic>
<sub>
<italic>r</italic>
</sub> depends on the size of the training of the dataset (<inline-formula id="inf9">
<mml:math id="m15">
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>), e.g., the number of collocation points. In practice, a larger number of collocation points leads to an increased MSE value. <italic>MSE</italic>
<sub>
<italic>r</italic>
</sub> depends also on on the distribution of our collocation points. The three most used dataset distributions are: uniform (the dataset is uniformly spaced on the simulation domain as on a uniform grid), pseudo-random (collocations points are sampled using pseudo-random number generator) and Sobol (collocation points are from the Sobol low-discrepancy sequence). Typically, the default training distribution for PINNs is Sobol, like in quasi-Montecarlo methods.</p>
<p>Recently, several PINN architectures have been proposed. PINNs differentiate on how the residual network is defined. For instance, fPINN (fractional PINN) is a PINN with a residual network capable of calculating residuals of governing equations including fractional calculus operators (<xref ref-type="bibr" rid="B32">Pang et&#x20;al., 2019</xref>). fPINN combines automatic differentiation with numerical discretization for the fractional operators in the residual network. fPINN extends PINN to solve integral and differential-integral equations. Another important PINN is vPINN (variational PINN): they include a residual network that uses the variational form of the problem into the loss function (<xref ref-type="bibr" rid="B19">Kharazmi et&#x20;al., 2019</xref>) and an additional shallow network using trial functions and polynomials and trigonometric functions as test functions. A major advantage with respect to basic PINNs is that in the analytical calculation by integrating by parts the integrand in the variational form, we can the order of the differential operators represented by the neural networks, speeding up the training and increasing PINN accuracy. hp-VPINN is an extension of vPINN that allows hp-refinement via domain decomposition as h-refinement and projection onto space of high order polynomials as p-refinement (<xref ref-type="bibr" rid="B18">Kharazmi et&#x20;al., 2020</xref>). In this work, we use the original residual network as shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>.</p>
<p>In the training phase, an optimization process targeting the residual minimization determines the weights and biases of the surrogate network. Typically, we use two optimizers in succession: the Adam optimizer as first and then a Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer (<xref ref-type="bibr" rid="B8">Fletcher, 2013</xref>). BFGS uses the Hessian matrix (curvature in highly dimensional space) to calculate the optimization direction and provides more accurate results. However, if used directly without using the Adam optimizer can rapidly converge to a local minimum (for the residual) without exiting. For this reason, the Adam optimizer is used first to avoid local minima, and then the solution is refined by BFGS. We note that the typical BFGS used in PINNs is the L-BFGS-B: L-BFGS is a limited-memory version of BFGS to handle problems with many variables, such as DL problems; the BFGS-B is a variant of BFGS for bound constrained optimization problems. In our work, we tested several optimizers, including Newton and Powell methods, and found that L-BFGS-B provides by far the highest accuracy and faster convergence in all our test problems. L-BFGS-B is currently the most critical technology for PINNs.</p>
<p>An <italic>epoch</italic> comprises all the optimizer iterations to cover all the datasets. In PINNs, typically, thousands of epochs are required to achieve accurate results. By nature, PINNs are under-fitted: the network is not complex enough to accurately capture relationships between the collocation points and solution. Therefore, an extensive dataset increase improves the PINN performance; however, the computational cost increases raising the data set&#x20;size.</p>
<p>One crucial point related to PINNs is whether a neural network can approximate simultaneously and uniformly the solution function and its partial derivatives. Ref. (<xref ref-type="bibr" rid="B22">Lu et&#x20;al., 2019</xref>). shows that feed-forward neural nets with enough neurons can achieve this task. A formal analysis of the errors in PINNs is presented in Refs. (<xref ref-type="bibr" rid="B22">Lu et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B28">Mishra and Molinaro, 2020b</xref>).</p>
<p>An important fact determining the convergence behavior of the DL networks and PINN linear solvers is the Frequency-principle (F-principle): <italic>DNNs often fit target functions from low to high frequencies during the training process</italic> (<xref ref-type="bibr" rid="B51">Xu et&#x20;al., 2019</xref>). The F-principle implies that in PINNs, the low frequency/large scale features of the solution emerge first, while it will take several training epochs to recover high frequency/small-scale features.&#x20;This.</p>
<p>Despite the recent introduction of PINNs, several PINN frameworks for PDE solutions exist. All the major PINN frameworks are written in Python and rely either on TensorFlow (<xref ref-type="bibr" rid="B1">Abadi et&#x20;al., 2016</xref>) or PyTorch (<xref ref-type="bibr" rid="B33">Paszke et&#x20;al., 2019</xref>) to express the neural network architecture and exploit auto-differentiation used in the residual network. Together with TensorFlow, SciPy (<xref ref-type="bibr" rid="B47">Virtanen et&#x20;al., 2020</xref>) is often used to use high-order optimizers such as L-BFGS-B. Two valuable PINN Domain-Specific Languages (DSL) are DeepXDE (<xref ref-type="bibr" rid="B22">Lu et&#x20;al., 2019</xref>) and sciANN (<xref ref-type="bibr" rid="B13">Haghighat and Juanes, 2020</xref>). DeepXDE is an highly customizable framework with TensorFlow one and two backend and it supports basic and fractional PINNs in complex geometries. sciANN is a DSL based on and similar to Keras (<xref ref-type="bibr" rid="B12">Gulli and Pal, 2017</xref>). In this work, we use the DeepXDE&#x20;DSL.</p>
<sec id="s2-1">
<title>2.1 An Example: Solving the 2D Poisson Equation With PINN</title>
<p>To showcase how PINNs work and provide a baseline performance in terms of accuracy and computational cost, we solve a Poisson problem in the unit square domain with a source term <italic>f</italic>(<italic>x</italic>, <italic>y</italic>) that is smooth, e.g., differentiable, and contains four increasing frequencies:<disp-formula id="e7">
<mml:math id="m16">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>sin</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi>sin</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>We choose such a source term as it has a simple solution and to show the F-principle&#x2019;s impact on the convergence of PINN to the numerical solution: we expect the lower frequency components, e.g., <italic>k</italic>&#x20;&#x3d; 1, to convergence faster than the higher frequency components present in the solution (<italic>k</italic>&#x20;&#x3d; 2, 3,&#x20;4).</p>
<p>We use a fully-connected four-layer PINN with a tanh activation function for the approximator/surrogate network for demonstration purposes and without a loss of generality. The input layer consists of two neurons (the <italic>x</italic> and <italic>y</italic> coordinates of one collocation point), while each hidden and output layers comprise 50 neurons and one neuron, respectively. The weights of the network are initialized with the Xavier method. As a reminder, the approximator/surrogate network&#x2019;s output is the approximate solution to our problem. The residual network is a graph encoding the Poisson equation and source term and provides the loss function (<xref ref-type="disp-formula" rid="e6">Eq. 6</xref>) to drive the approximator/surrogate network&#x2019;s optimization. At each, a collocation point within the problem domain is drawn from the Sobol sequence. The training data set consists of 128&#x20;&#xd7; 128 collocation points on the domain and additional 4,000 collocation points on the boundary for a total of 20,384 points. We train the approximator/surrogate network 10,000 of Adam optimizer epochs with a learning rate <italic>&#x3bb;</italic> equal to 0.001 (the magnitude of the optimizer vector along the direction to minimize the residual), followed by 13,000 epochs of L-BFGS-B optimizer. We use the DeepXDE DSL for our PINN implementation.</p>
<p>
<xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows the Poisson equation&#x2019;s approximate solution with the source term of <xref ref-type="disp-formula" rid="e7">Eq. (7)</xref> at different epochs, the training error, and the error of the PINN solution after the training is completed. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> top panels present the contour plot of the approximator/surrogate solution on a 128&#x20;&#xd7; 128 uniform grid after 500, 5,000 and 23,000 epochs. To determine the solution at each epoch, we take the approximate/surrogate network and perform inference/prediction using the points of the 128&#x20;&#xd7; 128 uniform grid. By analyzing the approximate solutions&#x2019; evolution (top panels of <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>), it is clear that the PINN resolves the low-frequency component present in the solution: a yellow band appears along the diagonal of the plot while local peaks (small islands in the contour plot) are not resolved. As the training progresses, localized peaks associated with the source term&#x2019;s high-frequencies appear and are resolved. The bottom right panel of <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows a contour plot of the error after the training is completed. The maximum pointwise error is approximately 5E-3. We note that a large part of the error is located in the proximity of the boundaries. This issue results from the <italic>vanishing-gradient</italic> problem (<xref ref-type="bibr" rid="B49">Wang S. et&#x20;al., 2020</xref>): unbalanced gradients back-propagate during the model training. This issue is similar to the numerical <italic>stiffness</italic> problem when using traditional numerical approaches. One of the effective technique to mitigate the <italic>vanishing-gradient</italic> problem is to employ locally (to the layers or the node) adaptive activation functions (<xref ref-type="bibr" rid="B15">Jagtap et&#x20;al., 2020</xref>). Additional techniques for mitigating <italic>vanishing-gradient</italic> problem are the usage of ReLU activations functions and batch normalization.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The top panels show the solution of the Poisson equation at different epochs using a PINN. The bottom panel shows the training error for an initial training with Adam&#x2019;s optimizer (10,000 epochs), followed by L-BFGS-B (13,000 epochs). The plot also includes the total time for training the PINN on a dual-core Intel i5 processor. The right bottom subplot presents the error of the final solution compared to the exact solution.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g002.tif"/>
</fig>
<p>The bottom panel of <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows the training error&#x2019;s evolution calculated with <xref ref-type="disp-formula" rid="e6">Eq. (6)</xref>. In this case, the initial error is approximately 1.08E2 and decreases up to 2.79E-5&#xa0;at the end of the training. The initial error mainly depends on the training data set size: small input data sets reduce training error that does not translate to higher accuracy in the solution of the problem. However, the training is a reasonable metric when comparing the PINN performance when using the same data set&#x20;size.</p>
<p>By analyzing the evolution of the training error, it is clear that the Adam optimizer training error stabilizes approximately in the range of 5E-3 - 1E-2 after 2,000 epochs, and we do not observe any evident improvement after 2,000 epochs of Adam optimization. The L-BFGS-B optimizer leads the error from 5E-3 - 1E-2 to 2.79E-5 and is responsible for the major decrease of the training error. However, we remind that L-BFGS-B is not used at the beginning of the training as it can converge quickly to a wrong solution (a local minimum in the optimization problem).</p>
<p>To provide an idea of the PINN training&#x2019;s overall computation cost, we also report the total time for training the PINN in this basic non-optimized configuration on a dual-core Intel i5 2.9&#xa0;GHz CPU. The total training execution time is 6,380 seconds, corresponding to approximately 1.5&#xa0;h. For comparison, the solution of the same problem with a uniform grid size 128&#x20;&#xd7; 128 on the same system with the petsc4py CG solver (<xref ref-type="bibr" rid="B5">Dalcin et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B3">Balay et&#x20;al., 2019</xref>) requires 92.28 seconds to converge to double-precision machine epsilon. Basic PINN&#x2019;s direct usage to solve the Poisson problem is limited for scientific application given the computational cost and the relatively low accuracy. In the next sections, we investigate which factors impact the PINN performance and its accuracy. We design a PINN-based solver to have comparable performance to state-of-the-art linear solvers such as petsc4py.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Characterizing PINNs as Linear Solvers</title>
<p>To characterize the PINNs performance for solving the Poisson equation, we perform several parametric studies varying the approximator/surrogate network size, activation functions, and training data size and distribution. We also investigate the performance enhancement achieved by using the transfer learning technique to initialize with the network weights obtained solving the Poisson equation with a different source term (<xref ref-type="bibr" rid="B50">Weiss et&#x20;al., 2016</xref>). During our experiments, we found that two relatively different configurations of the network are required in the case of the source term of the Poisson equation is smooth on non smooth, e.g. non-differentiable. For this reason, we choose two main use cases to showcase the impact of different parameters. For the smooth source term case, we take the source term from <xref ref-type="disp-formula" rid="e7">Eq. (7)</xref> (the example we showcased in the previous section). For the non-smooth source term case, we take a source term that is zero everywhere except for the points enclosed in the circle, centered in (0.5, 0.5) with radius 0.2:<disp-formula id="e8">
<mml:math id="m17">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mspace width="0.28em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.28em"/>
<mml:msqrt>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msqrt>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0.2</mml:mn>
<mml:mo>.</mml:mo>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>As baseline configuration, we adopt the same configuration described in the previous section: a fully-connected network with four hidden layers of 50 units, and tanh activation function. The data set consists of 128&#x20;&#xd7; 128 collocation points in the domain and 4,000 points on the boundary. Differently from the previous configuration, we reduce the training epochs to 2,000 for the Adam Optimizer (the training error do not decrease after 2,000 epochs) and 5,000 for the L-BFGS-B optimizer.</p>
<p>The first experiments we perform is to evaluate the impact of the network size (depth and units per layer) on the training error. To understand the impact of surrogate neural network depth, we perform training with layers of 50 neurons with one (1H), two (2H), three (3H), four (4H), five (5H) and six (6H) hidden layers (H stands for hidden layer). We present the evolution of training error in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>. By analyzing this figure, it is clear that shallow networks consisting of one or two hidden layers do not perform, and the PINN learning is bound in learning after few thousand epochs. Even one layer with large number of units, e.g., one hidden layer with 640 units (see the magenta line in the right panel of <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>), do not lead to better performance as demonstration that depth is more important than breadth in PINN. Deeper networks with more than three layers lead to lower final training errors and improved learning. However, we find that the final training error saturates for PINNs with more than six hidden layers (results not shown here) for the two test cases. An important aspect for the deployment of PINN in scientific applications is that the performance of PINNs with four and more hidden layers have comparable performance in the first 500 epochs of the Adam and L-BFGS-B optimizers. Taking in account that the PINN computational cost for PINNs increases with the number layers and realistically only few hundred epochs are necessary for PINN to be competitive with HPC solvers, PINNs with four hidden layers provide the best trade-off in terms of accuracy and computational performance.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Training error for different fully-connected PINN depth: one (1H), two (2H), three (3H), four (4H), five (5H) and six (6H) hidden layers with 50 neurons each. We also consider the training error for PINNs with six hidden layers and 10-20-40-80-160-320 and 320-160-80-40-20&#x2013;10 units per hidden layer, respectively.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g003.tif"/>
</fig>
<p>For the six hidden layers case, we also check the importance of having a large/small number of units at the beginning/end of the network: we consider the performance of PINN with six hidden layers and 10-20-40-80-160-320 and 320-160-80-40-20&#x2013;10 units per hidden layer, respectively. We find that to have a large number of units at the beginning of the network and small number of units at the end of the network is detrimental to the PINN performance (a six hidden layer network in this configuration has the same performance of a five hidden layer PINN). Instead, to have a small number of units at the beginning of the network and a large number of units at the end of the network is beneficial to the PINN. This observation hints that initial hidden layers might responsible for encoding the low-frequencies components (fewer points are needed to represent low-frequency signals) and the following hidden layers are responsible for representing higher-frequency components (several points are needed to represent high-frequency signals). However, more experiments are needed to confirm this hypothesis.</p>
<p>The most impactful parameter for achieving a low training error is the activation function. This fact is expected as activation functions are nothing else than non-linear interpolation functions (similarly to nodal functions in FEM): some interpolation function might be a better fit to represent the different source terms. For instance, sigmoid functions are a good fit to represent non-differentiable source terms exhibiting discontinuities. On the contrary, a smooth tanh activation function can closely represent smooth functions.</p>
<p>We investigate the impact of different activation functions and show the evolution of the training errors in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>. Together with traditional activation function, we also consider the Locally Adaptive Activation Functions (LAAF): with this technique, a scalable parameter is introduced in each layer separately, and then optimized with a variant of stochastic gradient descent algorithm (<xref ref-type="bibr" rid="B15">Jagtap et&#x20;al., 2020</xref>). The LAAF are provided in the DeepXDE DSL. We investigate LAAF with factor of 5 (LAAF-5) and 10 (LAAF-10) for the tanh, swish and sigmoid cases. The LAAF usage is critical to mitigate the <italic>vanishing-gradient</italic> problem.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Training error for different activation functions. The two test cases show rather different performance: the best activation function for smooth source term case is tanh, while it is sigmoid for the non-smooth source term case. Local (to the layer) adaptive activation functions provide a reduction of the training&#x20;error.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g004.tif"/>
</fig>
<p>The activation function&#x2019;s different impact for the two test cases (smooth and non-smooth source terms) is clear when analyzing the results presented in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>. In the smooth source term case, the best activation function is the locally (to the layer) adaptive tanh activation function with factor 5 (LAAF5 - tanh). In the case of the non-smooth source term, the sigmoid activation function outperforms all the other activation functions. In particular, in this case, the best activation function is the locally (to the layer) adaptive sigmoid activation function with factor 10 (LAAF10 - sigmoid).</p>
<p>As we mentioned in <xref ref-type="sec" rid="s2-1">Section 2.1</xref>, the data size impacts the training errors. Large data sets increase the PINN accuracy but have larger training errors than the training with small data set because of the error definition (see <xref ref-type="disp-formula" rid="e6">Eq. (6)</xref>). For this reason, the training error should be compared only for training using the same training data set size. We investigate the impact of three different input data size (1&#x2013;1,200 points in the domain and 200 on the boundary, 2&#x2013;64&#xd7;64 points in the domain and 2,000 on the boundary, 3&#x2013;128&#x20;&#xd7; 128 points in the domain and 4,000 on the boundary) with three collocation point distributions (uniform, pseudo-random, and Sobol sequence) for the non-smooth source term. We show the results in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Training error for different data set (1,200 points in the domain and 200 on the boundary, 64&#x20;&#xd7; 64 points in the domain and 2,000 on the boundary, 128&#x20;&#xd7; 128 points in the domain and 4,000 on the boundary) and different distribution (uniform, pseudo-random and Sobol).</p>
</caption>
<graphic xlink:href="fdata-04-669097-g005.tif"/>
</fig>
<p>In general, we find that the collocation point distribution does not have a considerable impact on the training error for large data sets: the Sobol and pseudo-random distributions have a slightly better performance than the uniform distribution. For small data sets, pseudo-random distribution result in lower training errors.</p>
<p>We also study the impact of having a <italic>restart</italic> procedure: we train first the PINN with a small data set 1,200 points in the domain and 200 on the boundary) for 4,500 epochs (and then re-train the same network with a large data set (128&#x20;&#xd7; 128 points in the domain and 4,000 on the boundary) for 2,500 cycles (see the magenta lines and the grey box in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>). Such a restart capability would lead to a large computational saving. However, the results show that to retrain with a large data set does not lead to a decreased error and result in the highest training&#x20;error.</p>
</sec>
<sec id="s4">
<title>4 The Importance of Transfer Learning</title>
<p>In this study, we found that the usage transfer learning technique is critical for training PINNs with a reduced number of epochs and computational cost. The transfer learning technique consists of training a network solving the Poisson equation with a different source term. We can then initialize the PINN network we intend to solve with the first fully trained network weights and biases. In this way, the first PINN <italic>transfers</italic> the learned information about encoding to the second PINN. To show the advantage of transfer learning in PINN, we solve two additional test cases with smooth and non-smooth source terms. For the test case with the smooth source term, we solve the Poisson equation with source term <italic>f</italic>(<italic>x</italic>, <italic>y</italic>) &#x3d; 10(<italic>x</italic>(<italic>x</italic>&#x20;&#x2212; 1) &#x2b; <italic>y</italic>(<italic>y</italic>&#x20;&#x2212; 1)) &#x2212; 2&#x2009;sin(<italic>&#x3c0;x</italic>)&#x2009;sin(<italic>&#x3c0;y</italic>) &#x2b; 5(2<italic>&#x3c0;x</italic>)&#x2009;sin(2<italic>&#x3c0;y</italic>).</p>
<p>We initialize the network with the results obtained during the training with <xref ref-type="disp-formula" rid="e7">Eq. (7)</xref> as a source term. One of the major advantages of transfer-learning is that we can start the L-BFGS-B optimizer after very few Adam solvers epochs (empirically,we found that 10 Adam epochs ensure that L-BFGS-B will avoid local minima). L-BFGS-B has faster convergence than the Adam optimizer and therefore the training is quicker. When not using transfer-learning, we train the PINN with 2,000 epochs of Adam optimizer, followed by 5,000 epochs of L-BFGS-B. When using L-BFGS-B, we perform 10 epochs of Adam optimizer, followed by 6,955&#xa0;L-BFGS-B epochs.</p>
<p>The black lines in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> show a comparison of the training error for a network initialized with Xavier weight initialization, e.g., without transfer learning (&#x2212;. black line) and with transfer learning (&#x2212;&#x2b; black line). In this case, transfer learning usage allows gaining two orders of improvement in the training error in less than 1,000 epochs.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Training error with and without transfer learning for the smooth and non-smooth source test&#x20;cases.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g006.tif"/>
</fig>
<p>For the test case with non-smooth source term, we introduce and additional test case solving the Poisson equation with a source term that is everywhere zero except in a circle with radius 0.1 and centered in the <italic>x</italic> and <italic>y</italic> coordinates (0.7,0.7).<disp-formula id="e9">
<mml:math id="m18">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:mspace width="0.28em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.28em"/>
<mml:msqrt>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.7</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.7</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msqrt>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0.1</mml:mn>
<mml:mo>.</mml:mo>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>For transfer learning, we use the PINN weights obtained training the network to solve the Poisson equation with source term of <xref ref-type="disp-formula" rid="e9">Eq. (9)</xref>. The blue lines in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> are the training error without transfer learning. As in the case of smooth-source term, the usage of transfer learning rapidly decreases the training&#x20;error.</p>
<p>We note that usage of the transfer learning leads to an initial (less than 200&#xa0;L-BFGS-B epochs) <italic>super-convergence</italic> to a relatively low training error. For this reason, transfer-learning is a necessary operation to make PINN competitive with other solvers used in scientific computing.</p>
<p>The major challenge for using transfer-learning is to determine which pre-trained PINN to use. In simulation codes, solving the same equation with different source term at each time step, an obvious choice is a PINN that solves the governing equations with a source term at one of the time step. For other cases, we found that PINNs solving problems with source terms containing high-frequency components (possibly more than one component) are suitable for transfer-learning in general situations. We also found that PINNs solving problem with only one low frequency component as source term are not beneficial for transfer learning: their performance is equivalent to the case without transfer learning.</p>
</sec>
<sec id="s5">
<title>5 The Old and the New: Integrating PINNs Into Traditional Linear Solvers</title>
<p>In <xref ref-type="sec" rid="s2-1">Section 2.1</xref>, we observed that direct usage of PINN to solve the Poisson equation is still limited by the large number of epochs required to achieve an acceptable precision. One possibility to improve the performance of PINN is to combine PINN with traditional iterative solvers such as the Jacobi, Gauss-Seidel and multigrid solvers (<xref ref-type="bibr" rid="B36">Quarteroni et&#x20;al., 2010</xref>).</p>
<p>PINN solvers&#x2019; advantage is the quick convergence to the solution&#x2019;s low frequencies components. However, the convergence to high-frequency features is slow and requires an increasing number of training iteration/epochs. This fact is a result of the F-principle. Because of this, PINNs are of limited usage when the application requires highly accurate solutions. As suggested by Ref. (<xref ref-type="bibr" rid="B51">Xu et&#x20;al., 2019</xref>), in such cases, the most viable option is to combine PINN solvers with traditional solvers that can converge rapidly to the solution&#x2019;s high-frequency components (but have low convergence for the low-frequency components). Such methods introduce a computational grid and we compute the differential operators with a finite difference scheme. In this work, we choose the Gauss-Seidel method as it exhibits higher convergence rate than the Jacobi method. Each Gauss-Seidel solver iteration for solving the Poisson equation (<xref ref-type="disp-formula" rid="e2">Eq. (2)</xref>) is:<disp-formula id="e10">
<mml:math id="m19">
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>4</mml:mn>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>x</mml:mi>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>y</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(10)</label>
</disp-formula>where <italic>i</italic> and <italic>j</italic> are the cell index, &#x394;<italic>x</italic> and &#x394;<italic>y</italic> are the grid cell size in the <italic>x</italic> and <italic>y</italic> direction, and <italic>n</italic> is the iteration number. Usually, the Gauss-Seidel method stops iterating when &#x2016;<italic>u</italic>
<sub>
<italic>n</italic>&#x2b;1</sub> &#x2212; <italic>u</italic>
<sup>
<italic>n</italic>
</sup>&#x2016;<sub>2</sub> &#x2264; <italic>&#x3b4;</italic>, where &#x2016;&#x2026; &#x2016; is the Euclidean norm and <italic>&#x3b4;</italic> is a so-called tolerance and it is chosen as an arbitrarily small&#x20;value.</p>
<p>Both the Jacobi and Gauss-Seidel methods show fast convergence for small-scale features: this is because the update of unknown values involves only the values of the neighbor points (stencil defined by the discretization of a differential operator). Between two different iterations, the information can only propagate to neighbor&#x20;cells.</p>
<p>In this work, we combine traditional approaches with new emerging DL methods as shown in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>. Overall, the new solver consists of three phases. We use first the DL PINN solver to calculate the solution on a coarse grid. As second phase, we refine the solution with Gauss-Seidel iterations on the coarse grid until a stopping criteria is satisfied. The third phase is a multigrid V-cycle: we linearly interpolate (or <italic>prolongate</italic> in the multigrid terminology) to finer grids and perform a Gauss-Seidel iteration for each finer grid. In fact, several multigrid strategies with different level of sophistications can be sought. However, in this work we focus on a very simple multigrid approach, based on the Gauss-Seidel method and linear interpolation across different grids. The crucial point is that we train a PINN to calculate the solution of the problem on the coarse grid, replacing the multigrid <italic>restriction</italic> (or <italic>injection</italic>) steps in just one&#x20;phase.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>The hybrid solvers relies on the DL linear solver to determine the solution on a coarse grid that is refined through a multigrid V-cycle performing Gauss-Seidel iterations on finer&#x20;grids.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g007.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="F8">Figure&#x20;8</xref> shows a more detailed diagram of a hybrid multigrid solver combining a DL solver to calculate the solution on a coarse grid with a Gauss-Seidel solver to refine the solution and interpolate to finer grid. Because the DL solver convergences quickly to the low-frequency coarse-grained components of the solution while high-frequency small-scale components of the solutions are not accurately solved, we perform the training in single-precision floating-point. This would speed-up the training on GPUs (not used in this work) where the number of single-precision floating-point units (FPUs) is higher than&#x20;CPU.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Structure of the hybrid multigrid solver combining the DL and Gauss-Seidel solvers. Pre-trained networks are pre-computed and used to initialize the DL network. Two main parameters <italic>ftol</italic>, <italic>&#x3b4;</italic> determine the accuracy and the performance of the hybrid solver.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g008.tif"/>
</fig>
<p>The hybrid DL solver comprises six basic steps, represented in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>:<list list-type="simple">
<list-item>
<p>1) Initialize the network weights and biases - We load from the disk the network structure and initialize the network. To accelerate the convergence, we rely on transfer-learning: we train a network to solve a similar problem and initialize the network. It is important that the same governing equations, boundary conditions and architecture are used. The weights and biases are in single floating-point precision. The time for completing this step is negligible with respect to the total time of the hybrid solver.</p>
</list-item>
<list-item>
<p>2) Train with Adam Optimizer (10 Epochs) - We run the Adam optimizer just for a short number of epochs to avoid the consequent L-BFGS-B optimizer converging quickly to the wrong solution (local minimum). By running several tests, we found empirically that only 10 Adams epochs are needed to avoid L-BFGS-B optimizer to converge to the wrong solution. The time for completing this step is typically negligible.</p>
</list-item>
<list-item>
<p>3) Train with L-BFGS-B Optimizer - We run the training with the L-BFGS-B optimizer. The stopping criterium is determined by the <italic>ftol</italic> parameter: the training stops when (<italic>r</italic>
<sub>
<italic>k</italic>
</sub> &#x2212; <italic>r</italic>
<sub>
<italic>k</italic>&#x2b;1</sub>)/max(&#x7c;<italic>r</italic>
<sub>
<italic>k</italic>
</sub>&#x7c;, &#x7c;<italic>r</italic>
<sub>
<italic>k</italic>&#x2b;1</sub>&#x7c;, 1) &#x2264; <italic>ftol</italic>, where <italic>k</italic> is the iteration of the optimizer and <italic>r</italic> is the value of the function to be optimized (in our case the residual function). Typically, the time for completing the L-BFGS-B dominates is a large part of the execution time of the hybrid solver. To compete with traditional approaches for solving Poisson equation, we set a maximum number of epochs to&#x20;1,000.</p>
</list-item>
<list-item>
<p>4) DL solver is obtained at the end of the training process - The solver can inference the solution at given collocation points or save it for future transfer-learning tasks, e.g., a simulation repeats the computation of the Poisson equation at different time&#x20;steps.</p>
</list-item>
<list-item>
<p>5) The Approximator/Surrogate Network is used to calculate the solution on the coarse grid of the multigrid solver - We calculate the solution of our problem on the coarse grid of a multigrid solver. This operation is carried with single-precision floating point numbers since high-accuracy is not needed in this step. The result is then cast to double precision for the successive Gauss-Seidel solver. This inference computational time is typically negligible when compared to the total execution&#x20;time.</p>
</list-item>
<list-item>
<p>6) Refine the solution with the Gauss-Seidel Method on the coarse grid and interpolate on fine grids - We perform first Gauss-Seidel iterations to refine the solution on the coarse grid. This solution refinement is critical to remove the vanishing-gradient problem at the boundary. The Gauss-Seidel iteration on the coarse grid stops when &#x2016;<italic>u</italic>
<sup>
<italic>n</italic>&#x2b;1</sup> &#x2212; <italic>u</italic>
<sup>
<italic>n</italic>
</sup>&#x2016;<sub>2</sub> &#x2264; <italic>&#x3b4;</italic> where <italic>n</italic> is the iteration number. After the Gauss-Seidel method stops on the coarse grid, linear interpolation to finer grids and a Gauss-Seidel iteration per grid are computed. As example, to solve the problem on a 512&#x20;&#xd7; 512 grid, we perform the following steps:</p>
</list-item>
<list-item>
<p>1) use the DL solver to calculate the solution on 64&#x20;&#xd7; 64 grid;</p>
</list-item>
<list-item>
<p>2) refine the solution with the Gauss-Seidel method on the 64&#x20;&#xd7; 64 grid until convergence is reached;</p>
</list-item>
<list-item>
<p>3) carry out a linear interpolation to the 128x128 grid;</p>
</list-item>
<list-item>
<p>4) perform a Gauss-Seidel iteration on the 128&#x20;&#xd7; 128 grid;</p>
</list-item>
<list-item>
<p>5) carry out a linear interpolation to 256&#x20;&#xd7; 256&#x20;grid,</p>
</list-item>
<list-item>
<p>6) perform a Gauss-Seidel iteration on the 256&#x20;&#xd7; 256&#x20;grid,</p>
</list-item>
<list-item>
<p>7) carry out linear interpolation to 512&#x20;&#xd7; 512&#x20;grid,</p>
</list-item>
<list-item>
<p>8) perform a final Gauss-Seidel iteration on the 512&#x20;&#xd7; 512&#x20;grid.</p>
</list-item>
</list>
</p>
<p>The interpolation and Gauss-Seidel iterations corresponds to the V-cycle in the multigrid method as shown in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>.</p>
<p>We test the hybrid modified solver against the same problem shown in <xref ref-type="sec" rid="s2-1">Section 2.1</xref>: we solve the Poisson equation with source term of <xref ref-type="disp-formula" rid="e7">Eq. (7)</xref>. Leveraging the knowledge gained in the characterization study of <xref ref-type="sec" rid="s3">Section 3</xref>, we use a four hidden layer fully-connected neural network with 50 neurons per hidden layer. To optimize the convergence for solving the Poisson equation with a smooth source term, we rely on LAAF-5 tanh activation functions: these activations functions provided the best performance in our characterization study. For the transfer learning, we pre-train a network for 2,000 Adam optimizer epochs and 5,000&#xa0;L-BFGS-B optimizer epochs to solve a Poisson equation with a source term equal to &#x2212;2&#x2009;sin(<italic>&#x3c0;x</italic>)&#x2009;sin(<italic>&#x3c0;y</italic>) &#x2212; 72&#x2009;sin(6<italic>&#x3c0;x</italic>)&#x2009;sin(6<italic>&#x3c0;y</italic>). We use an input data set consisting of 100&#x20;&#xd7; 100 points in the integration domain and 2,000 points on the boundaries for the DL solver. We use the Sobol sequence as training data set distribution. The network weights and biases for transfer learning are saved as checkpoint/restart files in TensorFlow.</p>
<p>For the first test, we employ a 512&#x20;&#xd7; 512 grid with a 64&#x20;&#xd7; 64 coarse grid, <italic>ftol</italic> equal to 1E-4 and <italic>&#x3b4;</italic> equal to 1E-6. We then test the hybrid multigrid solver on a 1,024&#x20;&#xd7; 1,024 grid with a 128&#x20;&#xd7; 128 coarse grid, <italic>ftol</italic> equal to 1E-4 and two values for <italic>&#x3b4;</italic>: 1E-5 and 1E-4. <xref ref-type="fig" rid="F9">Figure&#x20;9</xref> shows a contour plot the error (<inline-formula id="inf10">
<mml:math id="m20">
<mml:mi>u</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>) for these three configurations. The error norm in the three configurations is 0.11, 0.19, and 0.86, respectively. The maximum error for the hybrid multigrid solver is of the 1E-4 order and less than the error we obtained after extensive training of a basic PINN (approximately 1E-3, see the bottom right panel of <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). For comparison, the error norm for the PETSc CG using a 1024x1024 with <italic>rtol</italic> (the relative to the initial residual norm convergence tolerance) equal to 1E-2 and 1E-3 is 5.7E-5 and 5.5E-6, respectively.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Hybrid multigrid solver final error (<inline-formula id="inf11">
<mml:math id="m21">
<mml:mi>u</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo>&#x303;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>) using three different setups: 1&#x2013;512&#x20;&#xd7; 512 grid with a 64&#x20;&#xd7; 64 coarse grid, <italic>ftol</italic> equal to 1E-4 and <italic>delta</italic> equal to 1E-6; two and 3&#x2013;1,024&#x20;&#xd7; 1,024 grid with a 128&#x20;&#xd7; 128 coarse grid, <italic>ftol</italic> equal to 1E-4 and <italic>&#x3b4;</italic> equal to 1E-5 and 1E-4.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g009.tif"/>
</fig>
<p>Once we showed that the hybrid multigrid solver provides more accurate results than the direct PINN usage, we focus on studying the computational performance. The performance tests are carried out on a 2.9&#xa0;GHz Dual-Core Intel Core i5, 16&#xa0;GB 2133&#xa0;MHz LPDDR3 using macOS Catalina 10.15.7. We use Python 3.7.9, TensorFlow 2.4.0, SciPy 1.5.4 and the DeepXDE DSL. The Gauss-Seidel iteration is implemented in Cython (<xref ref-type="bibr" rid="B11">Gorelick and Ozsvald, 2020</xref>) to improve the performance and avoid time-consuming loops in Python. For comparison, we also solve the problem using only the Gauss-Seidel method to solve the problem on the coarse grid and using the petsc4py CG solver. The PETSc version is 3.14.2. We repeat the tests five times, and report the arithmetic average of the execution times. We do not report error bars as the standard deviation is less than 5% of the average value. <xref ref-type="fig" rid="F10">Figure&#x20;10</xref> shows the execution time together with number of epochs and iterations for the three different configurations.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Execution time, number of epochs and iterations for the hybrid multigrid DL-GS solver and comparison with the performance of a multigrid using only GS and petsc4py CG varying the resolution, and solver stopping criteria. The hybrid multigrid DL-GS solver is faster for problems using larger coarse grids, e.g. 128&#x20;&#xd7; 128 coarse grids, than the other approaches.</p>
</caption>
<graphic xlink:href="fdata-04-669097-g010.tif"/>
</fig>
<p>The most important result is that by using an optimized configuration, transfer learning, and integrating DL technologies into traditional approaches, we can now solve the Poisson equation with an acceptable precision with a reduced number of training iterations. This reduction of number of training epochs translates to complete the problem, presented in <xref ref-type="sec" rid="s2-1">Section 2.1</xref>, in less than few minutes instead of hours (see <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>) on the Intel i5 system. While the execution depends on the specific hardware platform and implementation, the number of training epochs and GS iterations on the coarse grid (reported on the top of the histogram bars in <xref ref-type="fig" rid="F10">Figure&#x20;10</xref>) are not. Overall, we found that 133 epochs are needed for the L-BFGS-B optimizer to reach an <italic>ftol</italic> equal to 1E-4.</p>
<p>
<xref ref-type="fig" rid="F10">Figure&#x20;10</xref> histograms also show the breakdown between the time spent in the DL and Gauss-Seidel solvers used in the multigrid V-cycle. Note that the execution time for the DL solver is approximately the same for calculating the values on the two coarse grids: 64&#x20;&#xd7; 64 and 128&#x20;&#xd7; 128. This is because of PINN are <italic>gridless</italic> methods: only the negligible inference computational cost is different. For comparison, we show the performance of the Gauss-Seidel solver for the coarse grid (orange bars) and py4petsc CG solver petsc4py (yellow bars) with different <italic>rtol</italic> values. When the coarse grid is small, e.g., 64&#x20;&#xd7; 64, the cost for training the DL solver is higher than using a basic method such Gauss-Seidel: using the Gauss-Seidel method for the coarse grid is faster than using the DL solver for the coarse grid. However, for larger coarser grids, e.g., 128&#x20;&#xd7; 128, the hybrid multigrid solver is fastest. For comparison, we present the results obtained running the petsc4py CG with different <italic>rtol</italic> values. Overall, the performance of the hybrid solver is competitive with state-of-the-art linear solvers. We note that none of the methods and codes have been optimized nor compared at same accuracy (the stopping criteria are defined differently for different solvers), so the performance results provide an indication of potential of the hybrid solver without providing absolute performance values.</p>
</sec>
<sec id="s6">
<title>6 Discussion and Conclusion</title>
<p>This paper presented a study to evaluate the potential of emerging new DL technologies to replace or accelerate old traditional approaches when solving the Poisson equation. We show that directly replacing traditional methods with PINNs results in limited accuracy and a long training period. Setting up an appropriate configuration of depth, activation functions, input data set distribution, and leveraging transfer-learning could effectively optimize the PINNs solver. However, PINNs are still far from competing with HPC solvers, such as PETSc CG. In summary, PINNs in the current state cannot yet replace traditional approaches.</p>
<p>However, while the direct usage of PINN in scientific applications is still far from meeting computational performance and accuracy requirements, hybrid strategies integrating PINNs with traditional approaches, such as multigrid and Gauss-Seidel methods, are the most promising option for developing a new class of solvers in scientific applications. We showed the first performance results of such hybrid approaches on the par with other state-of-the-art solver implementations, such as PETSc. While we applied PINNs to solve linear problems resulting from discretizing the Poisson equation, a very promising research area is PINN exploiting non-linear activation functions to solve non-linear systems.</p>
<p>When considering the potential for PINNs of using new emerging heterogeneous hardware, PINNs could benefit from the usage of GPUs that are workforce for DL workloads. It is likely that with the usage of GPUs, the performance of hybrid solvers can outperform state-of-the-art HPC solvers. However, PINN DSL frameworks currently rely on SciPy CPU implementation of the key PINN optimizer, L-BFGS-B, and its GPU implementation is not available in SciPy. The new TensorFlow two Probability framework<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref> provides a BFGS optimizer that can be used on GPUs. We note that Nvidia introduced the new SimNet framework for the PINN training inference on Nvidia GPUs (<xref ref-type="bibr" rid="B14">Hennigh et&#x20;al., 2021</xref>). Another interesting research direction is investigating the role and impact of the low and mixed-precision calculations to train the approximator network. The usage of low-precision formats would allow us to use tensorial computational units, such as tensor cores in Nvidia GPUs (<xref ref-type="bibr" rid="B24">Markidis et&#x20;al., 2018</xref>) and Google TPUs (<xref ref-type="bibr" rid="B16">Jouppi et&#x20;al., 2017</xref>), boosting the DL training performance.</p>
<p>From the algorithmic point of view, a line of research we would like to pursue is a better and more elegant integration of the DL approaches into traditional solvers. One possibility is to extend the seminal work on discrete PINNs (<xref ref-type="bibr" rid="B39">Raissi et&#x20;al., 2019</xref>) combining Runge-Kutta solvers and PINN for ODE solutions: a similar approach could be sought to encode information about discretization points into PINN. However, currently, this approach is supervised and requires the availability of simulation data. In addition, the development of specific network architectures for solving specific PDEs is a promising area of research. A limitation of this work is that we considered only fully-connected networks as surrogate network architectures. For solving the Poisson equation and elliptic problems in general, the usage of convolutional networks with large and dilated kernels is likely to provide better performance of fully-connected DL networks to learn non-local relationships a signature of elliptic problems (<xref ref-type="bibr" rid="B23">Luna and Blaschke, 2020</xref>).</p>
<p>The major challenge is integrating these new classes of hybrid DL and traditional approaches, developed in Python, into large scientific codes and libraries, often written in Fortran and C/C&#x2b;&#x2b;. One possibility is to bypass the Python interface of major DL frameworks and use their C&#x2b;&#x2b; runtime directly. However, this task is complex. An easier path for the software integration of DL solvers into legacy HPC applications is highly needed.</p>
<p>Despite all these challenges and difficulties ahead, this paper shows that the integration of new PINNs DL approaches into <italic>old</italic> traditional HPC approaches for scientific applications will play an essential role in the development of next-generation solvers for linear systems arising from differential equations.</p>
</sec>
</body>
<back>
<sec id="s7">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>The author confirms being the sole contributor of this work and has approved it for publication.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>Funding for the work is received from the European Commission H2020 program, Grant Agreement No. 801039 (EPiGRAM-HS).</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of Interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.tensorflow.org/probability">https://www.tensorflow.org/probability</ext-link>.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Abadi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Barham</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). &#x201c;<article-title>Tensorflow: A System for Large-Scale Machine Learning</article-title>,&#x201d; in <conf-name>12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>265</fpage>&#x2013;<lpage>283</lpage>. </citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Aguilar</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Markidis</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2021</year>). &#x201c;<article-title>A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations</article-title>,&#x201d; in <source>2021 IEEE International Conference on Cluster Computing (CLUSTER)</source>. <publisher-loc>Portland, OR</publisher-loc>: <publisher-name>IEEE</publisher-name>, <fpage>692</fpage>&#x2013;<lpage>697</lpage>. </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Balay</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Abhyankar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Brune</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Buschelman</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <source>Petsc Users Manual</source>. <publisher-loc>Argonne, IL</publisher-loc>: <publisher-name>Argonne National Laboratory</publisher-name>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baydin</surname>
<given-names>A. G.</given-names>
</name>
<name>
<surname>Pearlmutter</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>Radul</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Siskind</surname>
<given-names>J.&#x20;M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Automatic Differentiation in Machine Learning: a Survey</article-title>. <source>J.&#x20;machine Learn. Res.</source> <volume>18</volume>, <fpage>1</fpage>. </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dalcin</surname>
<given-names>L. D.</given-names>
</name>
<name>
<surname>Paz</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Kler</surname>
<given-names>P. A.</given-names>
</name>
<name>
<surname>Cosimo</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Parallel Distributed Computing Using python</article-title>. <source>Adv. Water Resour.</source> <volume>34</volume>, <fpage>1124</fpage>&#x2013;<lpage>1139</lpage>. <pub-id pub-id-type="doi">10.1016/j.advwatres.2011.04.013</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Eivazi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Tahani</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schlatter</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Vinuesa</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2021</year>). <source>Physics-informed Neural Networks for Solving reynolds-averaged Navier-Stokes Equations</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2107.10711</publisher-name>. </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Esmaeilzadeh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Azizzadenesheli</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kashinath</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Mustafa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tchelepi</surname>
<given-names>H. A.</given-names>
</name>
<name>
<surname>Marcus</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). &#x201c;<article-title>Meshfreeflownet: A Physics-Constrained Deep Continuous Space-Time Super-resolution Framework</article-title>,&#x201d; in <conf-name>SC20: International Conference for High Performance Computing, Networking, Storage and Analysis</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>15</lpage>. </citation>
</ref>
<ref id="B8">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fletcher</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2013</year>). <source>Practical Methods of Optimization</source>. <publisher-name>John Wiley &#x26; Sons</publisher-name>. </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.-X.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Phygeonet: Physics-Informed Geometry-Adaptive Convolutional Neural Networks for Solving Parametric Pdes on Irregular Domain</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2004.13145</publisher-name>. </citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Courville</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Deep Learning, Vol. 1</source>. <publisher-name>MIT press Cambridge</publisher-name>. </citation>
</ref>
<ref id="B11">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gorelick</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ozsvald</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2020</year>). <source>High Performance Python: Practical Performant Programming for Humans</source>. <publisher-loc>Sebastopol, CA</publisher-loc>: <publisher-name>O&#x2019;Reilly Media</publisher-name>. </citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gulli</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Pal</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Deep Learning with Keras</source>. <publisher-loc>Birmingham, UK</publisher-loc>: <publisher-name>Packt Publishing Ltd</publisher-name>. </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Haghighat</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Juanes</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Sciann: A Keras Wrapper for Scientific Computations and Physics-Informed Deep Learning Using Artificial Neural Networks</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2005.08803</publisher-name>. </citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hennigh</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Narasimhan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nabian</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Subramaniam</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tangsali</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). &#x201c;<article-title>Nvidia Simnet: An Ai-Accelerated Multi-Physics Simulation Framework</article-title>,&#x201d; in <conf-name>International Conference on Computational Science</conf-name> (<publisher-name>Springer</publisher-name>), <fpage>447</fpage>&#x2013;<lpage>461</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-77977-1_36</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jagtap</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Kawaguchi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Em Karniadakis</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Locally Adaptive Activation Functions with Slope Recovery for Deep and Physics-Informed Neural Networks</article-title>. <source>Proc. R. Soc. A.</source> <volume>476</volume>, <fpage>20200334</fpage>. <pub-id pub-id-type="doi">10.1098/rspa.2020.0334</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jouppi</surname>
<given-names>N. P.</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Patil</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Agrawal</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bajwa</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). &#x201c;<article-title>In-datacenter Performance Analysis of a Tensor Processing Unit</article-title>,&#x201d; in <conf-name>Proceedings of the 44th Annual International Symposium on Computer Architecture</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>12</lpage>. </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kelley</surname>
<given-names>C. T.</given-names>
</name>
</person-group> (<year>1995</year>). <source>Iterative Methods for Linear and Nonlinear Equations</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>SIAM</publisher-name>. </citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kharazmi</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Hp-Vpinns: Variational Physics-Informed Neural Networks with Domain Decomposition</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2003.05385</publisher-name>. </citation>
</ref>
<ref id="B19">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kharazmi</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Variational Physics-Informed Neural Networks for Solving Partial Differential Equations</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1912.00873</publisher-name>. </citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname>
<given-names>S. K.</given-names>
</name>
</person-group> (<year>2017</year>). <source>On Weight Initialization in Deep Neural Networks</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1704.08863</publisher-name>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lagaris</surname>
<given-names>I. E.</given-names>
</name>
<name>
<surname>Likas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fotiadis</surname>
<given-names>D. I.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Artificial Neural Networks for Solving Ordinary and Partial Differential Equations</article-title>. <source>IEEE Trans. Neural Netw.</source> <volume>9</volume>, <fpage>987</fpage>&#x2013;<lpage>1000</lpage>. <pub-id pub-id-type="doi">10.1109/72.712178</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Deepxde: A Deep Learning Library for Solving Differential Equations</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1907.04502</publisher-name>. </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Luna</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Blaschke</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Accelerating Gmres with Deep Learning in Real-Time</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>Supercomputing Posters</publisher-name>. </citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Markidis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Der Chien</surname>
<given-names>S. W.</given-names>
</name>
<name>
<surname>Laure</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>I. B.</given-names>
</name>
<name>
<surname>Vetter</surname>
<given-names>J.&#x20;S.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Nvidia Tensor Core Programmability, Performance &#x26; Precision</article-title>,&#x201d; in <conf-name>2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>522</fpage>&#x2013;<lpage>531</lpage>. <pub-id pub-id-type="doi">10.1109/ipdpsw.2018.00091</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Markidis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lapenta</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Uddin</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Multi-scale Simulations of Plasma with Ipic3d</article-title>. <source>Mathematics Comput. Simulation</source> <volume>80</volume>, <fpage>1509</fpage>&#x2013;<lpage>1519</lpage>. <pub-id pub-id-type="doi">10.1016/j.matcom.2009.08.038</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Markidis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Podobas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jongsuebchoke</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bengtsson</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Herman</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Automatic Particle Trajectory Classification in Plasma Simulations</article-title>,&#x201d; in <conf-name>2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>64</fpage>&#x2013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1109/mlhpcai4s51975.2020.00014</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Molinaro</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020a</year>). <source>Estimates on the Generalization Error of Physics Informed Neural Networks (PINNs) for Approximating PDEs</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv e-prints, arXiv:2006.16144</publisher-name>. </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Molinaro</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020b</year>). <source>Estimates on the Generalization Error of Physics Informed Neural Networks (PINNs) for Approximating PDEs Ii: A Class of Inverse Problems</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2007.01138</publisher-name>. </citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Morton</surname>
<given-names>K. W.</given-names>
</name>
<name>
<surname>Mayers</surname>
<given-names>D. F.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Numerical Solution of Partial Differential Equations: An Introduction</source>. <publisher-name>Cambridge University Press</publisher-name>. </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nascimento</surname>
<given-names>R. G.</given-names>
</name>
<name>
<surname>Viana</surname>
<given-names>F. A.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Fleet Prognosis with Physics-Informed Recurrent Neural Networks</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1901.05512</publisher-name>. </citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Offermans</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Marin</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Schanen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Schlatter</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). &#x201c;<article-title>On the strong Scaling of the Spectral Element Solver Nek5000 on Petascale Systems</article-title>,&#x201d; in <conf-name>Proceedings of the Exascale Applications and Software Conference</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1145/2938615.2938617</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Fpinns: Fractional Physics-Informed Neural Networks</article-title>. <source>SIAM J.&#x20;Sci. Comput.</source> <volume>41</volume>, <fpage>A2603</fpage>&#x2013;<lpage>A2626</lpage>. <pub-id pub-id-type="doi">10.1137/18m1229845</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Paszke</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gross</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Massa</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lerer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bradbury</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chanan</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <source>Pytorch: An Imperative Style, High-Performance Deep Learning Library</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1912.01703</publisher-name>. </citation>
</ref>
<ref id="B34">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Paul F. Fischer</surname>
<given-names>J.&#x20;W. L.</given-names>
</name>
<name>
<surname>Kerkemeier</surname>
<given-names>S. G.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>nek5000 Web page</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://nek5000.mcs.anl.gov">http://nek5000.mcs.anl.gov</ext-link>
</comment> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Psichogios</surname>
<given-names>D. C.</given-names>
</name>
<name>
<surname>Ungar</surname>
<given-names>L. H.</given-names>
</name>
</person-group> (<year>1992</year>). <article-title>A Hybrid Neural Network-First Principles Approach to Process Modeling</article-title>. <source>Aiche J.</source> <volume>38</volume>, <fpage>1499</fpage>&#x2013;<lpage>1511</lpage>. <pub-id pub-id-type="doi">10.1002/aic.690381003</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Quarteroni</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sacco</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Saleri</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2010</year>). <source>Numerical Mathematics, Vol. 37</source>. <publisher-name>Springer Science &#x26; Business Media</publisher-name>. </citation>
</ref>
<ref id="B37">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Raissi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Perdikaris</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2017a</year>). <source>Physics Informed Deep Learning (Part I): Data-Driven Solutions of Nonlinear Partial Differential Equations</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1711.10561</publisher-name>. </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Raissi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Perdikaris</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2017b</year>). <source>Physics Informed Deep Learning (Part Ii): Data-Driven Discovery of Nonlinear Partial Differential Equations</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1711.10566</publisher-name>. </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Raissi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Perdikaris</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Physics-informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations</article-title>. <source>J.&#x20;Comput. Phys.</source> <volume>378</volume>, <fpage>686</fpage>&#x2013;<lpage>707</lpage>. <pub-id pub-id-type="doi">10.1016/j.jcp.2018.10.045</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Raissi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Yazdani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Karniadakis</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Hidden Fluid Mechanics: Learning Velocity and Pressure fields from Flow Visualizations</article-title>. <source>Science</source> <volume>367</volume>, <fpage>1026</fpage>&#x2013;<lpage>1030</lpage>. <pub-id pub-id-type="doi">10.1126/science.aaw4741</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ramachandran</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zoph</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>Q. V.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Searching for Activation Functions</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1710.05941</publisher-name>. </citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rao</surname>
<given-names>S. S.</given-names>
</name>
</person-group> (<year>2017</year>). <source>The Finite Element Method in Engineering</source>. <publisher-name>Butterworth-heinemann</publisher-name>. </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rubinstein</surname>
<given-names>R. Y.</given-names>
</name>
<name>
<surname>Kroese</surname>
<given-names>D. P.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Simulation and the Monte Carlo Method, Vol. 10</source>. <publisher-name>John Wiley &#x26; Sons</publisher-name>. </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sobo&#x13a;</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Quasi-monte Carlo Methods</article-title>. <source>Prog. Nucl. Energ.</source> <volume>24</volume>, <fpage>55</fpage>&#x2013;<lpage>61</lpage>. </citation>
</ref>
<ref id="B45">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Trottenberg</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Oosterlee</surname>
<given-names>C. W.</given-names>
</name>
<name>
<surname>Schuller</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2000</year>). <source>Multigrid</source>. <publisher-name>Elsevier</publisher-name>. </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van Der Spoel</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lindahl</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Hess</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Groenhof</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Mark</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Berendsen</surname>
<given-names>H. J.&#x20;C.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Gromacs: Fast, Flexible, and Free</article-title>. <source>J.&#x20;Comput. Chem.</source> <volume>26</volume>, <fpage>1701</fpage>&#x2013;<lpage>1718</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.20291</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Virtanen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Gommers</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Oliphant</surname>
<given-names>T. E.</given-names>
</name>
<name>
<surname>Haberland</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Reddy</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Cournapeau</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Scipy 1.0: Fundamental Algorithms for Scientific Computing in python</article-title>. <source>Nat. Methods</source> <volume>17</volume>, <fpage>261</fpage>&#x2013;<lpage>272</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0686-2</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bentivegna</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Klein</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Elmegreen</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2020a</year>). <source>Physics-informed Neural Network Super Resolution for Advection-Diffusion Models</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2011.02519</publisher-name>. </citation>
</ref>
<ref id="B49">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Teng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Perdikaris</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2020b</year>). <source>Understanding and Mitigating Gradient Pathologies in Physics-Informed Neural Networks</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:2001.04536</publisher-name>. </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weiss</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Khoshgoftaar</surname>
<given-names>T. M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A Survey of Transfer Learning</article-title>. <source>J.&#x20;Big Data</source> <volume>3</volume>, <fpage>1</fpage>&#x2013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1186/s40537-016-0043-6</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>Z.-Q. J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks</source>. <publisher-loc>New York City, NY</publisher-loc>: <publisher-name>arXiv preprint arXiv:1901.06523</publisher-name>. </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>M&#xfc;ller</surname>
<given-names>E. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Generating a Machine-Learned Equation of State for Fluid Properties</article-title>. <source>J.&#x20;Phys. Chem. B</source> <volume>124</volume>, <fpage>8628</fpage>&#x2013;<lpage>8639</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpcb.0c05806</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>