Explainable Artificial Intelligence for Mechanics: Physics-Explaining Neural Networks for Constitutive Models

(Artificial) neural networks have become increasingly popular in mechanics and materials sciences to accelerate computations with model order reduction techniques and as universal models for a wide variety of materials. However, the major disadvantage of neural networks remains: their numerous parameters are challenging to interpret and explain. Thus, neural networks are often labeled as black boxes, and their results often elude human interpretation. The new and active field of physics-informed neural networks attempts to mitigate this disadvantage by designing deep neural networks on the basis of mechanical knowledge. By using this a priori knowledge, deeper and more complex neural networks became feasible, since the mechanical assumptions can be explained. However, the internal reasoning and explanation of neural network parameters remain mysterious. Complementary to the physics-informed approach, we propose a first step towards a physics-explaining approach, which interprets neural networks trained on mechanical data a posteriori. This proof-of-concept explainable artificial intelligence approach aims at elucidating the black box of neural networks and their high-dimensional representations. Therein, the principal component analysis decorrelates the distributed representations in cell states of RNNs and allows the comparison to known and fundamental functions. The novel approach is supported by a systematic hyperparameter search strategy that identifies the best neural network architectures and training parameters. The findings of three case studies on fundamental constitutive models (hyperelasticity, elastoplasticity, and viscoelasticity) imply that the proposed strategy can help identify numerical and analytical closed-form solutions to characterize new materials.

supervised learning. Due to the large variety of layers and cells, neural networks are highly modular and successful for many applications (LeCun et al., 2015).
In mechanics and materials sciences, machine learning algorithms accelerate the development of new materials (Bock et al., 2019), provide model order reduction techniques and enable data-driven constitutive models. Likewise, machine and deep learning algorithms have become an active field of research in the related domain of tribology Argatov (2019); Argatov and Chai (2021). As one of the first works in mechanics, Ghaboussi et al. (1991) proposed a unified constitutive model with shallow neural networks that learn from experimental data. Further extensions reduced the required number of experimental samples , adjusted the hidden layer dimensionality during training (Ghaboussi and Sidarta, 1998), and approximated the stiffness matrix (Hashash et al., 2004). Theocaris and Panagiotopoulos (1995) used dense neural networks to model kinematic hardening (Theocaris and Panagiotopoulos, 1995) and identify parameters for the failure mode of anisotropic materials (Theocaris et al., 1997). Shin and Pande (2000) and Javadi et al. (2003); Javadi and Rezania (2009) proposed "intelligent finite elements", neural network constitutive models that were applied to soils under cyclic loading and tunneling processes. Using context neurons in Recurrent Neural Networks (RNNs), Oeser and Freitag (2009), Graf et al. (2012), and Freitag et al. (2013) modeled elastoplastic and viscoelastic materials with fuzzy parameters. Bessa et al. (2017) proposed a data-driven analysis framework for materials with uncertain material parameters. For cantilever beams, Sadeghi and Lotfan (2017) used neural networks for nonlinear system identification and parameter estimation, while Koeppe et al. (2016) used dense neural networks to predict the displacement response. Extensions to continuum models resulted in linear "intelligent meta element" models of a cantilever beam (Koeppe et al., 2018a). To bridge the gap between atomistic and continuum mechanics, Teichert et al. (2019) trained integrable deep neural networks on atomistic scale models and successively approximated free energy functions. Stoffel et al. (2018) used dense neural networks to fit material data from high-velocity shock-wave tube experiments. Heider et al. (2020) investigated the frame invariance for graphbased neural networks predicting anisotropic elastoplastic material behavior. Huang et al. (2020) combined the proper orthogonal decomposition, manual history variables, and dense neural networks for hyperelasticity and plasticity. For materials applications, various data science methods have been successfully applied to simulation data and synthesized microstructures to analyze, characterize, and quantifify (e.g., Zhao et al. (2020); Altschuh et al. (2017)).
Despite the efforts to combine artificial intelligence and mechanics, the main disadvantage of neural networks remains: the learned parameters in black-box neural networks are challenging to interpret and explain (Breiman, 2001). Their high-level representations in deeper layers often elude human interpretation: what the neural network understood and how the individual parameter values can be explained remains incomprehensible. Coupled with insufficient data and limited computation capacities for past efforts, engineers and scientists mistrusted neural networks in favor of simpler models, whose fewer parameters could be easily interpreted and explained. Occam's razor, the well-known problem-solving principle by William of Ockham (ca. 1,347), became an almost dogmatic imperative: the simpler model with fewer parameters must be chosen if two models are equally accurate. However, with the advent of deep learning, the concept of simple models has become ambiguous. The shared representations common in dense neural networks require potentially infinite numbers of parameters to model arbitrary functions (Hornik et al., 1989). To handle the large dimensionality of mechanical spatio-temporal data, extensive parameter sharing, utilized by recursions (Freitag et al., 2013;Freitag et al., 2017;Koeppe et al., 2017;Koeppe et al., 2018b;Koeppe et al., 2019) and convolutions (Koeppe et al., 2020a;Koeppe et al., 2020b;Wu P. et al., 2020), introduces assumptions based on prior knowledge and user-defined parameters, i.e., hyperparameters, to reduce the number of trainable parameters.
By deriving the prior assumptions on the deep learning algorithm from mechanical knowledge, the recent trend in computational mechanics, enhanced by neural networks, aims towards physics-informed neural networks. Lagaris et al. (1998) and Aarts and van der Veer (2001) imposed boundary and initial conditions on dense neural networks, which were trained to solve unconstrained partial differential equations by differentiating the neural network graphs. In Ramuhalli et al. (2005), the authors designed their neural network by embedding a finite element model, which used the neural network's hidden layer to predict stiffness components and the output layer to predict the external force. Baymani et al. (2010) split the Stokes problem into three Poisson problems, approximated by three neural networks with superimposed boundary condition terms. In Rudd et al. (2014), a constrained backpropagation approach ensures that the boundary conditions are satisfied at each training step. The physics-informed neural network approach, as proposed by Raissi et al. (2019), employed two neural networks that contributed to the objective function. One physicsinformed neural network mimicked the balance equation, while the second neural network approximated the potential function and pressure field. Yang and Perdikaris (2019) employed adversarial training to improve the performance of physics-informed neural networks. In Kissas et al. (2020), physics-informed neural networks model arterial blood pressure with magnetic resonance imaging data. Using the FEM to inspire a novel neural network architecture, Koeppe et al. (2020a) developed a deep convolutional recurrent neural network architecture that maps Dirichlet boundary constraints to force response and discretized field quantities in intelligent meta elements. Yao et al. (2020) used physicsinformed deep convolutional neural networks to predict mechanical responses. Thus, using physics-informed approaches, neural networks with more parameters became feasible, whose architecture could be designed and explained using prior mechanical knowledge. However, the internal reasonings of the neural networks remained challenging to understand.
Complementary to the physics-informed approach, this work constitutes a new proof-of-concept approach towards a search approach radically different from the aforementioned design approach. Inspired by the explainable Artificial Intelligence (AI) research field (Bach et al., 2015;Arras et al., 2017;Alber et al., 2018;Montavon et al., 2018;Montavon et al., 2019;, this novel approach elucidates the black box of neural networks to provide physics-explaining neural networks that complement the existing physics-informed neural networks. As a result, more powerful neural networks may be employed with confidence, since both the architectural design and the learned parameters can be explained.
This work has the objective to efficiently and systematically search neural network architectures for fundamental onedimensional constitutive models and explain the trained neural networks. For this explanation, we propose a novel explainable AI method that uses the principal component analysis to interpret the cell states in RNNs. For the search, we define a wide hyperparameter search domain, implement an efficient search algorithm, and apply it to hyperelastic, viscoelastic, and elastoplastic constitutive models. After the search, the best neural network architectures are trained until they achieve low errors, and their generalization capabilities are tested. Finally, the developed explainable AI approach compares the temporal behavior of RNN cell states to known solutions to the fundamental physical problem, thereby demonstrating that neural networks without prior physical assumptions have the potential to inform mechanical research.
To the best of the authors' knowledge, this work constitutes a first proof-of-concept study proposing a dedicated explainable AI strategy for neural networks in mechanics, as well as a novel approach within the general field of explainable AI. Unique to RNNs, which are popular in mechanics (Graf et al., 2010;Freitag et al., 2011, Freitag et al., 2017Cao et al., 2016;Koeppe et al., 2018bKoeppe et al., , 2019Wu L. et al., 2020), this work complements popular strategies for classification problems that investigate kernels in convolutional neural networks or investigate the data flow through neural networks, such as layer-wise relevance propagation (Bach et al., 2015;Montavon et al., 2019;. Moreover, our new approach complements unsupervised approaches to find parsimonious and interpretable models for hyperelastic material behavior (Flaschel et al., 2021), which are one of the most recent representatives of the principle of simplicity. Finally, the systematic hyperparameter search strategy offers an alternative strategy to self-designing neural networks, which have successfully modeled anisotropic and elastoplastic constitutive behavior (Fuchs et al., 2021;Heider et al., 2021).
In Section 2.1, we briefly review the necessary preliminaries for RNN training as the foundation for the following sections. Section 2.2 details the systematic hyperparameter search strategy and explainable AI approach for data-driven constitutive models. For fundamental constitutive models, Section 3 demonstrates the developed approaches in three case studies. Finally, Section 4 discusses the approach and compares the results to related publications.

Training Artificial Neural Networks
Dense feedforward neural networks are the fundamental building block of neural networks. They represent nonlinear parametric mappings NN from inputs x to predictionsŷ with L consecutive layers and trainable parameters θ: For dense feedforward neural networks, the parameters θ include the layer weights W (l) and biases b (l) . Each layer l applies a linear transformation to the layer inputs x (l) , before a nonlinear activation function f (l) returns the layer activations a (l) Assembling and vectorizing these consecutive layers from x ≡a (0) toŷ ≡ a (L) enables fast computations on Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs).
Using supervised learning, neural network training identifies optimal parameters θ opt , which minimize the difference between desired outputs and neural network predictions. From a wide variety of input-target samples (x, y), the task of predicting physical responsesŷ ≈ y can be reformulated as a machine learning problem, where the prediction represents the expectation of the data probability distribution p data (y | x): (3) Assuming a single-peak distribution, e.g., a normal distribution p data (y | x) ≈ N (y | x), the maximum likelihood estimation principle (cf. Goodfellow et al. (2016)) yields a compatible loss function L(y,ŷ) for the prediction of physical values, such as the Mean Squared Error (MSE) averaged for batch size M and number of features F, i.e., L(y,ŷ) : (MF) −1 y ·ŷ. Using this loss function to compute scalar errors ϵ, the contributions of each parameter to the error can be backpropagated to compute gradients and train the neural network with gradient descent. To ensure generalization to unknown samples, the datasets used are split (usually randomly) into training, validation, and test sets. Gradient descent with backpropagation is only performed on the training set, while the validation set safeguards against overfitting of the parameters θ. The test set safeguards against overfitting of the chosen hyperparameter values γ, which include, e.g., the dimensionalities of each layer dim (a (l) ) and the number of layers L.
Recurrent neural networks (Rumelhart et al., 1986) introduce time-delayed recursions to feedforward neural networks (Goodfellow et al., 2016). Thus, RNNs use parameter sharing to reuse parameters efficiently for sequential data-driven problems. Such sequential data x t may be sorted according to the rate-dependent real time or a pseudo time. One straightforward implementation of an RNN cell introduces a single tensor-valued recursion as given by In Figure 1, the recurrent cell is depicted as a graph of two conventional dense layers and a time-delayed recursion of previous activation a t . Subsequently embedded into a larger sequence tensor X [x 1 . . . x T ], RNNs can be trained with backpropagation through time (Rumelhart et al., 1986).
However, simple RNNs (Eq. 4 to Eq. 6) suffer from unstable gradients and fading memory (Hochreiter and Schmidhuber, 1997;Greff et al., 2015), which motivated the development of RNNs with gates. These gating layers control the data flow, thereby stabilizing the gradients and mitigating the fading memory. The Long Short-Term Memory (LSTM) cell ( Figure 2) (Hochreiter and Schmidhuber, 1997;Gers et al., 2000) and the Gated Recurrent Unit (GRU)  are the most common gated RNNs and demonstrated comparable performances (Chung et al., 2014). As in Eq. 4, the previous activation a t is concatenated with the input x t+1 : x t+1 x t+1 a t T , x t+1 The gates process the adjusted inputx t+1 using dense layers, , and apply a sigmoid activation, which yields Therein, g i is the input gate activation, g f the forget gate activation, and g o the output gate activation, whose coefficient values are bounded between zero and one. Similar to Boolean masks, these gates control the data flow in the LSTM cell by where the element-wise product ⊙ allows the gate tensors g i , g f , and g o to control the incoming data flow into the cell c t+1 , to forget information from selected entries, and to output selected information as a t+1 . Despite being the foundation for the RNN reasoning and long-term memory, the cell states are often regarded as black boxes.

A Selection of Fundamental Constitutive Models
The systematic hyperparameter search and explainable AI approach will be demonstrated on three representative constitutive models, briefly reviewed in the following. These well-known models describe material effects, such as finite, lasting, and rate-dependent deformations. Since the ground truths for these one-dimensional models are known, the resulting neural network architectures from the hyperparameter search can be evaluated. Furthermore, for history-dependent material behaviors, the explainable AI approach can investigate the RNNs that best describe the constitutive models.
First, a hyperelastic constitutive model challenges neural networks to model finite strains and deal with the singular behavior of stretches that approach zero. The one-dimensional Neo-Hooke model (cf. Holzapfel (2000)) is regarded as one of the most straightforward nonlinear elastic material models. Therein, a single parameter μ describes the Cauchy stress τ as a function of the stretch λ: Thus, the Neo-Hooke model offers the highest possible contrast to distributed neural network representations. In Figure 3, the corresponding rheological model ( Figure 3A) FIGURE 1 | Visualization of a recurrent cell. The recurrent cell consists of two feedforward layers (light violet squares). A time-delayed recursion (black square) loops the activation of the first or last layer back as additional input to the cell.
FIGURE 2 | Visualization of an LSTM cell, which consists of multiple feedforward layers (light violet). Three gates, g i , g f , and g o , control the data flow into the cell, within the cell, and out of the cell. Two recursions (black rectangles) ensure stable gradients across many time steps. and a loading and unloading cycle ( Figure 3B) are depicted. The neural network approximation is challenged by stretch values near the origin, where the stress response exhibits singular behavior. Second, viscoelasticity represents the archetype of fadingmemory material behavior (Truesdell and Noll, 2004), which poses a short-term temporal regression problem to the neural networks. Figure 4A introduces the Poynting-Thomson or standard linear viscoelastic solid model as an example for ratedependent inelastic material behavior. As the elementary example for generalized Maxwell models, which use multiple parallel Maxwell branches to cover more decades in the frequency range, it exhibits the same principal relaxation and creep behavior (Markert, 2005), as depicted in Figure 4B-D.
The additive decomposition of the stress σ and the evolution equations for each Maxwell branch i, as reviewed, e.g., in the textbook of Holzapfel (2000), yield For each branch i, σ i represents the stress, τ i the relaxation time, and E i the modulus. The strain ε is shared by all Maxwell branches and the elastic spring with modulus E. For I 1, Eqs 16, 17 yield Many known excitations, e.g., unit steps σ(t) σ 0 H(t) or ε(t) ε 0 H(t) yield closed-form solutions described by the relaxation function R(t) and creep function C(t) (Simo and Hughes, 1998): For the general case of incremental excitations, numerical integration Eq. 16 with an implicit backward Euler scheme yields Therein, σ t and σ t+Δt represent the stresses while ε t and ε t+Δt respectively represent the strains, respectively at the current and next increment. As a dissipative model, the dissipated energy can be described by the area surrounded by the hystereses, which depend on the strain rates ( Figure 4B).
Finally, elastoplastic models exhibit path-dependent behavior with lasting deformations, i.e., long-term dependency behavior, and include potential discontinuities in the stress response, which need to be learned by the neural network. Numerical implementations of one-dimensional Prandtl-Reuss plasticity ( Figure 5A) can be found, e.g., in Simo andHughes (1998) or de Borst et al. (2012): In Eqs 24 to 26, the main history variable is the plastic strain ε P , whose evolution follows the yield step γ if a yield criterion f y 1 is met. In Figure 5B, the characteristic stress-strain curve of one-dimensional Prandtl-Reuss plasticity is shown.  Data-driven constitutive models and intelligent finite elements were one of the first applications of neural networks within the FEM (Ghaboussi et al., 1991;Ghaboussi and Sidarta, 1998;, since they leverage the flexibility of the FEM to the fullest. For a strain-driven problem, data-driven constitutive models can be defined by where ε represents the strain and σ represents the stress. The history variables are gathered in Q, which may include both algorithmic history variables, such as the plastic strain ε P , and recurrent neural network cell states C. As in the conventional FEM, the unified interface of inputs, outputs, and history variables enables a straightforward substitution of different data-driven constitutive models. For example, data-driven constitutive models, e.g., trained on experimental data (Stoffel et al., 2018), can substitute analytically derived constitutive models with trivial implementation effort. However, this flexibility massively increases the choice of conceivable neural network architectures and their defining hyperparameter configurations of data-driven constitutive models. Often, these hyperparameter configurations are chosen by the user or tuned using brute-force search algorithms.

Systematic Hyperparameter Search
Systematic hyperparameter search strategies constitute elegant and efficient solutions to finding optimal architectures and hyperparameter configurations. Hyperparameter search algorithms automatize the task of identifying advantageous neural network architectures and tuning hyperparameters γ to achieve good performance. Since neural networks are nonlinear function approximators with numerous parameters, the error surface is generally non-convex, high dimensional, potentially non-smooth, and noisy (Li et al., 2017). Furthermore, hyperparameters are interdependent, and the effects on the model remain unclear and problem-specific (Li et al., 2017), which often popularized brute force search algorithms, such as grid search (Bergstra and Bengio, 2012). Grid search is the most fundamental search algorithm that seeks to cover the entire hyperparameter search domain. All possible combinations of user-defined hyperparameter values are tested one by one, with a fixed step size. The number and intervals of tested hyperparameter configurations Γ are set arbitrarily by the user, making the approach wasteful, inefficient, and infeasible for large numbers of hyperparameters.
Random search algorithms (Bergstra and Bengio, 2012) yield probabilistic approaches for larger numbers of hyperparameters. With enough trials, random search statistically explores the entire search space by testing a random selection of all possible combinations with varying step sizes. Thus, high-dimensional search spaces are explored faster, which makes random search a widely used search algorithm for hyperparameter-intensive neural networks (Bergstra and Bengio, 2012). Unfortunately, the computational effort of random search remains significant.
Therefore, this work follows approaches that increase the efficiency of the random search algorithm, leveraging the observation that the final performances of neural networks can be approximately assessed after a few epochs n, i.e., iterations that include the entire training dataset. The Successive Halving algorithm (Jamieson and Talwalkar, 2016), for example, trains randomly chosen hyperparameter configurations Γ {γ (1) . . .γ (C) } for n epochs. After that, the lowest-performing configurations are discarded, while the best-performing configurations are trained further. Thus, the approach focuses the computational resources on promising hyperparameter configurations.
Unfortunately, the number of epochs to train before deciding on discarding low-performing configurations constitutes another hyperparameter, which depends on the mechanical problem. Either Successive Halving can explore more configurations C for fewer epochs n or investigate fewer configurations C for more epochs n per decision. Li et al. (2017) solved this exploration issue with the Hyperband search algorithm (Algorithm 1). By gathering groups of configurations in h brackets, Successive Halving can be applied for different numbers of epochs n h per group. The first bracket s h maximizes exploration to test many configurations C with Successive Halving, identifying promising positions even for vast search domains and non-convex or non-smooth loss functions. In the last bracket s 0, fewer configurations are tested for the full number of epochs, similar to a conventional random search. This is advantageous if the search domain is narrow and the objective functions are convex and smooth. Thus, Hyperband combines efficient exploration and investigation of promising architectures and does not introduce additional hyperparameters to be tuned.
In this work, we combine Hyperband with aggressive early stopping, which discards solutions that are unlikely to improve further by monitoring moving averages of the validation errors. Independent of Hyperband, this additional logic further increases the efficiency of the search strategy for data-driven constitutive models. Thus, this fully automatized strategy identifies optimal neural network architectures for the given training and validation data without user interaction. The resulting hyperparameter configurations represent a 'natural' choice for the given problem, which merits further investigations into why this specific neural network was chosen.

The Novel Explainable Artificial Intelligence Approach
If the problem allows, it is generally possible to identify and train an efficient neural network to achieve accurate results and generalize to unknown data with the systematic approach outlined above. However, in the past, the proven approximation capabilities of neural networks (Hornik et al., 1989) were often shunned because the magical black-box-generated results could not be explained. Given only finite dataset sizes, any machine learning algorithm may learn spurious or accidental correlations. Since neural networks naturally develop distributed high-dimensional representations in deep layers, the 'reasoning' of neural networks is notoriously difficult to verify. Thus, a trained neural network may generalize to validation and test datasets but found it's decision-making on unphysical observations and biases in the dataset. The motivation of explainable AI is to unmask such 'Clever Hans' predictors 1 .
Many explainable neural network approaches, such as Layerwise Relevance Propagation (LRP) (Bach et al., 2015;Arras et al., 2017;Montavon et al., 2018, use the neural network graph to trace the activation back to its origin. In particular, such approaches are attractive for classification, because they can explain individual class labels, i.e., from single binary values to multiple inputs. For positive binary values, the activations can be traced back straightforwardly, explaining why the neural network chose the class associated with the binary value. However, multivariate regression problems are faced with a different problem: since the outputs are continuous, the neural network 'reasoning' must be interpretable over the full output ranges, including the origin. In particular, in balance equations, zero-valued residuals are at least as important as non-zero residuals and follow the same physical governing equations. Thus, for mechanical regression tasks, different explainable AI methods are necessary, which focus on the evolution of the high-dimensional representations. As a new explainable AI approach using mechanical domain knowledge, our proposed approach focuses on the temporal evolution of mechanical quantities. Since time-variant problems in mechanics are often modeled by training RNNs on time-variant and path-dependent mechanical data (Freitag et al., 2011;Cao et al., 2016;Koeppe et al., 2019;Wu L. et al., 2020), we propose to use the Principal Component Analysis (PCA) to investigate recurrent cell states, e.g., in LSTM and GRU cells. Therefore, we interpret the time-variant cell states as statistical variables and use the PCA to identify the major variance directions in the distributed representations. With the original evolution equations and history variables known, major principal components can be compared with the known temporal evolution. If the cell states resemble the mechanical evolution equations of the algorithmic history variables, the neural network correctly understood the fundamental mechanical problem. For future materials, neural networks can thus be trained on new material test data, and the material can be possibly characterized by comparing the cell state principal components to known fundamental evolution equations.
The PCA (Pearson, 1901) constitutes an unsupervised learning algorithm that decorrelated data ( Figure 6) (Goodfellow et al., 2016). As a linear transformation, the PCA identifies an orthonormal basis by maximizing the variance in the directions of the principal components. In the field of model order reduction and data analysis, the PCA is often used to compute proper orthogonal decompositions, which eliminate undesirable frequencies from mechanical problems and reduce the model dimensionality to achieve computational speed-up (cf.  2017)). Used in this explainable AI approach, the PCA is not used to accelerate computations, but to analyze and explain neural network behavior.
To compute the PCA on a dataset with correlated features (Figure 6A), the dataset of M samples x m is collected in a dataset tensor X and centered feature-wise to X: Applied to the centered dataset tensor X, the Singular Value Decomposition (SVD), ALGORITHM 1 | Hyperband search (Li et al., 2017).
1 Apparently, the horse 'Clever Hans' (1895-1916) could count and calculate, but, in fact, interpreted the expressions and gestures of the audience to find the correct answers.
decomposes X into the left-singular vectors U, the singular-value diagonal matrix Σ, and the right-singular vectors W. Conventionally, most SVD algorithms sort the singular vectors and singular values on the basis of the magnitude of the latter, where the highest singular value in Σ corresponds to the highest variance (i.e., frequency) in the dataset. Feature-wise contraction of the centered dataset X with W yields the decoupled data tensor x: x XW.
The reciprocal square root of the singular values Σ −1 √ scales the decoupled dataset x down to unit variance. To use the PCA on LSTM cell states c t from Eq. 13, the data tensor C of sequence length T is assembled: Equations 29 to 32 yield The columns C t (t 1 . . . T) of the decoupled state tensor C describe the temporal behavior of the cell in the principal axes. The associated singular value Σ f divided by the sum of all singular values quantifies the relative importance of the corresponding principal components, i.e., how much each principal component explains the cell state variance. Often, most of the variance can be explained using the first three principal components, c I , c II , and c III , which describe the memory response of the majority of the cell units. To investigate the ability to explain the mechanical problem, the neural network's major memory cell responses, c I , c II , and c III , are compared to the algorithmic history variables q. This comparison demonstrates that the neural network understood temporal mechanical problems in line with the physically observed evolution laws.
Note that the ability to generalize, i.e., to achieve a reproducible and equally accurate result on unknown data, as outlined in Section 2.2.2, remains independent of the ability to model the result on correct physical assumptions, as described in this subsection. To achieve generalizable results, systematic hyperparameter tuning thus is the necessary prerequisite for the explainable AI approaches.

RESULTS
The following three case studies demonstrate the systematic hyperparameter search strategy and the new explainable AI approach. First, the proposed systematic hyperparameter search strategy will be demonstrated in the scope of a case study for an intelligent nonlinear elastic constitutive model. Thereafter, the latter two case studies combine the systematic hyperparameter search strategy with explainable AI, in order to interpret inelastic time-variant constitutive behavior.
For all data-driven constitutive models, the same datageneration strategy provides training, validation, and test data. For a sequence length of T 10 000 and Ω 5 phases of loading and unloading, we sample control values from a random normal distribution N (μ 0, σ 1). Between those control values, a variety of ramping functions, e.g., linear, quadratic, or sinusoidal, interpolate the intermediate values to assemble stress-or strain-controlled loading sequences. The ramping functions are selected to cover a variety of constitutive responses based on the investigated constitutive models. Thereafter, the numerical implementation of each reference constitutive model generates M total 10 000 samples, which are split randomly 70 %-15 %-15% into training, validation, and test set.
Three constitutive models are selected to demonstrate fundamental mechanical material behavior, including finite deformations, long-term temporal behavior, and rate dependency. To enhance interpretability, each onedimensional constitutive model uses dimensionless and purely academic parameter values. Using preprocessing strategies, such as normalization and augmenting the input with explicit model parameters, models with arbitrary material parameter values can be created (Koeppe et al., 2018b;. First, an incompressible Neo-Hooke constitutive model (μ 1 3 ) computes the nonlinear Frontiers in Materials | www.frontiersin.org February 2022 | Volume 8 | Article 824958 8 stress response to stretch-controlled loading. Due to the hyperelastic problem, all loading and unloading phases ω 1. . .Ω use linear ramps. The investigated time-distributed dense neural network architectures receive the stretch λ as input and learn to predict the Cauchy stress σ. Second, a perfect Prandtl-Reuss elastoplasticity model (E 1, σ Y 0.6) is subjected to strain-controlled loading. The phase interpolation functions are sampled randomly from linear, quadratic, squareroot, exponential, sine, and half-sine ramping functions. The investigated RNN architectures use the strain ε to return the stress σ and plastic strain ε P . Finally, a Poynting-Thomson constitutive model, defined by E 1.0, E 1 0.5, and τ 1 0.1667 is integrated in time (T 1) with an implicit backward Euler-scheme. The stresscontrolled phases ω 1. . .Ω include linear ramps and constant phases to investigate creeping behavior. The RNN architectures process the stress σ and return the strain ε and the viscous branch stress σ 1 . For all neural networks, a final time-distributed dense layer of output size applies a linear transformation to cover the entire range of real values.
For the neural network architecture and the hyperparameters, the Hyperband algorithm (Li et al., 2017) systematically explores and investigates the high-dimensional search domain. During the training of each configuration, the Adam algorithm (Kingma and Ba, 2014) minimizes the MSE on the training set (N 51 epochs) and reports the validation loss to evaluate the configuration in the scope of Hyperband. To avoid artificially penalizing specific weight values, neither L 1 nor L 2 regularization is used, optimizing the unconstrained MSE. In the Hyperband brackets, each step performed by the Successive Halving algorithm eliminates the worst configurations from all configurations by a factor of η 3.7. In Table 1, the search domains for the recurrent and dense neural network architectures are described. The design variables, i.e., the layer width d (l) , the neural network depth L, the base learning rate α and the batch size M vary during all search trials. Moreover, for dense neural networks (hyperelastic material behavior), multiple activation functions are investigated by the algorithm, whereas for recurrent neural networks (viscoelastic and elastoplastic material behavior), the cell type of the reccurent cells become additional design variables. To limit the potentially infinite search space, the layer widths are selected to be identifcal for all layers and the design variables are varied in heurisically defined steps and ranges. After the search, we train the best configuration, as evaluated by the lowest MSE on the validation set, for the full duration of N 301 epochs. The resulting parameter values θ opt are used to evaluate the test set to compute the test errors.
For the last two case studies, the novel explainable neural network approach (Section 2.2.3), employing the PCA on the cell states, analyzes the best RNN architectures to explain the intelligent inelastic constitutive models.
The Kadi4Mat (Brandt et al., 2021) data management infrastructure stores, links, and describes the data generated in this publication. Published via the direct integration of Zenodo (Koeppe et al., 2021), the dataset includes the generated reference constitutive model dataset, the associated metadata of the constitutive and neural network models, and the serialized trained neural networks for different hyperparameter configurations at different epochs during training. The data storage with the associated metadata and connections enhances the systematic hyperparameter search strategy with the option for long-term data sustainability, e.g., by reusing previous search results and hyperparameter configurations for future models. Furthermore, the links between the dataset and the serialized neural network models provide a starting point for a bottom-up data ontology for machine learning in mechanics and material sciences. Subsequently formalized, the data ontology will provide semantic rules that describe the relations and workflows inherent to the research data, which will enable additional analysis and explainable AI approaches.

Systematic Investigation of Data-Driven Hyperelastic Constitutive Models
For the hyperelastic problem, the best-performing hyperparameters were a batch size of M 64 and a base learning rate of α 3·10 -3 . A deep feedforward neural network with 5 hidden layers of width 112 and rectifier activation achieved the best performance. After training (Figure 7) for 301 epochs, the neural network achieves an MSE ϵ MSE of 1.28 × 10 -5 on the training, 1.01 × 10 -5 on the validation, and 1.01 × 10 -5 on the test set. The same order of error magnitude on all three datasets indicates that the neural network achieved generalization. Figure 8 visualizes three randomly selected test samples. For the onedimensional hyperelastic case, the data-driven constitutive model is in perfect agreement with the reference model.

Explaining Data-Driven Elastoplastic Constitutive Models
For the elastoplastic problem, the systematic search strategy identified a best-performing architecture, characterized by four LSTM cells, with a width of 52 units followed by a linear time-distributed dense layer with 2 units. For the training (Figure 9), a batch size of M 32 and a base learning rate of α 1· 10 -3 resulted in the best result without overfitting, i.e., an MSE ϵ MSE of 5.72 × 10 -5 on the training, 5.11 × 10 -5 on the validation, and 4.38 × 10 -5 on the test set.
To interpret and explain the RNN behavior, a random sample is extracted from the test set and evaluated using the data-driven constitutive model. The explainable AI approach uses the PCA on the concatenated cell states of all recurrent cells and yields the three principal components with the highest singular values. Expressed in the three major principal components and divided by their singular values, the cell states I, II, and III represent the joint response of the LSTM cell states. Figure 10 visualizes and compares the evolution of the stresses, strains, and history variables over the entire loading sequence.
The neural network's output predictions and the reference constitutive model match exactly, i.e., the stresses and plastic strain predictions are accurate. By comparing the plastic strain with the internal cell states, the learned function of the LSTM cells becomes apparent, which governs the neural network's decision-making behavior. Without being trained directly, the cell states' principal components learn to approximate the plastic strain evolution (with a negative sign). Since the PCA chooses the principal directions based on the variance, and the output layer can apply arbitrary weighting, the negative sign of the cell state does not affect the result. The second and third principal components do not contribute to the joint cell state response.

Explaining Data-Driven Viscoelastic Constitutive Models
For the viscoelastic problem, the Hyperband search found the best-performing architecture to be three GRU cells with 120 units, followed by a time-distributed dense layer (2 units and linear activation). For training convergence, a batch size of M 32 and a base learning rate of α 1· 10 -3 achieved the best results. After training for the full 301 epochs (Figure 11), the neural network achieved a best MSE ϵ MSE of 2.05 × 10 -7 on the FIGURE 7 | Training and validation loss during hyperelastic constitutive model training. Both losses rapidly reduce within the first 20 epochs before oscillating due to variance shift combined with the singular behavior of the stretch near the origin.   The branch history stress compared to the three major principal components of the cell state. Instead of mimicking the history variables used to generate training data, the cell states learned a generic solution for viscoelasticity: a modified exponential function that can be shifted and scaled at will by the output layer.
Frontiers in Materials | www.frontiersin.org February 2022 | Volume 8 | Article 824958 11 training, 7.36 × 10 -7 on the validation, and 1.15 × 10 -6 on the test set, indicating generalization. Furthermore, since all samples are generated with unit variance and unit stiffness, the model approximates almost at the machine precision of the element-wise definition of the MSE performance metric ( ∼ O(10 7 ) for single-precision floating-point arithmetic). Figure 12 exemplifies the explainable AI strategy on one randomly generated sample that was not used for training or validation. In the top row, the stress is plotted over the increments, which constitutes the input to the reference constitutive and neural network models. The second row compares the strain response of the reference and neural network models. The final row depicts the history variables q. The branch stress σ 1 is computed by both the reference and datadriven constitutive models. Both strain and branch stress match accurately, as indicated by the low test loss. Finally, the PCA on the GRU cell state over time yields the data-driven constitutive history variables, i.e., the three principal components with the highest singular values, which govern the major evolution of the data-driven constitutive model.
Instead of approximating the history variables σ 1 , used to generate the training data with backward Euler time integration, the first principal component of the cell states approximates a modified exponential function, correctly identifying the exponential behavior of viscoelasticity. Shifting, scaling, and changing the sign of a function represent trivial operations for previous and subsequent neural network layers. Thus, the weights of the RNN and the last dense layer can modify the cell output at will to assemble a solution, such as Eq. 21. The second and third principal components do not contribute to the joint cell state response.

Results and Approach
The objectives of this publication were to systematically and automatically identify and train neural networks for data-driven constitutive models of fundamental material behavior and to explain the resulting recurrent neural networks' behavior.
The test errors (2.45 × 10 -6 to 4.38 × 10 -5 ) approach machine precision for the single-precision floating-point arithmetic used to compute them. The neural network performance surpasses the elastoplastic constitutive model for uniaxial tension and compression in the recent works of Huang et al. (2020) (3.93 × 10 -2 ) and Alwattar and Mian (2019) (∼ 1· 10 -5 , as reported on the training set). For each of the case studies, the results were in the same order of magnitude for all datasets, regardless of whether they were shown to the neural networks during training (training dataset) or unknown to the neural network (validation and test dataset). The oscillations in the loss curves for the viscoelastic and hyperelastic cases can be attributed to the shift in randomly selected samples for each batch averaged over each epoch in combination with the stronger continuous nonlinearities of the problems. This suggests, that no overfitting occured and generalization to unknown data was achieved, despite the considerable number of trained parameters in some of the models identified by the search algorithm.
When using manual hyperparameter tuning, mechanical and physical 1257 knowledge, and machine learning expertise, considerably smaller neural networks with fewer parameters with fewer layers and units per layer are conceivable that could have been trained to achieve similar results. Limiting the capacity of machine learning models is one of the first and most important steps in achieving robust performance and generalization to unknown data. This limit is governed by the amount and variance of the data, which is considerably harder to generate in abundance for experimental data (cf. Argatov and Chai (2019)). Here, many best-practice applications and textbooks suggest choosing a neural network capacity slightly larger than strictly necessary and utilizing additional regularization schemes for optimal generalization performance (Goodfellow et al., 2016;Argatov and Chai, 2019). However, the objectives of this study were to demonstrate an automated and inductive strategy that requires minimal prior knowledge to automatically identify and train models and to help understand the resulting models, regardless of how many parameters were used. Therefore, the less-than-minimalistic models identified by the hyperparameter search algorithm posed a more difficult challenge for the novel explainable AI approach.
The results were achieved inductively, with minimal user input, through the application of a Hyperband-inspired systematic search strategy on mechanical data. To further improve the results of the neural networks, several approaches are conceivable, most of which are deductively derived from deep learning and mechanical domain knowledge. Ensemble learning and related regularizers, such as dropout (Srivastava et al., 2014), for example, are known to improve the quality of the results further (Srivastava et al., 2014;Goodfellow et al., 2016). Similarly, physics-informed and physics-guided approaches, as proposed by Raissi et al. (2019), Yang and Perdikaris (2019), or Kissas et al. (2020), use mechanical domain knowledge to improve neural network models of physical systems. In contrast, the data-driven constitutive models trained in this work represent an inductive, fundamental, and accurate approach that offers intuition for possible neural network architectures and hyperparameter configurations for higher-dimensional mechanical problems. In combination with the explainable AI approach, the data-driven constitutive models represent the first step towards physicsexplaining neural networks.
Even beyond mechanics and materials sciences, applying the PCA to explain neural network cell states constitutes a novel and promising explainable AI method. The PCA investigation proposed in this work can explain the recurrent cell state behavior, despite the considerable size of the investigated architecture, compared to the one-dimensional fundamental problems. Due to the quality of the results and the explainability of the neural network, "Clever-Hans" predictors  can be ruled out. For material behavior where multiple physical effects affect the same observed features (e.g., viscoplasticity or ductile fracture), the global PCA used in this proof-of-concept study could be upgraded to provide local decompositions in combination with clustering algorithms.
As an explainable neural network approach in mechanics, the case studies demonstrate how neural networks can help to explain material behavior. For the elastoplastic problem, the recurrent cell state identified the same history variables used in computing the ground truth. For the viscoelastic problem, the data generation used numerical time-integration, but the neural network found a solution similar to the exponential closed-form solution. Therefore, if the ground truth is not known, e.g., when learning from raw experimental data, explainable neural network approaches can possibly identify underlying closedform solutions. By characterizing new materials or material behavior, e.g., at extreme loading conditions, explainable AI can help guide researchers in mechanics and materials sciences towards new analytic closed-form solutions that elegantly model the materials in the desired ranges.

Concluding Remarks and Outlook
We proposed a step towards physics-explaining neural networks, which inductively complement existing deductive approaches for physics-informed and physics-guided neural networks. To that end, a systematic hyperparameter search strategy was implemented to identify the best neural network architectures and training parameters efficiently. For the analysis of the best neural networks, we proposed a novel explainable AI approach, which uses the PCA to explain the distributed representations in the cell states of RNNs.
The search strategy and explainable AI approach were demonstrated on data-driven constitutive models that learned fundamental material behavior, i.e., one-dimensional hyperelasticity, elastoplasticity, and viscoelasticity. For all case studies, the best neural network architectures achieved test errors in the order of 1.10 -5 to 1.10 -6 . In particular, for hyperelasticity, the test error approached machine precision, despite the singular behavior for stretches approaching zero. For elastoplasticity, the novel explainable AI approach identified that the recurrent cell states learned history variables equivalent to the plastic strain, i.e., the history variables used to generate the original data. Remarkably, for viscoelasticity, the explainable AI approach found that the best performing neural network architecture used an exponential function as the basis for its decisions instead of the algorithmic history variables used to generate the training data.
These findings imply that systematic hyperparameter search, coupled with explainable AI, can help identify and characterize numerical and analytical closed-form solutions for constitutive models independent of the data origin. Thus, new materials can potentially be characterized with data originating from experiments, using the approach proposed in this work.
Future studies will apply and extend the proposed strategies to more complex material models. Of particular interest are viscoelastic materials subjected to strain rates that cover multiple decades, where conventional numerical models require numerous algorithmic history variables. Eventually, new materials, where the analytic closed-form solutions are asof-yet unknown and numerical solutions are challenging to implement, can be characterized with developments based on the present work. Finally, applications beyond constitutive models are conceivable. For spatio-temporal problems, e.g., as given in Koeppe et al. (2020a), the explainable AI approach outlined in this work needs to be extended to leverage the spatial structure.
In forthcoming work, it is intended to extend the developed artificial intelligence approach within the Kadi4Mat (Brandt et al., 2021) framework to higher dimensional data containing 2D and 3D spatial plus temporal information to predict microstructure-mechanics correlations. The database used to train the neural network algorithms relies on digital twin data from synchronously conducted experiments and simulations of mechanically loaded polycrystalline and multiphase materials. Based on the training, the AI approach is applied to large-scale micromechanics-microstructure simulations so as to provide new insights, e.g., into mechanically induced nucleation events of new phases and grain variants or into microcrack probabilities. The combination of new AI concepts and advanced high-performance materials simulations shall establish an integral component of the research data infrastructure to enable computational methods for an accelerated design of new materials.

DATA AVAILABILITY STATEMENT
The datasets generated and analyzed for this study can be found in the Zenodo repository https://zenodo.org/record/4699219.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.