On-the-fly adaptivity for nonlinear twoscale simulations using artificial neural networks and reduced order modeling

A multi-fidelity surrogate model for highly nonlinear multiscale problems is proposed. It is based on the introduction of two different surrogate models and an adaptive on-the-fly switching. The two concurrent surrogates are built incrementally starting from a moderate set of evaluations of the full order model. Therefrom, a reduced order model (ROM) is generated. Using a hybrid ROM-preconditioned FE solver additional effective stress-strain data is simulated while the number of samples is kept to a moderate level by using a dedicated and physics-guided sampling technique. Machine learning (ML) is subsequently used to build the second surrogate by means of artificial neural networks (ANN). Different ANN architectures are explored and the features used as inputs of the ANN are fine tuned in order to improve the overall quality of the ML model. Additional ML surrogates for the stress errors are generated. Therefore, conservative design guidelines for error surrogates are presented by adapting the loss functions of the ANN training in pure regression or pure classification settings. The error surrogates can be used as quality indicators in order to adaptively select the appropriate---i.e. efficient yet accurate---surrogate. Two strategies for the on-the-fly switching are investigated and a practicable and robust algorithm is proposed that eliminates relevant technical difficulties attributed to model switching. The provided algorithms and ANN design guidelines can easily be adopted for different problem settings and, thereby, they enable generalization of the used machine learning techniques for a wide range of applications. The resulting hybrid surrogate is employed in challenging multilevel FE simulations for a three-phase composite with pseudo-plastic micro-constituents. Numerical examples highlight the performance of the proposed approach.


Contents
1 Introduction In computer-assisted materials design and in the simulation of complex materials with rich microstructure major challenges remain to be solved despite the outstanding advances made in recent years. For example, the discretization of all microstructural features in a monolithic finite element (FE) simulation is unfeasible due to the various length scales involved that range from micrometers up to the meters. These would lead to a ludicrous complexity of the resulting overall model. By accounting for a separation of length scales, the FE 2 ansatz (Feyel, 1999;Miehe, 2002) can lead to some savings over the monolithic approach by replacing the heterogeneous material by microscopic FE problems at the macroscopic integration points, leading to a partial decoupling of microscopic and macroscopic degrees of freedom. Still the number of overall unknowns is prohibitive and calls for massively improved computational efficiency in terms of CPU time, memory savings and information compression. Novel strategies contributing to the vision of a fully connected investigation of materials and aspiring the prediction process-structure-property relationships across multiple length and time scales are, thus, much sought-after, see, e.g., Schmitz and Prahl (2016). Due to the rapid growth of available material and simulation data, data-integrated approaches that exploit information from different sources in order to complement or substitute simulations and experiments are experiencing increased attention, see, e.g., Kalidindi and De Graef (2015), Kalidindi (2015) and Ramakrishna et al. (2018). Due to novel improvements in machine learning and computational resources, a zoo of data-driven methods comprising, e.g., kernel methods, principal component analysis and artificial neural networks, have developed immense momentum over the last years. The successful implementation of these techniques in materials research is an active field. For instance, in Chupakhin et al. (2017) artificial neural networks and finite element computations have been combined in order to predict the influence of plasticity on the residual stress field measured by hole drilling. Principal component analysis of n-point microstructure statistics have shown excellent performance in order to examine microstructure-property relationships, see, e.g., Ç eçen et al. (2014), Gupta et al. (2015) and Altschuh et al. (2017). In Bélisle et al. (2015), several machine learning methods have been considered in the context of molecular dynamics. Liu et al. (2015) show how data mining and machine learning are combined in order to efficiently approximate the elastic localization in voxelized microstructures. Another branch of datadriven materials research exploring the use of convolutional neural networks and deep learning in order to deliver accurate structure-property linkages is currently in heavy development, see, e.g., Ç eçen et al. (2018) and Yang et al. (2018).
While data-driven approaches have their appeal, the structure of the underlying physical problem can be accounted for only in parts. For instance, established balanced laws and thermodynamic principles are hard to be incorporated in the aforementioned methods. Reduced order models for the microscopic problem offer an advantageous compromise between physics-informed modeling and computational efficiency. Purely data-driven surrogates lack accuracy (i) if the amount of training data is limited, (ii) if the validity domain is left or (iii) if the error of the surrogate in respect to the reference solution is to be estimated. In these scenarios, reduced order models following physical principles offer, in general, better accuracy and robustness. For example, in Fritzen and Leuschner (2013) a highly efficient potential based reduced order model has been developed. This ansatz has a natural physical supporting argument, since a reduced basis for the solution field is generated based on snapshot data of FE computations. The approach has been demonstrated to achieve substantial speed-ups and memory savings, see also Fritzen et al. (2014) and Fritzen and Hodapp (2016). Other developments in this field comprise the NTFA (e.g., Michel and Suquet, 2003) and NTFA-TSO (Michel and Suquet, 2016) or hyper-reduced simulations and related schemes (Ryckelynck, 2009;Soldner et al., 2017). In order to improve the incorporation of surrogates obtained from reduced order models a goal-oriented error estimation or quality indication is required. The quantity of interest (QoI) is the effective stress and its accuracy (up to a prescribed tolerance) is essential for reliability of the overall predictions. In Lu et al. (2018), for example, neural networks have been successfully trained to approximate the microscopic nonlinear microscopic electric material law of graphene/polymer nanocomposites, but without error control or model adaptivity in macroscopic simulations. A macroscopic goal-oriented approach combining reduced order modeling and machine learning techniques has been demonstrated in Trehan et al. (2017) for two-dimensional oil-water subsurface flow systems and in Freno and Carlberg (2018) for three-dimensional mechanical problems. The approach considers a reduced order model and an a posteriori correction through machine learning methods. The ansatz shows promising results, but it requires the evaluation of the reduced order model. For multiscale simulations, even the evaluation of a reduced order model may not be always viable due to the large number of needed evaluations. It is, therefore, necessary to seek efficient alternatives incorporating a hierarchy of surrogate models of different computational complexity and different accuracy in the QoI.
The present work aims at the adaptive combination of the physics-informed reduced order model (ROM) of Fritzen and Kunc (2018), for hyperelastic problems, with artificial neural networks (ANNs) in mechanical multiscale FE simulations. Hereby, the ANNs are trained based on FE computation of the full three-dimensional microstructure and material at hand for a set of loading strains (input quantity). The QoI (output quantity) is the effective stress, which the ANNs are trained for. The trained ANNs are then used as a highly efficient constitutive relation surrogate for the nonlinear material at hand. For macroscopic FE computations, the ANN material law surrogate is to be used, if possible, at every integration point for given effective strain. Based on quality indicators, as accuracy or range of validity, the ANN constitutive relation surrogate may be inaccurate or insecure for given strain. The present work, therefore, further considers the error modeling of trained ANNs and discusses guidelines in order to induce conservative properties to the error models, which are calibrated through ANNs in standard regression or classification approaches. Additionally, strategies are proposed for an adaptive ANN-ROM schemes, where the more accurate but expensive ROM is only called at an integration point, if the quality indicator demands it.
The manuscript is organized as follows: In Section 2, two concurrent surrogate models for the QoI obtained by reduced order modeling and from purely data-driven ANNs are described. The twoscale mechanical problem is introduced and the challenges in the goal-oriented error estimation of derived quantities of interest remaining in nonlinear reduced order modeling are detailed. Then, the data generation for the training of the ANNs is illustrated, followed by the guidelines for the material law and error approximation. At the end of the section, adaptive twoscale simulation strategies including on-the-fly model switching are presented. Section 3 offers numerical examples for a three-phase pseudo-plastic material: The ANN is used for the direct surrogation of the QoI. This is adaptively complemented by a more robust and reliable reduced order model based on the concept of quality indicators. Multiscale FE simulations comparing the different multiscale simulation techniques are presented. The manuscript ends with a concluding summary of the results in Section 4. The simulation of microstructured solids with a sufficient separation of length scales is investigated. More precisely, a macroscopic domain Ω ⊂ R 3 with characteristic length L and attached microstructure with characteristic length L µ L is considered. The microstructure is assumed to be ergodic and the existence of a periodic Reference Volume Element (RVE) Ω is assumed. In the following, macroscopic fields are overlined •. The twoscale problem consists of the concurrent solution of the macroscopic boundary value problem (BVP) (P) : div (σ(ε)) = 0 with ε = sym grad(u) + BC (1) and, for each macroscopic point x ∈ Ω, of the solution of the RVE problem (P) : div (σ(ε)) = 0 with ε = sym grad(u) and 1 |Ω| Here u, u denote displacements, ε, ε are infinitesimal strain tensors and σ, σ denote the stress field on the microscopic and macroscopic domain, respectively. The solution of (P) defines the missing constitutive relation for the macroscopic stress σ via the volume average The two BVPs are strongly coupled since the solution u of (P) defines the boundary condition for (P) via ε, while the solution of (P) implicitly provides the missing constitutive equation via (3). A straight-forward yet computationally costly approach to solving the twoscale problem is given in terms of the FE 2 method (Feyel, 1999;Miehe, 2002): Here the microscopic problem is solved at each macroscopic integration point and the effective tangent operator is used in order to allow for Newton-Raphson iterations of the nonlinear macroscopic BVP. In the following, the solution of the microscopic BVP using Finite Elements is considered as the reference solution, i.e. it denotes the Full Order Model (FOM).

Reduced Order Model (ROM)
Given the massive computational demands of the FE 2 technique and the limited availability of computational resources, the use of nowadays established reduced order models (ROM), in order to replace the costly microscopic BVP evaluations, has become an accepted alternative for dissipative and pseudo-plastic hyperelastic materials (Radermacher and Reese, 2016;Fritzen and Kunc, 2018). The reduced basis of dimension N obtained from the snapshot Proper Orthogonal Decomposition (POD, Sirovich, 1987) can be expressed in terms of a matrix U (x), where each column represents a displacement field. The reduced parameterization of the solution is then given in vector notation via (i = 1, . . . , N ) where (A) •i refers to the ith column of the corresponding matrix. Here, the matrix and vector notation of the effective strain ε are used concurrently for convenience. In the following, attention is limited to pseudo-plastic materials, i.e. to strongly nonlinear hyperelastic solids for which the stress σ and the stiffness C are defined as the gradients of a free energy function W (ε) according to Following Fritzen and Kunc (2018) the reduced problem is to find the coefficients ξ ∈ R N solving While the effective stress is obtained from simple volume averaging of σ, the effective tangent stiffness is computed via which follows from straight-forward linearization of (6). The accuracy of the ROM depends on the quality and amount of the snapshots and of the reduced dimension N . It shall be noted that the Galerkin ROM inherits the properties of classical Finite Elements, i.e. the solution is Galerkin orthogonal and, thus, energy optimal. It follows the basic physical principle of energy minimization. From a theoretical perspective this motivates the robustness and accuracy of the ROM even beyond the considered parameters used during the generation of the snapshot data, i.e., the ROM can be considered to generalize.

Goal-oriented error estimation
For the microscopic BVP, using the ROM (or any other approximation of the FOM) naturally introduces an error into the solution of the problem, and into the quantity of interest (QoI). In this work the latter is the effective stress. Hence, in order to enable error control for the macroscopic boundary value problem, it is crucial to estimate the error in the QoI, see, e.g., Larsson and Runesson (2011). For this purpose, we define the error in displacements on the microscale as e(x) = u F (x)−u RN (x), where u F and u RN are the solutions to the microscopic problem (2) using the FOM and N -dimensional ROM, respectively. In view of the reduced kinematics (4), we can parametrize the error in the ROM as where U F denotes the (finite element) shape functions pertinent to the FOM and ξ e denotes the nodal values of the fully resolved error. The FOM and N -dimensional ROM effective stress functions are referred to for clarity asσ F (ε) = effective stress of FOM for given effective strainε , σ RN (ε) = effective stress of N -dimensional ROM for given effective strainε , while the corresponding error is addressed as Now, consider the corresponding FOM residual equation analogous to (6) and the error in the QoI given in (11) in terms of the error in the solution defined in (8). Through linearization, we obtain the error equation and the linearization of the macroscopic stress error respectively. Here r F , J F and K F define the residual, Jacobian and linearized stress error in (6) and (7), with E replaced by E F defining the strains of the finite element shape functions. The, nowadays, standard method of goal-oriented error estimation can be carried out by solving the suitably formulated dual (or adjoint) problem (see, e.g., Oden and Prudhomme, 2001) Finally, (12) and (13) can be combined to yield the result We note that the estimator (14) has, in particular, the following properties: (i) It is restricted to estimating the linearized error contribution, (ii) it requires the assembly of the entire FOM residual and Jacobian, and (iii) it requires the solution of the dual problem using the FOM to formally hold. Even if the linearization error is negligible, the high computational cost involved in assembling the full (FOM) Jacobian and residual of the problem makes this technique unalluring for use in conjunction with highly efficient ROM approximations. Possible approximations of (13) pertain to hierarchical approximations. One could, for instance, solve the dual problem using an enriched ROM, rather than the FOM. However, designing a robust hierarchical scheme requires means of guaranteeing that the enriched basis is sufficient. In view of the discussion above, we shall henceforth consider alternative methods to estimating (and controlling) the error in macroscopic stress from each microscopic problem.

Generation of data
Design of input data / loading directions Based on the Concentric Sampling (CS) approach proposed by Kunc and Fritzen (2018), n d almost uniformly distributed unit vectors / directions d (i) ∈ R 6 (i = 1, . . . , n d ) are generated. Samples along the generated directions are considered with an exponentially growing step width from the origin. This ansatz follows the mechanical reasoning, which expects the constitutive relation to behave strongly nonlinear close to ε = 0, followed by a pronounced change in slope at the onset of plastic yielding and a saturation behavior for increasing load amplitudes. The primal strain datasetD ε is addressed aŝ with the primal strain norm discretization D r and set of directions D d . The definition (15) corresponds to a tensor decomposition into direction and amplitude. For many materials the volume changes are rather small compared to isochoric deformations. This effect is particularly pronounced for (pseudo-)plastic materials. In order to sample the strain space in a problem specific manner, a rescaling of the strains defined in (15) may be convenient. The present work solely rescales the spherical part (sph) of each primal strain (i.e. the dilatation), while the deviatoric part (dev) remains unchanged. The actual strain dataset is described by The number of strain samples #(D ε ) is given by the product of number of the directions #(D d ) and the number of amplitudes per direction #(D r ).
Generation of output data For the training of the artificial neural networks (ANNs), training (T), validation (V) and random (Monte Carlo -MC) datasets, referred to as D T ε , D V ε and D MC ε , respectively, are generated. The latter are not obtained using CS, but using a uniformly random set of directions in strain space. They are mainly used for unbiased testing of the surrogate independent of the proximity to the training and validation set. The output of interest in the present work is, primarily, the effective stress, but also some error measures for the derived surrogates, which will be defined in the following sections.
Technically, the process of generating the data samples is challenging. In order to obtain reliable data, the FOM and the ROM must be evaluated thousands of times in order to obtain the needed data. Each sample consists of an effective strainε and the related effective stressσ. In order to boost the performance of the simulations, a ROM-preconditioned solver for the FOM has been developed: First, an accurate (i.e. high-dimensional) ROM is solved for each load path. Then the FEM is accelerated by taking the ROM solution as initial guess for the nodal displacements during the first increment and, during the subsequent load steps, by taking the ROM displacement increment as initial guess for the FEM displacement adjustment. This not only brings the initial guess close to the final solution but it also leads to an accurate global stiffness matrix that can be combined with Quasi-Newton techniques. Overall, this approach provides significant computational improvements over a naive FE based data generation. Further, it is noteworthy that the high-dimensional ROM solution can be used to derive a hierarchy of lower-dimensional ROM solutions needing virtually no additional Newton-Raphson iterations via linearization. More precisely the trailing entries of a ROM solution can be eliminated by making use of the Schur complement which leads to an adjustment of the remaining reduced coefficients. In our tests this downscaling of high quality ROM solutions to N -dimensional ROMs proved an efficient tool.

Surrogate model for the effective stress
Feature design For the successful training of ANNs the normalization of the input and output data and the design of appropriate inputs (usually referred to as features) through linear or nonlinear transformations is essential. Compared to image data and convolutional neural networks, which usually take advantage of the intrinsic connection of image data and convolution, the present input data (strain data) is low-dimensional and necessarily requires sensible mechanical guidance during feature design. From a pure data-driven perspective, general batch normalization can greatly improve the prediction quality of a network. But in the present problem setting the input and output data have a clear physical nature. Therefore, based on mechanical reasoning, the consideration of the dependency of the material law on the spherical (ε • ) and deviatoric (ε ) degrees of freedom of the strain offers a material theoretic starting point. This linear transformation is addressed as Additionally, the deviatoric part of the strain can be split into its norm and direction After either of these transformations, a corresponding normalization is performed in order to prepare the strain features for the subsequent evaluation through the ANN: For T sd1 , each component of the vector T sd1 (ε) is shifted and then divided by its corresponding mean and standard deviation over the training dataset D T ε , i.e. component-wise shifting and scaling are applied. For T sd2 , the first component (i.e. the volumetric strain) is scaled according to the standard procedure while the deviatoric strain amplitude is divided by its peak value and the deviatoric direction remains unchanged. In the following the shifted and scaled inputs are referred to as Architecture of the neural network Artificial neural networks consisting of L > 1 layers are considered. For each layer l = 1, . . . , L consisting of n [l] neurons the inputs complemented by n [0] = D. The weights and biases of the ANN are parameters, which need to be calibrated with training data by solving a unconstrained optimization problem. The choice of activation functions is an abstract parameter that can heavily influence the quality of the surrogate. Its selection depends on the intuition of the user, complemented by thorough testing in terms of architecture sweeps. In the present context, the differentiability of the stress surrogate is aspired, as it allows for a computation of the tangent stiffness at low computational expense through automatic differentiation. This requirement naturally favours smooth activation functions. Our ANN implementation is based on Python3 (v3.4.3) using Google's TensorFlow library (v1.12.0), which offers automatic differentiation capabilities. For architecture tests the following activation functions have been used: • the softplus function (SP) a(x) = log(1 + exp(x)) • and the hyperbolic tangent (TANH) a(x) = tanh(x).
The identity function (Id) allows to pass unaltered input, such that a linear combination of the activation functions of the previous layer is returned. This is particularly desired in the last layer, in order to obtained an optimized linear combination of nonlinear functions as final output y = x [L] of the ANN. The evaluation of a single input strain through the whole ANN is addressed by the composition of all layers Loss function The training of the ANN requires an objective function that provides an error respecting the nature of the outputs. In the context of ANNs, the objective function is referred to as loss function. Similar to the inputs, the outputs should also be scaled using an invertible transformation Here, the same transformations T sd1 and T sd2 as for the inputs are considered for T σ during architecture testing. The evaluation of the ANN is analogously abbreviated as In this work, the mean squared error (MSE) is chosen as the loss function This is to emphasize that error is computed on the rescaled dataset and not on the actual stresses. The MSE (23) is then optimized with respect to the ANN parameters, i.e., the weights and biases are identified starting from a random initialization. The ANN output is then obtained through an inverse transformation The quality of the ANN during training is checked, not with respect to the training dataset, but with the validation dataset D V ε via the mean relative norm error (MRNE) In addition to that, the mean coefficient of determination R 2 σ of the effective stress is evaluated The coefficient of determination is strictly positive and bounded by one which is attained if and only if the surrogate coincides with the reference for all queries.

Surrogate model for the error in the quantity of interest
Error regression and classification In this section, we are interested in the calibration of ANNs taking strain data as input and delivering quantitative and qualitative error estimates for the stress. On the one hand, for a given strain, it might be of interest to predict the error of stress surrogate against the FOM stress. On the other hand, it might not be of particular interest to know the exact error value, but rather to know if the error is acceptable, i.e. if it is smaller than a prescribed tolerance. The quantitative error prediction leads to a classical regression problem, whereas the binarized response gives rise to an ordinary classification problem.
In the error regression problem, for a given modelσ M ∈ {σ RN ,σ ANN } of the effective stress, we are interested in the absolute and relative norm errors For the error classification problem, we consider the indicator function with prescribed absolute and relatives tolerances τ a and τ r , respectively. The outcome of χ M is particularly useful in order to decide on the subsequent treatment: For χ M = 1, the error is considered acceptable and the surrogate can be used, while χ M = 0 should trigger an adaptive refinement. For instance, the classifier χ M can decide if the stress surrogateσ M at a macroscopic integration point is acceptable or whether a more dedicated surrogate is needed. For error regression and classification, the fully connected feed forward ANNs as described by (19) and the same activation functions as in Section 2.3.2 are used. For the binary classification the final ANN layer is regarded as a log-probability with a single neuron. This setup is usually referred to as logits in binary classification.
Loss function One of the desired properties, considering possible safety requirements in the error regression and classification, is to obtain if not accurate, then at least conservative results. In order to achieve a conservative behavior, for the error regression problem we consider the function which changes the slope for negative input values to α. The function φ α can be used to penalize underestimation of the error (for α > 1) when applied to the scalar argument of the MSE for the true error e and its ANN surrogateẽ The MSE α is considered as the loss function for error regression, where α acts as a penalty parameter. The corresponding R 2 value and the relative conservative amount (RCA) over the validation dataset are used to assess the quality of the prediction. For the error classification, due to the binary nature of (28), the last layer of the ANN is defined as the composition of a standard sigmoid function and a shifted step function, i.e., The loss function for classification chosen in this work is the weighted binary cross entropy Herein, false positive predictions dominate the cross entropy for w > 1, while 0 < w < 1 puts the focus on false negative classification. We define the overall accuracy of the classifier as the expectation of finding the same response in the true indicator χ and in the surrogateχ C : Further, the accuracy within the bin b ∈ {0, 1} is defined as the conditional probability The reader should note, that ACC 0 is more relevant when seeking conservative estimates. Only if ACC 0 and ACC 1 are close to unity, then the overall classification is robust, while for seemingly good ACC (e.g. around 0.98) the critical ACC 0 could be inappropriate. This effect is particularly important if the surrogate has only few outliers requiring further processing.

General hybrid approach
In order to build a twoscale simulation model relying on the finite element method on the larger scale, the material model must be replaced by the homogenized response of the heterogeneous solid. In Sections 2.1.2 and 2.3.2 the use of ROM and ANN serving as surrogates for the effective stress tensor and the effective tangent stiffness are described in detail. Both surrogates can be combined by introducing an indicator function χ(x) : Ω → {0; 1} which adaptively selects between the rapid and purely data-driven (but less physical) ANN if χ = 1 and the physics-driven ROM for χ = 0. The indicator function represents the binarized confidence in the accuracy of the ANN surrogate. First, a simple ansatz for χ is chosen by setting χ to one if the current strain at the macroscopic position x ∈ Ω falls within the region covered by samples during the training of the ANN. In the present study this is equivalent to the kinematic indicator Here, ε 0 = max(D r ) is the peak amplitude used during Concentric Sampling and · W denotes a weighted norm that transforms elements of D ε defined via (16) back into normalized directions: The use of the ROM outside of the training domain is motivated by its reluctance to energy minimization, i.e. by preserving the key physical characteristics of the full order model while restricted to a relevant subspace of the solution manifold. A second indicator can be obtained by evaluating the accuracy of the ANN. Therefore, a binary classifierχ C : Sym(R 3×3 ) → {0, 1} is employed following the procedure outlined in Section 2.3.3. The indicator function is then replaced by the classifier: χ(x) =χ C (ε(x)).

Technical issues related to on-the-fly model switching
At first, the concept of the indicator function χ marking the confidence region for the ANN and employing the ROM elsewhere sounds straight-forward. However, this simple approach does not work in practice as the two concurrent surrogates do not provide continuous approximations of the stresses. This can be illustrated by letting C ⊆ Sym(R 3×3 ) denote the confidence region of the ANN in strain space. On the boundary ∂C of the confidence region there is a hard transition between the two surrogates which induces a stress jump on the boundary ∂C, leading to a non-smooth material response. When switching between ANN and ROM on-the-fly, i.e., when deciding for each query adaptively which surrogate should be evaluated, convergence of the macroscopic problem is disrupted, rendering the straight-forward implementation of a quality indicator guided adaptive procedure infeasible. In order to resolve this issue two algorithms are described below.
Staggered hybrid ANN/ROM algorithm The first approach consists of a staggered procedure, where the ANN is used as the only stress surrogate in a first run of the twoscale simulation (see Algorithm 1). Thereby, a first overall response is gathered. This is followed by a second run, in which the subset of all integration points having seen a zero quality indicator during any of the load steps of the first run are enforced to use the ROM surrogate. This set is then kept constant, i.e. switching from ANN to ROM is one way. This procedure enables the use of the ANN solution as an initial guess for the subsequent hybrid run which leads to low iteration counts and improved performance. During the second run, the difference of the ANN and the ROM can be evaluated to provide valuable post-processing data in order to better understand the quantitative impact of the model modifications, see also examples in Section 3.3.2. Two major disadvantages of this approach are (i) the irreversibility of the ROM activation which can lead to substantial computational costs and (ii) the possible failure during the first run, if the ANN surrogate becomes nonconvergent. The latter can, e. g., occur if the local magnitude of ε on the macroscale falls way outside of range of the training data.
Adaptive on-the-fly ANN/ROM algorithm A second on-the-fly model selection procedure, solving both of the aforementioned issues, is described in Algorithm 2: It re-initializes the quality indicator in favor of the ANN at the beginning of each load increment. During the subsequent nonlinear Newton-Raphson iterations of the same increment, the indicator is updated in a monotonic way, i.e. switching from ANN to ROM is allowed but not vice verse (see line 5 in Algorithm 2). The computational efficiency can be improved by substituting only part of the equilibrium iteration by the ROM. of the two phases. The inclusion is assumed linear elastic with properties mimicking a ceramic inclusion made of SiC (E = 400 GPa, ν = 0.2), see Figure 1. The volume fractions of the two plastic layers are 46.73% each and the one of the inclusion is 6.54%. The material was designed to induce a directional dependency of the effective material behavior (see right plot in Figure 1 for an example). This feature makes the identification of the unknown homogenized response more challenging and, thereby, a benchmark problem for the developed methodology is designed.

Effective stress surrogate
The strain space is sampled as described in Sect. . . , L−1}, yields that none of the activation functions (RELU, SP, TANH) show a remarkable advantage over the other, even for as large number of epochs as 10,000 with whole batch training for a learning rate of 0.001 using an ADAM optimizer. However, the feature design of input (effective strain) and output data (effective stress of the FOM) has a major influence. Hereby, the most successful combination is identified to be the use of the spherical-deviatoric transformation T sd1 for the input as well as for the output. The transformation T sd2 did not show major advantages in the final objective function values. Based on the initial architecture testing, the softplus function (SP) has been chosen to power further investigations, due to its monotonic and differentiability properties in regard of an expected monotonic stress behavior and need for tangent operators for future FE multiscale computations. In Tab. 1, different architectures are tabulated, showing the performance of each ANN. Based on the MRNE and R 2 σ values for the validation dataset (and the corresponding values MRNE MC and R 2 σMC evaluating the MC dataset), the ANN1 comprised of six layers with five softplus hidden layers and 128 neurons per hidden layer is chosen for the final evaluation. In Fig. 2 the prediction of ANN1 for the von Mises effective stressσ vM is depicted for six directions of the training and validation datasets, showing a good agreement with the FOM data. Hereby, the reader should take into account, that the ANNs have been trained with strain data up to a norm of 0.04 in the primal strain setD T ε (corresponding to the last data point for each loading direction in Fig. 2). The behavior of the ANN1 beyond this norm value was expected to tend to keep increasing due to the properties of the softplus function. However, due to the tendency of the ANN to increasingly overestimate the stresses and the artificial stiffening at load amplitudes beyond the training data, ANN1 is not expected to deliver accurate results beyond an effective strain norm of approximately 0.04 in respect to the primal strain setD T ε .

Error surrogates
For the error regression and classification, it is first necessary to gain an overview regarding the quality of the N -dimensional ROMs and of the best of the trained ANN effective stress surrogatesσ ANN1 of the previous section. In Fig. 3 the cumulative distribution function of the absolute norm error e M a (ANE) and of the relative norm error e M r (RNE) for the validation set D V ε are shown for ROMs of different dimensions N and forσ ANN1 . It is clearly visible that for increasing ROM dimension, the accuracy of the ROM improves for both, the ANE and RNE. This is expected, since the higher the ROM dimension, the richer the underlying function space, i.e. the distance to the solution manifold of the full order model decreases. It should be noted that the ANN effective stress modelσ ANN1 performs well against ROM16 and ROM24. The ROM32 yields a mean ANE of 1.019 MPa and a mean RNE of 0.007. It is from now on assumed that the accuracy of the ROM32 suffices for future multiscale FE simulations, i.e. an a priori quality assessment is made.
We first demonstrate the error regression in terms solely of the N -dimensional ROMs for the corresponding ANE and RNE. These error measures could be used for an adaptive selection of a ROM after having access to its estimated errors. An architecture test for ANNs with number of layers L ∈ {3, . . . , 6}, neurons per hidden layer n [l] ∈ {16, 32, 64}, up to 10,000 epochs and whole batch training is performed. A selection of the trained ANNs is tabulated in Tab. 2.
The ANNsẽ R16|1/2 a , tabulated in Tab. 2, are depicted in Fig. 4. The influence of the penalty parameter α,   (29), can be seen in Fig. 4 (left plot), where it becomes visible that the larger amount of points are found on the upper side of the diagonal, i.e., the predicted error is larger than the error in the validation set. This is reflected in the relative conservative amount (RCA), see Tab. 2. The usual trade-off is that increasing α yields conservative behavior (i.e., a higher RCA), but reduces the accuracy in terms of R 2 e . Analog behavior is observed for e r , as tabulated in Tab. 2 (bottom half). Large values of α yield reduced R 2 e values, due to the dilemma of balancing a reduction of the loss function, while preserving conservative behavior.
The error classification is conducted for the absolute and relative tolerances τ a = 2MPa and τ r = 0.02, respectively. Architecture testing for L ∈ {3, . . . , 6} and n [l] ∈ {16, 32, 64} for the hidden layers yield varying quality of results depending on the weight w on the false positive. Depending on the number of positive and negative outcomes, the weights should be adapted. For the architecture testing of this work, the ratio is considered. If the number of negative outcomes in the training data #(D T ε : χ(ε) = 0) is higher than the positives, then w 0 > 1 holds. The consideration of w = w 0 in the binary cross entropy partly equilibrates the influence of the false positive (i.e. classified accurate but violating the tolerance) and false negative (i.e. classified inaccurate but within tolerance). But it may also overly bias the cross entropy during training, yielding poor accuracy in one bin. Therefore, w is sampled between unity and w 0 in four evenly spaced steps during architecture testing. A selection of trained ANNs is tabulated in Tab. 3. Classification ANNs with acceptable accuracies with respect to the validation dataset are obtained for the 16-, 24-, and even for the 32-dimensional ROM. These ANNs, denoted asχ R16/24/32 in Tab. 3, offer, in principle, the opportunity for an adaptive ROM scheme, in which for a given effective strain the lowest-dimensional but still acceptable ROM can be automatically identified for the chosen tolerances. In addition to the error classification of different ROMs, an attempt to classify the quality of the ANN labeledσ ANN1 in Tab. 1 is made for the same tolerances, see last row in Tab. 3. The surrogateσ ANN1 has already intrinsic information of the training dataset, due to its optimization in respect to this dataset. In order to avoid an over-calibration, the training, validation and Monte Carlo datasets have been concatenated, randomly reordered and split into new training and validation datasets containing 90% and 10% of the data, respectively. The classifier forσ ANN1 is trained on these new datasests. An extensive architecture test is performed with the same parameters as for the ROMs. The classifier forσ ANN1 , denoted asχ ANN1 in Tab. 3, reaches acceptable accuracies, but notable lower than the ones achieved for the ROM classifiers. In retrospective, a justification for the lower performance of the classifierχ ANN1 is found in the higher regularity of the ROM solution that is matching the behavior of the full order model. This is explained by the ROM inheriting the mathematical structure and the physical principles of the FOM.

Twoscale problem
The presented hybrid methods introduced in Algorithms 1 and 2 are used in actual three-dimensional twoscale simulations. The results are compared to FE 2R simulations (in the spirit of Fritzen and Hodapp, 2016) in which the reduced order model is used as a stress surrogate in all points of a macroscopic structure which is considered as a reference based on the high accuracy of the ROM with 32 modes (see Figure 3, Section 3.2.2).
The macroscopic problem, see Figure 5, is borrowed from (Fritzen and Hodapp, 2016) and three different mesh densities are considered on the macroscopic level: M1 (1,734 elements/13,872 int. points), M2 (6,318 elements/50,512 int. points) and M3 (53,790 elements/430,320 int. points). All three models consist of trilinear hexahedral elements with selectively reduced integration (i.e., B-bar elements are used). The loading in terms of a 2% stretch of the specimen is applied in 10 equally spaced increments up to 0.2% followed by nine increments of 0.2% amplitude each in order to better cover the transition between elastic and elasto-plastic behavior for Algorithm 2. The reference solution and the computations using Algorithm 1 are computed only for the larger increments, i.e. 10 increments of 0.2% each.

Staggered adaptive procedure cf. Algorithm 1
First the staggered procedure introduced in Algorithm 1 is used. It is found that the first run that is relying on the ANN surrogate only achieves excellent runtimes when evaluating the ANN on graphics cards (here: one Nvidia GTX Titan Black), leading to runtimes of approximately 15 seconds for one evaluation of the surrogate at each of the 430,320 integration points of the finest mesh M3. It shall be noted that this includes a major execution overhead * . A general dilemma of twoscale simulations that was observed for the FE 2R method by Fritzen and Hodapp (2016) is also present here: Local outliers of the strain field attain magnitudes that quickly exceed the range of the inputs used during training of the ANN stress surrogate. The number of outliers becomes more relevant for the finer discretizations. This reveals a major short-coming of Algorithm 1: While the simulation for mesh level M1 terminated cleanly in roughly 3h wall-clock time with most of the computing time being spent during the hybrid ANN/ROM phase, M2 did not converge for loadings larger than 1.2% due to locally excessive strains that lead to spurious stress response of the ANN. The finer mesh M3 fails to converge beyond 0.8% of overall stretch. Additionally, the ANN version failed to improve the accuracy beyond a certain limit, i.e. it failed to achieve quadratic convergence starting beyond a critical load amplitude. In Figure 6 a comparison of the tension force of the ANN-only run (lines) and of the subsequent hybrid run (symbols) is shown. During the hybrid run the number of integration points evaluating the ROM are determined from the quality indicator at the end of the last load step of the first run. For M1 this amounts to 960 out of 13,872 integration points (6.92%) * * . These numbers illustrate that the ROM must be evaluated approximately 42,000 times for M1 (44 Newton iterations were needed in total) which leads to a substantial computational effort. Surprisingly, the ANN-only run and the hybrid run are hard to distinguish from the overall force-stretch plots, i.e. the ANN appears to yield good accuracy for this test. This is supported by the rather small absolute errors in the effective stress tensor, see Figure 6 (right) for the final load and mesh M1. In summary, the staggered procedure can exclusively be used if the peak strain in the macroscopic problem is sufficiently low due to the aforementioned convergence issues. Then the solution can be expected to give accurate predictions.
In view of the number of quadrature points marked for use of the more reliable ROM, the adaptive scheme shows a steady increase when using the kinematic indicator χ K marking points outside of the training range as not trustworthy for the ANN.

Single pass on-the-fly adaptive algorithm
The crucial ingredient of the on-the-fly adaptive scheme, described in Algorithm 2, is the irreversible update of the quality indicator during each load increment. Thereby, alternating model selection is prevented. All three macroscopic models, M1, M2 and M3, converged without any issues. The resulting macroscopic tension force of all three is compared in Figure 7, where the hybrid curve from Figure 6 and the FE 2R curve for the ROM featuring 32 modes are also shown for M1. It is observed from Figure 7 (right) that all algorithms yield virtually identical results. Closer inspection reveals, however, that the FE 2R and adaptive algorithm have nearly indistinguishable slopes (despite a negligible shift), whereas the ANN model is slightly curved, i.e. it shows a qualitative difference towards the reference solution which gets more pronounced at increasing load amplitude.
The adaptive algorithm has the advantage that the number of macroscopic integration points that require evaluation of the ROM depends only on the current state. For the considered proportional loading, and when using the kinematic indicator χ K , the relative amount of integration points grows monotonically with increasing  Figure 8. In order to investigate the practical usefulness of the ANN classifier, a comparison of the on-the-fly adaptive simulation using the kinematic quality indicator and the same simulation supplemented by the ANN classifier discussed in Section 3.2.2 is considered. As expected, the solid yet not overly satisfying accuracy of the ANN classifier (see Table 3) induces a large number of additional ROM evaluations, thereby increasing the computation time considerably (approx. by a factor of 7), see Figure 9. Notably, the ANN classifier adds a considerable amount of points at the left and right constriction and close to the holes. In order to assess the relevance of these additional points, the macroscopic tensile force was investigated: It varies less than 0.3%, except in the very first load step with a difference around 0.6%.

Concluding summary
A multi-fidelity approach for generating surrogate models of the effective stress tensor for the use in twoscale simulations is developed in Section 2.1. At first, a ROM is derived from data gathered during full field simulations. The estimation of the error in the effective stress tensor (representing the QoI) of the ROM is discussed from a theoretical perspective in Section 2.2. The mathematical structure of the error estimate reveals, that the ROM error estimation produces computational cost that is almost equivalent or even beyond that needed to solve a more dedicated ROM, thereby making it hard to justify such estimates when in the need for computational efficiency.
In our view this dilemma can only be resolved by finding alternative surrogates with low computational complexity but moderate to good accuracy complemented by adaptive strategies for local model refinement that employ costly computational methods only when needed. In this regard, ANNs are seen as promising candidates for the calibration of surrogate models for the effective stress and for classification that can trigger adaptive refinement. In Section 2.3, the layout and the theoretical background of ANNs are discussed, together with different feature designs for the inputs and outputs based on the mechanical nature of the strain and stress. For the calibration of the stress surrogate, the mean squared error is used as loss function, while the quality of the trained ANN is checked on the validation dataset with the mean coefficient of determination and the mean relative norm error. In the case of error regression, a penalized mean squared error is proposed, which allows the conservative calibration of trained ANNs. For the error classification based on prescribed tolerances, the weighted cross entropy is used in order to allow for a better focus on the more important warning case, if the warning case density is low. Based on the proposed models, the core contribution of the present work constitutes two modeladaptive algorithms which encompass convergence issues encountered in the naive implementation of on-the-fly adaptive surrogate selection, see Section 2.4. The first staggered algorithm is based on a two run approach, in which the first run is conducted solely with the ANN effective stress surrogate and flags points evaluated outside of the strain training region, such that only these points are evaluated with the high-accuracy ROM in a second run. The second algorithm offers a more flexible on-the-fly model-adaptive approach by allowing the re-initialization of the ANN at the beginning of each load increment. Numerical examples of the illustrated approaches are presented in Section 3 for a three-phase pseudo-plastic material with microstructure. First, ANNs are trained in order to approximate the effective stress. The surrogate of choice,σ ANN1 , achieved a mean relative norm error of 0.0189 and a mean coefficient of determination of 0.9995 and yields an accurate tangent stiffness, due to its formulation on the automatic differentiation capabilities of the TensorFlow library. The accuracy of the ANN stress surrogate is found to range between ROMs of dimension 24 and 32, respectively (see Section 3.2). The ansatz for error regression of ROMs of different dimension is presented, showing the possibilities for a calibration of a conservative ROM error estimator. In view of subsequent twoscale simulations with adaptive model selection, error classification is carried out for ROMs of different dimension and for the trained ANN stress surrogate. The achieved accuracies for the ROMs are higher than for the ANN stress surrogate, which indicates that the physics-informed ROM still shows a clearer pattern than the trained ANN stress surrogate, due to its inherited mathematical structure and underlying physical principles.
The trained ANNs are then used in twoscale mechanical FE simulations, based on the two developed algorithms of Section 2.4. The staggered algorithm produces sensible results but has two limitations: First, the number of macroscopic quadrature points marked for correction grows irreversibly. Second, the ANN surrogate must be sufficiently robust and of-at least-moderate accuracy in a prohibitive part of the strain space. This requirement stems from the fact that local strain outliers lead to queries that are way outside of the usual training range of the ANN. This effect is found to be more pronounced when the macroscopic mesh density is increased which further complicates the robust surrogate construction using purely data-driven methods in general, see Section 3.3.2. The second algorithm offers a true on-the-fly adaptivity in which the ANN surrogate can be recovered, e.g., during unloading. It is observed in Section 3.3.3 that this second algorithm offers the fastest convergence among the considered twoscale simulations being approximately 5-10 times faster than the staggered algorithm and around 20 times in comparison to the fully coupled FE 2R algorithm using the ROM with 32 modes for all stress predictions. The adaptive on-the-fly model of the second algorithm offers, therefore, an attractive approach which combines a low number of ROM evaluations with good convergence.
The final test using the additional error classifier for the ANN stress surrogate introduced a high number of additional negative outcomes (i.e., ANN error greater than tolerances), considerably increasing the number of integration points requiring the ROM. This was expected due to the low accuracies achieved during the training of the classifier, more specifically, due to the low accuracy for the positive outcome ACC 1 and corresponding high amount of positive outcomes reflected by w 0 , see Tab. 3. On the one hand, this last approach offers a conservative twoscale scheme relying on the robust ROM. On the other hand, the approach loses computational efficiency, which leaves the error classification for rapid multiscale problems still an open issue. Future improvements should, therefore, focus on further theoretical or hybrid error estimators and an improved error classification for the effective stress. The latter could benefit from datasets spanning larger portions of the strain space. The authors are convinced that the ambitious goal of reliable twoscale simulations can only be achieved by fusing data-driven methods together with dedicated theories. For example, the reader should consider that at least two quality indicators are indeed necessary for a reliable model-adaptive ansatz. One quality indicator should address the trained ANN stress surrogate over the sampled strain (input) space and one quality indicator should definitely return a warning if the input leaves the training region. This is quintessential, since the range of macroscopic strain is not a priori known and the behavior of the purely data-driven ANN outside the training region may endanger the convergence and quality of the multiscale simulation. The results of the present work support the usability of ANNs in computational materials science, although further research ideally addressing the combination of theory and data is urgently required.

Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author Contributions
FF conducted the finite element and reduced order model computations on the microscale and the twoscale simulations based on his self-developed code and wrote the corresponding sections. MF implemented, trained the artificial neural networks and wrote the corresponding sections. FL conducted together with FF preliminary work on theoretical error estimates solely based on the reduced order model, which yielded the theoretical insights for the computational expenses and the necessity for alternative approaches. FL wrote the corresponding section for the theoretical error estimation.