Efficient Exploration of Microstructure-Property Spaces via Active Learning

Morand, Lukas; Link, Norbert; Iraki, Tarek; Dornheim, Johannes; Helm, Dirk

doi:10.3389/fmats.2021.824441

ORIGINAL RESEARCH article

Front. Mater., 14 February 2022

Sec. Computational Materials Science

Volume 8 - 2021 | https://doi.org/10.3389/fmats.2021.824441

This article is part of the Research TopicVirtual Materials DesignView all 14 articles

Efficient Exploration of Microstructure-Property Spaces via Active Learning

Lukas Morand¹*

Norbert Link²

Tarek Iraki²

Johannes Dornheim^2,3

Dirk Helm¹

¹Fraunhofer Institute for Mechanics of Materials IWM, Freiburg, Germany
²Intelligent Systems Research Group ISRG, Karlsruhe University of Applied Sciences, Karlsruhe, Germany
³Institute for Applied Materials—Computational Materials Sciences IAM-CMS, Karlsruhe Institute of Technology, Karlsruhe, Germany

In materials design, supervised learning plays an important role for optimization and inverse modeling of microstructure-property relations. To successfully apply supervised learning models, it is essential to train them on suitable data. Here, suitable means that the data covers the microstructure and property space sufficiently and, especially for optimization and inverse modeling, that the property space is explored broadly. For virtual materials design, typically data is generated by numerical simulations, which implies that data pairs can be sampled on demand at arbitrary locations in microstructure space. However, exploring the space of properties remains challenging. To tackle this problem, interactive learning techniques known as active learning can be applied. The present work is the first that investigates the applicability of the active learning strategy query-by-committee for an efficient property space exploration. Furthermore, an extension to active learning strategies is described, which prevents from exploring regions with properties out of scope (i.e., properties that are physically not meaningful or not reachable by manufacturing processes).

1 Introduction

1.1 Motivation

With regard to natural learning processes, Cohn et al. stated that “in many situations, the learner’s most powerful tool is its ability to act, to gather data, and to influence the world it is trying to understand” (Cohn et al., 1996). One attempt to transfer this ability to methods in the field of machine learning is called active learning. Active learning describes an interactive learning process in which machine learning models improve with experience and training (Settles, 2012). It is therefore contrary to the often used approach of gathering data a priori and learn from it afterwards. In fact, active learning couples both, sampling and training, what typically results in an efficient and broad data space exploration.

In terms of materials design, the exploration of data spaces is quite important. For a designer, it is essential to know all possible microstructure configurations and reachable properties of a material in order to increase the performance of a workpiece. How to delineate these configurations and properties is described for example in the microstructure sensitive design (MSD) approach (Adams et al., 2001). Two necessary steps for MSD are 1) to determine the design space yielding a hull of possible microstructures and 2) to calculate the corresponding properties defining a so-called property closure (Fullwood et al., 2010). For complex high-dimensional microstructure representations, exploring the design space is, however, challenging. It is even more challenging, if gathering data is cumbersome, because it is time-consuming (e.g., running complex numerical simulations), ties up manpower (e.g., labeling data manually) or laborious (e.g., performing experiments).

To the authors knowledge, the present work is the first one that uses the active learning strategy query-by-committee (Burbidge et al., 2007) for generating microstructure-property data in materials science. We show that the active learning strategy can be used to explore microstructure-property spaces efficiently and that the generated data is better suited to train accurate machine learning models than data generated via classical sampling approaches. Moreover, we present an extension to active learning approaches that aims to avoid sampling in regions with properties out of scope. This is necessary, as defining bounds for microstructure spaces is not always simple and it can happen that active learning techniques explore regions with properties that are physically not meaningful or not reachable by manufacturing processes.

1.2 Related Work

Active learning techniques can be grouped into three use-case specific classes, namely stream-based selective sampling, pool-based selective sampling and membership query synthesis (Settles, 2009). The first two mentioned are based either on a continuous data-stream or an a priori defined pool of data, from which the active learning algorithm can chose data points to be labeled. In membership query synthesis, in contrast, the algorithm is free of choice at which location new data points are created. Therefore, membership query synthesis is well suited for virtual data generation on the basis of numerical simulations.

Membership query synthesis goes back to Angluin (1988) for classification problems and to Cohn et al. (1996) for regression problems. The technique presented in Cohn et al. (1996) is called variance reduction. It aims to minimize the output variance of a machine learning model in order to minimize the future generalization error. Alternative approaches for regression problems are maximizing the expected change of a machine learning model when seeing new data (Cai et al., 2013) or committee-based approaches like query-by-committee (Burbidge et al., 2007), where the prediction variance of a committee consisting of multiple separately trained machine learning models is minimized. In the present study, we use the latter approach, as it is straightforward to implement and scales to complex models (i.e., neural networks). Recent research in active learning targets the application of deep learning models, see for example Stark et al. (2015) and Wang et al. (2016) for applications using convolutional neural networks and Zhu and Bento (2017), Sinha et al. (2019) and Mayer and Timofte (2020) for generative models.

Instead of actively sampling data spaces, classical space-filling sampling strategies can be used to generate data without considering the learning task, see Fang et al. (2000), Simpson et al. (2001) and Wang and Shan (2006) for an overview. In the following, we list some of the most popular space-filling sampling strategies. Latin hypercube design (McKay et al., 2000) aims to partition the dimensions of the input space into equidistant slices and places data points such that each slice is covered by one data point. Orthogonal arrays (Owen, 1992) are special matrices that define sampling with the aim to sample data spaces uniformly. In particular, orthogonal arrays can be used to generate uniform Latin hypercubes (Tang, 1993). Furthermore, low-discrepancy sequences can be used to cover spaces uniformly with data points, see for example Niederreiter (1988). Among others, popular sequences are the Hammersley sequence (Hammersley and Handscomb, 1964), Halton sequence (Halton, 1964) and Sobol sequence (Sobol, 1967). In addition to these methods, a common sampling strategy is to randomly draw samples from a uniform distribution. However, all of these approaches suffer from the curse of dimensionality, which means that the effort needed to sufficiently sample data spaces grows exponentially with the number of dimensions.

In materials design, the usage of classical sampling strategies is very common. The framework for data-driven analysis of materials that is presented in Bessa et al. (2017) for example, uses data generation on the basis of space-filling sampling methods, especially the Sobol sequence. In Gupta et al. (2015), dual-phase 2D-microstructures are generated by randomly placing particles in a steel matrix. In order to generate spatially resolved dual-phase microstructure volume elements, Liu et al. (2015b) uses evenly distributed data of phase volume fractions. Regarding homogenized microstructural features, in Iraki et al. (2021), Latin hypercube design is used to generate a data set of textures for cold rolled steel sheets. Even special sampling heuristics have been developed for generating sets of microstructure features, like in Johnson and Kurniawan, 2018.

Also, adaptive sampling techniques are used in materials design, however, in the sense of an optimization aiming to identify microstructures with targeted properties. In Liu et al. (2015a) and Paul et al. (2019), specific machine learning-based optimization approaches are presented that efficiently guide sampling to regions in the space of microstructures, where microstructures with desired properties are expected to be located. Further statistic-based approaches exist that use surrogate-based optimization (cf. Forrester and Keane, 2009), see Nikolaev et al. (2016), Balachandran et al. (2016), Lookman et al. (2017) and Lookman et al. (2019). Yet, as these approaches aim to find individual material compositions or microstructures for certain target properties, they are not applicable for sampling microstructure-property spaces broadly.

So far, only few publications exist, which describe the usage of active learning to train a machine learning model while generating microstructure-property data (Jung et al., 2019; Kalidindi, 2019; Castillo et al., 2019). The approaches presented therein are based on variance reduction using Bayesian models, like Gaussian process regression (GPR), cf. Seo et al. (2000). Such Bayesian approaches can have a tremendous advantage when working with experimental measurements. However, the computational complexity of GPR increases cubically with the number of data points. Furthermore, it is worth mentioning the Bayesian approach described in Tran and Wildey (2021) to solve stochastic inverse problems for property-structure linkages. It is the aim of the approach to model posterior probabilities for microstructures having desired properties. This is achieved by successively updating an a priori probability distribution with new sampled microstructure-property data points. These are generated by drawing microstructure samples from the actual probability distribution and evaluating their properties. Afterwards, the samples are accepted or rejected depending on a certain criterion. However, this so-called acceptance-rejection sampling can be disadvantageous in terms of sample efficiency, as the decision to accept samples is made after calculating properties.

1.3 Paper Structure

In Section 2 the concept of membership query synthesis is presented including an extension to avoid sampling in regions out of scope. Additionally, the applied query-by-committee approach is described in detail. In Section 3, three numerical examples are shown to demonstrate the advantage of using active learning to sample microstructure-property spaces. The results are discussed in Section 4. The work is summarized in Section 5 and a brief outlook on the application of active learning in virtual materials design is given.

2 Methods

2.1 Active Learning via Membership Query Synthesis

Membership query synthesis follows an iterative procedure that is sketched schematically in Figure 1, cf. Settles (2012). The procedure starts with an initial data set of input variables $X_{i} \in R^{k}$ and corresponding target variables $Y_{i} \in R^{l}$ . The mapping from input space $X$ to output space $Y$ is approximated by a learner:

f : X \to Y, y = f (x), (1)

where $x \in X \subset R^{k}$ and $y \in Y \subset R^{l}$ . The learner is realized by one or more supervised learning models. To apply active learning, it is essential that the learner’s prediction quality can be measured. On the basis of such a measure, an optimization is performed with the objective to find the location X* in the input space $X$ at which the learner’s prediction quality is likely worst. At this location, a new data point is generated in order to improve the learner. To get the corresponding target label Y*, the so-called oracle (in our case a numerical simulation) is queried. The obtained new data tuple X*, Y* is added to the data set and the procedure is repeated.

FIGURE 1

FIGURE 1. Iterative procedure of membership query synthesis, cf. Settles (2012).

2.2 Avoid Queries in Regions out of Scope

Following the procedure described in Section 2.1, new data points are generated over the whole input space. In many applications, this might be appropriate to improve the learner. However, when the input space bounds cannot be defined adequately, it is probable that the active learning algorithm queries for data in regions that are out of interest for the application case. To avoid sampling in regions out of scope, the original workflow depicted in Figure 1 can be extended with a region, cf. Figure 2. The purpose of the region filter is to limit the optimizer to regions in the input space leading only to output quantities of interest. In order to set up the region filter, the data points X_i get an additional class label c ∈ (0, 1), which marks if the output values are of interest or out of scope. The bounds in the output space that determine the class label can be defined by the user.

FIGURE 2

FIGURE 2. Iterative procedure of membership query synthesis (cf. Settles (2012)) including an extension to avoid sampling in regions out of scope. The class labels of the data points are represented by C_i ∈ (0, 1).

The region filter is realized by a binary classifier that partitions the input space depending on the class label by learning the mapping function

g : X \to c, c = g (x) . (2)

In fact, the described extension is similar to Bayesian optimization approaches that account for unknown constraints using classification methods, see for example Sacher et al. (2018), Heese et al. (2019) and Tran et al. (2019).

Often the amount of data that is out of scope is much lower than the data of interest. If this is the case, one-class classification methods can be used as region filter, such as isolation forests (Liu et al., 2008) or one-class support vector machines (Schölkopf et al., 2001). Both are unsupervised learning methods that delimit the input space, which is covered by data out of scope. Once trained, they are used to estimate the class membership of unseen data points. For more complex classification problems, this can also be achieved by deep learning approaches (cf. Chalapathy and Chawla, 2019), such as autoencoder neural networks (Hinton and Salakhutdinov, 2006; Sakurada and Yairi, 2014).

2.3 Query-By-Committee for Microstructure-Property Space Exploration

Originally, query-by-committee was introduced by Seung et al. (1992) for classification problems. In this study, the query-by-committee approach following Burbidge et al. (2007) for regression problems is applied. In this approach, the learner is realized by a committee of n regression models (here we use feedforward neural networks). Following the workflow depicted in Figure 1, the committee members are trained on the actual data set. However, in this work, each neural network is trained only on a subset of the data (RayChaudhuri and Hamey, 1995). In order to query new data points, the microstructure space is searched for the location at which the committee members disagree the most. Disagreement is defined by the variance of the neural network predictions s² (Krogh and Vedelsby, 1995):

s^{2} (x) = \sum_{η = 1}^{n} {(f_{η} (x) - \bar{f} (x))}^{2}, (3)

where f_η (x) denotes the property prediction of neural network η and $\bar{f} (x)$ denotes the mean over all predictions at location x. The location to query the next data point is determined by

X^{*} = \underset{x}{arg max} (s^{2} (x)) . (4)

Certainly, it is challenging here to chose the right number of regression models and to equip them with sufficient complexity (e.g., depth of neural networks). Depending on the mapping to learn, we suggest to assign a lower complexity to the regression models in the beginning, as the amount of initial data is typically low. With an increasing amount of data it is possible to increase the complexity of the regression models, which was, however, not done in this study. Regarding the number of regression models in the committee, the similarity of the query-by-committee approach to Bayesian methods like GPR is worth mentioning here. GPR can be interpreted as a distribution over functions (Williams and Rasmussen, 2006), which is also the case for the query-by-committee approach when the number of committee members goes to infinity. Though, the overall training time of the query-by-committee approach increases linearly with an increasing number of committee members.

In order to extend the query-by-committee approach to avoid sampling in regions out of scope, first, data points that exceed the predefined output bounds need to be filtered out from the actual data set. Then, a classifier is trained on these data points in order to delimit a region in microstructure space. This region is excluded from the optimization by adding a soft constraint to Eq. 4:

X^{*} = \underset{x}{arg max} (s^{2} (x) - W ⟨ ρ (x) ⟩), (5)

where ⟨⋅⟩ denotes the Macaulay brackets and ρ(x) denotes the distance of x to the decision boundary that is defined by the classifier. As s²(x) and ρ(x) can be of different magnitudes, the scalar weight factor W is introduced, which needs to be set in order to balance the optimization.

In this work, ρ(x) is determined by an isolation forest classifier. Isolation forest is an outlier detection method that consists of an ensemble of decision trees. Each tree partitions the input space randomly until all training data points are isolated. It is assumed that outliers typically lie in partitions with rather short paths in the decision tree structures. On the basis of the path lengths, an anomaly score (in the range of (0, 1)) can be defined for each observation, see Liu et al. (2008) for details. Therein, it is stated that data points with an anomaly score $< 0.5$ can be regarded as being normal. Consequently, the decision function ρ(x) can be defined by shifting the anomaly score to the range (−0.5, 0.5), and, in this work, by multiplying it by −1. The latter needs to be done because the isolation forest is trained only on microstructures that exceed the predefined property bounds. In this respect, two cases can occur in Eq. 5. If ρ(x) > 0, the optimization is punished such that the optimizer is forced to generate candidate microstructures that do likely not exceed the specified property bounds. Generating such microstructures then leads to ρ(x) ≤ 0, what does not affect the optimization at all.

To solve Eqs 4, 5, we use the differential evolution algorithm by Storn and Price (1997) as it is implemented in Python package scipy (Virtanen et al., 2020). The neural network models and the isolation forest classifier applied in this work are based on the implementation in Python package scikit-learn (Pedregosa et al., 2011).

3 Results

3.1 Toy Example: Dirac Delta Function

First, a simplistic extreme case is analyzed. The data generating process considered here is given by an approximation of the Dirac delta function via a Gaussian distribution

δ (x) = \frac{1}{| α | \sqrt{x}} e^{- {(\frac{x}{α})}^{2}}, (6)

with parameter α = 0.1. A set of 500 data points is generated by randomly drawing samples of x in the range of (−50, 50). Additionally, 500 data points are generated via query-by-committee. Therefore, a committee of five neural networks is set up, which are all trained on a random subset of 80% of the actual data. The neural networks consist of two layers with five neurons each. To avoid overfitting, early stopping (Prechelt, 1998) and L2-regularization (Krogh and Hertz, 1992) is applied with regularization parameter λ = 0.0001. As activation functions, rectifiers (ReLU) are used. The mean-squared-error loss function between true and predicted δ(x) is applied and optimized using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimizer (Liu and Nocedal, 1989). The approach is initialized with 100 randomly drawn samples.

The resulting 500 data points and their distribution over x are shown in Figure 3. The data points generated by random sampling are distributed almost uniformly over the input space. All data points are located in regions of lower δ(x). The maximum δ(x)-value in the randomly sampled data set equals 0.506 208 09. In contrast, the query-by-committee approach concentrates on sampling the region close to the peak of the approximated delta function. The maximum δ(x)-value in the data set equals 5.64189535, which is very close to the maximum of Eq. 6: δ(x = 0) = 5.64189584. The generated data is available online, see Morand et al. (2021).

FIGURE 3

FIGURE 3. The sampled input-output space of the approximated Dirac delta function δ(x) is shown above. 500 samples were generated via random sampling (A) and query-by-committee, labeled as QBC, (B). Below, the normalized Gaussian kernel density estimation is shown for the data sets generated via random sampling (C) and query-by-committee (D).

3.2 Identifying Material Model Parameters

The second example is about inferring material model parameters from given material model responses, which is (like typical materials design problems) an inverse identification problem, cf. Mahnken (2004). To solve it, neural networks can be used to directly learn a mapping from material model responses to model parameters (Yagawa and Okuda, 1996; Huber, 2000). Such an approach is for example applied in Huber and Tsakmakis (1999) to identify constitutive parameters of a finite deformation plasticity model on the basis of spherical indentation tests. As simulating spherical indentation tests is time consuming, the usage of active learning can be beneficial, because it efficiently explores the space of material model parameters and responses. This characteristic can be understood as goal-directed sampling, which is essential for the prediction quality of supervised learning models that are directly trained on inverse relations (Jordan and Rumelhart, 1992).

In this example, we analyze the identification problem described in Morand and Helm (2019), as it requires a special sampling, for which a knowledge-based approach has been developed. The data generating process is defined by the hardening model (cf. Helm, 2006):

H (s_{p}, β, γ) = \frac{γ}{β} (1 - e^{- β s_{p}}), (7)

where β and γ are material dependent parameters and s_p denotes the accumulated plastic strain. For the purpose of this study, the hardening curves are discretized into 20 equidistantly distributed points in s_p ∈ (0.0, 0.2). For sampling, we consider β and $\frac{γ}{β}$ being inside the ranges (5, 200) and (100, 400), respectively. This yields a parameter identification problem as it is illustrated in Figure 4.

FIGURE 4

FIGURE 4. Parameter identification problem. A data base of material model parameters and corresponding responses is set up (black dots and curves), which can be used to train a neural network (NN) on the inverse mapping. After training, the neural network is able to identify parameters for given hardening curves (red dashed line and cross).

In total 2,500 discretized hardening curves $H_{i} \in R^{20}$ are generated by varying β and $\frac{γ}{β}$ using 1) Latin hypercube design, 2) the proposed knowledge-based sampling approach following Morand and Helm (2019) and 3) query-by-committee. The knowledge-based approach from Morand and Helm (2019) is also based on Latin hypercube design, however, the parameter variations in β are manipulated such that the region of lower β-values is sampled more densely (as this region is significant for the shapes of the hardening curves). The configuration of the query-by-committee approach here is the same as in Section 3.1, except for the neural network complexity, which is increased to two hidden layers with 10 and 15 neurons. The initial data set consists of 100 randomly sampled data points.

The resulting sets of parameter tuples (β, $\frac{γ}{β}$ ) chosen by the three sampling strategies are shown in Figure 5 and are represented in the following by $B_{i} \in R^{2}$ . Per definition, Latin hypercube design samples the parameter space almost uniformly. In contrast, the query-by-committee approach samples the parameter space in a similar manner as it is done by the knowledge-based approach. Thereby, the region of lower β-values is sampled even more densely and, in contrast to the knowledge-based approach, also the bounds of the parameter space are sampled. Naturally, the three generated data sets have different effects on the prediction quality of supervised learning models.

FIGURE 5

FIGURE 5. The sampled parameter space of the hardening model described in Eq. 7. 2,500 samples were generated using Latin hypercube design (A), the knowledge-based sampling approach following Morand and Helm (2019) (B) and query-by-committee (C).

In order to show these effects, neural networks are trained on the data sets. As there is no ground truth to test the trained models, the generated data is also used for testing. Training and testing is done for both, the forward mapping

f : B \to H, h = f (b), (8)

and the inverse mapping as it is outlined in Figure 4

f^{- 1} : H \to B, b = f^{- 1} (h), (9)

where $b \in B \subset R^{2}$ and $h \in H \subset R^{20}$ . Here, $B$ denotes the space of hardening parameters and $H$ the space of discretized hardening curves.

The neural networks that learn the forward mapping consist of two hidden layers with 10 and 15 neurons and for learning the inverse mapping they consist of two hidden layers with 15 and 10 neurons. In both cases, the mean-squared-error loss function between true and predicted output quantity is applied and optimized using the limited-memory BFGS optimizer with L2 regularization of λ = 0.0001. Furthermore, early stopping is applied using a random subset of 10% of the training data for validation. Both networks use ReLU activation functions. To measure the performance of the forward models, the absolute error between the predicted curve H_pred and the true curve H_true is given by

Δ H = \frac{1}{20} \sum | H_{pred} - H_{true} | . (10)

To measure the performance of the inverse models, the curves are reconstructed using the predicted material model parameters (which yields H_recon) and compared with the true curves H_true:

Δ H = \frac{1}{20} \sum | H_{recon} - H_{true} | . (11)

Training and test runs were performed five times with different random validation splits. The averaged results are listed in Table 1. The neural network trained on the data set generated by query-by-committee reaches a similar performance than the neural network trained on the data generated by the knowledge-based approach when tested on the Latin hypercube samples. When tested on each other, the averaged mean is rather low for both approaches. However, one can observe that for modeling the forward relation, the neural network trained with the data generated by query-by-committee is slightly better than the one trained with the data generated by the knowledge-based sampling approach and vice versa for the inverse relation. Comparing both neural networks with the neural network trained on the data generated via Latin hypercube design, the latter is outperformed for every test set. The generated data is available online, see Morand et al. (2021).

TABLE 1

TABLE 1. Averaged mean ΔH for the neural networks that are trained and evaluated using the three data sets. The data sets are generated using LHD (Latin hypercube design), KBS (the knowledge-based sampling approach), and QBC (query-by-committee). The best result for each test set is marked in bold.

3.3 Artificial Rolling Texture Generation

As a third example, we analyze the problem of generating microstructure-property data, which is used to learn a forward mapping as a fundamental basis to solve materials design problems in Iraki et al. (2021). The specific materials design problem tackled therein is the identification of crystallographic textures for given desired material properties of DC04 steel sheets, see Figure 6 for an illustration. Basically, this is achieved by using a machine learning-based model that approximates the mapping from crystallographic textures to properties combined with an optimization approach. Alternatively, the identification problem can be solved by learning the inverse mapping. In general, solving inverse problems is challenging due to ill-posedness. In this example, the solution of the inverse problem is not guaranteed to exist, and if it exists, it is not guaranteed to be unique (in contrast to the previous example). Here, the uniqueness and existence of a solution is highly depending on the choice of desired properties. If the definition of desired properties is very specific, then it is rather unlikely that a microstructure leading to exact these properties exists. One way to tackle this problem is by defining target property windows (desired properties with tolerances), as is done in Iraki et al. (2021) for example.

FIGURE 6

FIGURE 6. Illustration of the texture identification problem. The space of rolling textures is described by d_j and the space of properties by p_j.

In Iraki et al. (2021), texture generation is done based on the rolling texture description model described in Delannay et al. (1999). In this study, the parameter ranges for the texture description model are defined as is described in Iraki et al. (2021). To calculate the properties of interest, a crystal plasticity model is used. The model is of Taylor-type and is set up following Kalidindi et al. (1992). For a detailed description of the Taylor-type crystal plasticity model, see Dornheim et al. (2022) and Iraki et al. (2021). Besides, instead of using a Taylor-type crystal plasticity model, also computationally expensive full-field models can be applied here. For the purpose of our study, we use the Taylor-type crystal plasticity model to determine the Young’s moduli E_φ and the Lankford coefficients (r-values) r_φ, both at 0, 45 and 90° to rolling direction for given crystallographic textures. In the following, the generated properties are represented by $P_{i} \in R^{6}$ . The material model parameters are chosen to represent DC04 steel (cf. Iraki et al., 2021). However, using the elastic constants for ferrite from Eghtesad and Knezevic, (2020), the Young’s modulus is slightly overestimated by our simulations.

In the following, we compare the generation of 5,000 texture-property data pairs using Latin hypercube design and query-by-committee. As r-values of rolled DC04 sheets typically do not exceed values of 5.0, we additionally apply the extension described in Section 2.2 to suppress generating data in regions leading to r > 5.0 (the factor to weight the soft constraint W in Eq. 5 is set to 100). For the query-by-committee approach, a committee of five neural networks is used with two hidden layers of 24 and 6 neurons. Every committee member is trained on a random subset of 80% of the actual data. The mean-squared-error loss function between true and predicted properties is applied and the limited-memory BFGS optimizer is used. Early stopping and L2-regularization with λ = 0.1 are applied. The activation function used is ReLU. For an initial data set, 100 texture-property data points are sampled randomly.

The obtained data sets are depicted as projections in property space in Figure 7 (E₀, E₉₀) and Figure 8 (r₄₅, r₀). The point cloud generated by Latin hypercube design comprises a much smaller region in property space compared to the ones generated by query-by-committee. Furthermore, the point cloud is concentrated at its center. Such a strong concentration cannot be observed in the point clouds generated by query-by-committee. Also the minimum and maximum values in both, E and r, that are found by the active learning strategies are more extreme than by using Latin hypercube design. However, the original query-by-committee sampling approach leads to unrealistic high r-values, cf. Figure 8B. In contrast, Figure 8C shows that this effect can be minimized by applying the extension to query-by-committee presented in Section 2.2. The applied region filter (isolation forest) limits the active learning search space in such a way that textures with high r-values are excluded. Therefore, the amount of textures in the data set that lead to unrealistic high r-values decreases dramatically compared to the data set generated without the query-by-committee extension. The latter includes 141 data points with r > 5, while the former includes only 11. The generated data is available online, see Morand et al. (2021).

FIGURE 7

FIGURE 7. 5,000 sampled texture-property data points projected into property space (E₀, E₉₀). Data points are generated on the basis of Latin hypercube design (A), query-by-committee (B) and extended query-by-committee with r ≤ 5.0 (C).

FIGURE 8

FIGURE 8. 5,000 sampled texture-property data points projected into property space (r₄₅, r₀). The data points are generated on the basis of Latin hypercube design (A), query-by-committee (B) and extended query-by-committee with r ≤ 5.0 (C).

To evaluate the effect of the applied sampling strategies on supervised learning models, we train and test feedforward neural networks on the generated data. For training and testing, we exclude the data points with r > 5.0. In contrast to Section 3.2, only the forward mapping is modeled, as the inverse relation cannot be learned directly using feedforward neural networks due to its non-uniqueness. For the forward mapping, first, we approximate the orientation distribution function of the generated textures via symmetric generalized spherical harmonics of degree 12 Bunge (2013). The constants $D_{i} \in R^{33}$ of this series expansion are used as texture representation (cf. Kalidindi et al., 2004). The feedforward neural networks are supposed to learn the mapping from texture space $D$ to property space $P$

f : D \to P, p = f (d), (12)

where $p \in P \subset R^{6}$ and $d \in D \subset R^{33}$ . Each neural network consists of two hidden layers with 30 and 10 neurons with ReLU activation functions. The mean-squared-error loss function is applied and optimized using the limited memory BFGS optimizer. L2 regularization with λ = 0.0001 is applied as well as early stopping using a subset of 10% of the training data for validation.

The performance measure for the neural networks used in this example is the mean absolute error for the Young’s moduli

Δ E = \frac{1}{3} (| E_{0, p r e d} - E_{0, t r u e} | + | E_{45, p r e d} - E_{45, t r u e} | + | E_{90, p r e d} - E_{90, t r u e} |) (13)

and for the r-values

Δ r = \frac{1}{3} (| r_{0, p r e d} - r_{0, t r u e} | + | r_{45, p r e d} - r_{45, t r u e} | + | r_{90, p r e d} - r_{90, t r u e} |) . (14)

Table 2 and 3 show the results of the trained neural networks, when tested on the generated data. Training and test runs were performed five times with different random validation splits. The mean and maximum errors were averaged over the data set. Both tables show that the neural networks trained with data generated by query-by-committee outperform the neural networks trained with data generated on the basis of Latin hypercube design. However, the differences in the averaged mean errors are not significantly high. In contrast, regarding the averaged maximum errors, the differences are much higher. When tested on the data set generated by Latin hypercube design, both neural networks that are trained with data generated by query-by-committee achieve similar results.

TABLE 2

TABLE 2. Average mean ΔE and Δr for the neural networks trained and evaluated on the three generated data sets: LHD (Latin hypercube design), QBC (query-by-committee) and QBC+ (query-by-committee with extension). The best result for each test set is marked in bold.

TABLE 3

TABLE 3. Average maximum ΔE and Δr for the neural networks trained and evaluated on the three generated data sets: LHD (Latin hypercube design), QBC (query-by-committee) and QBC+ (query-by-committee with extension). The best result for each test set is marked in bold.

4 Discussion

In Section 3.1, an extreme case is studied to emphasize the advantage of using active learning for the generation of microstructure-property data sets. The peak of the approximated delta function is chosen to be quite steep such that the probability of sampling data points on it by random sampling is rather small. As a result, random sampling covers the peak region of the delta function insufficiently. In contrast, by using query-by-committee, the peak region is explored extensively. As pointed out, even the maximum value in the sampled data set is very close to the maximum value of the approximated delta function. If we imagine the delta function expressing a relation between microstructures and properties, we can easily see the advantage for a designer to gain knowledge about the property peak in order to be able to improve the performance of a workpiece.

In Section 3.2, the query-by-committee approach is compared to a classical Latin hypercube design approach and a knowledge-based sampling approach for generating data of hardening model parameters and responses. Originally, the knowledge-based approach was developed in Morand and Helm (2019) to optimally sample the hardening model’s parameter space by incorporating knowledge about the model’s behavior. The results show that the query-by-committee approach is able to find a similar parameter distribution, but without manually introducing any expert knowledge. The data generated by query-by-committee is equally appropriate for training forward and inverse neural network models, which all outperform the models trained on the data generated by the baseline Latin hypercube design approach. All in all, the results show that by using query-by-committee, sampling can be performed automatically in a goal-directed way without additionally introducing expert knowledge.

Also, the results from Section 3.3 show that the query-by-committee approach is more suitable to sample microstructure-property spaces than classical space-filling sampling strategies. In this example, a space of artificial rolling textures is sampled aiming to efficiently explore the space of corresponding properties. A comparison of the spread of the generated properties point clouds reveals with which additional possibilities a designer can be equipped, when the design space is sampled via active learning. However, the original query-by-committee approach explores the texture space in regions that lead to unrealistic high properties. By using the extension to membership query synthesis that is presented in this paper, sampling in regions with unrealistic high properties can be suppressed. In fact, still some data points are generated in these regions, which are yet necessary for the binary classifier (region filter) to be trained. Nevertheless, compared to the classical query-by-committee approach, the amount of property data out of scope is much lower and sampling concentrates on the predefined region of interest. Consequently, when training a supervised learning model on the inverse relation (predicting textures for given properties), more extreme properties can be learned on the basis of data generated by query-by-committee.

Such a positive effect can also be observed on learning the forward relation. The neural networks trained on the data generated by query-by-committee both outperform the model trained with the data generated by Latin hypercube design. However, no significant differences in the averaged mean absolute error (shown in Table 2) can be seen. This is due to the fact that most of the data points are located near the center of the point cloud, which is where all the neural networks are quite accurate. In contrast, the differences in the averaged maximum errors are more significant. This is because the data sets sampled by query-by-committee contains more extreme data points than the data set sampled by Latin hypercube design. Furthermore, it can be seen that the neural network trained on the data generated by the extended query-by-committee approach performs worse than the network trained with the data from the original approach. This is a sign that the region filter limits the texture space too rigorously and further adjustment is needed. However, the general concept of the active learning extension is proven, as less samples were generated in regions out of scope compared to the original approach.

5 Summary and Outlook

The present paper shows that active learning can be used to efficiently explore microstructure-property spaces. By using the active learning approach query-by-committee, the focus of data generation is automatically shifted to sparse regions and nonlinearities. Subsequently, two main advantages of active learning in materials design applications follow: 1) regions in microstructure space that lead to extreme properties are explored extensively and 2) in contrast to classical space-filling sampling strategies, active learning can be used for goal-directed sampling, which is relevant for training direct inverse machine learning models. Future work is, however, necessary to investigate how the size of the committee, the fraction of the data used to train the committee members and the complexity of these affect sampling. Also it is necessary to benchmark the query-by-committee approach to the Bayesian approaches, which are mentioned in the introduction.

In general, a problem for active learning approaches arises, when the input space bounds are not set adequately. Then, regions in microstructure space are explored that lead to properties out of scope. However, sampling in these regions can be suppressed by using the extension presented in this work. Still one drawback of using active learning remains: In contrast to classical sampling strategies, active learning is time-intensive, as in every active learning cycle one or more machine learning models need to be trained and additionally an optimization has to be performed. Yet, the results of the present paper show that by using active learning, less data is needed to sufficiently cover microstructure-property spaces than it is the case for classical sampling strategies.

Therefore, regarding virtual materials design, the application of active learning techniques is suitable when sample-efficiency plays an important role. This is for example the case when data is generated using time-intensive numerical simulations, like for example on the bases of spatially resolved full-field microstructures. Also, active learning can help setting up multi-fidelity data bases by enriching less quality data with precisely sampled high quality simulation data or experimental data. Though, incorporating multi-fidelity data and experimental data has not been studied in this work and is part of future research.

Data Availability Statement

The data sets generated for this study can be found in the Fraunhofer Fordatis repository at https://fordatis.fraunhofer.de/handle/fordatis/219, see Morand et al. (2021).

Author Contributions

LM set up the active learning framework, developed the extension to membership query synthesis with a region filter, generated the results and wrote the manuscript. NL had the idea to develop the extension to membership query synthesis. TI, JD, and NL supported in terms of machine learning and helped fundamentally by discussing the approach. DH supported in terms of materials sciences and modeling. All authors contributed to the discussion of the results and to the summary and outlook.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank the German Research Foundation (DFG) for funding this work, which was carried out within the research project number 415 804 944: “Taylored Material Properties via Microstructure Optimization: Machine Learning for Modelling and Inversion of Structure-Property-Relationships and the Application to Sheet Metals”.

References

Adams, B. L., Henrie, A., Henrie, B., Lyon, M., Kalidindi, S. R., and Garmestani, H. (2001). Microstructure-Sensitive Design of a Compliant Beam. J. Mech. Phys. Sol. 49, 1639–1663. doi:10.1016/s0022-5096(01)00016-3