<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mater.</journal-id>
<journal-title>Frontiers in Materials</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mater.</abbrev-journal-title>
<issn pub-type="epub">2296-8016</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmats.2019.00168</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Materials</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Machine-Learning Informed Representations for Grain Boundary Structures</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Homer</surname> <given-names>Eric R.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/284719/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Hensley</surname> <given-names>Derek M.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/698076/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Rosenbrock</surname> <given-names>Conrad W.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/749503/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Nguyen</surname> <given-names>Andrew H.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/765773/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Hart</surname> <given-names>Gus L. W.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/731338/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Mechanical Engineering, Brigham Young University</institution>, <addr-line>Provo, UT</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Physics and Astronomy, Brigham Young University</institution>, <addr-line>Provo, UT</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Benjamin Klusemann, Leuphana University, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Michele Ceriotti, &#x000C9;cole Polytechnique F&#x000E9;d&#x000E9;rale de Lausanne, Switzerland; Robert Horst Mei&#x000DF;ner, Hamburg University of Technology, Germany</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Eric R. Homer <email>eric.homer&#x00040;byu.edu</email></corresp>
<corresp id="c002">Conrad W. Rosenbrock <email>rosenbrockc&#x00040;gmail.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Computational Materials Science, a section of the journal Frontiers in Materials</p></fn></author-notes>
<pub-date pub-type="epub">
<day>16</day>
<month>07</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>6</volume>
<elocation-id>168</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>02</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>06</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2019 Homer, Hensley, Rosenbrock, Nguyen and Hart.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Homer, Hensley, Rosenbrock, Nguyen and Hart</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>The atomic structure of grain boundaries plays a defining but poorly understood role in the properties they exhibit. Due to the complex nature of these structures, machine learning is a natural tool for extracting meaningful relationships and new physical insight. We apply a new structural representation, called the scattering transform, that uses wavelet-based convolutional neural networks to characterize the complete three-dimensional atomic structure of a grain boundary. The machine learning to predict GB energy, mobility, and shear coupling using the scattering transform representation is compared and contrasted with learning using a smooth overlap of atomic positions (SOAP) based representation. While predictions using the scattering transform are not as good as those of SOAP, other factors suggest that the scattering transform may yet play an important role in GB structure learning. These factors include the ability of the scattering transform to learn well on larger datasets, in a process similar to deep learning, as well as their ability to provide physically interpretable information about what aspects of the GB structure contribute to the learning through an inverse scattering transform.</p></abstract> <kwd-group>
<kwd>machine learning</kwd>
<kwd>grain boundaries</kwd>
<kwd>atomic structure</kwd>
<kwd>characterization</kwd>
<kwd>SOAP</kwd>
<kwd>scattering transform</kwd>
</kwd-group>
<contract-num rid="cn001">DE-SC0016441</contract-num>
<contract-num rid="cn002">MURI N00014-13-1-0635</contract-num>
<contract-sponsor id="cn001">U.S. Department of Energy<named-content content-type="fundref-id">10.13039/100000015</named-content></contract-sponsor>
<contract-sponsor id="cn002">Office of Naval Research<named-content content-type="fundref-id">10.13039/100000006</named-content></contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="2"/>
<equation-count count="6"/>
<ref-count count="60"/>
<page-count count="11"/>
<word-count count="8765"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Grain boundaries (GBs) in crystalline materials are complex structures that can have a significant influence on material properties. The structural complexity derives from the fact that when any two crystals are joined, there are macroscopic and microscopic degrees of freedom that influence their behavior. With a proper understanding of how material properties are influenced by these degrees of freedom, materials engineers could develop materials with enhanced properties. This has been accomplished in a handful of cases using GB engineering (Watanabe et al., <xref ref-type="bibr" rid="B57">2009</xref>; Randle, <xref ref-type="bibr" rid="B46">2010</xref>). Unfortunately, the majority of materials used in society have not benefited from these efforts as GB engineering primarily focuses on one special type of GB, the twin boundary. Continued efforts in tailoring material properties as a result of GB engineering will require a more complete understanding of GB structure-property relationships.</p>
<p>At the macroscopic level, the structural degrees of freedom are well known and defined by the crystallography of the joined crystals (Frank, <xref ref-type="bibr" rid="B19">1988</xref>; Patala et al., <xref ref-type="bibr" rid="B42">2012</xref>; Patala and Schuh, <xref ref-type="bibr" rid="B43">2013</xref>). At the microscopic level, the structural degrees of freedom are defined by the configuration of the atoms and the macroscopic degrees of freedom can be viewed as constraints (Tadmor and Miller, <xref ref-type="bibr" rid="B55">2011</xref>; Han et al., <xref ref-type="bibr" rid="B25">2016</xref>).</p>
<p>Since material properties are derived from the atom configurations, or microscopic degrees of freedom, more attention must be given to characterization of atom configurations at GBs. A full description of the microscopic structure is given by the position of all the atoms, leading to 3<italic>N</italic> positional degrees of freedom for <italic>N</italic> atoms. Due to the challenge of fully defining GB structures through their 3<italic>N</italic> degrees of freedom a variety of other structural metrics have been defined.</p>
<p>Among the commonly used structural descriptors of GBs are the structural unit model (Frost et al., <xref ref-type="bibr" rid="B20">1982</xref>; Sutton and Vitek, <xref ref-type="bibr" rid="B53">1983</xref>; Balluffi and Bristowe, <xref ref-type="bibr" rid="B2">1984</xref>; Rittner and Seidman, <xref ref-type="bibr" rid="B48">1996</xref>; Tschopp and McDowell, <xref ref-type="bibr" rid="B56">2007</xref>; Spearot, <xref ref-type="bibr" rid="B52">2008</xref>; Han et al., <xref ref-type="bibr" rid="B26">2017</xref>), dislocation arrays (Read and Shockley, <xref ref-type="bibr" rid="B47">1950</xref>; Bishop and Chalmers, <xref ref-type="bibr" rid="B9">1968</xref>; Wolf, <xref ref-type="bibr" rid="B59">1989</xref>; Medlin et al., <xref ref-type="bibr" rid="B38">2001</xref>), and common neighbor analysis (Honeycutt and Andersen, <xref ref-type="bibr" rid="B31">1987</xref>). These have unique capabilities and provide intuition primarily in characterizing quasi-2-dimensional GB structures but have limitations in characterizing fully 3-dimensional GB structures. More recently a number of other models have emerged to overcome limitations in the common techniques; these include polyhedral template matching (Larsen et al., <xref ref-type="bibr" rid="B34">2016</xref>), Voronoi cell topology (Lazar, <xref ref-type="bibr" rid="B35">2018</xref>), and polyhedral unit model (Banadaki and Patala, <xref ref-type="bibr" rid="B3">2017</xref>).</p>
<p>As modern machine learning techniques push the limits of scientific discovery, there are several important lessons to learn from the deep learning community. The first is the remarkable discovery that the accuracy of a model can continue increasing, instead of asymptoting, as more data is added. That discovery required a universally applicable, generalized approach to extracting descriptors (i.e., features) from data using convolutional networks. These lessons should inform our approach to machine learning in materials. Specifically, given the availability of algorithms and limited data in GB science, the important gap to fill is in the creation of universal descriptors that fully characterize the 3-dimensional GB structure.</p>
<p>Rosenbrock et al. (<xref ref-type="bibr" rid="B49">2017</xref>) recently introduced the use of two new descriptors that help address this gap. The first is the application of the Smooth Overlap of Atomic Positions (SOAP) formalism to GBs. Typical applications of SOAP include accurately modeling potential energy surfaces (Szlachta et al., <xref ref-type="bibr" rid="B54">2014</xref>; John and Cs&#x000E1;nyi, <xref ref-type="bibr" rid="B32">2017</xref>; Mocanu et al., <xref ref-type="bibr" rid="B39">2018</xref>) and reactivity (Caro et al., <xref ref-type="bibr" rid="B12">2018</xref>) of molecules (Cisneros et al., <xref ref-type="bibr" rid="B14">2016</xref>) and solids (De et al., <xref ref-type="bibr" rid="B15">2016</xref>; Sosso et al., <xref ref-type="bibr" rid="B51">2018</xref>), pressure, temperature, and composition phase diagrams of materials (Baldock et al., <xref ref-type="bibr" rid="B1">2016</xref>), defects (Dragoni et al., <xref ref-type="bibr" rid="B16">2018</xref>), and dislocations (Maresca et al., <xref ref-type="bibr" rid="B37">2018</xref>). SOAP is also convenient for characterization of GBs because it possesses the following desirable properties: (i) enables comparison between GBs, (ii) is invariant with respect to structural symmetries, rotations, and permutations, (iii) is smoothly varying while accommodating structural perturbations, (iv) is applicable to general, three-dimensional GB structures, and (v) is amenable to automated characterization and discovery of structures. Rosenbrock et al. (<xref ref-type="bibr" rid="B49">2017</xref>) also introduced a new descriptor called the local environment representation. This representation finds unique sets of local environments that are repeated throughout a set of GBs. In recent work, Priedeman et al. (<xref ref-type="bibr" rid="B45">2018</xref>) used the local environment representation and found that among 494,495 GB atoms, there were only 55 unique local atomic environments that were repeated in different combinations and arrangements to construct <italic>all</italic> the GBs.</p>
<p>Using these descriptors and their ability to compare environments, Rosenbrock et al. (<xref ref-type="bibr" rid="B49">2017</xref>) applied machine learning to predict both static and dynamic GB properties based on the static GB structure. The predictions for the static property of GB energy was the most accurate, which is reasonable considering that it is a property that is influenced by each atom&#x00027;s contribution to the whole energy. For the dynamic properties of mobility trend and shear coupling, however, the predictions were not as good, and it was reasoned that longer range information about atomic structures was likely required to make better predictions. Since SOAP is a local-environment descriptor, we propose that an alternative descriptor is necessary to characterize the structure at multiple scales. Importantly, the characterization metric must still be automated and satisfy invariance requirements.</p>
<p>We present the scattering transform (ST, Bownik, <xref ref-type="bibr" rid="B10">1997</xref>; Benedetto and Pfander, <xref ref-type="bibr" rid="B7">1998</xref>; Pfander and Benedetto, <xref ref-type="bibr" rid="B44">2002</xref>; Ben&#x00301;&#x00131;tez et al., <xref ref-type="bibr" rid="B8">2010</xref>; Goh and Lee, <xref ref-type="bibr" rid="B22">2010</xref>; Goh et al., <xref ref-type="bibr" rid="B21">2011</xref>; Lanusse et al., <xref ref-type="bibr" rid="B33">2012</xref>; Mallat, <xref ref-type="bibr" rid="B36">2012</xref>) as a second, universal descriptor for GB systems that includes multi-scale features. We present its ability as a representation to learn energy, mobility, and shear coupling from GB structures, and compare the results with the published SOAP methodology. We also compare the results with a combined representation by SOAP and ST. While the results indicate that there is room for improvement, we demonstrate how additional data can improve learning by ST. Finally, we demonstrate how an inverse ST, using relevance propagation, can identify key features of the GB structure that are useful for the machine learned predictions.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and Methods</title>
<sec>
<title>2.1. SOAP</title>
<p>To generate the first representation, the averaged SOAP representation, we create a SOAP descriptor (Bart&#x000F3;k et al., <xref ref-type="bibr" rid="B6">2010</xref>; Bart&#x000F3;k et al., <xref ref-type="bibr" rid="B5">2013</xref>) for each atom in the GB. Briefly, the process of calculating the SOAP descriptor starts by placing a Gaussian on each local neighbor of a specified atom <italic>i</italic>.</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>/</mml:mo><mml:mn>2</mml:mn><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mtext>atom</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mtext>cut</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>f</italic><sub>cut</sub> is a smooth cutoff function that ensures compact support at radius <italic>r</italic><sub>cut</sub>, and <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the vector from atom <inline-formula><mml:math id="M3"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to <inline-formula><mml:math id="M4"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. We define these Gaussians as the species independent neighbor density of <italic>i</italic>. To simplify the representation of this neighbor density it is expanded in an orthonormal basis,</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>l</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mi>l</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>g</italic><sub><italic>n</italic></sub> are an orthonormal radial basis, <italic>Y</italic><sub><italic>lm</italic></sub> are spherical harmonics, and <italic>c</italic><sub><italic>i, nlm</italic></sub> are the expansion coefficients.</p>
<p>The overlap of two different site environments is defined to be:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mo>&#x0222B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msup><mml:mi>r</mml:mi><mml:mo>,</mml:mo></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>and is permutationally invariant (because of the sum over the <italic>j</italic> neighbors in &#x003C1;<sub><italic>i</italic></sub> of Equation 1). Rotational invariance is achieved by integrating over all rotations of one of its arguments,</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mo>&#x0222B;</mml:mo><mml:mi>d</mml:mi><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">|</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M8"><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is a 3D rotation operator (element of SO(3)), and <italic>p</italic> is a small integer, e.g., 2. The value for <italic>p</italic> loosely defines the &#x0201C;multi-bodyness&#x0201D; of the expansion, similar to how the power of a binomial relates to the number of cross-terms in its expansion. For example, (<italic>a</italic> &#x0002B; <italic>b</italic>)<sup>2</sup> &#x0003D; <italic>a</italic><sup>2</sup> &#x0002B; 2<italic>ab</italic> &#x0002B; <italic>b</italic><sup>2</sup>, where the <italic>ab</italic> cross-term shows interaction between <italic>a</italic> and <italic>b</italic>. Thus, <italic>p</italic> &#x0003D; 2 roughly corresponds to 2-body interactions and a value of <italic>p</italic> &#x0003D; 4 <italic>roughly</italic> corresponds to 5-body interactions. A more complete description for creating SOAP descriptors from local environments is documented in detail elsewhere (Bart&#x000F3;k et al., <xref ref-type="bibr" rid="B5">2013</xref>; Rosenbrock et al., <xref ref-type="bibr" rid="B49">2017</xref>).</p>
<p>This process has already been efficiently implemented and can be found in the Python-based <monospace>pycsoap</monospace> code<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> (Nguyen and Rosenbrock, to be submitted). Rosenbrock et al. (<xref ref-type="bibr" rid="B50">2018</xref>) discusses selecting atoms to include in the GB and considerations for tuning parameters.</p>
<p>The difficulty with applying local-environment descriptors directly is that the method produces an <italic>M</italic> &#x000D7; <italic>N</italic> matrix for each GB, where <italic>M</italic> is the number of atoms in the GB, and <italic>N</italic> is the length of each SOAP vector. Machine learning requires a single vector describing each data point in the dataset, which motivates an averaging of this SOAP matrix over the <italic>M</italic> atoms to produce the averaged SOAP representation, as defined by Rosenbrock et al. (<xref ref-type="bibr" rid="B49">2017</xref>) and De et al. (<xref ref-type="bibr" rid="B15">2016</xref>). While this representation was referred to as the ASR (for Averaged SOAP Representation) in previous works (Rosenbrock et al., <xref ref-type="bibr" rid="B49">2017</xref>), we simply refer to it here as SOAP. In other words, this SOAP vector represents the average local atomic environment of all the atoms in the GB. Collecting all these averaged SOAP vectors for a collection of GBs produces the feature matrix for machine learning.</p>
</sec>
<sec>
<title>2.2. Scattering Transform</title>
<p>The ST is similar to a multi-layer, convolutional neural network. However, instead of using the discrete convolutions typical in deep learning approaches, based on integer kernel matrices, the ST uses continuous convolution with wavelet functions. For a time series signal, the Fourier transform gives information about the frequency content of the signal. Wavelets, by analogy, are localized in both time and frequency by defining a scaling parameter for the wavelet function that limits its extent in time. The wavelet transform is then executed as a convolution between the scaled, time-frequency wavelet function and the signal.</p>
<p>The analysis functions for this wavelet transform are defined as:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M9"><mml:mrow><mml:msub><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msqrt><mml:mi>a</mml:mi></mml:msqrt></mml:mrow></mml:mfrac><mml:mi>&#x003C8;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mi>a</mml:mi></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>a</italic> represents the scale (i.e., large values of <italic>a</italic> correspond to &#x0201C;long" basis functions that will identify long-term trends in the signal to be analyzed) and <italic>b</italic> represents a shift. The unscaled wavelet function &#x003C8;(<italic>t</italic>) is usually a bandpass filter. High-frequency basis functions are obtained by going to small scales; therefore, scale is loosely related to the inverse frequency. One can choose shifts and scales to obtain a constant relative bandwidth analysis known as the wavelet transform. To accomplish this, we use a real bandpass filter with zero mean.</p>
<p>Then we can define a continuous wavelet transform for an arbitrary function <italic>f</italic>(<italic>t</italic>) as:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mo>*</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x0222B;</mml:mo></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> represents the complex conjugate of &#x003C8;<sub><italic>a, b</italic></sub>(<italic>t</italic>) and <italic>R</italic> is the domain of the signal. This is similar to the Short Time Fourier Transform but with a variable window. Once again, we are measuring the similarity between a function, <italic>f</italic>(<italic>t</italic>), and of an elementary function (which is shifted and scaled).</p>
<p>For a multi-dimensional signal, a multi-dimensional wavelet can be constructed as the Cartesian product between wavelets defined in each dimension. In other words, the domain for the function of interest <italic>f</italic>(<italic>t</italic>) changes to <italic>f</italic>(<italic>x, y, z</italic>), and the convolution integral is still defined over the domain of <italic>f</italic>.</p>
<p>Applied to GBs, the 3D ST is computed as a sequence of multi-dimensional, multi-scale wavelet transforms, interleaved with non-linear transforms that take the absolute value of their input signal (i.e., modulus nonlinearities). The process of introducing these nonlinearities is described below.</p>
<p>The general formulation of the ST used here is depicted in <xref ref-type="fig" rid="F1">Figure 1</xref> where a series of layered convolutions are used to obtain the feature representation. In the first step, and similar to the SOAP formalism, a Gaussian density is applied to the atom positions to obtain the density <italic>f</italic>. When implemented numerically, some discretization of <italic>f</italic> is inevitable, the continuous signals are sampled at a specified resolution (tunable parameter).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Schematic illustrating the scattering transform. The different layers are formed by systematic applications of the wavelet transform, modulus operator, Gaussian blur, and subsampling and scaling. Each of these different processes is represented by different colored arrows. The data is collected into a feature vector for the scattering transform machine learning.</p></caption>
<graphic xlink:href="fmats-06-00168-g0001.tif"/>
</fig>
<p>In the first layer (0), a Gaussian filter &#x003D5;<sub><italic>J</italic><sub>0</sub></sub>(<italic>f</italic>) at scale <italic>J</italic><sub>0</sub> blurs the density <italic>f</italic>. The coefficients of the blurred density are subsampled, averaged, and stored as part of the ST representation. During subsampling, a discretized vector is sampled at a coarser resolution to form a smaller vector for the final representation.</p>
<p>To obtain the second layer (1), various wavelet transforms are applied to <italic>f</italic>; the convolutions <italic>f</italic> &#x0002A; &#x003C8;<sub><italic>j</italic><sub>1</sub>, 0</sub> are computed at various length scales <italic>j</italic><sub>1</sub> before calculating the modulus (absolute value) of each of these averaged coefficients as another part of the ST representation. This modulus operation introduces the nonlinearities mentioned earlier. After computing the modulus, we again blur using a Gaussian filter &#x003D5;<sub><italic>J</italic><sub>1</sub></sub>(<italic>f</italic>) and subsample, this time at scale <italic>J</italic><sub>1</sub> and store the resulting coefficients as part of the scattering representation.</p>
<p>To obtain the third layer (2) another wavelet transform is applied, yielding |<italic>f</italic> &#x0002A; &#x003C8;<sub><italic>j</italic><sub>1</sub>, 0</sub>| &#x0002A; &#x003C8;<sub><italic>j</italic><sub>2</sub>, 0</sub> for each length scale <italic>j</italic><sub>2</sub>. Each of these again has the modulus operator applied, is blurred, and is subsampled to produce coefficients as done in previous layers. Similar to other convolutional neural networks, this process could continue for many more layers. Of course, the ability to capture the relevant features will depend upon the relative scales of the atomic structures and the wavelets employed. Once the scales of the wavelets have been set, these features will not be affected by including more copies of a periodic structure, like those often present in GBs. In this respect, the scattering features are not dependent on increased system size.</p>
<p>The ST produces a 1 &#x000D7; <italic>N</italic> vector for each GB, where <italic>N</italic> is determined by the ST parameters (i.e., chiefly the number of convolutional layers, the number and scale of the wavelet functions, and the severity of the subsampling). In contrast to SOAP, the ST produces a single vector per GB and thus requires no additional statistical post-processing to produce the feature vector for the GB.</p>
<p>Given the availability of discrete convolutional neural network software that is optimized for both CPU and GPU architectures, it is worth noting why continuous convolutions are worth the extra implementation effort compared to using discrete convolutions. Convolutional neural networks in deep learning were developed to handle image learning tasks, which are inherently discrete due to pixels in images. Physical systems, like the atomistic view of GBs, have smooth transitions that are represented more naturally by spherical harmonics and continuous wavelet functions. While it is true that neural network architectures can approximate curved decision boundaries<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>, continuous wavelets are a more natural choice because they lead to a sparser representation (Hirn et al., <xref ref-type="bibr" rid="B28">2015</xref>, <xref ref-type="bibr" rid="B27">2017</xref>; Eickenberg et al., <xref ref-type="bibr" rid="B17">2017</xref>).</p>
</sec>
<sec>
<title>2.3. Grain Boundary Structures and Properties</title>
<p>The SOAP and the ST are both representations that provide a feature matrix that is convenient for machine learning of GB structures. In the present work, we learn on the Olmsted GB database, which is a collection of 388 computed Ni GBs created by Olmsted et al. (<xref ref-type="bibr" rid="B40">2009a</xref>) using the Foiles-Hoyt embedded atom method (EAM) potential (Foiles and Hoyt, <xref ref-type="bibr" rid="B18">2006</xref>).</p>
<p>The GB structures were created following standard methods where a fairly comprehensive list of initial atomic configurations are each minimized to determine which of all the configurations represents the minimum energy structure of the GB (Olmsted et al., <xref ref-type="bibr" rid="B40">2009a</xref>). Using these GB structures, a variety of properties can be measured or calculated from simulations; for this work, our interest is in energy, temperature-dependent mobility, and shear coupling of the 388 GBs.</p>
<p>The GB energy is defined as the excess energy relative to the bulk as a result of the irregular structure of the atoms in the GB (Tadmor and Miller, <xref ref-type="bibr" rid="B55">2011</xref>). It is important to note that GB energy is normally defined as a static property of the system measured at <italic>T</italic> &#x0003D; 0 K, and all atomistic structures examined in the machine learning are the <italic>T</italic> &#x0003D; 0 K structures associated with this calculation. The GB energies for the Olmsted GB database are available in the supplemental materials of Olmsted et al. (<xref ref-type="bibr" rid="B40">2009a</xref>). Since the energies for this dataset were calculated using an EAM potential, learning energies serves merely as a benchmark to demonstrate whether a given descriptor captures any physically relevant information useful for machine learning.</p>
<p>Temperature-dependent mobility and shear coupled GB migration are two dynamic properties related to the behavior of a migrating GB. The mobility of a GB is defined as the proportionality factor relating how fast a GB will migrate when subjected to a given driving force (Gottstein and Shvindlerman, <xref ref-type="bibr" rid="B23">2010</xref>). The temperature-dependent mobility has to do with how the mobility changes with temperature. In most cases, mobility is a thermally activated process, where the mobility increases with increasing temperature. However, in analyzing the temperature-dependent mobility of the GBs in the Olmsted database (Olmsted et al., <xref ref-type="bibr" rid="B41">2009b</xref>) and Homer et al. (<xref ref-type="bibr" rid="B30">2014</xref>) noticed four broad categories of temperature-dependent mobility: (i) <italic>thermally activated</italic>, (ii) <italic>non-thermally activated</italic>, (iii) <italic>mixed modes</italic>, and (iv) <italic>immobile/unclassifiable</italic>. These categories correspond with whether the mobility follows an Arrhenius relationships with temperature (thermally activated), does not follow an Arrhenius relationship with temperature (non-thermally activated), shows some mixed mode combination of thermally activated and non-thermally activated, or is immobile or simply unclassifiable.</p>
<p>In addition, when GBs migrate, they can also exhibit a coupled shear motion, in which the motion of a GB normal to its surface couples with lateral motion of one of the two crystals (Cahn et al., <xref ref-type="bibr" rid="B11">2006</xref>; Homer et al., <xref ref-type="bibr" rid="B29">2013</xref>). GBs are then classified as either exhibiting shear coupling or not.</p>
</sec>
<sec>
<title>2.4. Machine Learning</title>
<p>The SOAP and ST structure characterizations of the 388 GBs in the Olmsted database are calculated using the methods described above. Parameters for these calculations are defined for the SOAP as the radial basis cutoff (<italic>n</italic><sub>max</sub>), angular basis (spherical harmonic) cutoff (<italic>l</italic><sub>max</sub>), and the radial cutoff (<italic>r</italic><sub>cut</sub>) which are set to 18, 18 and 5.0 respectively in the present work. For the ST the parameters are defined as the size of the density discretization grid (<monospace>density</monospace>=0.25), the number of convolutional layers as seen in <xref ref-type="fig" rid="F1">Figure 1</xref> (<monospace>Layers</monospace>=2, which also includes Layer 0), a parameter that defines a singular spherical harmonic angular function (<monospace>SPH_L</monospace>=4), the number of wavelets at different scales used at each layer (<monospace>n_trans</monospace>=16), and the number of angular augmentations in the azimuthal and polar angles (<monospace>n_angle1</monospace>=16, <monospace>n_angle2</monospace>=16). An angular augmentation is when the density function is duplicated and rotated to form a new density function, which is also fed through the scattering network. The vectors produced from the rotated density function are then concatenated to form the final ST vector. For example, with <monospace>n_angle1</monospace> = 16 and <monospace>n_angle2</monospace> = 16, we end up with 256 copies of the density function, each of which produces a scattering vector. These are then concatenated together to produce the final ST vector. This provides a level of rotational invariance since it is not explicit in the ST.</p>
<p>With both the SOAP and ST providing feature matrices, we are now able to apply a machine learning approach on the SOAP, ST, and combined SOAP&#x0002B;ST characterizations of the GBs. The combined SOAP&#x0002B;ST characterization feature vector is created by simply concatenating the SOAP and ST vectors together. Gradient boosted decision trees [as implemented in <monospace>xgboost</monospace> (Chen and Guestrin, <xref ref-type="bibr" rid="B13">2016</xref>)] are used to analyze and predict the GB energy, temperature-dependent mobility, and shear coupling.</p>
<p>For the machine learning of the properties, it is important to note that GB energy is a continuous quantity, while temperature dependent mobility trend and shear coupling are classification properties. The mobility and shear coupling properties present an imbalanced class problem, where one class contains many more samples than the other classes. Consequently, the machine learning models favor this larger class to minimize error, but this degrades the ability of the model to generalize to new data. For example, imagine a binary classification problem where the training data has 99% in one class and only 1% of the other. The machine learning model will perform best by just predicting 100% of the first class. Thus to address this issue, we used the Synthetic Minority Over-sampling Technique (SMOTE), which is a standard approach used in imbalanced class machine learning problems (Han et al., <xref ref-type="bibr" rid="B24">2005</xref>), as implemented in the <monospace>imblearn</monospace> package to oversample the minority classes. We can conceptualize SMOTE by imagining a line segment connecting each instance of the minority class to every other instance of that minority class. The algorithm then synthetically creates instances of the minority class randomly along these line segments and adds them to the data set, thus oversampling and balancing the number of samples in each class. This approach could present issues if any classes are not separable (e.g., the classes overlap), but even in these cases SMOTE is expected to improve learning over simply using the imbalanced classes.</p>
<p>In addition to using SMOTE to address the class imbalance, we also consider two different splits of the temperature-dependent mobility. In a 4 class split, we use the four categories as defined above (Homer et al., <xref ref-type="bibr" rid="B30">2014</xref>). In a 3 class split, we essentially combined the non-thermally activated and mixed modes into a single class, such that the three classes are essentially, (i) thermally activated, (ii) mobile but not thermally activated, and (iii) immobile/unclassifiable. The original machine learning on this data by Rosenbrock et al. (<xref ref-type="bibr" rid="B49">2017</xref>) used this same 3 class split.</p>
<p>We trained each model with a 50&#x02013;50 train-test split. While decision trees have many different tunable hyperparameters, only the number of estimators (the number of trees) was tuned, using a process called Early Stopping (Zhang et al., <xref ref-type="bibr" rid="B60">2005</xref>) with 5-fold cross validation. An ensemble of decision trees is trained by adding trees in multiple fitting rounds, with each new tree&#x00027;s parameters optimized using a loss function. By limiting the number of fitting rounds, the model will only grow until the accuracy never improves for the specified number of rounds. Thus, the optimal number of estimators can be found to minimize the chance of over-fitting.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Results and Discussion</title>
<p>A summary of the machine learning results of GB energy, temperature-dependent mobility, and shear coupling by the SOAP, ST, and Combined SOAP&#x0002B;ST methods is found in <xref ref-type="table" rid="T1">Table 1</xref>. To provide a reference against which to judge the machine learning results, we define a baseline &#x0201C;Random&#x0201D; quantity, as implemented in the original SOAP formulation (Rosenbrock et al., <xref ref-type="bibr" rid="B49">2017</xref>). For this &#x0201C;Random&#x0201D; column, energies are drawn from a normal distribution with the same mean and standard deviation as the training data and then compared to the actual values in the validation data. For the mobility and shear coupling classification, random selection of classes from the training data are picked and compared against the validation data.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Machine learning % accuracy of different properties by different techniques.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Property</bold></th>
<th valign="top" align="center"><bold>SOAP</bold></th>
<th valign="top" align="center"><bold>ST</bold></th>
<th valign="top" align="center"><bold>SOAP&#x0002B;ST</bold></th>
<th valign="top" align="center"><bold>Multi-scale SOAP</bold></th>
<th valign="top" align="center"><bold>Random</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GB Energy</td>
<td valign="top" align="center">95</td>
<td valign="top" align="center">86</td>
<td valign="top" align="center">93</td>
<td valign="top" align="center">95</td>
<td valign="top" align="center">70</td>
</tr>
<tr>
<td valign="top" align="left">Temperature-dependent mobility (3 Class Split)</td>
<td valign="top" align="center">77</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">69</td>
<td valign="top" align="center">76</td>
<td valign="top" align="center">49</td>
</tr>
<tr>
<td valign="top" align="left">Temperature-dependent mobility (4 Class Split)</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">62</td>
<td valign="top" align="center">39</td>
</tr>
<tr>
<td valign="top" align="left">Shear coupling</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">53</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">52</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The ST results for energy and temperature-dependent mobility are statistically better than random and demonstrate that this new, universal representation is capable of learning certain GB structure-property relationships. However, it does not perform as well as the SOAP, and does not improve predictions even when it is combined with SOAP (SOAP&#x0002B;ST). Valid predictions are being made, but on different features of the GB atomic structure.</p>
<p>It is worth noting that the predictions of temperature-dependent mobility is worse for the 4 class split than the 3 class split. We attribute this to the reduced number of GBs in each class on which to learn and then make predictions, and which aggravates the imbalanced class problem. If our attribution is correct, this suggests how even a minor increase in data for each class (e.g., from 4 to 3 classes of the 388 GBs) can have a significant impact on the learning and prediction ability.</p>
<p>On its own, the ability to predict GB properties using machine learning has only limited benefits. For example, predicting the energy of the GBs here is merely an exercise. Computing energies from structures is not difficult, but predicting the mobility and shear coupling of a GB is and these properties have implications for material processing and deformation. Thus, we desire to use machine learning models to highlight new physical processes governing these properties. ST was introduced here because it targets different features of the GB atomic structure than SOAP. It follows then that each may highlight different physical processes that contribute to the same structure-property relationship, an assertion that would be born out by improvements to the machine learning accuracies.</p>
<p>A comparison of the learning rates is provided in <xref ref-type="fig" rid="F2">Figure 2</xref>. In this figure it can be seen that the SOAP has better training and test accuracies than ST. Furthermore, according to the current slopes of the learning rates, there is no indication, at this point, that ST will perform better than SOAP. For now, one must conclude that ST learns different information about the GB structures, and this information is less helpful for accurate property prediction than the information provided by SOAP.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Learning rates for training and testing of GB energy for the averaged SOAP representation (SOAP), Scattering Transform (ST), and combined SOAP&#x0002B;ST descriptor. Mean absolute value for the energy across the GB database is about 1.09 J/m<sup>2</sup>.</p></caption>
<graphic xlink:href="fmats-06-00168-g0002.tif"/>
</fig>
<p>Interestingly, the SOAP&#x0002B;ST has the lowest training error, while having slightly worse test error than SOAP alone. This is indicative that the information provided by ST is useful in improving the training accuracy of the model. Unfortunately, the increase in error from SOAP alone to SOAP&#x0002B;ST indicates that the additional information provided by ST does not generalize to accurate property predictions on other GB structures. This would indicate that the SOAP&#x0002B;ST is suffering from over-fitting.</p>
<p>To understand and interpret these results, it is helpful to examine the characteristics of the SOAP and ST descriptors. While SOAP is formally complete in its <italic>rotational</italic> invariance (see Equation 4), the ST is formally complete in its <italic>translational</italic> invariance due to its convolution integral in Equation (6). In practice, the rotational invariance for ST is introduced by augmenting the representation with several discretely rotated copies of the data. Thus rotational invariance is only approximate for ST, whereas it is formally exact for SOAP. On the other hand, because ST uses multiple wavelets at different scales, it formally handles multi-scale translational invariance. Translational invariance for the SOAP representation originates in the use of local environments defined relative to a central atom, though the length-scale is limited by the cutoff radius of the SOAP descriptor.</p>
<p>The SOAP representation uses spherical harmonics to capture the angular information in the local environment density function. For this implementation of ST, we used periodic spherical harmonic wavelets to capture the periodicity of the GB structure in the dimensions of the boundary plane. It is likely that this choice of basis introduced some similarity in the features extracted by both SOAP and ST, but SOAP remains a local approach while ST operates at multiple scales.</p>
<p>One could also characterize multiple scales using SOAP by concatenating multiple SOAP vectors with varying cutoff and &#x003C3;<sub>atom</sub> parameters, as has been done in other works (Bart&#x000F3;k et al., <xref ref-type="bibr" rid="B4">2017</xref>; Willatt et al., <xref ref-type="bibr" rid="B58">2018</xref>). At larger radial cutoffs, the surface area of the sphere for the local environment grows as <inline-formula><mml:math id="M12"><mml:msubsup><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mtext>cutoff</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>, which introduces larger distances between atoms at the surface of the sphere. If the width of the Gaussian density (&#x003C3;<sub>atom</sub>) placed at each atom remains small, the angular resolution of the SOAP expansion cannot distinguish atom densities well. Thus, increasing the width of the Gaussian at each atom in proportion to the radial cutoff compensates for this geometrical effect so that more distant atoms are still resolved well. However, larger Gaussians placed at neighboring atoms <italic>close</italic> to the central atom cause structural information to be washed out. This necessitates including multiple SOAP vectors at different cutoffs and &#x003C3;<sub>atom</sub> values. To demonstrate the effectiveness of this approach, we compare the accuracy of this method with the others listed in <xref ref-type="table" rid="T1">Table 1</xref>. Here it can be seen that the multi-scale SOAP performs almost equal to standard SOAP, with values slightly worse for several properties. This also means that it performs better than ST and SOAP&#x0002B;ST.</p>
<p>While one could conclude from these results that ST does not provide sufficient improvement to the learning to justify its use, we believe there are some reasons to withhold judgment. There are three attributes to the ST that should be considered further. These are (i) data availability, (ii) interpretability, and (iii) overall utility as a structural descriptor.</p>
<p>First, concerning data availability, the ST uses layered convolutional neural networks, which generally provide high accuracy predictions in machine learning. It is worth noting that convolutional neural networks are frequently trained with tens of thousands or more datapoints. It is possible that more data may simply be required for the convolutional neural network used by ST to accurately learn GB properties.</p>
<p>One can increase the size of the GB dataset by constructing additional GB structures, which is time consuming and non-trivial. Or, one can increase the dataset by simulating existing GB structures at finite temperatures, where thermal fluctuations will lead to a large number of similar atomic configurations. We employ the latter approach in simulations of a &#x003A3;5 <inline-formula><mml:math id="M14"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mover accent="true"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover><mml:mn>3</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mover accent="true"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover><mml:mover accent="true"><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, &#x02329;100&#x0232A; symmetric tilt GB at 100 K over 10 ns and generate 1000 configurations, or snapshots, for that GB. If the ST is used to train a model on some configurations and test the model on the remaining, ST predicts with low mean absolute error. For example, with a single GB trained on 250 configurations and tested on the other 750 configurations, a mean absolute error of 0.002 <italic>J</italic>/<italic>m</italic><sup>2</sup> is obtained. On the other hand SOAP trained on that same data results in a mean absolute error of 0.0015 <italic>J</italic>/<italic>m</italic><sup>2</sup>. Thus, with significantly more data ST improves significantly, though still not better than SOAP in this case.</p>
<p>The expanded MD dataset demonstrates that ST performs well with additional data. However, such datasets are moving toward the realm of &#x0201C;big data.&#x0201D; For example, if one desires to predict properties for any conceivable GB structure, significantly more data will be needed to train a general ST model.</p>
<p>The second attribute of ST that is worth discussing is the interpretability of the results and the ability to learn the underlying physics surrounding the machine learning predictions. By using the ST to provide the feature matrix, one can also perform an inverse scattering transform using relevance propagation to understand what aspects of the structure are influencing the learning. Specific details on the application of relevance propagation to ST is forthcoming (Nguyen, to be submitted). However, <xref ref-type="fig" rid="F3">Figure 3</xref> shows heatmaps generated using relevance propagation for the energy learning task. In <xref ref-type="fig" rid="F3">Figure 3A</xref> we show a relevance propagation heatmap for learning of GB energy using a 50/50 split of the Olmsted database (i.e., the learning task reported in <xref ref-type="table" rid="T1">Table 1</xref>). Contrast that with the relevance propagation heatmap in <xref ref-type="fig" rid="F3">Figure 3B</xref> where energy was learned from 500/500 split of the MD configurations noted above. In comparing the two images it is clear that <xref ref-type="fig" rid="F3">Figure 3A</xref> highlights a seemingly random selection of atoms that are not consistent with the symmetry of the periodic structure of the GB. In <xref ref-type="fig" rid="F3">Figure 3B</xref>, the well-known kite structure from the structural unit model is highlighted, despite the fact that the model had no knowledge of this structure a priori. Thus, the inverse ST relevance propagation heatmaps may allow one to identify the relevant features of the GB structure that correlate with the property of interest. The heatmaps in <xref ref-type="fig" rid="F3">Figure 3</xref> would be different for each property even though the structure of the GB might be the same. This could be crucial to the identification of the relevant features of the GB structure controlling different properties.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>(A)</bold> Inverse scattering transform of the &#x003A3;5 <inline-formula><mml:math id="M13"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mover accent="true"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover><mml:mn>3</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mover accent="true"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover><mml:mover accent="true"><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> GB. The model was trained using only half of the 388 GBs. <bold>(B)</bold> Inverse scattering transform of the same GB except that this model was trained using 500 configurations of the same GB. To obtain the configurations, a 10 nanosecond molecular dynamics simulation was performed at 100 K. Configurations were extracted every 10 picoseconds. Both models look down the [100] tilt axis of the crystals. The units for the inverse scattering transform are arbitrary.</p></caption>
<graphic xlink:href="fmats-06-00168-g0003.tif"/>
</fig>
<p>Furthermore, while Rosenbrock et al. (<xref ref-type="bibr" rid="B49">2017</xref>) demonstrated that a derived form of SOAP, called the local environment representation, provides a way to interpret relevant GB structures, SOAP itself can be difficult to interpret. The multi-scale SOAP, which can provide longer range structural information, would be more difficult than SOAP by itself. Thus, while ST may not lead to the highest prediction values, its interpretability through the relevance propagation may render it a useful tool.</p>
<p>The overall utility as a structural descriptor is the third attribute of ST that is worth considering. To consider this we compare ST to a range of structural descriptors and their properties.</p>
<p>In <xref ref-type="table" rid="T2">Table 2</xref> we summarize descriptors introduced for characterizing GBs, and from which machine learning models could be built. In addition to the metrics described in this work we also compare attributes against the structural unit model (SUM), dislocation arrays (DA), common neighbor analysis (CNA), polyhedral template matching (PTM), Voronoi cell topology (VCT), and the polyhedral unit model (PUM), all of which were mentioned in the introduction.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Comparison of structural descriptors and their properties.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Property</bold></th>
<th valign="top" align="center"><bold>SUM</bold></th>
<th valign="top" align="center"><bold>DA</bold></th>
<th valign="top" align="center"><bold>CNA</bold></th>
<th valign="top" align="center"><bold>PTM</bold></th>
<th valign="top" align="center"><bold>VCT</bold></th>
<th valign="top" align="center"><bold>PUM</bold></th>
<th valign="top" align="center"><bold>SOAP</bold></th>
<th valign="top" align="center"><bold>LER</bold></th>
<th valign="top" align="center"><bold>ST</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Easily visualized</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Easily interpreted</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td valign="top" align="center">R</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Comparison</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Invariance</td>
<td valign="top" align="center">R</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Perturbations</td>
<td valign="top" align="center">R</td>
<td/>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Smoothly varying</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">3D GB structures</td>
<td/>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Automation</td>
<td/>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Connectivity</td>
<td valign="top" align="center">R</td>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">R</td>
<td/>
<td/>
<td valign="top" align="center">R</td>
</tr>
<tr>
<td valign="top" align="left">Multi-scale</td>
<td valign="top" align="center">R</td>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">R</td>
<td valign="top" align="center">R</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Subunit discovery</td>
<td valign="top" align="center">R</td>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">R</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The structural unit model is abbreviated as SUM, dislocation arrays as DA, common neighbor analysis as CNA, polyhedral template matching as PTM, Voronoi cell topology as VCT, polyhedral unit model as PUM, averaged SOAP representation as SOAP, local environment representation as LER, and scattering transform as ST. A check mark (&#x02713;) indicates that the descriptor exhibits a particular property. &#x02018;R&#x00027; indicates that the researcher using the tool is largely responsible for whether or not the atomic structure description has a particular property or not (since that property is extracted manually)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>We judge each descriptor based on its usefulness across several metrics. The properties of interest are: <italic>Easily Visualized</italic>- one can convey the structures through visual means, <italic>Easily Interpreted</italic>&#x02013;one can easily identify the relevant characteristics and differences between structures, <italic>Comparison</italic>- one can quantitatively compare the structures to one another, <italic>Invariance</italic>&#x02013;the characterization is invariant to rotations, permutations, and/or translations, <italic>Perturbations</italic>&#x02013;perturbations in the structure are captured as small changes in the metric, <italic>Smoothly Varying</italic>&#x02013;the metric is continuous and varies smoothly for larger changes in structure, <italic>3D GB Structures</italic>&#x02013;the characterization works for quasi-2D and complex 3D GB structures, <italic>Automation</italic>&#x02013;the characterization process can be automated, <italic>Connectivity</italic>&#x02013;the technique characterizes how all the atoms in the GB are connected, <italic>Multi-scale</italic>&#x02013;the technique characterizes both short- and long-range structural information, <italic>Subunit Discovery</italic>&#x02013;the technique does not require a preset list of structures, it can discover them on its own.</p>
<p>While there are notable things about each descriptor and some of the entries in <xref ref-type="table" rid="T2">Table 2</xref> are subjective, we will focus on a few properties of interest. In particular, we&#x00027;ll focus on a few of the properties not present in SOAP.</p>
<p>First, the ability to <italic>automate</italic> the description is an essential requirement to move GB science into the big data age. This property is shared by many. Second, is the ability to provide <italic>multi-scale</italic> characterization. Many techniques possess this ability if the researcher knows what they are doing, but ST is the only technique that possesses this inherently. Third and fourth are easily <italic>visualized</italic> and <italic>interpreted</italic>, which are two properties that are more subjective. Neither of these properties is a strength of SOAP<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>, but both could be a strengths of ST as evidenced by the heatmaps in Figure 3. Sixth is <italic>connectivity</italic>. ST does not possess this outright as one might consider in the structural unit model or in a graph description. However, it should be noted that while <xref ref-type="fig" rid="F3">Figure 3</xref> colors each of the atoms by their relevance in predicting energy, the continuous nature of ST and the inverse ST means that relevance scores are available continuously throughout the space; one could produce high resolution heatmaps. Having a detailed 3D &#x0201C;importance density&#x0201D; for a grain boundary would allow connectivity values between a graph of nearest-neighbor atoms to be quantified (for example by integrating the density along the path connecting the atoms). These edge weights in the connectivity graph could be thresholded to provide alternate views of connectivity. This definition of connectivity is somewhat different from the traditional definition. The heatmaps also change based on the property of interest rather than being static. That in turn, may be more useful for discovering the physical underpinnings on structure-property relationships. This approach might also allow one to fulfill the final property of <italic>subunit discovery</italic>. Again, this isn&#x00027;t currently present in ST, but one could imagine how the inverse ST heatmaps might enable this property.</p>
<p>Considering these three attributes of ST, there is reason to believe that the ST, or something very similar, might become an important descriptor for GB data science. However, given the evidence presented here, one must proceed with caution, and consider other ways to achieve the same goals of encoding the most useful information about GB structures for property prediction and discovery of the underlying physics.</p>
</sec>
<sec sec-type="conclusions" id="s4">
<title>4. Conclusion</title>
<p>The success of machine learning in GB data science will largely be guided by the development of tools that capture the physical essence of GB structure-property relationships. These tools must be automated and universally applicable to large and complex GB structures. Since the machine learning is merely a stepping stone to discovery of the underlying physics, these tools should also satisfy certain mathematical constraints related to invariances and smoothness.</p>
<p>We introduced a new descriptor, the Scattering Transform (ST) (Bownik, <xref ref-type="bibr" rid="B10">1997</xref>; Benedetto and Pfander, <xref ref-type="bibr" rid="B7">1998</xref>; Pfander and Benedetto, <xref ref-type="bibr" rid="B44">2002</xref>; Ben&#x00301;&#x00131;tez et al., <xref ref-type="bibr" rid="B8">2010</xref>; Goh and Lee, <xref ref-type="bibr" rid="B22">2010</xref>; Goh et al., <xref ref-type="bibr" rid="B21">2011</xref>; Lanusse et al., <xref ref-type="bibr" rid="B33">2012</xref>; Mallat, <xref ref-type="bibr" rid="B36">2012</xref>), based on continuous, multi-scale wavelet transforms interleaved with modulus nonlinearities. We showed that this descriptor can effectively learn GB structure-property relationships for energy and does reasonably well for temperature-dependent mobility. It should be noted that the SOAP descriptor surpassed the ST in prediction accuracy and remains the optimal descriptor for the properties and structures compared here.</p>
<p>However, we also demonstrated that despite its inability to achieve the same accuracy predictions as SOAP, ST has complimentary features that may make it a useful descriptor of GB structure. First off, the ST information content is different than and complementary to that of the SOAP descriptor. The ST has the ability to encode multi-scale structural information and be visualized using an inverse ST that generates a heatmap. Importantly, the inverse ST provides evidence of the prevailing wisdom that multi-level convolutional networks require large amounts of data in order to truly learn the physics underlying structure-property relationships. This helps contextualize the performance of ST relative to the averaged SOAP representation and other SOAP-based representations. It also motivates the building of much larger GB databases.</p>
<p>The ST has the potential to be a powerful tool in understanding GB structure-property relationships. As we continue to push the limits of our understanding in GB structure-property relationships it will be most valuable to (i) focus on building larger databases of GB structure-property mappings, which currently represents the greatest limitation, and (ii) continue to introduce new descriptors that satisfy as many of the desirable characteristics as possible.</p>
</sec>
<sec sec-type="data-availability" id="s5">
<title>Data Availability</title>
<p>The datasets for this manuscript are not publicly available. Requests to access the datasets should be directed to Stephen Foiles, <email>foiles&#x00040;sandia.gov</email>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>CR, AN, EH, and GH all conceived the idea for this work. AN wrote the code for the scattering transform. DH performed all the calculations. All were involved in writing the manuscript.</p>
<sec>
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldock</surname> <given-names>R. J. N.</given-names></name> <name><surname>P&#x000E1;rtay</surname> <given-names>L. B.</given-names></name> <name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Payne</surname> <given-names>M. C.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2016</year>). <article-title>Determining pressure-temperature phase diagrams of materials</article-title>. <source>Phys. Rev. B</source> <volume>93</volume>:<fpage>174108</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevB.93.174108</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balluffi</surname> <given-names>R. W.</given-names></name> <name><surname>Bristowe</surname> <given-names>P. D.</given-names></name></person-group> (<year>1984</year>). <article-title>On the structural unit/grain boundary dislocation model for grain boundary structure</article-title>. <source>Surface Sci.</source> <volume>144</volume>, <fpage>28</fpage>&#x02013;<lpage>43</lpage>.</citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Banadaki</surname> <given-names>A. D.</given-names></name> <name><surname>Patala</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>A three-dimensional polyhedral unit model for grain boundary structure in fcc metals</article-title>. <source>NPJ Comput. Mater.</source> <volume>3</volume>:<fpage>13</fpage>. <pub-id pub-id-type="doi">10.1038/s41524-017-0016-0</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>De</surname> <given-names>S.</given-names></name> <name><surname>Poelking</surname> <given-names>C.</given-names></name> <name><surname>Bernstein</surname> <given-names>N.</given-names></name> <name><surname>Kermode</surname> <given-names>J. R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Machine learning unifies the modeling of materials and molecules</article-title>. <source>Sci. Adv.</source> <volume>3</volume>, <fpage>e1701816</fpage>&#x02013;<lpage>B871</lpage>. <pub-id pub-id-type="doi">10.1126/sciadv.1701816</pub-id><pub-id pub-id-type="pmid">29242828</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Kondor</surname> <given-names>R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>On representing chemical environments</article-title>. <source>Phys. Rev. B</source> <volume>87</volume>:<fpage>184115</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevB.87.184115</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Payne</surname> <given-names>M. C.</given-names></name> <name><surname>Kondor</surname> <given-names>R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2010</year>). <article-title>Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons</article-title>. <source>Phys. Rev. Lett.</source> <volume>104</volume>:<fpage>136403</fpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-14067-9</pub-id><pub-id pub-id-type="pmid">20481899</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Benedetto</surname> <given-names>J. J.</given-names></name> <name><surname>Pfander</surname> <given-names>G. E.</given-names></name></person-group> (<year>1998</year>). <article-title>Wavelet periodicity detection algorithms,</article-title> in <source>Wavelet Applications in Signal and Imaging Processing VI</source>, eds <person-group person-group-type="editor"><name><surname>Laine</surname> <given-names>A. F.</given-names></name> <name><surname>Unser</surname> <given-names>M. A.</given-names></name> <name><surname>Aldroubi</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>International Society for Optics and Photonics</publisher-name>), <fpage>48</fpage>&#x02013;<lpage>56</lpage>.</citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ben&#x000ED;tez</surname> <given-names>R.</given-names></name> <name><surname>Bol&#x000F3;s</surname> <given-names>V. J.</given-names></name> <name><surname>Ram&#x000ED;rez</surname> <given-names>M. E.</given-names></name></person-group> (<year>2010</year>). <article-title>A wavelet-based tool for studying non-periodicity</article-title>. <source>Comput. Math. Appl.</source> <volume>60</volume>, <fpage>634</fpage>&#x02013;<lpage>641</lpage>. <pub-id pub-id-type="doi">10.1016/j.camwa.2010.05.010</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bishop</surname> <given-names>G. H.</given-names></name> <name><surname>Chalmers</surname> <given-names>B.</given-names></name></person-group> (<year>1968</year>). <article-title>A coincidence - ledge - dislocation description of grain boundaries</article-title>. <source>Scrip. Metal. Mater.</source> <volume>2</volume>, <fpage>133</fpage>&#x02013;<lpage>140</lpage>.</citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bownik</surname> <given-names>M.</given-names></name></person-group> (<year>1997</year>). <article-title>Tight frames of multidimensional wavelets</article-title>. <source>J. Four. Anal. Appl.</source> <volume>3</volume>, <fpage>525</fpage>&#x02013;<lpage>542</lpage>.</citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cahn</surname> <given-names>J. W.</given-names></name> <name><surname>Mishin</surname> <given-names>Y.</given-names></name> <name><surname>Suzuki</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>Coupling grain boundary motion to shear deformation</article-title>. <source>Acta Mater.</source> <volume>54</volume>, <fpage>4953</fpage>&#x02013;<lpage>4975</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2006.08.004</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caro</surname> <given-names>M. A.</given-names></name> <name><surname>Aarva</surname> <given-names>A.</given-names></name> <name><surname>Deringer</surname> <given-names>V. L.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <name><surname>Laurila</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Reactivity of amorphous carbon surfaces: rationalizing the role of structural motifs in functionalization using machine learning</article-title>. <source>Chem. Mater.</source> <volume>30</volume>, <fpage>7446</fpage>&#x02013;<lpage>7455</lpage>. <pub-id pub-id-type="doi">10.1021/acs.chemmater.8b03353</pub-id><pub-id pub-id-type="pmid">30487663</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>T.</given-names></name> <name><surname>Guestrin</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>Xgboost: a scalable tree boosting system,</article-title> in <source>Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, KDD &#x00027;16 (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>785</fpage>&#x02013;<lpage>794</lpage>.</citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cisneros</surname> <given-names>G. A.</given-names></name> <name><surname>Wikfeldt</surname> <given-names>K. T.</given-names></name> <name><surname>Ojam&#x000E4;e</surname> <given-names>L.</given-names></name> <name><surname>Lu</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Torabifard</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Modeling molecular interactions in water: From pairwise to many-body potential energy functions</article-title>. <source>Chem. Rev.</source> <volume>116</volume>, <fpage>7501</fpage>&#x02013;<lpage>7528</lpage>. <pub-id pub-id-type="doi">10.1021/acs.chemrev.5b00644</pub-id><pub-id pub-id-type="pmid">27186804</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De</surname> <given-names>S.</given-names></name> <name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <name><surname>Ceriotti</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Comparing molecules and solids across structural and alchemical space</article-title>. <source>Phys. Chem. Chem. Phys.</source> <volume>18</volume>, <fpage>13754</fpage>&#x02013;<lpage>13769</lpage>. <pub-id pub-id-type="doi">10.1039/C6CP00415F</pub-id><pub-id pub-id-type="pmid">27101873</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dragoni</surname> <given-names>D.</given-names></name> <name><surname>Daff</surname> <given-names>T. D.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <name><surname>Marzari</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Achieving dft accuracy with a machine-learning interatomic potential: thermomechanics and defects in bcc ferromagnetic iron</article-title>. <source>Phys. Rev. Mater.</source> <volume>2</volume>:<fpage>013808</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevMaterials.2.013808</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eickenberg</surname> <given-names>M.</given-names></name> <name><surname>Exarchakis</surname> <given-names>G.</given-names></name> <name><surname>Hirn</surname> <given-names>M.</given-names></name> <name><surname>Mallat</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Solid harmonic wavelet scattering: predicting quantum molecular energy from invariant descriptors of 3d electronic densities,</article-title> in <source>Advances in Neural Information Processing Systems 30</source>, eds <person-group person-group-type="editor"><name><surname>Guyon</surname> <given-names>I.</given-names></name> <name><surname>Luxburg</surname> <given-names>U. V.</given-names></name> <name><surname>Bengio</surname> <given-names>S.</given-names></name> <name><surname>Wallach</surname> <given-names>H.</given-names></name> <name><surname>Fergus</surname> <given-names>R.</given-names></name> <name><surname>Vishwanathan</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>6540</fpage>&#x02013;<lpage>6549</lpage>.</citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Foiles</surname> <given-names>S. M.</given-names></name> <name><surname>Hoyt</surname> <given-names>J.</given-names></name></person-group> (<year>2006</year>). <article-title>Computation of grain boundary stiffness and mobility from boundary fluctuations</article-title>. <source>Acta Mater.</source> <volume>54</volume>, <fpage>3351</fpage>&#x02013;<lpage>3357</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2006.03.037</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>F. C.</given-names></name></person-group> (<year>1988</year>). <article-title>Orientation mapping</article-title>. <source>Metall. Trans. A</source> <volume>19</volume>, <fpage>403</fpage>&#x02013;<lpage>408</lpage>.</citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frost</surname> <given-names>H. J.</given-names></name> <name><surname>Spaepen</surname> <given-names>F.</given-names></name> <name><surname>Ashby</surname> <given-names>M. F.</given-names></name></person-group> (<year>1982</year>). <article-title>A second report on tilt boundaries in hard sphere F.C.C. crystals</article-title>. <source>Scrip. Metall. Mater.</source> <volume>16</volume>, <fpage>1165</fpage>&#x02013;<lpage>1170</lpage>.</citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goh</surname> <given-names>S. S.</given-names></name> <name><surname>Han</surname> <given-names>B.</given-names></name> <name><surname>Shen</surname> <given-names>Z.</given-names></name></person-group> (<year>2011</year>). <article-title>Tight periodic wavelet frames and approximation orders</article-title>. <source>Appl. Comput. Harmon. Analy.</source> <volume>31</volume>, <fpage>228</fpage>&#x02013;<lpage>248</lpage>. <pub-id pub-id-type="doi">10.1016/j.acha.2010.12.001</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goh</surname> <given-names>S. S.</given-names></name> <name><surname>Lee</surname> <given-names>S. L.</given-names></name></person-group> (<year>2010</year>). <article-title>Wavelets, multiwavelets and wavelet frames for periodic functions,</article-title> <source>Proceedings of the 6th IMT-GT Conference on Mathematics, Statistics and its Applications ICMSA</source>, <publisher-loc>Kuala Lumpur</publisher-loc>.</citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gottstein</surname> <given-names>G.</given-names></name> <name><surname>Shvindlerman</surname> <given-names>L. S.</given-names></name></person-group> (<year>2010</year>). <source>Grain Boundary Migration in Metals</source>. <publisher-loc>Boca Raton</publisher-loc>: <publisher-name>CRC Press</publisher-name>.</citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>W.-Y.</given-names></name> <name><surname>Mao</surname> <given-names>B.-H.</given-names></name></person-group> (<year>2005</year>). <article-title>Borderline-smote: a new over-sampling method in imbalanced data sets learning,</article-title> in <source>Proceedings of the 2005 International Conference on Advances in Intelligent Computing - Volume Part I</source>, ICIC&#x00027;05 (<publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>), <fpage>878</fpage>&#x02013;<lpage>887</lpage>.</citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Vitek</surname> <given-names>V.</given-names></name> <name><surname>Srolovitz</surname> <given-names>D. J.</given-names></name></person-group> (<year>2016</year>). <article-title>Grain-boundary metastability and its statistical properties</article-title>. <source>Acta Mater.</source> <volume>104</volume>, <fpage>259</fpage>&#x02013;<lpage>273</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2015.11.035</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Vitek</surname> <given-names>V.</given-names></name> <name><surname>Srolovitz</surname> <given-names>D. J.</given-names></name></person-group> (<year>2017</year>). <article-title>The grain-boundary structural unit model redux</article-title>. <source>Acta Mater.</source> <volume>133</volume>, <fpage>186</fpage>&#x02013;<lpage>199</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2017.05.002</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hirn</surname> <given-names>M.</given-names></name> <name><surname>Mallat</surname> <given-names>S.</given-names></name> <name><surname>Poilvert</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>Wavelet scattering regression of quantum chemical energies</article-title>. <source>Multiscale Model. Simulat.</source> <volume>15</volume>, <fpage>827</fpage>&#x02013;<lpage>863</lpage>. <pub-id pub-id-type="doi">10.1137/16M1075454</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hirn</surname> <given-names>M.</given-names></name> <name><surname>Poilvert</surname> <given-names>N.</given-names></name> <name><surname>Mallat</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>Quantum energy regression using scattering transforms</article-title>. <source>arXiv</source> [Preprint] arXiv:1502.02077.</citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Homer</surname> <given-names>E. R.</given-names></name> <name><surname>Foiles</surname> <given-names>S. M.</given-names></name> <name><surname>Holm</surname> <given-names>E. A.</given-names></name> <name><surname>Olmsted</surname> <given-names>D. L.</given-names></name></person-group> (<year>2013</year>). <article-title>Phenomenology of shear-coupled grain boundary motion in symmetric tilt and general grain boundaries</article-title>. <source>Acta Mater.</source> <volume>61</volume>, <fpage>1048</fpage>&#x02013;<lpage>1060</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2012.10.005</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Homer</surname> <given-names>E. R.</given-names></name> <name><surname>Holm</surname> <given-names>E. A.</given-names></name> <name><surname>Foiles</surname> <given-names>S. M.</given-names></name> <name><surname>Olmsted</surname> <given-names>D. L.</given-names></name></person-group> (<year>2014</year>). <article-title>Trends in grain boundary mobility: survey of motion mechanisms</article-title>. <source>J. Miner. Metals Mater. Soc.</source> <volume>66</volume>, <fpage>114</fpage>&#x02013;<lpage>120</lpage>. <pub-id pub-id-type="doi">10.1007/s11837-013-0801-2</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Honeycutt</surname> <given-names>J. D.</given-names></name> <name><surname>Andersen</surname> <given-names>H. C.</given-names></name></person-group> (<year>1987</year>). <article-title>Molecular dynamics study of melting and freezing of small Lennard-Jones clusters</article-title>. <source>J. Phys. Chem.</source> <volume>91</volume>, <fpage>4950</fpage>&#x02013;<lpage>4963</lpage>.</citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>John</surname> <given-names>S. T.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Many-body coarse-grained interactions using gaussian approximation potentials</article-title>. <source>J. Phys. Chem. B</source> <volume>121</volume>, <fpage>10934</fpage>&#x02013;<lpage>10949</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpcb.7b09636</pub-id><pub-id pub-id-type="pmid">29117675</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lanusse</surname> <given-names>F.</given-names></name> <name><surname>Rassat</surname> <given-names>A.</given-names></name> <name><surname>Starck</surname> <given-names>J.-L.</given-names></name></person-group> (<year>2012</year>). <article-title>Spherical 3d isotropic wavelets</article-title>. <source>Astron. Astrophys.</source> <volume>540</volume>:<fpage>A92</fpage>. <pub-id pub-id-type="doi">10.1051/0004-6361/201118568</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larsen</surname> <given-names>P. M.</given-names></name> <name><surname>Schmidt</surname> <given-names>S.</given-names></name> <name><surname>Schi&#x000F8;tz</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Robust structural identification via polyhedral template matching</article-title>. <source>Model. Simulat. Mater. Sci. Eng.</source> <volume>24</volume>:<fpage>055007</fpage>. <pub-id pub-id-type="doi">10.1088/0965-0393/24/5/055007</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lazar</surname> <given-names>E. A.</given-names></name></person-group> (<year>2018</year>). <article-title>VoroTop: voronoi cell topology visualization and analysis toolkit</article-title>. <source>Model. Simulat. Mater. Sci. Eng.</source> <volume>26</volume>:<fpage>015011</fpage>. <pub-id pub-id-type="doi">10.1088/1361-651X/aa9a01</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mallat</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>Group invariant scattering</article-title>. <source>Comm. Pure Appl. Math</source>. <volume>65</volume>, <fpage>1331</fpage>&#x02013;<lpage>1398</lpage>. <pub-id pub-id-type="doi">10.1002/cpa.21413</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maresca</surname> <given-names>F.</given-names></name> <name><surname>Dragoni</surname> <given-names>D.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <name><surname>Marzari</surname> <given-names>N.</given-names></name> <name><surname>Curtin</surname> <given-names>W. A.</given-names></name></person-group> (<year>2018</year>). <article-title>Screw dislocation structure and mobility in body centered cubic Fe predicted by a Gaussian Approximation Potential</article-title>. <source>NPJ Comput. Mater.</source> <volume>4</volume>:<fpage>69</fpage>. <pub-id pub-id-type="doi">10.1038/s41524-018-0125-4</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Medlin</surname> <given-names>D. L.</given-names></name> <name><surname>Foiles</surname> <given-names>S. M.</given-names></name> <name><surname>Cohen</surname> <given-names>D.</given-names></name></person-group> (<year>2001</year>). <article-title>A dislocation-based description of grain boundary dissociation: application to a 90 110 tilt boundary in gold</article-title>. <source>Acta Mater</source> <volume>49</volume>, <fpage>3689</fpage>&#x02013;<lpage>3697</lpage>. <pub-id pub-id-type="doi">10.1016/S1359-6454(01)00284-1</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mocanu</surname> <given-names>F. C.</given-names></name> <name><surname>Konstantinou</surname> <given-names>K.</given-names></name> <name><surname>Lee</surname> <given-names>T. H.</given-names></name> <name><surname>Bernstein</surname> <given-names>N.</given-names></name> <name><surname>Deringer</surname> <given-names>V. L.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Modeling the phase-change memory material, ge2sb2te5, with a machine-learned interatomic potential</article-title>. <source>J. Phys. Chem. B</source> <volume>122</volume>, <fpage>8998</fpage>&#x02013;<lpage>9006</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpcb.8b06476</pub-id><pub-id pub-id-type="pmid">30173522</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olmsted</surname> <given-names>D. L.</given-names></name> <name><surname>Foiles</surname> <given-names>S. M.</given-names></name> <name><surname>Holm</surname> <given-names>E. A.</given-names></name></person-group> (<year>2009a</year>). <article-title>Survey of computed grain boundary properties in face-centered cubic metals: I. Grain boundary energy</article-title>. <source>Acta Mater.</source> <volume>57</volume>, <fpage>3694</fpage>&#x02013;<lpage>3703</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2009.04.007</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olmsted</surname> <given-names>D. L.</given-names></name> <name><surname>Holm</surname> <given-names>E. A.</given-names></name> <name><surname>Foiles</surname> <given-names>S. M.</given-names></name></person-group> (<year>2009b</year>). <article-title>Survey of computed grain boundary properties in face-centered cubic metals-II: grain boundary mobility</article-title>. <source>Acta Mater.</source> <volume>57</volume>, <fpage>3704</fpage>&#x02013;<lpage>3713</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2009.04.015</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patala</surname> <given-names>S.</given-names></name> <name><surname>Mason</surname> <given-names>J. K.</given-names></name> <name><surname>Schuh</surname> <given-names>C. A.</given-names></name></person-group> (<year>2012</year>). <article-title>Improved representations of misorientation information for grain boundary science and engineering</article-title>. <source>Progress Mater, Sci.</source> <volume>57</volume>, <fpage>1383</fpage>&#x02013;<lpage>1425</lpage>. <pub-id pub-id-type="doi">10.1016/j.pmatsci.2012.04.002</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patala</surname> <given-names>S.</given-names></name> <name><surname>Schuh</surname> <given-names>C. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Symmetries in the representation of grain boundary-plane distributions</article-title>. <source>Philos. Magaz.</source> <volume>93</volume>, <fpage>524</fpage>&#x02013;<lpage>573</lpage>. <pub-id pub-id-type="doi">10.1080/14786435.2012.722700</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pfander</surname> <given-names>G. E.</given-names></name> <name><surname>Benedetto</surname> <given-names>J. J.</given-names></name></person-group> (<year>2002</year>). <article-title>Periodic wavelet transforms and periodicity detection</article-title>. <source>SIAM J. Appl. Math.</source> <volume>62</volume>, <fpage>1329</fpage>&#x02013;<lpage>1368</lpage>. <pub-id pub-id-type="doi">10.1137/S0036139900379638</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Priedeman</surname> <given-names>J. L.</given-names></name> <name><surname>Rosenbrock</surname> <given-names>C. W.</given-names></name> <name><surname>Johnson</surname> <given-names>O. K.</given-names></name> <name><surname>Homer</surname> <given-names>E. R.</given-names></name></person-group> (<year>2018</year>). <article-title>Quantifying and connecting atomic and crystallographic grain boundary structure using local environment representation and dimensionality reduction techniques</article-title>. <source>Acta Mater.</source> <volume>161</volume>, <fpage>431</fpage>&#x02013;<lpage>443</lpage>. <pub-id pub-id-type="doi">10.1016/j.actamat.2018.09.011</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Randle</surname> <given-names>V.</given-names></name></person-group> (<year>2010</year>). <article-title>Grain boundary engineering: an overview after 25 years</article-title>. <source>Mater. Sci. Tech.</source> <volume>26</volume>, <fpage>253</fpage>&#x02013;<lpage>261</lpage>. <pub-id pub-id-type="doi">10.1179/026708309X12601952777747</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Read</surname> <given-names>W. T.</given-names></name> <name><surname>Shockley</surname> <given-names>W.</given-names></name></person-group> (<year>1950</year>). <article-title>Dislocation models of crystal grain boundaries</article-title>. <source>Phys. Rev.</source> <volume>78</volume>, <fpage>275</fpage>&#x02013;<lpage>289</lpage>.</citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rittner</surname> <given-names>J. D.</given-names></name> <name><surname>Seidman</surname> <given-names>D. N.</given-names></name></person-group> (<year>1996</year>). <article-title>110 symmetric tilt grain-boundary structures in fcc metals with low stacking-fault energies</article-title>. <source>Phys. Rev. B</source> <volume>54</volume>, <fpage>6999</fpage>&#x02013;<lpage>7015</lpage>.</citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenbrock</surname> <given-names>C. W.</given-names></name> <name><surname>Homer</surname> <given-names>E. R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name> <name><surname>Hart</surname> <given-names>G. L. W.</given-names></name></person-group> (<year>2017</year>). <article-title>Discovering the building blocks of atomic systems using machine learning: application to grain boundaries</article-title>. <source>NPJ Comput. Mater.</source> <volume>3</volume>:<fpage>29</fpage>. <pub-id pub-id-type="doi">10.1038/s41524-017-0027-x</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosenbrock</surname> <given-names>C. W.</given-names></name> <name><surname>Priedeman</surname> <given-names>J. L.</given-names></name> <name><surname>Hart</surname> <given-names>G. L.</given-names></name> <name><surname>Homer</surname> <given-names>E. R.</given-names></name></person-group> (<year>2018</year>). <article-title>Structural characterization of grain boundaries and machine learning of grain boundary energy and mobility</article-title>. <source>arXiv</source> arXiv:1808.05292.</citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sosso</surname> <given-names>G. C.</given-names></name> <name><surname>Deringer</surname> <given-names>V. L.</given-names></name> <name><surname>Elliott</surname> <given-names>S. R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Understanding the thermal properties of amorphous solids using machine-learning-based interatomic potentials</article-title>. <source>Mol. Simulat.</source> <volume>44</volume>, <fpage>866</fpage>&#x02013;<lpage>880</lpage>. <pub-id pub-id-type="doi">10.1080/08927022.2018.1447107</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spearot</surname> <given-names>D. E.</given-names></name></person-group> (<year>2008</year>). <article-title>Evolution of the E structural unit during uniaxial and constrained tensile deformation</article-title>. <source>Acta Mater.</source> <volume>35</volume>, <fpage>81</fpage>&#x02013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1016/j.mechrescom.2007.09.002</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sutton</surname> <given-names>A. P.</given-names></name> <name><surname>Vitek</surname> <given-names>V.</given-names></name></person-group> (<year>1983</year>). <article-title>On the structure of tilt grain-boundaries in cubic metals. 1. symmetrical tilt boundaries</article-title>. <source>Philos. Trans. R. Soc. Math. Phys. Eng. Sci.</source> <volume>309</volume>, <fpage>1</fpage>&#x02013;<lpage>36</lpage>.</citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szlachta</surname> <given-names>W. J.</given-names></name> <name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2014</year>). <article-title>Accuracy and transferability of Gaussian approximation potential models for tungsten</article-title>. <source>Phys. Rev. B</source> <volume>90</volume>:<fpage>104108</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevB.90.104108</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tadmor</surname> <given-names>E. B.</given-names></name> <name><surname>Miller</surname> <given-names>R. E.</given-names></name></person-group> (<year>2011</year>). <source>Modeling Materials: Continuum, Atomistic and Multiscale Techniques.</source> <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tschopp</surname> <given-names>M. A.</given-names></name> <name><surname>McDowell</surname> <given-names>D. L.</given-names></name></person-group> (<year>2007</year>). <article-title>Structural unit and faceting description of Sigma 3 asymmetric tilt grain boundaries</article-title>. <source>J. Mater. Sci.</source> <volume>42</volume>, <fpage>7806</fpage>&#x02013;<lpage>7811</lpage>. <pub-id pub-id-type="doi">10.1007/s10853-007-1626-6</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Watanabe</surname> <given-names>T.</given-names></name> <name><surname>Tsurekawa</surname> <given-names>S.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Zuo</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). The coming of grain boundary engineering in the 21st Century, in <source>Microstructure and Texture in Steels</source>, eds <person-group person-group-type="editor"><name><surname>Haldar</surname><given-names>A.</given-names></name> <name><surname>Suwas</surname> <given-names>S.</given-names></name> <name><surname>Bhattacharjee</surname> <given-names>D.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Springer London</publisher-name>), <fpage>43</fpage>&#x02013;<lpage>82</lpage>.</citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Willatt</surname> <given-names>M. J.</given-names></name> <name><surname>Musil</surname> <given-names>F.</given-names></name> <name><surname>Ceriotti</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements</article-title>. <source>Phys. Chem. Chem. Phys.</source> <volume>20</volume>, <fpage>29661</fpage>&#x02013;<lpage>29668</lpage>. <pub-id pub-id-type="doi">10.1039/C8CP05921G</pub-id><pub-id pub-id-type="pmid">30465679</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolf</surname> <given-names>D.</given-names></name></person-group> (<year>1989</year>). <article-title>A read-shockley model for high-angle grain boundaries</article-title>. <source>Scrip. Metal. Mater.</source> <volume>23</volume>, <fpage>1713</fpage>&#x02013;<lpage>1718</lpage>. <pub-id pub-id-type="doi">10.1016/0036-9748(89)90348-7</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>T.</given-names></name> <name><surname>Yu</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>Boosting with early stopping: convergence and consistency</article-title>. <source>Ann. Stat.</source> <volume>33</volume>, <fpage>1538</fpage>&#x02013;<lpage>1579</lpage>. <pub-id pub-id-type="doi">10.1214/009053605000000255</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>This is available from the Python Package Index using <monospace>pip</monospace> <monospace>install pycsoap</monospace>.</p></fn>
<fn id="fn0002"><p><sup>2</sup>The interactive 2D playground at <ext-link ext-link-type="uri" xlink:href="https://playground.tensorflow.org">https://playground.tensorflow.org</ext-link> demonstrates this nicely.</p></fn>
<fn id="fn0003"><p><sup>3</sup>SOAP can lends itself to interpretation by either (i) optimizing a reference structure by minimizing the kernel metric distance, much like the local environment representation, or (ii) applying relevance propagation to the SOAP vector. However, the first approach provides only a local analog and the second approach suffers information loss due to the angular integral. Thus, while certainly useful, the inverse SOAP operations do not have the same global resolution as an inverse scattering transform.</p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> DH, CR, AN, and GH are supported under ONR (MURI N00014-13-1-0635). EH is supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences under Award &#x00023;DE-SC0016441.</p>
</fn>
</fn-group>
</back>
</article>