<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2023.1124718</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Learning and reasoning with graph data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Jaeger</surname> <given-names>Manfred</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/506686/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Computer Science, Aalborg University</institution>, <addr-line>Aalborg</addr-line>, <country>Denmark</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Dursun Delen, Oklahoma State University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Parisa Kordjamshidi, Michigan State University, United States; Fabrizio Riguzzi, University of Ferrara, Italy</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Manfred Jaeger <email>jaeger&#x00040;cs.aau.dk</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>08</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>6</volume>
<elocation-id>1124718</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>07</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Jaeger.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Jaeger</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Reasoning about graphs, and learning from graph data is a field of artificial intelligence that has recently received much attention in the machine learning areas of graph representation learning and graph neural networks. Graphs are also the underlying structures of interest in a wide range of more traditional fields ranging from logic-oriented knowledge representation and reasoning to graph kernels and statistical relational learning. In this review we outline a broad map and inventory of the field of learning and reasoning with graphs that spans the spectrum from reasoning in the form of logical deduction to learning node embeddings. To obtain a unified perspective on such a diverse landscape we introduce a simple and general semantic concept of a model that covers logic knowledge bases, graph neural networks, kernel support vector machines, and many other types of frameworks. Still at a high semantic level, we survey common strategies for model specification using probabilistic factorization and standard feature construction techniques. Based on this semantic foundation we introduce a taxonomy of reasoning tasks that casts problems ranging from transductive link prediction to asymptotic analysis of random graph models as queries of different complexities for a given model. Similarly, we express learning in different frameworks and settings in terms of a common statistical maximum likelihood principle. Overall, this review aims to provide a coherent conceptual framework that provides a basis for further theoretical analyses of respective strengths and limitations of different approaches to handling graph data, and that facilitates combination and integration of different modeling paradigms.</p></abstract>
<kwd-group>
<kwd>graph data</kwd>
<kwd>representation learning</kwd>
<kwd>statistical relational learning</kwd>
<kwd>graph neural networks</kwd>
<kwd>neuro-symbolic integration</kwd>
<kwd>inductive logic programming</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="2"/>
<equation-count count="31"/>
<ref-count count="71"/>
<page-count count="17"/>
<word-count count="14094"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Machine Learning and Artificial Intelligence</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Graphs are a very general mathematical abstraction for real world networks such as social-, sensor-, biological- or traffic-networks. These types of networks often generate large quantities of observational data, and using machine learning techniques to build predictive models for them is an area of substantial current interest. Graphs also arise as abstract models for knowledge, e.g., in the form of semantic models for a logic knowledge base, or directly as a knowledge graph. In most cases, an appropriate representation as a graph will require that one goes beyond the fundamental graph model, and allows that nodes are annotated with attributes, and that there are several distinct edge relation.</p>
<p>Evidently, in machine learning the main interest is in learning from graph data, whereas in knowledge representation and reasoning the primary focus is on deductive reasoning. However, both learning and reasoning play a role in all disciplines: making class label predictions from a learned model is a highly specialized (and limited) form of reasoning, and learning logic rules from examples has a long history in symbolic AI. It is the goal of this review to survey the large and diverse area of approaches for learning and reasoning with graphs in different areas of AI and adjacent fields of mathematics. Given the scope of the subject, our discussion will be mostly at a high, conceptual level. For more technical details, and more comprehensive literature reviews, we will point to relevant specialized surveys. The main objective of this review is to establish a coherent formal framework that facilitates a unified analysis of a wide variety of learning and reasoning frameworks. Even though this review aims to cover a broad range of methods and disciplines, there will be a certain focus on graph neural networks (GNNs) and statistical relational learning (SRL), whereas the fields of graph kernels and purely logic-based approaches receive a little less attention than they deserve.</p>
<p>The following examples illustrate the range of modeling, learning, and reasoning approaches that we aim to cover in this review. Each example describes a general task, a concrete instance of that task, and a particular approach for solving the task. The examples should not be construed in the way that the described solution approach is the only or even most suited one to deal with the given task. The intention is to illustrate the diversity of tasks and solution techniques.</p>
<p>Example 1.1. (Node classification with graph neural networks) One of the most common tasks in machine learning with graphs is <italic>node classification</italic>, i.e., predicting an unobserved node label. A standard instance of such a classification task is subject prediction in a bibliographic data graph: nodes are scientific papers that are connected by citation links, and the node labels consist of a subject area classification of the paper. In the <italic>inductive</italic> version of this task, one is given one or several training graphs containing labeled <italic>training nodes</italic>. The task is to learn a model that allows one to predict class labels of unlabeled nodes that are not already contained in the training graphs. For example, the training graph may consist of the current version of a bibliographic database, whereas the unlabeled nodes are new publications when they are added to the database. In the <italic>transductive</italic> version of the task, both the labeled training nodes and the unlabeled test nodes reside in the same graph, which is already fully known at the time of learning. This is the case when an incomplete subject area labeling is to be completed for a given bibliographic database. Graph neural networks are a state-of-the-art approach to solve such classification tasks (e.g., Niepert et al., <xref ref-type="bibr" rid="B45">2016</xref>; Hamilton et al., <xref ref-type="bibr" rid="B22">2017</xref>; Welling and Kipf, <xref ref-type="bibr" rid="B67">2017</xref>; Veli&#x0010D;kovi&#x00107; et al., <xref ref-type="bibr" rid="B63">2018</xref>).</p>
<p>Example 1.2. (Link prediction via node embeddings) Here the task is to predict whether two nodes are connected by an edge. This prediction problem is usually considered in a transductive setting, where all the nodes and some of the edges are given, and edges between certain test pairs of nodes have to be predicted. Bibliographic data graphs are again a popular testbed for link prediction approaches (Kipf and Welling, <xref ref-type="bibr" rid="B31">2016</xref>; Pan et al., <xref ref-type="bibr" rid="B46">2021</xref>). <italic>Recommender systems</italic> also can be seen as handling a link prediction problem: the underlying graph here contains <italic>user</italic> and <italic>product</italic> nodes, and edges connect users with products that the user has bought (or provided some other type of positive feedback for). The link prediction problem then amounts to predicting positive user/product relationships that have not yet been observed. Numerous different link prediction approaches exist (Kumar et al., <xref ref-type="bibr" rid="B36">2020</xref> gives a comprehensive survey). A variety of different approaches is based on constructing for each node in the graph a <italic>d</italic>-dimensional real-valued <italic>embedding vector</italic>, and to score the likelihood of the existence of an edge between two nodes by considering the proximity (according to a suitable metric) of their embedding vectors. This general paradigm encompasses approaches such as matrix factorization (Koren and Bell, <xref ref-type="bibr" rid="B34">2015</xref>) and random walk based approaches (Perozzi et al., <xref ref-type="bibr" rid="B47">2014</xref>; Grover and Leskovec, <xref ref-type="bibr" rid="B19">2016</xref>).</p>
 <p>A good and concise monograph that covers the modern machine learning methods described in the preceding two examples is (Hamilton, <xref ref-type="bibr" rid="B21">2020</xref>).</p>
<p>Example 1.3. (Graph Classification with inductive logic programming) One may also want to predict a class label associated with a whole graph. A classic example is predicting properties of molecules, where molecules are represented as graphs consisting of nodes representing the atoms, and links representing bonds between the atoms. The famous <italic>Mutagenesis</italic> dataset (Srinivasan et al., <xref ref-type="bibr" rid="B60">1996</xref>), for example, consists of 188 molecules with a Boolean <italic>mutagenic</italic> class label. A predictor for this label may be given in the form of a logic program, such as the following:</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>m</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>b</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0005F;</mml:mo><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>(this program here is purely given for expository purposes, and does not resemble realistic classification programs for this task). A particular molecule specified by a list of <italic>ground facts</italic>, such as <italic>carbon</italic>(<italic>at</italic>_1), <italic>carbon</italic>(<italic>at</italic>_2), <italic>nitrogen</italic>(<italic>at</italic>_3), &#x02026;, <italic>bond</italic>(<italic>at</italic>_1, <italic>at</italic>_3), would then be classified as mutagenic, if <italic>mutagenic</italic> can be proven by the program from the ground facts. This will be the case if and only if the molecule contains a cycle consisting of carbon atoms. Parts of the program (e.g., the definitions of carbon path and cycle) may be provided by experts as background knowledge, whereas other parts (e.g., the dependence of mutagenic on the existence of a carbon cycle) would be learned from labeled examples.</p>
<p>Example 1.4. (Graph similarity and classification with graph kernels) The problem of graph classification (especially in bio-molecular application domains) has also extensively been approached with kernel techniques. Graph kernels (see Kriege et al., <xref ref-type="bibr" rid="B35">2020</xref> for an excellent survey) are functions <italic>k</italic> that map pairs of graphs <italic>G, H</italic> to a value <italic>k</italic>(<italic>G, H</italic>)&#x02208;&#x0211D; that is usually interpreted as a similarity measure for <italic>G</italic> and <italic>H</italic>, and which must be of the form <italic>k</italic>(<italic>G, H</italic>) &#x0003D; &#x003D5;(<italic>G</italic>)&#x000B7;&#x003D5;(<italic>H</italic>) for some finite or infinite dimensional real-valued feature vectors &#x003D5;(<italic>G</italic>), &#x003D5;(<italic>H</italic>). The graph kernel can then be used for graph classification by using it as input for a support vector machine classifier. Based on its interpretation as a similarity measure, graph kernels can also support other types of similarity-based analyses, e.g., clustering. Most graph kernels are defined by an explicit definition of the mapping from graphs <italic>G</italic> to feature vectors &#x003D5;(<italic>G</italic>). Important examples are the <italic>Weisfeiler-Lehman kernel (WLK)</italic> (Shervashidze et al., <xref ref-type="bibr" rid="B58">2011</xref>), the <italic>graphlet kernel</italic> (Shervashidze et al., <xref ref-type="bibr" rid="B57">2009</xref>), and the <italic>random walk kernel</italic> (G&#x000E4;rtner et al., <xref ref-type="bibr" rid="B17">2003</xref>). The components of the feature vectors in the first two contain statistics on the occurrence of local neighborhood structures in <italic>G</italic>, whereas the features of the random walk kernel represent statistics on node label sequences that are generated by random walks on the graph.</p>
<p>The previous examples were concerned with specific prediction tasks. In the following examples we move toward more general forms of reasoning.</p>
<p>Example 1.5. (Probabilistic inference with SRL) The task is to learn a probabilistic graph model that supports a rich class of queries. An example is a probabilistic model for the genotypes of people in a <italic>pedigree</italic> graph. The model should support a spectrum of queries. A basic type are conditional probability queries of the form: given (partial) genotype information for some individuals, what are the probabilities for the genotypes of the other individuals? This is still very similar to the node classification task of Example 1.1. A query that goes beyond what has been described in previous examples is a <italic>most probable explanation (MPE)</italic> query: again, given partial genotype information, what is the most probable joint configuration of the genotypes for all individuals? <italic>Statistical relational learning (SRL)</italic> approaches such as <italic>Relational Bayesian Networks (RBNs)</italic> (Jaeger, <xref ref-type="bibr" rid="B26">1997</xref>), <italic>Markov logic networks (MLNs)</italic> (Richardson and Domingos, <xref ref-type="bibr" rid="B51">2006</xref>), or <italic>ProbLog</italic> (De Raedt et al., <xref ref-type="bibr" rid="B11">2007</xref>) provide modeling and inference frameworks for solving such tasks.</p>
<p>Example 1.6. (Logical reasoning with first-order logic) Going beyond the flexible, but still rather structured type of queries considered in Example 3.6, we can consider more general logical reasoning tasks. There are no standard example instances of this, so we illustrate this task and its solution by deduction in first-order logic by the following example: given the following knowledge about a social network:</p>
<list list-type="bullet">
<list-item><p>Every user follows at least one other user. Expressed as a first-order logic formula, this reads as:</p></list-item>
</list>
<disp-formula id="E2"><label>(1)</label><mml:math id="M2"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x02203;</mml:mo><mml:mi>y</mml:mi><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<list list-type="bullet">
<list-item><p>There is a user who is not followed by anyone:</p></list-item>
</list>
<disp-formula id="E3"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mo>&#x02203;</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x000AC;</mml:mo><mml:mo>&#x02203;</mml:mo><mml:mi>x</mml:mi><mml:mtext class="textit" mathvariant="italic">follows</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Does this knowledge imply that there must be a user who has at least two followers, or there must be at least four different users:</p>
<disp-formula id="E4"><label>(3)</label><mml:math id="M4"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02203;</mml:mo><mml:mi>y</mml:mi><mml:msup><mml:mo>&#x02203;</mml:mo><mml:mrow><mml:mo>&#x02265;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>x</mml:mi><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02228;</mml:mo><mml:msup><mml:mo>&#x02203;</mml:mo><mml:mrow><mml:mo>&#x02265;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup><mml:mi>x</mml:mi><mml:mo>?</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>Considering first all finite graphs, one finds that when (1) and (2) are true there must be a node with at least two incoming edges, i.e., the first part of the disjunction in (3) is true. When the graph is infinite, then this implication no longer holds, but then the second disjunct of (3) will be true (where the number 4 may be replaced with any natural number). Thus, we find that (1) and (2) logically entail (3). This inference can be performed by automated theorem provers. For example the SPASS prover (Weidenbach et al., <xref ref-type="bibr" rid="B66">2009</xref>) can answer our query.</p>
<p>Example 1.7. (Limit behavior of random graphs) Many probabilistic models have been developed for the temporal evolution of growing networks. The simplest possible model is to assume that every new node is connected to the already existing nodes with a fixed probability <italic>p</italic>, and these connections are formed independently of each other. One may then ask how the probabilities of certain graph properties develop, as the size of the graph grows to infinity. Classic results of the Erd&#x00151;s-R&#x000E9;nyi random graph theory establish, for example, that the limiting probability for the evolving graph to be connected is 1 (Erd&#x00151;s and R&#x000E9;nyi, <xref ref-type="bibr" rid="B13">1960</xref>) [the actual results of the Erd&#x00151;s-R&#x000E9;nyi random graph theory are actually much more sophisticated, as they pertain to models where the edge probability is a function <italic>p</italic>(<italic>n</italic>) of the number <italic>n</italic> of vertices]. Similarly, it is known that the probability of every property that can be expressed in first-order logic converges to either 0 or 1 (Fagin, <xref ref-type="bibr" rid="B14">1976</xref>). Reasoning about such limiting probabilities goes beyond reasoning about all graphs as in Example 1.6, since we now consider probability distributions over infinite sequences of graphs. Some types of queries about the limit behavior of random graphs are formally decidable. This is the case, for example, for the limit probability of a first-order sentence (Grandjean, <xref ref-type="bibr" rid="B18">1983</xref>). However, the computational complexity of these reasoning tasks puts them outside the range of practically feasible implementations.</p>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> arranges the landscape of reasoning scenarios we have considered in the preceding examples in two dimensions: one dimension characterizes the domain of graphs that we reason about: at the bottom of this dimension is the transductive setting of Examples 1.1 and 1.2 in which reasoning is limited to a single given graph. At the next level, labeled &#x0201C;one graph at a time,&#x0201D; any single reasoning task is about a specific graph, but this graph under consideration can vary without the need to change or retrain the model. This corresponds to the inductive setting in Example 1.1, as well as the tasks described in Examples 1.4 and 3.6. At the third level, the reasoning concerns several or all graphs at once (Example 1.6). Finally, we may go beyond reasoning about properties of individual graphs, and consider global properties of the space of all graphs. This is exemplified by Example 1.7, where the reasoning pertains to the relationship between different probability distributions on graphs. These informal distinctions about the domain of reasoning will be partly formalized by technical definitions in Section 5.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Reasoning landscape.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1124718-g0001.tif"/>
</fig>
<p>The second dimension in <xref ref-type="fig" rid="F1">Figure 1</xref> is correlated with the first, but describes the type of reasoning that is performed. The most restricted type here is to perform a narrowly defined prediction task such as node classification or link prediction. Based on the distinction between &#x0201C;model checking&#x0201D; and &#x0201C;theorem proving&#x0201D; promoted by Halpern and Vardi (<xref ref-type="bibr" rid="B20">1991</xref>) we label the next level as &#x0201C;model checking&#x0201D;: this refers to evaluating properties of a single given structure, where the class of properties that can be queried is defined by a rich and flexible query language. &#x0201C;Deduction&#x0201D; then refers to (logical) inference about a class of structures, and in &#x0201C;model theory&#x0201D; such a class of structure becomes the object of reasoning itself. Within this two-dimensional schema, <xref ref-type="fig" rid="F1">Figure 1</xref> indicates what areas of reasoning are covered by different types of frameworks. The reasoning landscape here delineated extends beyond the boundaries of what can currently be tackled with automated, algorithmic reasoning methods in AI, and extends to human mathematical inference. In the remainder of this review we will focus on algorithmic reasoning. However, it is an always open challenge to extend the scope of what can be accomplished by algorithmic means.</p>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> focuses on reasoning rather than learning. This is for two reasons: first, reasoning covers a somewhat wider ground that includes scenarios (e.g., logical deduction) not yet supported by learning methods. Second, a taxonomy of learning scenarios in terms of data requirements and learning objectives will naturally follow from the reasoning taxonomy (cf. Section 6). The link between reasoning and learning is a <italic>model</italic> that can be learned from data and that supports the reasoning tasks under consideration. In this review we use a general probabilistic concept of a model formally defined in in Section 3, that allows us to express a wide range of reasoning tasks as different forms of querying a model (Section 5). Similarly, a wide range of learning scenarios can be understood in a coherent manner as constructing models from data under a maximum likelihood objective (Section 6). This review is partly based on Jaeger (<xref ref-type="bibr" rid="B29">2022</xref>), which already contains an in-depth exploration of GNN and SRL methods for learning and reasoning with graphs. The current review takes a much broader high-level perspective. It contains fewer technical details on GNNs and SRL, and instead develops a general conceptual framework for describing a much wider range of learning and reasoning frameworks.</p>
</sec>
<sec id="s2">
<title>2. Graphs</title>
<p>In this section we establish the basic definitions and terminology we use for graphs. <xref ref-type="table" rid="T1">Table 1</xref> collects the main notations. Our graphs will actually be multi-relational, attributed hyper-graphs, but we just say shortly &#x0201C;graphs.&#x0201D;</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Overview of notation.</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th valign="top" align="left"><bold>[<italic>n</italic>]</bold></th>
<th valign="top" align="left"><bold>The (node) set {1, &#x02026;, <italic>n</italic>}</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M5"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Signature of relation symbols</td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M6"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic><italic>in</italic></italic></sub>, <inline-formula><mml:math id="M7"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic><italic>out</italic></italic></sub></td>
<td valign="top" align="left">Sets of designated input and output relations</td>
</tr> <tr>
<td valign="top" align="left"><italic>e, r</italic>, &#x02026;</td>
<td valign="top" align="left">(Lower case) specific symbols in <inline-formula><mml:math id="M8"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula></td>
</tr> <tr>
<td valign="top" align="left"><italic>i, j</italic>, &#x02026;</td>
<td valign="top" align="left">Nodes</td>
</tr> <tr>
<td valign="top" align="left"><bold>i</bold>, <bold>j</bold>, &#x02026;</td>
<td valign="top" align="left">Tuples of nodes</td>
</tr> <tr>
<td valign="top" align="left"><italic>E, R</italic>, &#x02026;</td>
<td valign="top" align="left">(Upper case) interpretations of <italic>e, r</italic>, &#x02026; in a specific graph</td>
</tr> <tr>
<td valign="top" align="left"><bold>R</bold></td>
<td valign="top" align="left">Tuple of interpretations for all <italic>r</italic>&#x02208;<inline-formula><mml:math id="M9"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M10"><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Partial interpretation(s)</td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M11"><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left"><inline-formula><mml:math id="M12"><mml:mover accent="true"><mml:mrow><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> Extends partial interpretation <inline-formula><mml:math id="M13"><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M14"><mml:mrow><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Set of interpretations of <inline-formula><mml:math id="M15"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> over domain <italic>V</italic></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M16"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Set of graphs for signature <inline-formula><mml:math id="M17"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> with domain <italic>V</italic></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M18"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Set of all finite graphs for signature <inline-formula><mml:math id="M19"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M20"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Set of all graphs for signature <inline-formula><mml:math id="M21"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M22"><mml:mover accent="true"><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x02026;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Corresponding sets of partial graphs</td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M23"><mml:mi>&#x00394;</mml:mi><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Space of probability distributions on <inline-formula><mml:math id="M24"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr> <tr>
<td valign="top" align="left"><inline-formula><mml:math id="M25"><mml:mrow><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>V</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td valign="top" align="left">Set of completions of partial interpretation <inline-formula><mml:math id="M26"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">{|&#x02026;|}</td>
<td valign="top" align="left">Delimiters for multisets</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Different attributes and (hyper-) edge relations are collected in a <italic>signature</italic>: a set <inline-formula><mml:math id="M27"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> of <italic>relation symbols</italic>. Each relation symbol has an <italic>arity</italic>(<italic>r</italic><sub><italic>i</italic></sub>)&#x02208;{0, 1, &#x02026;}. Relations of arity 0 are global graph attributes, such as <italic>toxic</italic> for a graph representing a chemical molecule. Relations of arity 1 are node attributes. Relations of arities &#x02265;3, can in principle be reduced to a set of binary relations by materializing tuples as nodes. For example, the 3-ary relation <italic>shortest_path</italic>(<italic>a, b, c</italic>) representing that node <italic>b</italic> lies on the shortest path from <italic>a</italic> to <italic>c</italic> can be encoded by creating a new shortest path node <italic>sp</italic>, and three binary relations <italic>start, on, end</italic>, so that <italic>start</italic>(<italic>sp, a</italic>), <italic>on</italic>(<italic>sp, b</italic>), <italic>end</italic>(<italic>sp, c</italic>) are true. However, this leads to very un-natural encodings, and therefore we allow relations of arities &#x02265;3.</p>
<p>A <inline-formula><mml:math id="M28"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-<italic>graph</italic> is a structure (<italic>V</italic>, <bold>R</bold>), where <italic>V</italic> is a finite or countably infinite set of nodes (also referred to as a <italic>domain</italic>), and <bold>R</bold> &#x0003D; (<italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>m</italic></sub>) are the <italic>interpretations</italic> of the relation symbols: <inline-formula><mml:math id="M29"><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:msup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">arity</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02192;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. In the case of <italic>arity</italic>(<italic>r</italic><sub><italic>i</italic></sub>) &#x0003D; 0 this is just a constant 0 or 1.</p>
<p>We write <italic>IntV</italic>(<italic>r</italic>) for the set of possible interpretations of the relation symbol <inline-formula><mml:math id="M30"><mml:mi>r</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> over the domain <italic>V</italic>, and <inline-formula><mml:math id="M31"><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mi>V</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x000D7;</mml:mo></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mi>V</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> for the set of all interpretations of the whole signature <inline-formula><mml:math id="M32"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>. In most cases it is sufficient to consider the domains <italic>V</italic> &#x0003D; [<italic>n</italic>]: &#x0003D; {1, &#x02026;, <italic>n</italic>} for <italic>n</italic>&#x02208;&#x02115;. We use <italic>i, j</italic>, &#x02026; as generic symbols for nodes, and bold face <bold>i</bold>, <bold>j</bold>, &#x02026; for tuples of nodes. We here use somewhat logic-inspired terminology and notation. Taking the logic conventions even further, we can equivalently define an interpretation of <italic>r</italic> as an assignment of true/false values to <italic>ground atoms</italic> <italic>r</italic>(<bold>i</bold>) [<bold>i</bold>&#x02208;|<italic>V</italic>|<sup><italic>arity</italic>(<italic>r</italic>)</sup>]. Going in the opposite direction toward the terminological conventions of neural networks, such an interpretation also can be seen as a |<italic>V</italic>| &#x000D7; &#x022EF; &#x000D7; |<italic>V</italic>|-dimensional (<italic>arity</italic>(<italic>r</italic>) many factors) tensor.</p>
<p>In the following, we usually take the signature <inline-formula><mml:math id="M33"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> as given by the context, and do not refer to it explicitly, thus saying &#x0201C;graph&#x0201D; rather than <inline-formula><mml:math id="M34"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-<italic>graph</italic>, and also abbreviating <inline-formula><mml:math id="M35"><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mi>V</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> by <italic>IntV</italic> Also, when the intended meaning is clear from the context, we use the simple term &#x0201C;relation&#x0201D; to either refer to a relation symbol <italic>r</italic>, or an interpretation <italic>R</italic> in a specific graph.</p>
<p>Note that according to our definitions all relations are <italic>directed</italic>. Undirected edges/relations are obtained as the special case where the interpretation <italic>R</italic><sub><italic>i</italic></sub> for a tuple <bold>i</bold> only depends on the elements of <bold>i</bold>, not their order. Furthermore, only Boolean node attributes are permitted. Multi-valued attributes can be represented by multiple Boolean ones in a one-hot encoding.</p>
<p>For a given domain <italic>V</italic> and signature <inline-formula><mml:math id="M36"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> we denote with <inline-formula><mml:math id="M37"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> the set of all <inline-formula><mml:math id="M38"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-graphs with domain <italic>V</italic>, with <inline-formula><mml:math id="M39"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> the set of all finite <inline-formula><mml:math id="M40"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-graphs, and with <inline-formula><mml:math id="M41"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> the set of all graphs, also allowing (countably) infinite domains <italic>V</italic>. As a basis for probabilistic graph models, we denote with <inline-formula><mml:math id="M42"><mml:mi>&#x00394;</mml:mi><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> the set of probability distributions over <inline-formula><mml:math id="M43"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> (in the case of infinite <italic>V</italic> this must be based on suitable measure-theoretic definitions that we do not elaborate here).</p>
</sec>
<sec id="s3">
<title>3. Models</title>
<p>We introduce a general concept of a model, so that all reasoning tasks become different forms of querying a model. Ours will be a high-level, purely semantic notion of a model that imposes no restrictions on how models are represented, implemented or constructed. Our model definition is probabilistic in nature. As we shall see, this still allows us to capture purely qualitative, logic-based frameworks, although at the cost of casting them in a slightly contrived way into the probabilistic mold via extreme 0, 1-valued probabilities. Roughly speaking, a model in our sense will be a mapping from partly specified graphs to probability distributions over their possible completions.</p>
<p>A <italic>partial graph</italic> is a structure <inline-formula><mml:math id="M44"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula> where <italic>V</italic> is a finite or countably infinite domain, and <inline-formula><mml:math id="M45"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> are <italic>partial interpretations</italic> of the relation symbols: <inline-formula><mml:math id="M46"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:msup><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">arity</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02192;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>?</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. We write <inline-formula><mml:math id="M47"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0227C;</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> (<inline-formula><mml:math id="M48"><mml:mrow><mml:msubsup><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>i</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msubsup><mml:msub><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> <italic>extends</italic> <inline-formula><mml:math id="M49"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) nif</p>
<disp-formula id="E5"><mml:math id="M50"><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:mi>i</mml:mi><mml:mo>:</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02260;</mml:mo><mml:mo>?</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x021D2;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:msup><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>If <inline-formula><mml:math id="M51"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0227C;</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> for <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>m</italic> we write <inline-formula><mml:math id="M52"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0227C;</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>.</p>
<p><inline-formula><mml:math id="M53"><mml:mover accent="true"><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the set of all partial <inline-formula><mml:math id="M54"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-graphs with domain <italic>V</italic>, and <inline-formula><mml:math id="M55"><mml:mover accent="true"><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> the set of all finite partial <inline-formula><mml:math id="M56"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-graphs. Finally, <inline-formula><mml:math id="M57"><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the set of complete interpretations <inline-formula><mml:math id="M58"><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mi>V</mml:mi></mml:msub><mml:mi>&#x0211B;</mml:mi></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M59"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x0227C;</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
<p>Definition 3.1. A <inline-formula><mml:math id="M60"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>-model <inline-formula><mml:math id="M61"><mml:mrow><mml:mi mathvariant="script">M</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> consists of</p>
<list list-type="bullet">
<list-item><p>a subset <inline-formula><mml:math id="M62"><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow><mml:mo>&#x02286;</mml:mo><mml:mover accent="true"><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
<list-item><p>a mapping</p></list-item>
</list>
<disp-formula id="E6"><mml:math id="M63"><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mo>:</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x021A6;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>&#x00394;</mml:mi><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>defined for all <inline-formula><mml:math id="M64"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover><mml:mi>R</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02208;</mml:mo><mml:mi>&#x02110;</mml:mi></mml:mrow></mml:math></inline-formula>, such that</p>
<disp-formula id="E7"><mml:math id="M65"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>1.</mml:mn></mml:mrow></mml:math></disp-formula>
<p>In the case where the input just consists of a domain <italic>V</italic> (i.e., <inline-formula><mml:math id="M66"><mml:mrow><mml:mover><mml:mi>&#x0211B;</mml:mi><mml:mo>~</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is completely unspecified), we simply write <italic>P</italic><sub><italic>V</italic></sub> for <inline-formula><mml:math id="M68"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
<p>Definition 3.1 has several important special cases: when a model is for classification of a specific label relation <italic>l</italic>, then <inline-formula><mml:math id="M69"><mml:mrow><mml:mi mathvariant="script">I</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo>&#x02216;</mml:mo><mml:mi>&#x00020;</mml:mi><mml:mi>&#x00020;</mml:mi><mml:mo>&#x0007B;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x0007D;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></inline-formula>. The model then defines for every input graph with complete specifications of relations <inline-formula><mml:math id="M70"><mml:mi>r</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="script">R</mml:mi><mml:mo>\</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> a distribution over interpretations <italic>L</italic> for <italic>l</italic>. We refer to such models as <italic>discriminative</italic>. A model that, in contrast, takes as input only a finite domain, i.e., <inline-formula><mml:math id="M71"><mml:mi mathvariant="bold-script">I</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x02205;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, and thereby maps a domain <italic>V</italic> to a probability distribution over <inline-formula><mml:math id="M72"><mml:mi mathvariant="script">G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="script">R</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is called <italic>fully generative</italic>. In between these two extremes are models where <inline-formula><mml:math id="M73"><mml:mi mathvariant="script">R</mml:mi></mml:math></inline-formula> is partitioned into a set of <italic>input</italic> relations <inline-formula><mml:math id="M74"><mml:mi mathvariant="script">R</mml:mi></mml:math></inline-formula><sub><italic>in</italic></sub> and output relations <inline-formula><mml:math id="M75"><mml:mi mathvariant="script">R</mml:mi></mml:math></inline-formula><sub><italic>out</italic></sub>, and <inline-formula><mml:math id="M76"><mml:mi mathvariant="bold-script">I</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="script">G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">in</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. This is the typical case for SRL models. We refer to such models as <italic>conditionally generative</italic> (discriminative then is just a borderline case of conditionally generative). In all these special cases the input graph contains complete specifications of a selected set of relations. This covers many, but not all models used in practice: e.g., models for link prediction (cf. Example 3.3 below) operate on inputs with partial specifications of the edge relation. Transductive models are characterized as the special case |<inline-formula><mml:math id="M77"><mml:mi mathvariant="script">I</mml:mi></mml:math></inline-formula>| &#x0003D; 1.</p>
<p>The abstract definition 3.1 accommodates a multitude of concrete approaches for learning and reasoning about graphs. We call a <italic>modeling framework</italic> any approach that provides computational tools for representing, learning, and reasoning with models. <xref ref-type="fig" rid="F2">Figure 2</xref> gives an overview of several classes of modeling frameworks, and representatives for each class. The selection of representatives does not attempt a complete coverage of even the most important examples. In the following we continue the examples of Section 1 to illustrate how these frameworks indeed fit into our general Definition 3.1.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Framework classes and representatives.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1124718-g0002.tif"/>
</fig>
<p>Example 3.2. (Node classification; GNNs) In the standard node classification scenario, the signature <inline-formula><mml:math id="M78"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> consists of one binary edge relation <italic>e</italic>, observed node attributes <bold>a</bold> &#x0003D; <italic>a</italic><sub>1</sub>, &#x02026;, <italic>a</italic><sub><italic>l</italic></sub>, and a node label <italic>l</italic>. A partial input graph has the form <inline-formula><mml:math id="M79"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, where in the partial interpretation <inline-formula><mml:math id="M80"><mml:mover accent="true"><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>A</mml:mi></mml:mstyle><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> the edge relation and node attributes are fully observed, and the node label is unobserved for some (possibly all) nodes. Given a Graph neural networks then define label probabilities</p>
<disp-formula id="E8"><mml:math id="M81"><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>E</mml:mi><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>A</mml:mi></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>:</mml:mo><mml:mover accent='true'><mml:mi>L</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>?</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>for the unlabeled nodes. Since these predictions are independent for all nodes, they define a distribution over completions <italic>L</italic> of <inline-formula><mml:math id="M82"><mml:mover accent="true"><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> via</p>
<disp-formula id="E9"><label>(4)</label><mml:math id="M83"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>L</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003A0;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>:</mml:mo><mml:mover accent='true'><mml:mi>L</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>?</mml:mo></mml:mrow></mml:msub><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>E</mml:mi><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>A</mml:mi></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>In <xref ref-type="fig" rid="F2">Figure 2</xref>, the class of GNN frameworks is divided into the sub-classes of <italic>message-passing</italic> and <italic>recurrent</italic> GNNs. MP-GNNs compute node feature vectors in a fixed number of iterations, whereas R-GNNs perform feature updates until a fixed point is reached. We return to this distinction in Section 4.2.</p>
<p>Example 3.3. (Link prediction; shallow embeddings) In its most basic form, a transductive link prediction problem is given by a single input graph <inline-formula><mml:math id="M84"><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x01EBC;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> with an partially observed edge relation. Usually, all edges that are not observed as existent are candidates for being predicted, so that &#x01EBC;(<italic>i, j</italic>)&#x02208;{1, ?} for all <italic>i, j</italic>&#x02208;<italic>V</italic> (in practice, however, represented in the form &#x01EBC;(<italic>i, j</italic>)&#x02208;{1, 0}, with the understanding that &#x01EBC;(<italic>i, j</italic>) &#x0003D; 0 still allows the prediction <italic>E</italic>(<italic>i, j</italic>) &#x0003D; 1). <italic>Shallow embeddings</italic> construct for each node <italic>i</italic>&#x02208;<italic>V</italic> an embedding vector <italic>em</italic>(<italic>i</italic>), and define a scoring function <italic>score</italic>[<italic>em</italic>(<italic>i</italic>), <italic>em</italic>(<italic>j</italic>)] for the likelihood of <italic>E</italic>(<italic>i, j</italic>) &#x0003D; 1. As the objective usually is to just rank candidates edges, this score need not necessarily be probabilistic [concrete examples are the dot product or cosine similarity of <italic>em</italic>(<italic>i</italic>), <italic>em</italic>(<italic>j</italic>)]. However, turning the scores into probabilities by applying a sigmoid function does not affect the ranking and hence we may assume that the link prediction model defines edge probabilities <italic>P</italic>[<italic>e</italic>(<italic>i, j</italic>)|(<italic>V</italic>, &#x01EBC;)] (<italic>i, j</italic>:&#x01EBC;(<italic>i, j</italic>) &#x0003D; ?). Via the same implicit independence assumptions as in the previous example, then a distribution in the sense of Definition 3.1 is defined as</p>
<disp-formula id="E10"><label>(5)</label><mml:math id="M85"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>E</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003A0;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>:</mml:mo><mml:mover accent='true'><mml:mi>E</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>?</mml:mo></mml:mrow></mml:msub><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>E</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Example 3.4. (ILP) A logic program as shown in Example 1.3 can be interpreted as a model in our sense. For any input <inline-formula><mml:math id="M86"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M87"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> is specified by a list of true ground facts, the program defines the unique interpretation <bold>R</bold><sup>&#x0002A;</sup> in which a ground fact <italic>r</italic>(<bold>i</bold>) is true, iff it is provable from the program and <inline-formula><mml:math id="M88"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula>. This can be expressed in the form of a degenerate probability distribution with</p>
<disp-formula id="E11"><mml:math id="M89"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>1.</mml:mn></mml:mrow></mml:math></disp-formula>
<p>The set <inline-formula><mml:math id="M90"><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow></mml:math></inline-formula> of possible inputs for this model consists of all <inline-formula><mml:math id="M91"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> where <inline-formula><mml:math id="M92"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> is defined by positive ground facts only, i.e., <inline-formula><mml:math id="M93"><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>?</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> for all <inline-formula><mml:math id="M94"><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:math></inline-formula>.</p>
<p>Example 3.5. (Graph Classification; graph kernels) A graph kernel <italic>k</italic>(<italic>G, H</italic>) alone is not a model in our sense. However, a support-vector machine classifier built on the kernel (denoted <italic>k</italic>-SVM) is such a model: a <italic>k</italic>-SVM maps graphs to label values in {0, 1}, which similar to Example 3.4 can be seen as a degenerate probabilistic model. Alternatively, since the <italic>k</italic>-SVM actually produces numeric scores for the labels (as the distance to the decision boundary), one can transform the score by a sigmoid function to non-degenerate probability values.</p>
<p>Example 3.6. (SRL) To solve the probabilistic inference tasks of Example an SRL model will be defined for input structures (<italic>V, E</italic>) with a fully observed edge relation <italic>e</italic> defining the pedigree structure. The model will then define a distribution <italic>P</italic><sub>(<italic>V, E</italic>)</sub>(<bold>A</bold>) over interpretations <bold>A</bold> of node attributes <bold>a</bold>. In a typical solution for this type of problem this will be done by defining a marginal distribution over genotypes for the individuals in <italic>V</italic> whose parents are not included in <italic>V</italic>, and a conditional distribution over child genotypes given parent genotypes. In order to solve the reasoning tasks described in Example, the framework must support queries about conditional probabilities of the form <inline-formula><mml:math id="M95"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>E</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>A</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> (genotype probabilities for individual <italic>i</italic> given partial information <inline-formula><mml:math id="M96"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>A</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> about genotypes in the pedigree) and <inline-formula><mml:math id="M97"><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>&#x00020;</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:mover accent='true'><mml:mi>A</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula> (MPE inference).</p>
<p>The model outlined here falls into the sub-class of <italic>directed</italic> SRL frameworks, where the distributions defined by a model can be represented in the form of directed probabilistic graphical models. Other important sub-classes of SRL distinguished in <xref ref-type="fig" rid="F2">Figure 2</xref> are frameworks based on <italic>undirected</italic> probabilistic graphical models, and probabilistic generalizations of inductive logic programming. Because of their close relationship with the most popular undirected SRL framework, Markov logic networks, <xref ref-type="fig" rid="F2">Figure 2</xref> also lists <italic>exponential random graph models</italic> under the U-SRL class, even though in terms of historical background and applications, exponential random graphs rather fall into the random graph model category.</p>
<p>Example 3.7. (First-order logic) A logical knowledge base <italic>KB</italic> as exemplified by (1) and (2) in Example 1.6 can be seen as a discriminative model in our sense: for any graph <italic>G</italic> &#x0003D; (<italic>V</italic>, <bold>R</bold>)&#x02208;<inline-formula><mml:math id="M98"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>(<inline-formula><mml:math id="M99"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>) the semantics of the logic defines whether <italic>KB</italic> is true in <italic>G</italic>.<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> To formalize this as a discriminative model we augment the signature <inline-formula><mml:math id="M100"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> with a binary graph label <italic>l</italic><sub><italic>KB</italic></sub>, set <inline-formula><mml:math id="M101"><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow></mml:math></inline-formula> &#x0003D; <inline-formula><mml:math id="M102"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>(<inline-formula><mml:math id="M103"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>), and <italic>P</italic><sub><italic>G</italic></sub>(<italic>l</italic><sub><italic>KB</italic></sub> &#x0003D; 1) &#x0003D; 1 if <italic>KB</italic> is true in <italic>G</italic>, and <italic>P</italic><sub><italic>G</italic></sub>(<italic>l</italic><sub><italic>KB</italic></sub> &#x0003D; 0) &#x0003D; 1, otherwise. Logical inference as described in Example 1.6 then amounts to determining whether for all graphs <italic>G</italic> in which (3) is false one has <italic>P</italic><sub><italic>G</italic></sub>(<italic>l</italic><sub><italic>KB</italic></sub> &#x0003D; 0) &#x0003D; 1. This is a probabilistic rendition of the task of automated theorem proving.</p>
<p>Example 3.8. (Random graph models) Classical random graph models are generative models in our sense for the signature <inline-formula><mml:math id="M104"><mml:mi mathvariant="script">R</mml:mi></mml:math></inline-formula> &#x0003D; {<italic>e</italic>} containing a single edge relation.</p>
<p>The preceding examples show that the quite simple Definition 3.1 is sufficient as a unifying semantic foundation for a large variety of frameworks for reasoning about graphs. Of course, most of these frameworks also (or primarily) are designed for learning models from graph data, but our initial focus here is on modeling and reasoning, before we turn to learning in Section 6.</p>
</sec>
<sec id="s4">
<title>4. Modeling tools</title>
<p>Our Definition 3.1 of a model is completely abstract and does not entail any assumptions or prescriptions about the syntactic form and computational tools for model specification and reasoning. In the following, we consider several key modeling elements that are used across multiple concrete frameworks.</p>
<sec>
<title>4.1. Factorization</title>
<p>We first consider conditionally generative models that for an input graph (<italic>V</italic>, <bold>R</bold><sub><italic>in</italic></sub>) define a probability distribution <italic>P</italic><sub>(<italic>V</italic>,<sub><bold>R</bold></sub><sub><italic>in</italic></sub>)</sub> (in the following abbreviated as <italic>P</italic>) over interpretations <bold>R</bold><sub><italic>out</italic></sub> of output relations. Enumerating <inline-formula><mml:math id="M105"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic>out</italic></sub> as <italic>r</italic><sub>1</sub>, &#x02026;, <italic>r</italic><sub><italic>n</italic></sub>, one can factorize <italic>P</italic> as</p>
<disp-formula id="E12"><label>(6)</label><mml:math id="M106"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x022EF;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Thus, a generative model is decomposed into a product of discriminative models. This factorization at the relation level is often used in SRL models that are based on directed graphical models (Breese et al., <xref ref-type="bibr" rid="B6">1994</xref>; Ngo and Haddawy, <xref ref-type="bibr" rid="B44">1995</xref>; Jaeger, <xref ref-type="bibr" rid="B26">1997</xref>; Laskey and Mahoney, <xref ref-type="bibr" rid="B38">1997</xref>; Friedman et al., <xref ref-type="bibr" rid="B16">1999</xref>; Kersting and De Raedt, <xref ref-type="bibr" rid="B30">2001</xref>; Heckerman et al., <xref ref-type="bibr" rid="B24">2007</xref>; Laskey, <xref ref-type="bibr" rid="B37">2008</xref>).</p>
<p>Individual discriminative factors <italic>P</italic>(<italic>R</italic><sub><italic>k</italic></sub>|<italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic>&#x02212;1</sub>) can further be decomposed into a product of atom probabilities:</p>
<disp-formula id="E13"><label>(7)</label><mml:math id="M107"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mi>P</mml:mi></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mi>R</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We have seen that this factorization is implicitly present in GNNs (4) and shallow embeddings (5). Unlike (6), which is a generally valid application of the chain rule, the factorization (7) represents a quite restrictive assumption that the <italic>R</italic><sub><italic>k</italic></sub>-atoms are conditionally independent given relations <italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic>&#x02212;1</sub>. This leads to some challenges, e.g., for modeling <italic>homophily</italic> properties of <italic>R</italic><sub><italic>k</italic></sub>. A useful strategy to address such limitations of (7) is to include among the <italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic>&#x02212;1</sub> <italic>latent relations</italic> that only serve to induce dependencies among the <italic>R</italic><sub><italic>k</italic></sub>-atoms.</p>
<p>Frameworks that are based on undirected probabilistic models, notably <italic>exponential random graph models</italic> and the closely related <italic>Markov logic networks (MLNs)</italic> (Richardson and Domingos, <xref ref-type="bibr" rid="B51">2006</xref>), decompose the joint distribution <italic>P</italic>(<bold>R</bold><sub><italic>out</italic></sub>) into factors that are defined by <italic>features</italic> <italic>F</italic>(<bold>i</bold>) of node tuples <bold>i</bold>. Examples of such features are the degree of a node <italic>F</italic>(<italic>i</italic>) &#x0003D; |{<italic>j</italic>|<italic>edge</italic>(<italic>i, j</italic>)}|, or, in MLNs, 0,1-valued featured expressed by Boolean formulas over ground atoms, e.g., <italic>F</italic>(<italic>i, j</italic>) &#x0003D; <italic>edge</italic>(<italic>i, j</italic>)&#x02227;<italic>r</italic>(<italic>i</italic>)&#x02227;&#x000AC;<italic>r</italic>(<italic>j</italic>). Every such feature has an arity, and the distribution defined by features <italic>F</italic><sub>1</sub>, &#x02026;, <italic>F</italic><sub><italic>K</italic></sub> then is</p>
<disp-formula id="E14"><label>(8)</label><mml:math id="M108"><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>Z</mml:mi></mml:mfrac><mml:mtext>exp</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>Z</italic> is a normalizing constant, and on the right-hand side we make explicit that the feature is a function of the interpretations <bold>R</bold><sub><italic>out</italic></sub> and <bold>R</bold><sub><italic>in</italic></sub>.</p>
</sec>
<sec>
<title>4.2. Feature construction</title>
<p>Both the basic factors <italic>P</italic>(<italic>R</italic><sub><italic>k</italic></sub>(<bold>i</bold>)|<italic>R</italic><sub>1</sub>, &#x02026;, <italic>R</italic><sub><italic>k</italic>&#x02212;1</sub>, <bold>R</bold><sub><italic>in</italic></sub>) in (7) and <italic>F</italic>(<bold>i</bold>)(<bold>R</bold><sub><italic>out</italic></sub>, <bold>R</bold><sub><italic>in</italic></sub>) in (8) are functions that take as input a graph (<italic>V</italic>, <bold>R</bold>&#x02032;) containing interpretations <bold>R</bold>&#x02032; for some subset <inline-formula><mml:math id="M109"><mml:mrow><mml:msup><mml:mi>&#x0211B;</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>&#x02286;</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x0211B;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mstyle mathsize='140%' displaystyle='true'><mml:mo>&#x0222A;</mml:mo></mml:mstyle><mml:mtext>&#x0200B;</mml:mtext></mml:msup><mml:msub><mml:mi>&#x0211B;</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, and return a mapping of entity tuples <bold>i</bold> into &#x0211D;. Such entity feature functions also lie at the core of many other modeling frameworks than those that use these features inside a probabilistic factorization approach: graph neural networks define a sequence of node feature vectors (a.k.a. embedding or representation vectors). Each component in such a vector is a feature function in our sense. A FOL formula with <italic>k</italic> free variables defines a 0,1-valued feature of arity <italic>k</italic>.</p>
<p>Graph kernels, by definition, are based on feature functions defined on whole graphs, i.e., features of arity zero in our sense. These features, however, often are aggregates of features defined at the single node level (e.g., in the Weisfeiler-Lehman kernel), or <italic>k</italic>-tuple level (e.g., in the graphlet kernel, whose features are closely related to the MLN features).</p>
<p>In many cases, feature functions are nested constructs where complex features are built from simpler ones. An important consideration for feature construction is whether they are used in models for transductive or inductive reasoning tasks. In the latter case the required generalization capabilities of the model imply that all features should be <italic>invariant under isomorphisms</italic>, i.e., <inline-formula><mml:math id="M114"><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x01E7C;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> whenever there is a graph isomorphism from (<italic>V</italic>, <bold>R</bold>&#x02032;) to <inline-formula><mml:math id="M115"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x01E7C;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> that maps <bold>i</bold> to <inline-formula><mml:math id="M116"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula>.</p>
<p><xref ref-type="table" rid="T2">Table 2</xref> summarizes some characteristics of the feature functions used in different frameworks. The column &#x0201C;Arity&#x0201D; indicates whether the framework uses features of graphs (0), nodes (1), or tuples of any arities &#x02265;0. The remaining columns are addressed in the following sub-sections.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Properties of feature functions underlying different frameworks.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Framework</bold></th>
<th valign="top" align="left"><bold>Arity</bold></th>
<th valign="top" align="left"><bold>Initial</bold></th>
<th valign="top" align="left"><bold>Aggregation</bold></th>
<th valign="top" align="left"><bold>Final</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Shallow embedding</td>
<td valign="top" align="left">0,1</td>
<td valign="top" align="left">n/a</td>
<td valign="top" align="left">n/a</td>
<td valign="top" align="left">Shallow</td>
</tr> <tr>
<td valign="top" align="left">Kernel</td>
<td valign="top" align="left">&#x02265;0</td>
<td valign="top" align="left">at</td>
<td valign="top" align="left"><italic>div</italic></td>
<td valign="top" align="left">Deep, shallow</td>
</tr> <tr>
<td valign="top" align="left">MP-GNN</td>
<td valign="top" align="left">0,1</td>
<td valign="top" align="left">id,at</td>
<td valign="top" align="left">Sum, mean, max,&#x02026;</td>
<td valign="top" align="left">Deep</td>
</tr> <tr>
<td valign="top" align="left">R-GNN</td>
<td valign="top" align="left">0,1</td>
<td valign="top" align="left">id,at</td>
<td valign="top" align="left">Sum, mean, max,&#x02026;</td>
<td valign="top" align="left">Sat</td>
</tr> <tr>
<td valign="top" align="left">D-SRL</td>
<td valign="top" align="left">&#x02265;0</td>
<td valign="top" align="left">at,rel</td>
<td valign="top" align="left">Noisy-or, mean,&#x02026;</td>
<td valign="top" align="left">Deep</td>
</tr> <tr>
<td valign="top" align="left">U-SRL</td>
<td valign="top" align="left">&#x02265;0</td>
<td valign="top" align="left">at,rel</td>
<td valign="top" align="left">n/a</td>
<td valign="top" align="left">Shallow</td>
</tr> <tr>
<td valign="top" align="left">I-SRL</td>
<td valign="top" align="left">&#x02265;0</td>
<td valign="top" align="left">at,rel</td>
<td valign="top" align="left">&#x02203;</td>
<td valign="top" align="left">Sat</td>
</tr> <tr>
<td valign="top" align="left">ILP</td>
<td valign="top" align="left">&#x02265;0</td>
<td valign="top" align="left">at,rel</td>
<td valign="top" align="left">&#x02203;</td>
<td valign="top" align="left">Sat</td>
</tr> <tr>
<td valign="top" align="left">FOL</td>
<td valign="top" align="left">&#x02265;0</td>
<td valign="top" align="left">at,rel</td>
<td valign="top" align="left">&#x02203;, &#x02200;</td>
<td valign="top" align="left">Deep</td>
</tr>
<tr>
<td valign="top" align="left">Random graph models</td>
<td valign="top" align="left"><italic>div</italic></td>
<td valign="top" align="left"><italic>div</italic></td>
<td valign="top" align="left"><italic>div</italic></td>
<td valign="top" align="left"><italic>div</italic></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>div</italic>: the framework class is too loosely defined to allow a meaningful entry.</p>
</table-wrap-foot>
</table-wrap>
<sec>
<title>4.2.1. Initial features</title>
<p>All feature constructions start with some initial base features (we note that when here and in the following we talk about &#x0201C;construction,&#x0201D; then this is not meant to imply manual construction; it may very well be automated construction by a learner). Initial features often are node features. Here already important distinctions arise that essentially determine whether the model will have a transductive or inductive use. If one uses <italic>unique node identifiers</italic> as initial features (denoted &#x0201C;id&#x0201D; in the &#x0201C;Initial&#x0201D; column of <xref ref-type="table" rid="T2">Table 2</xref>), then models constructed from these features are not invariant under isomorphisms, and will be limited to transductive reasoning. As an example consider the feature <italic>F</italic>(<italic>i</italic>): &#x0201C;<italic>i</italic> is at most three edges away from node &#x02018;26&#x00027;.&#x0201D; This feature is illustrated by the red coloring of nodes in <xref ref-type="fig" rid="F3">Figure 3A</xref>. While this feature can be useful for one specific graph (e.g., for predicting a node label), it does not generalize in a meaningful way to other graphs, even if they also contain a node with identifier &#x0201C;26.&#x0201D; Node identifiers are mostly used for transductive reasoning with GNNs. They are usually not used in SRL or kernel frameworks.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Initial node features and their use: <bold>(A)</bold> node identifiers; <bold>(B)</bold> node attributes; <bold>(C)</bold> vacuous.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1124718-g0003.tif"/>
</fig>
<p>The most commonly used initial features are <italic>node attributes</italic> (&#x0201C;at&#x0201D; in <xref ref-type="table" rid="T2">Table 2</xref>). For example, if nodes have a color attribute with values &#x0201C;blue&#x0201D; and &#x0201C;yellow,&#x0201D; then with this initial features one can define a feature <italic>F</italic>(<italic>i</italic>): &#x0201C;<italic>i</italic> is at most two edges away from a blue node.&#x0201D; This feature, illustrated by a red coloring in <xref ref-type="fig" rid="F3">Figure 3B</xref>, also applies to other graphs sharing the same signature <inline-formula><mml:math id="M117"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula> containing the blue/yellow attribute (we note that the regular grid structures of the graphs (a) and (b) are for illustrative clarity only; these features are in no way linked to such regular structures).</p>
<p>In some cases, there are no informative initial features used, or available. Formally, this can be expressed by a single <italic>vacuous</italic> node attribute that has a constant value for all nodes. Such a vacuous initial feature can still be the basis for the construction of useful complex features using the construction methods described below. <xref ref-type="fig" rid="F3">Figure 3C</xref> illustrates this for a constructed feature <italic>F</italic>(<italic>i</italic>): &#x0201C;<italic>i</italic> is at most two edges away from a node with degree &#x02265;5.&#x0201D;</p>
<p>Similar to node attributes, also <italic>non unary relations</italic> of <inline-formula><mml:math id="M118"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>&#x02032; can serve as initial features (&#x0201C;rel&#x0201D; in <xref ref-type="table" rid="T2">Table 2</xref>). Most logic-based or SRL frameworks will allow binary initial features, enabling, e.g., the construction of features like <italic>F</italic>(<italic>i, j, k</italic>) &#x0003D; <italic>edge</italic>(<italic>i, j</italic>)&#x02227;<italic>edge</italic>(<italic>i, k</italic>)&#x02227;<italic>edge</italic>(<italic>j, k</italic>) expressing that <italic>i, j, k</italic> form a triangle. Such features are outside the reach of most GNN frameworks, though higher-order GNNs (Morris et al., <xref ref-type="bibr" rid="B42">2019</xref>) overcome this limitation.</p>
<p>A special approach that has been proposed in connection with GNNs is the use of <italic>random node attributes</italic> (Abboud et al., <xref ref-type="bibr" rid="B1">2021</xref>; Sato et al., <xref ref-type="bibr" rid="B54">2021</xref>) as initial features. Such random attributes can serve as a substitute for unique node identifiers. Due to their random nature, however, it does not make sense to construct features based on their absolute value, like the one illustrated in <xref ref-type="fig" rid="F3">Figure 3A</xref>. However, they enable the construction of features based on equalities <italic>rid</italic>(<italic>i</italic>) &#x0003D; <italic>rid</italic>(<italic>j</italic>) (<italic>rid</italic> being the random attribute), which with high probability just encodes identity <italic>i</italic> &#x0003D; <italic>j</italic>, and which is robust with regard to the random actual values. One can then construct features like <italic>F</italic>(<italic>i</italic>): &#x0201C;<italic>i</italic> lies on a cycle of length 5&#x0201D; as &#x0201C;<italic>j</italic> is reachable from <italic>i</italic> in 5 steps, and <italic>rid</italic>(<italic>i</italic>) &#x0003D; <italic>rid</italic>(<italic>j</italic>).&#x0201D; A similar capability for constructing equality-based features is presented by Vignac et al. (<xref ref-type="bibr" rid="B64">2020</xref>). Here the initial features are in fact unique identifiers, but the subsequent constructions are limited in such a way that again absolute values are not used in an informative manner, and invariance under isomorphisms is ensured.</p>
</sec>
<sec>
<title>4.2.2. Construction by aggregation</title>
<p>Complex features are constructed out of simpler ones using simple operations such as Boolean or arithmetic operators, and linear or non-linear transformations. The essential tool for feature construction in graphs, however, is the aggregation of feature values from related entities. The general structure of such a construction can be described as</p>
<disp-formula id="E15"><label>(9)</label><mml:math id="M119"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mi>g</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>j</mml:mi><mml:mo>:</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>F</italic><sub><italic>old</italic></sub>(<bold>i</bold>, <bold>j</bold>) is an already constructed feature, the delimiters |}, {| are used to denote a multiset, &#x003D5;(<bold>i</bold>, <bold>j</bold>) expresses a relationship between <bold>i</bold> and <bold>j</bold>, and <italic>agg</italic> is an aggregation function that maps the multiset of values <italic>F</italic><sub><italic>old</italic></sub>(<bold>i</bold>, <bold>j</bold>) to a single number. For better readability we here omit the common dependence of <italic>F</italic><sub><italic>old</italic></sub> and <italic>F</italic><sub><italic>new</italic></sub> on the input graph (<italic>V</italic>, <bold>R</bold>&#x02032;). As usual, vector-valued features are also covered by considering each vector component as a scalar feature. In the case of GNNs, Equation (9) takes the special form <italic>F</italic><sub><italic>new</italic></sub>(<italic>i</italic>): &#x0003D; <italic>agg</italic>{|<italic>F</italic><sub><italic>old</italic></sub>(<italic>j</italic>)|<italic>j</italic>:<italic>edge</italic>(<italic>i, j</italic>)|}, and is often referred to as a <italic>message passing</italic> operation. Another special form of aggregation used in GNNs is the <italic>readout</italic> operation that aggregates node features into a single graph level feature: <italic>F</italic><sub><italic>readout</italic></sub>(): &#x0003D; <italic>agg</italic>{|<italic>F</italic><sub><italic>node</italic></sub>(<italic>j</italic>)|<italic>j</italic>:<italic>true</italic>(<italic>j</italic>)|}, where <italic>true</italic>(<italic>j</italic>) stands for a tautological condition that holds for all nodes <italic>j</italic>. D-SRL and I-SRL frameworks are particularly flexible with regard to aggregate feature construction by supporting rich classes of relationships &#x003D5;. U-SRL frameworks, on the other hand, do not support the nested construction of features by (9) in general, and are limited to a single sum-aggregation of basic features <italic>F</italic><sub><italic>k</italic></sub> via (8).</p>
<p>Applicable aggregation functions <italic>agg</italic> depend on the type of the feature values. When in a logic-based framework all features are Boolean, then <italic>agg</italic> usually is existential or universal quantification. When in a probabilistic framework all feature values are (conditional) probabilities, then <italic>noisy-or</italic> as the probabilistic counterpart of existential quantification is a common aggregator. When features have unconstrained numeric values, then standard aggregators are <italic>sum, mean, min</italic>, or <italic>max</italic>. <xref ref-type="table" rid="T2">Table 2</xref> lists under &#x0201C;Aggregation&#x0201D; characteristic forms of aggregation functions in different frameworks (n/a for frameworks that do not support nested aggregations).</p>
<p>In (9), we have written the aggregation function as operating on a multiset. In this form one immediately obtains that if <italic>F</italic><sub><italic>old</italic></sub> is invariant under isomorphisms, then so is <italic>F</italic><sub><italic>new</italic></sub>. However, in practice the multiset {|<italic>F</italic><sub><italic>old</italic></sub>(<bold>i</bold>, <bold>j</bold>)|&#x02026;)|} will be stored as a vector. When <italic>agg</italic> then is defined for vector inputs, one has to require that <italic>agg</italic> is <italic>permutation invariant</italic> (which just means that only the multiset content of the vector affects the computed value) in order to ensure invariance under isomorphisms.</p>
</sec>
<sec>
<title>4.2.3. Final features</title>
<p>The inductive feature construction described in the preceding subsections leads to features that have a certain <italic>depth</italic> of nestings of aggregation functions. Often this depth corresponds to a maximal radius of the graph neighborhood of <bold>i</bold> that can affect the value <italic>F</italic>(<bold>i</bold>). This example is the case for features constructed by message passing aggregation in GNNs (but not for <italic>readout</italic> features), and the features of the WLK. Many modeling frameworks only are based on features of a limited depth that are obtained by a fixed sequence of feature constructions. This case is denoted by &#x0201C;deep&#x0201D; in the &#x0201C;Final&#x0201D; column of <xref ref-type="table" rid="T2">Table 2</xref>, whereas &#x0201C;shallow&#x0201D; stands for feature constructions that do not support nested aggregation.</p>
<p>In contrast, I-SRL and R-GNN models are based on an a-priori unbounded sequence of feature constructions that for each input graph (<italic>V</italic>, <bold>R</bold>&#x02032;) proceeds until a saturation point is reached (&#x0201C;sat&#x0201D; in <xref ref-type="table" rid="T2">Table 2</xref>). This enables models which are based on final features that are outside the reach of any fixed depth construction. An example is the &#x0201C;contains cycle&#x0201D; graph feature of Example 1.3.</p>
</sec>
<sec>
<title>4.2.4. Numeric and symbolic representations</title>
<p>In the previous sections we have considered features mostly at an abstract semantic level. For any given semantic feature there can be very different forms of formal representation. We here illustrate different paradigms on a concrete example. Assume that we have a signature containing a single binary <italic>edge</italic> relation, <italic>l</italic> different Boolean node attributes <italic>a</italic><sub>1</sub>, &#x02026;, <italic>a</italic><sub><italic>l</italic></sub>, and a node class attribute <italic>class</italic>. A node classification model may depend on the node feature <italic>F</italic>(<italic>i</italic>) defined as the number of distinct paths of length 2 that lead from <italic>i</italic> to a node <italic>j</italic> for which <italic>a</italic><sub>1</sub>(<italic>j</italic>) is true. The concrete classification model then can be defined as a logistic regression model based on <italic>F</italic>:</p>
<disp-formula id="E16"><label>(10)</label><mml:math id="M120"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>a</mml:mi><mml:mi>F</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>a, b</italic>&#x02208;&#x0211D;, and &#x003C3; denotes the sigmoid function.</p>
<p>In a MP-GNN framework, the construction of <italic>F</italic> and the classification model can be implemented in a two layer network with a generic structure such as</p>
<disp-formula id="E17"><label>(11)</label><mml:math id="M121"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msup><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mi>U</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>a</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>h</mml:mi><mml:mo>:</mml:mo><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:mo>&#x02295;</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mi>a</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msup><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>U</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>h</mml:mi><mml:mo>:</mml:mo><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x02295;</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>W</mml:mi><mml:msup><mml:mi>h</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where <bold>h</bold><sup>(<italic>k</italic>)</sup> are the embedding vectors computed at hidden layer <italic>k</italic>, <bold>a</bold>(<italic>i</italic>) is the attribute vector of <italic>i</italic>, the <italic>U</italic><sup>(<italic>k</italic>)</sup>, <italic>V</italic><sup>(<italic>k</italic>)</sup> and <italic>W</italic> are weight matrices, <bold>b</bold> is a bias vector, &#x02295; denotes vector concatenation, and <italic>relu</italic> is component-wise application of the <italic>relu</italic> activation function.</p>
<p>Finally, <italic><bold>out</bold></italic> is a two-dimensional output vector whose components represent the probabilities for <italic>class</italic>(<italic>i</italic>) &#x0003D; 1 and <italic>class</italic>(<italic>i</italic>) &#x0003D; 0. The embedding vector <bold>h</bold><sup>(2)</sup>(<italic>i</italic>) defines a whole set of (scalar) features. The feature <italic>F</italic>(<italic>i</italic>) can be obtained as the first component of <bold>h</bold><sup>(2)</sup>(<italic>i</italic>) when in both matrices <italic>U</italic><sup>(1)</sup> and <italic>U</italic><sup>(2)</sup> the first row is set to (1, 0, &#x02026;, 0). With a suitable setting of <italic>W</italic> and <bold>b</bold>, <italic><bold>out</bold></italic>(<italic>i</italic>) can then represent the model (10). Clearly, the representational capacity of (11) is not nearly exhausted when used in this manner to implement (10). The architecture (11) would usually encode a model where the output probabilities are a complex function of a multitude of different features encoded in the hidden embedding vectors.</p>
<p>A symbolic representation in a D-SRL framework would take a form like</p>
<disp-formula id="E18"><label>(12)</label><mml:math id="M122"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mi>&#x00023;</mml:mi><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext>_</mml:mtext><mml:mi>neighbors</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mi>sum</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mi>h</mml:mi><mml:mo>:</mml:mo><mml:mi>edge</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mi>sum</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mi>&#x00023;</mml:mi><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>_</mml:mi><mml:mi>neighbors</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mi>h</mml:mi><mml:mo>:</mml:mo><mml:mi>edge</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>class</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x02190;</mml:mo></mml:mtd><mml:mtd><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The first two lines define the feature <italic>F</italic>, while the last implements the classification rule. In contrast to (11), where <italic>F</italic> is defined numerically through the entries in the parameter matrices, it is here defined symbolically in a formal language that combines elements of logic and functional programming languages. The only numeric parameters in the specification are the coefficients <italic>a, b</italic> of the logistic regression model. The representation (12) is more interpretable than (11). However, a new classification model depending on other features than <italic>F</italic> would here require a whole new specification, whereas in (11) this is accomplished simply by a different parameter setting. Also, the symbolic representation will grow in size (and loose in interpretability), when more complex models depending on a large number of features are needed.</p>
<p>Though on different sides of the symbolic/numeric divide, model specifications (11) and (12) are very similar in nature in that they use the same three-step strategy to first define the number of direct <italic>a</italic><sub>1</sub>-neighbors as an auxiliary node feature, then define <italic>F</italic>, and finally the logistic regression model based on <italic>F</italic>. This iterative construction is not supported in U-SRL frameworks. However, Equation (10) can still be implemented in an U-SRL framework using a specification of the form</p>
<disp-formula id="E19"><label>(13)</label><mml:math id="M123"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mi>class</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02227;</mml:mo><mml:mi>edge</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02227;</mml:mo><mml:mi>edge</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02227;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x000AC;</mml:mo><mml:mi>class</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02227;</mml:mo><mml:mi>edge</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02227;</mml:mo><mml:mi>edge</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02227;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>class</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>This is an MLN type specification that defines a model of the form (8) with <italic>K</italic> &#x0003D; 3 features defined by Boolean properties and associated weights. The feature <italic>F</italic><sub><italic>k</italic></sub>(<bold>i</bold>) evaluates to <italic>w</italic><sub><italic>k</italic></sub> if the Boolean property is satisfied by <bold>i</bold>, and otherwise evaluates to zero [<bold>i</bold> &#x0003D; (<italic>i, j, h</italic>) for the first two features, and <bold>i</bold> &#x0003D; (<italic>i</italic>) for the third]. This defines a fully generative model without a separation into feature construction and classification model. However, with suitable setting of the weights <italic>w</italic><sub><italic>k</italic></sub>, the conditional distributions of the <italic>class</italic>(<italic>i</italic>) atoms given full instantiations of the <italic>edge</italic> relation and the node attributes, the model (10) can be obtained.</p>
</sec>
<sec>
<title>4.2.5. Expressivity</title>
<p>The general question of expressivity of feature construction frameworks can be viewed from an <italic>absolute</italic> or <italic>comparative</italic> perspective. From the absolute point of view, one asks whether a feature construction framework is generally able to distinguish different entities, i.e., whether for <bold>i</bold>&#x02208;(<italic>V</italic>, <bold>R</bold>&#x02032;) and <inline-formula><mml:math id="M124"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x01E7C;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> a feature <italic>F</italic> can be constructed with <inline-formula><mml:math id="M125"><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x01E7C;</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02260;</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> (usually only required or desired when <bold>i</bold> and <inline-formula><mml:math id="M126"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula> are not isomorphic). In the context of graph kernels this question was first investigated by G&#x000E4;rtner et al. (<xref ref-type="bibr" rid="B17">2003</xref>), who showed that graph kernels that are maximally expressive in this sense will be computationally intractable due to their implicit ability to solve the subgraph isomorphism problem.</p>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> (adapted from Abboud et al., <xref ref-type="bibr" rid="B1">2021</xref>) gives an example of a graph whose nodes do not have any attributes. The nodes on the three-node cycle are not isomorphic to the nodes on the 4-node cycle. However, features that are constructed starting with the vacuous initial feature using the aggregation mechanisms of WLK or GNNs will not be able to distinguish these nodes. As already mentioned in Section 4.2.1, GNNs with random node attributes as initial features, on the other hand, will be able to distinguish the nodes on the three-cycle from the nodes on the four-cycle. Without the need for random node attributes this distinction also is enabled by most SRL frameworks due to their support for binary feature constructions, which here can be used to first construct Boolean features <italic>k-path</italic>(<italic>i, j</italic>) representing whether there exists a path of length <italic>k</italic> from <italic>i</italic> to <italic>j</italic>, and then distinguish the nodes on the three-cycle by the feature <italic>3-cycle</italic>(<italic>i</italic>): &#x0003D; <italic>3-path</italic>(<italic>i, i</italic>).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Indistinguishable nodes.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1124718-g0004.tif"/>
</fig>
<p>The discriminative capabilities of feature functions largely depend on the discriminative capabilities of the aggregators (9). It has been suggested that in combination with suitable feature transformation functions applied to <italic>F</italic><sub><italic>old</italic></sub> before aggregation, and <italic>F</italic><sub><italic>new</italic></sub> after aggregation, <italic>sum</italic> is a universally expressive aggregation function (Zaheer et al., <xref ref-type="bibr" rid="B71">2017</xref>). As pointed out by Wagstaff et al. (<xref ref-type="bibr" rid="B65">2019</xref>), however, the requirements on the transformation functions are not realistic for actual implementations. See also Jaeger (<xref ref-type="bibr" rid="B29">2022</xref>) for a detailed discussion.</p>
<p>In the comparative view, one asks whether all features that can be constructed in a framework ABC can also be constructed in a framework XYZ. If that is the case, we write ABC&#x0227C;<sub><italic>F</italic></sub> XYZ. We use the subscript <italic>F</italic> here in order to emphasize that this is an expressivity relationship about the feature construction capabilities of the frameworks. Different frameworks use features in a different ways, and therefore ABC&#x0227C;<sub><italic>F</italic></sub>XYZ does not directly imply that every model of ABC also can be represented as a model in XYZ. However, feature expressivity is the most fundamental ingredient for modeling capacity.</p>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> gives a small and simplistic overview of some expressivity relationships between different frameworks. In this overview we gloss over many technical details regarding the exact representatives of larger classes such as MP-GNN for which the relations have been proven, and the fact that in some cases the comparison so far only has been conducted for graphs with a single edge relation (though generalizations to multi-relational graphs seem mostly straightforward). The relationship FOL&#x0227C;<sub><italic>F</italic></sub>D-SRL has been shown in Jaeger (<xref ref-type="bibr" rid="B26">1997</xref>). That MP-GNNs are at least as expressive as the 2-variable fragment of first-order logic with counting quantifiers (2FOLC) is shown by Barcel&#x000F3; et al. (<xref ref-type="bibr" rid="B2">2020</xref>) on the basis of MP-GNNs that allow feature constructions using both message-passing and readout aggregations. The relationship between MP-GNNs and the Weisfeiler-Lehman graph-isomorphism test was demonstrated by Morris et al. (<xref ref-type="bibr" rid="B42">2019</xref>) and Xu et al. (<xref ref-type="bibr" rid="B68">2019</xref>). A detailed account of expressivity of different GNN frameworks and their relationship to Weisfeiler-Lehman tests is given by Sato (<xref ref-type="bibr" rid="B53">2020</xref>). The MP-GNN&#x0227C;<sub><italic>F</italic></sub>D-SRL relation is demonstrated in Jaeger (<xref ref-type="bibr" rid="B29">2022</xref>), and FOL&#x0227C;<sub><italic>F</italic></sub>U-SRL is shown by Richardson and Domingos (<xref ref-type="bibr" rid="B51">2006</xref>). It must be emphasized, however, that these results only pertain to the feature expressivity of the frameworks. The reasoning capabilities of logical deduction in FOL are not provided by the SRL frameworks.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Some expressivity relations.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1124718-g0005.tif"/>
</fig>
<p>The figure also shows that R-GNN and I-SRL are (presumably) incomparable to the other frameworks here shown: their saturation based feature construction, on the one hand, provides expressivity not available to the fixed depth features of the other frameworks. On the other hand, I-SRL has limitations regarding expressing logical features involving negation or universal quantification, and R-GNN cannot aggregate features over a fixed number of steps using different aggregators at each step.</p>
</sec>
</sec>
</sec>
<sec id="s5">
<title>5. Reasoning: tasks and techniques</title>
<p>Given a model in the sense of Definition 3.1, we consider several classes of reasoning tasks. Narrowing down the range of possible reasoning tasks considered in Section 1, we are now only concerned with algorithmic reasoning.</p>
<sec>
<title>5.1. Sampling</title>
<p>The sampling task consists of generating a random graph <inline-formula><mml:math id="M127"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> according to the distribution <inline-formula><mml:math id="M128"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>. Here we consider this sampling problem as a task in itself, which is to be distinguished from sampling as a method for approximately solving other tasks (see below). Sampling as a main reasoning task is mostly considered in the context of graph evolution models, where the comparison of observed statistics in the random sample with real-world graphs is used to validate the evolution model (e.g., Leskovec et al., <xref ref-type="bibr" rid="B39">2007</xref>). A concrete application of sampling from generative models is to create realistic benchmark datasets for other computational tasks on graphs. Bonifati et al. (<xref ref-type="bibr" rid="B5">2020</xref>) give a comprehensive overview of graph generator models in this application context.</p>
<p>Sampling is typically a task for fully generative models. However, not all models that are fully generative in the semantic sense of Section 3 necessarily support efficient sampling procedures. This is the case, in particular, for U-SRL models whose specification (8) does not translate into an operational sampling procedure. D-SRL models following the factorization strategy of (6) and (7), on the other hand, allow for an efficient sampling procedure by successively sampling truth values for ground atoms <italic>R</italic><sub><italic>k</italic></sub>(<bold>i</bold>). The same is true for dedicated graph evolution models, and, with slight modifications, for I-SRL models.</p>
<p>While mostly designed for prediction tasks, MP-GNNs have also been adapted as models for random graph generation (e.g., Li et al., <xref ref-type="bibr" rid="B40">2018</xref>; Simonovsky and Komodakis, <xref ref-type="bibr" rid="B59">2018</xref>; You et al., <xref ref-type="bibr" rid="B69">2018</xref>; Dai et al., <xref ref-type="bibr" rid="B9">2020</xref>). Differently from more traditional graph evolution models, which typically only have a small number of calibration parameters, generative GNNs are highly parameterized and can be fitted to training data sets of graphs with different structural properties.</p>
</sec>
<sec>
<title>5.2. Domain-specific queries</title>
<p>A large class of reasoning tasks falls into the following category of performing inference based on a given model for a single graph under consideration. This covers what in the introduction informally was referred to as reasoning about a single graph, and reasoning about one graph at a time.</p>
<p>Definition 5.1. Let <inline-formula><mml:math id="M129"><mml:mrow><mml:mi mathvariant="script">M</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> be a model. A <italic>domain specific query</italic> is given by</p>
<list list-type="simple">
<list-item><p>(a.) an input graph <inline-formula><mml:math id="M130"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="script">I</mml:mi></mml:math></inline-formula>,</p></list-item>
<list-item><p>(b.) a <italic>query set</italic> <inline-formula><mml:math id="M131"><mml:mrow><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x02286;</mml:mo><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
<list-item><p>(c.) a <italic>query objective</italic> that consists of calculating a property of the set <inline-formula><mml:math id="M132"><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>.</p></list-item>
</list>
<p>Query sets are typically specified by one or several query atoms <italic>r</italic><sub><italic>k</italic><sub>1</sub></sub>(<bold>i</bold><sub>1</sub>), &#x02026;, <italic>r</italic><sub><italic>k</italic><sub><italic>q</italic></sub></sub>(<bold>i</bold><sub><italic>q</italic></sub>), where &#x003C1; then consists of all interpretations <bold>R</bold> in which the query atoms are true. A slightly more general class of queries is obtained when the query atoms can also be negated, with the corresponding condition that the negated atoms are false in <bold>R</bold>. Part (c.) of Definition 5.1 is formulated quite loosely. The most common query objective is to calculate the probability of &#x003C1;:</p>
<disp-formula id="E20"><label>(14)</label><mml:math id="M133"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003C1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x003C1;</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>When &#x003C1; is defined by a list of query atoms we can write for <inline-formula><mml:math id="M134"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> also more intuitively</p>
<disp-formula id="E21"><label>(15)</label><mml:math id="M135"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The prediction tasks of Examples 1.1&#x02013;1.4 all fall into the category of computing (15) for a single query atom. More general <italic>probabilistic inference</italic> is concerned with computing (15) for general lists of (possibly negated) query atoms. Furthermore, we often are interested in conditional queries of the form</p>
<disp-formula id="E22"><label>(16)</label><mml:math id="M136"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>r</italic><sub><italic>k</italic><sub><italic>q</italic>&#x0002B;1</sub></sub>(<bold>i</bold><sub><italic>q</italic>&#x0002B;1</sub>), &#x02026;, <italic>r</italic><sub><italic>k</italic><sub><italic>p</italic></sub></sub>(<bold>i</bold><sub><italic>p</italic></sub>) represents <italic>observed evidence</italic>, and <italic>r</italic><sub><italic>k</italic><sub>1</sub></sub>(<bold>i</bold><sub>1</sub>), &#x02026;, <italic>r</italic><sub><italic>k</italic><sub><italic>q</italic></sub></sub>(<bold>i</bold><sub><italic>q</italic></sub>) represents the uncertain <italic>target</italic> of our inference. If the inference framework permits arbitrary unconditional queries (15), then (16) can be computed from the definition of conditional probabilities as the ratio of two unconditional queries. Thus, Equation (15) already covers most cases of probabilistic inference.</p>
<p>In the most probable genotype problem of Example 3.6 the query set is given by a list of atoms (possibly negated) representing observed genotype data, and the query objective is to compute</p>
<disp-formula id="E23"><label>(17)</label><mml:math id="M137"><mml:mrow><mml:msub><mml:mrow><mml:mi>arg</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x003C1;</mml:mi></mml:mrow></mml:msub><mml:mi>max</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The computational tools and computational complexity for computing a query differ widely according to the underlying modeling framework, and the class of queries it supports. The simplest case is given by specialized prediction models that only support single (class label) atoms <italic>class</italic>(<bold>i</bold>) as queries. When, moreover, the model (as in GNNs) is directly defined by functional expressions for the conditional probabilities <italic>P</italic>(<italic>class</italic>(<bold>i</bold>)|<bold>R</bold><sub><italic>in</italic></sub>), then the computation consists of a single <italic>forward computation</italic> that is typically linear in the size of the input graph <inline-formula><mml:math id="M138"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">in</mml:mtext></mml:mstyle></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and the model specification. Similarly, performing prediction by a kernel-SVM is usually computationally tractable, though the necessary evaluations of the kernel function here add another source of complexity.</p>
<p>Dedicated classification models provide for efficient predictive inference, because for their limited class of supported queries (14) can be evaluated without explicitly performing the summation over <bold>R</bold>&#x02208;&#x003C1;. The situation is very different for SRL models that are designed to support a much richer class of queries defined by arbitrary lists of atoms in the <inline-formula><mml:math id="M139"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic>out</italic></sub> relations, and where the summation in (14) cannot be bypassed. A naive execution of the summation over all <bold>R</bold> is usually infeasible, since the number of <bold>R</bold> one needs to sum over is exponential in the number of unobserved atoms, i.e., atoms that are not instantiated in the input <inline-formula><mml:math id="M140"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula>, or included among the query atoms. The number of such unobserved atoms, in turn, is typically polynomial in |<italic>V</italic>|. For SRL models, the basic strategy to evaluate (14) is to compile the distribution <inline-formula><mml:math id="M141"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and the query &#x003C1; into a standard inference problem in a probabilistic graphical model (e.g., Ngo and Haddawy, <xref ref-type="bibr" rid="B44">1995</xref>; Jaeger, <xref ref-type="bibr" rid="B26">1997</xref>; Richardson and Domingos, <xref ref-type="bibr" rid="B51">2006</xref>), or into a <italic>weighted model counting</italic> problem (e.g., Fierens et al., <xref ref-type="bibr" rid="B15">2011</xref>). These compilation targets are <italic>ground</italic> models in the sense that they are expressed in terms of ground atoms <italic>r</italic>(<bold>i</bold>) as basic random variables. While often effective, they still may exhibit a computational complexity that is exponential in |<italic>V</italic>|. The idea of <italic>lifted inference</italic> is to exploit uniformities and symmetries in the model <inline-formula><mml:math id="M142"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, which may allow us to sum over whole classes of essentially indistinguishable interpretations <bold>R</bold> at once (Poole, <xref ref-type="bibr" rid="B49">2003</xref>; Van den Broeck, <xref ref-type="bibr" rid="B62">2011</xref>). While more scalable than ground inference in some cases, the potential of such lifted techniques is still limited by general intractability results: Jaeger (<xref ref-type="bibr" rid="B28">2000</xref>) shows that inference for single atom queries is exponential in |<italic>V</italic>| in the worst case,<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> if the modeling framework is expressive enough to support FOL features. Van den Broeck and Davis (<xref ref-type="bibr" rid="B61">2012</xref>) obtain a related result under weaker assumptions on the expressiveness of the modeling framework, but for more complex queries.</p>
<p>When exact computation of (14) becomes infeasible, one typically resorts to approximate inference via sampling random <bold>R</bold> according to <inline-formula><mml:math id="M143"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, and taking the empirical frequency of samples that belong to &#x003C1; as an estimate of <inline-formula><mml:math id="M144"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. When directly sampling from <inline-formula><mml:math id="M145"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> is not supported (cf. Section 5.1), then this is performed by a <italic>Markov Chain Monte Carlo</italic> approach where samples follow the distribution <inline-formula><mml:math id="M146"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> only in the limit of the sampling process.</p>
</sec>
<sec>
<title>5.3. Cross-domain queries</title>
<p>The reasoning tasks considered in the previous section arise when a single known domain of entities is the subject of inference. Cross-domain queries in the sense of the following definition capture reasoning tasks that arise when the exact domain is unknown, or one wants to reason about a range of possible domains at once.</p>
<p>Definition 5.2. Let <inline-formula><mml:math id="M147"><mml:mrow><mml:mi mathvariant="script">M</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> be a model. A <italic>cross-domain query</italic> is given by</p>
<list list-type="bullet">
<list-item><p>a set <inline-formula><mml:math id="M148"><mml:mrow><mml:mi mathvariant="script">J</mml:mi></mml:mrow><mml:mo>&#x02286;</mml:mo><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow></mml:math></inline-formula> of input graphs</p></list-item>
<list-item><p>a <italic>query set</italic> <inline-formula><mml:math id="M149"><mml:mi>&#x003C1;</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">J</mml:mi></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, where each <inline-formula><mml:math id="M150"><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x02286;</mml:mo><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
<list-item><p>a <italic>query objective</italic> that consists of calculating a property of the set of sets</p></list-item></list>
<disp-formula id="E24"><label>(18)</label><mml:math id="M151"><mml:mrow><mml:mo>&#x0007B;</mml:mo><mml:mo>&#x0007B;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>R</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x0007D;</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02208;</mml:mo><mml:mi>J</mml:mi><mml:mo>&#x0007D;</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Definition 5.1 now is just the special case |<inline-formula><mml:math id="M152"><mml:mrow><mml:mi mathvariant="script">J</mml:mi></mml:mrow></mml:math></inline-formula>| &#x0003D; 1 of Definition 5.2.</p>
<p>Example 5.3. (Deduction, cont.) Consider a logic knowledge base <italic>KB</italic> as a discriminative model as described in Example 3.7. The question whether <italic>KB</italic> implies a statement &#x003D5; then is a cross-domain query where <inline-formula><mml:math id="M153"><mml:mi mathvariant="script">J</mml:mi></mml:math></inline-formula> contains the set of graphs <inline-formula><mml:math id="M154"><mml:mi>G</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-script">R</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> in which &#x003D5; does not hold, &#x003C1;<sub><italic>G</italic></sub> is the extension of <italic>G</italic> in which <italic>l</italic><sub><italic>KB</italic></sub> &#x0003D; 1, and the query objective is to decide whether <italic>P</italic><sub><italic>G</italic></sub>(&#x003C1;<sub><italic>G</italic></sub>) &#x0003D; 0 for all <italic>G</italic>&#x02208;<inline-formula><mml:math id="M156"><mml:mrow><mml:mi mathvariant="script">J</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
<p>Example 5.4. (Model explanation) Consider a GNN model for the graph classification task described in Example 1.3. This will be a discriminative model that takes fully specified graphs representing molecules as input, and returns a probability distribution over the <italic>mutagenic</italic> graph label as output. An approach to <italic>explain</italic> such a model is to identify for a given target value of the graph label, e.g., <italic>mutagenic</italic> &#x0003D; <italic>true</italic>, the input molecule that leads to the highest probability for that label (which can be interpreted as an ideal prototype for the label&#x02014;according to the model we are trying to explain; Yuan et al., <xref ref-type="bibr" rid="B70">2020</xref>). Finding such an explanation is a cross-domain query in our sense: the set <inline-formula><mml:math id="M157"><mml:mrow><mml:mi mathvariant="script">J</mml:mi></mml:mrow></mml:math></inline-formula> is the set of all possible input molecules (or the set of all molecules with a given size). For all (<italic>V</italic>, <bold>R</bold>)&#x02208;<inline-formula><mml:math id="M158"><mml:mi mathvariant="script">J</mml:mi></mml:math></inline-formula> the query set is the single labeled molecule &#x003C1;<sub>(<italic>V</italic>, <bold>R</bold>)</sub> &#x0003D; {(<italic>V</italic>, <bold>R</bold>, <italic>mutagenic</italic> &#x0003D; <italic>true</italic>)}, and the objective is to find the argmax of (18). This is very similar to the most probable genotype queries considered above, but subtly different: since the underlying model here only is discriminative for the class label, the maximization of (18) does not depend on any prior probabilities for the input graphs (<italic>V</italic>, <bold>R</bold>). Also, what is a cross-domain query for a discriminative model can just be a single domain query for a fully generative model: if here we are looking for an explanation of a fixed size <italic>n</italic>, then we only need to consider the single input domain <italic>V</italic> &#x0003D; [<italic>n</italic>], the query set &#x003C1;<sub>(<italic>V</italic>)</sub> &#x0003D; {([<italic>n</italic>], <bold>R</bold>, <italic>mutagenic</italic> &#x0003D; <italic>true</italic>)|<bold>R</bold>&#x02208;<italic>Int</italic>([<italic>n</italic>])(<inline-formula><mml:math id="M159"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula>)}, and the query objective (17). While on the one hand merely a technical semantic distinction, this can make a significant difference when in the first case the existing framework does not directly support the argmax computation for (18), whereas in the second case the computation of (17) may be supported by native algorithmic tools in the framework.</p>
<p>Example 5.5. (Limit behavior, cont.) For a fully generative model consider <inline-formula><mml:math id="M160"><mml:mrow><mml:mi mathvariant='script'>J</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x02110;</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x02205;</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula>. For each input graph <italic>G</italic><sub><italic>n</italic></sub>: &#x0003D; ([<italic>n</italic>], &#x02205;) let &#x003C1;<sub><italic>n</italic></sub>: &#x0003D; &#x003C1;<sub><italic>G</italic><sub><italic>n</italic></sub></sub> contain the set of graphs with some property of interest, such as being connected, or satisfying a given logic formula. An important cross-domain reasoning task then is characterized by the query objective to determine the existence and value of the limit <inline-formula><mml:math id="M162"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">lim</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>&#x0221E;</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<p>Automated theorem provers provide algorithmic solutions for the reasoning tasks described in Example 5.3. For the analysis of limit probabilities as in Example 5.5 there exist a few theoretical results that puts them into the reach of general automated inference methods (Grandjean, <xref ref-type="bibr" rid="B18">1983</xref>; Jaeger, <xref ref-type="bibr" rid="B27">1998</xref>; Cozman and Mau&#x000E1;, <xref ref-type="bibr" rid="B7">2019</xref>; Koponen and Weitk&#x000E4;mper, <xref ref-type="bibr" rid="B33">2023</xref>). However, these results have not yet been carried over to operational implementations.</p>
</sec>
</sec>
<sec id="s6">
<title>6. Learning: settings and techniques</title>
<p>So far our discussion has largely focused on modeling and reasoning. Our formal definitions in Sections 3 and 5 draw a close link between types of models and the reasoning tasks they support (discriminative, generative, transductive, inductive, &#x02026;). Turning now to the question of how a model is learned from data, a similar close linkage arises between model types and learning scenarios. Based on the unifying probabilistic view of models according to Definition 3.1, different learning scenarios are essentially just distinguished by the structure of the training data, but unified by a common maximum likelihood learning principle.</p>
<p>The class of input structures <inline-formula><mml:math id="M163"><mml:mi mathvariant="script">I</mml:mi></mml:math></inline-formula> of a model <inline-formula><mml:math id="M164"><mml:mrow><mml:mi mathvariant="script">I</mml:mi></mml:mrow></mml:math></inline-formula> &#x0003D; (<inline-formula><mml:math id="M165"><mml:mi mathvariant="script">I</mml:mi></mml:math></inline-formula>, &#x003BC;) characterizes its basic functionality and will be assumed to be fixed a-priori by the learning/reasoning task at hand. What is to be learned is the mapping &#x003BC;. For this one needs training data of the following form.</p>
<p>Definition 6.1. Let <inline-formula><mml:math id="M166"><mml:mrow><mml:mi>&#x02110;</mml:mi><mml:mo>&#x02286;</mml:mo><mml:mover accent='true'><mml:mi>G</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x0221E;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula> be given. A dataset for learning the mapping &#x003BC; consists of a set of examples</p>
<disp-formula id="E25"><label>(19)</label><mml:math id="M169"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M170"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02208;</mml:mo><mml:mi>&#x02110;</mml:mi></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M171"><mml:mrow><mml:msubsup><mml:mrow><mml:mover accent='true'><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mi>R</mml:mi></mml:mstyle><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>&#x02208;</mml:mo><mml:mover accent='true'><mml:mi mathvariant='script'>G</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x0211B;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula> complements <inline-formula><mml:math id="M174"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> in the sense that <inline-formula><mml:math id="M175"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02260;</mml:mo><mml:mo>?</mml:mo><mml:mo>&#x021D2;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>i</mml:mi></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>?</mml:mo></mml:math></inline-formula> (<italic>i</italic> &#x0003D; 1, &#x02026;, <italic>m</italic>).</p>
<p>The unifying learning principle is to find the mapping &#x003BC; that maximizes the log-likelihood</p>
<disp-formula id="E26"><label>(20)</label><mml:math id="M176"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mi>&#x003BC;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>In practice, the pure likelihood (20) will often be modified by regularization terms or prior probabilities, which, to simplify matters, we do not consider here. Furthermore, some learning objectives, such as the max-margin objective in learning a kernel-SVM, are not based on the log-likelihood as the central element at all. However, as the following examples show, Equation (20) still covers a fairly wide range of learning approaches.</p>
<p>Example 6.2. (Node classification with GNN) Consider <inline-formula><mml:math id="M177"><mml:mi mathvariant="script">R</mml:mi></mml:math></inline-formula> consisting of a single binary <italic>edge</italic> relation, node attributes <italic>a</italic><sub>1</sub>, &#x02026;, <italic>a</italic><sub><italic>k</italic></sub>, and a node label <italic>l</italic>. For training a discriminative model with <inline-formula><mml:math id="M178"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic>in</italic></sub> &#x0003D; {<italic>edge, a</italic><sub>1</sub>, &#x02026;, <italic>a</italic><sub><italic>k</italic></sub>}, and <inline-formula><mml:math id="M179"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic>out</italic></sub> &#x0003D; {<italic>l</italic>}, the training examples consist of complete input graphs (<italic>V</italic><sub><italic>n</italic></sub>, <bold>R</bold><sub><italic>n</italic></sub>)&#x02208;<inline-formula><mml:math id="M180"><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:math></inline-formula>(&#x0003C; &#x0221E;, <inline-formula><mml:math id="M181"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic>in</italic></sub>), and <inline-formula><mml:math id="M182"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> is a partial interpretation <inline-formula><mml:math id="M183"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of <italic>l</italic>. In the transductive setting, <italic>N</italic> &#x0003D; 1, and (<italic>V</italic><sub>1</sub>, <bold>R</bold><sub>1</sub>) is equal to the single input graph for which the model is defined. Under the factorization (7) the log likelihood (20) then becomes</p>
<disp-formula id="E27"><mml:math id="M184"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>:</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>L</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02260;</mml:mo><mml:mo>?</mml:mo></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle><mml:mi>&#x003BC;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>L</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The usual training objective for GNNs of minimizing the <italic>log-loss</italic> is equivalent to maximizing this log-likelihood.</p>
<p>Example 6.3. (Generative models from incomplete data) The discriminative learning scenario of Example 6.2 requires training data in which input relations <inline-formula><mml:math id="M185"><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:math></inline-formula><sub><italic>in</italic></sub> are completely observed. When also some of the attributes <italic>a</italic><sub><italic>i</italic></sub>, and maybe the <italic>edge</italic> relation are only incompletely observed in the training data, then no valid input structure for a discriminative model is given. However, no matter which relations are fully or partially observed, a partial interpretation <inline-formula><mml:math id="M186"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can always be written as a valid training example</p>
<disp-formula id="E28"><label>(21)</label><mml:math id="M187"><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02205;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula>
<p>for a fully generative model. Without any assumptions on the factorization of the distributions <italic>P</italic><sub><italic>n</italic></sub>, the log-likelihood now takes the general form</p>
<disp-formula id="E29"><mml:math id="M188"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mi>&#x003BC;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02205;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mtext>&#x000A0;</mml:mtext></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent='true'><mml:mi>R</mml:mi><mml:mo stretchy='true'>&#x002DC;</mml:mo></mml:mover></mml:mrow><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mi>&#x0211B;</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mi>&#x003BC;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02205;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>While always well-defined, this likelihood may be intractable for optimization. An explicit summation over all <inline-formula><mml:math id="M189"><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is almost always infeasible. When optimization of the <italic>complete data likelihood</italic> &#x003BC;(<italic>V</italic><sub><italic>n</italic></sub>, &#x02205;)(<bold>R</bold>) for <inline-formula><mml:math id="M190"><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mstyle class="text"><mml:mtext class="textit" mathvariant="italic">Int</mml:mtext></mml:mstyle><mml:mi>V</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is tractable, then the <italic>expectation-maximization</italic> strategy can be applied, where one iteratively imputes expected completions <bold>R</bold><sub><italic>n</italic></sub> for the incomplete observations <bold>R</bold><sub><italic>n</italic></sub> under a current model &#x003BC;, and then optimizes &#x003BC; under the likelihood induced by this complete data. For U-SRL models, even the complete data likelihood usually is intractable due to its dependence on the <italic>partition function</italic> <italic>Z</italic> &#x0003D; <italic>Z</italic>(&#x003BC;) in (8). In this case the true likelihood may be approximated by a <italic>pseudo-likelihood</italic> (Besag, <xref ref-type="bibr" rid="B3">1975</xref>; Richardson and Domingos, <xref ref-type="bibr" rid="B51">2006</xref>).</p>
<p>Example 6.4. (Learning logic theories) Consider now the case of a logical framework where a model is a knowledge base <italic>KB</italic> that we consider as a discriminative model in the sense of Example 3.7. Learning logical theories or concepts is usually framed in terms of learning from positive and negative examples, where the example data can consist of full interpretations, logical statements, or even proofs (De Raedt, <xref ref-type="bibr" rid="B10">1997</xref>). The learning from interpretations setting fits most closely our general data and learning setup: examples then are fully observed graphs <italic>G</italic><sub><italic>n</italic></sub> &#x0003D; (<italic>V</italic><sub><italic>n</italic></sub>, <bold>R</bold><sub><italic>n</italic></sub>) together with a label <inline-formula><mml:math id="M191"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>R</mml:mi></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, and optimizing (20) amounts to finding a knowledge base <italic>KB</italic> such that <italic>KB</italic> is true in all <italic>G</italic><sub><italic>n</italic></sub> with <italic>L</italic><sub><italic>n</italic></sub> &#x0003D; 1, and false for <italic>G</italic><sub><italic>n</italic></sub> with <italic>L</italic><sub><italic>n</italic></sub> &#x0003D; 0, i.e., the standard objective in logic concept learning.</p>
<sec>
<title>6.1. Numeric and symbolic optimization</title>
<p>Examples 6.22&#x02013;6.4 present a uniform perspective on learning in very different frameworks. However, the required techniques for solving the likelihood optimization problem are quite diverse. Corresponding to the combination of symbolic and numeric representation elements of a modeling framework, the learning problem decomposes into a <italic>structure learning</italic> part for the symbolic representation, and a <italic>parameter learning</italic> part for the numeric elements. Since the numeric parameterization usually depends on the chosen structure, this can lead to a nested optimization in which structure learning is performed in an outer loop that contains parameter learning as an inner loop. Structure learning amounts to search in a potentially infinite combinatorial space. Parameter learning, on the other hand, typically is reduced to the optimization of a differentiable objective function, for which powerful gradient-based methods are available.</p>
<p>The (empirical) fact that numeric optimization of parameters is somewhat easier than combinatorial optimization of symbolic structure favors frameworks that are primarily numeric, notably GNNs. As illustrated in Section 4.2.4, feature constructions that in other frameworks require symbolic specifications (12), (13) here are encoded in numeric parameter matrices (11). As a result, learning that in symbolic representations requires a search over symbolic representations, here is accomplished by numeric optimization. However, even GNNs are not completely devoid of &#x0201C;symbolic&#x0201D; representation elements: the neural network architecture is a component of the model specification in a discrete design space, and finding the best architecture via <italic>neural architecture search</italic> (Elsken et al., <xref ref-type="bibr" rid="B12">2019</xref>) leads to optimization problems in discrete, combinatorial search spaces that have much in common with structure learning in more manifestly symbolic frameworks.</p>
<p>While presenting a harder optimization task for machine learning, symbolic, structural parts of a model may also be supplied manually by domain experts, thus reducing the machine learning task to the optimization of the numeric parameters. Many SRL frameworks, so far, rely to a greater or lesser extent on such a humans-in-the-loop scenario.</p>
</sec>
</sec>
<sec id="s7">
<title>7. Integration</title>
<p>In view of the complementary benefits of symbolic and numeric models, it is natural to aim for combinations of both elements that optimally exploit the strengths of each. Emphasizing neural network frameworks as the prime representatives for numeric approaches, these efforts are currently mostly pursued under the name of <italic>neuro-symbolic integration</italic> (Sarker et al., <xref ref-type="bibr" rid="B52">2021</xref>). An important example in the context of this review is the integration of the I-SRL ProbLog framework with deep neural networks (not specifically GNNs). The underlying philosophy for the proposed <italic>DeepProbLog</italic> framework (Manhaeve et al., <xref ref-type="bibr" rid="B41">2018</xref>) is that neural frameworks excel at solving low-level &#x0201C;perceptual&#x0201D; reasoning tasks, whereas symbolic frameworks support higher-level reasoning. The integration therefore consists of two layers, where the lower (neural) layer provides inputs to the higher (symbolic) layer.</p>
<p>The unifying perspective on model semantics and model structure we have developed in Sections 3 and 4 gives rise to more homogeneous integration perspectives: a conditionally generative model that is constructed via the factorization (6), (7) consists of individual discriminative models (7) for each relation <italic>R</italic><sub><italic>k</italic></sub> as building blocks. In principle, different such building blocks can be constructed in different frameworks, and combined into a single model via (6). Moreover, this approach will be consistent in the sense that if all constructions of the component discriminative models use (20) as the objective, then the combination of these objectives is equivalent to the maximizing the overall log-likelihood &#x003BC;(<italic>V</italic><sub><italic>n</italic></sub>, <bold>R</bold><sub><italic>in</italic></sub>)(<bold>R</bold><sub><italic>out</italic></sub>) directly for the resulting conditional generative model. Here we deliberately speak of &#x0201C;constructing&#x0201D; rather than &#x0201C;learning&#x0201D; component models in order to emphasize the possibility that some model components may be built by manual design, whereas others can be learned from data.</p>
<p>Piecing together a combined model from heterogeneous model components only is useful when the resulting model then can be used to perform inference tasks. This will limit the reasoning capabilities to tasks that can be broken down into a combination of tasks supported by the component models. A possible alternative is to compile all individual components into a representation in a common framework with high expressivity and flexible reasoning capabilities. As shown in Jaeger (<xref ref-type="bibr" rid="B29">2022</xref>), quite general GNN architectures for representing discriminative models can be compiled into an RBN representation, and integrated as components into a bigger generative model. While theoretically sound, it is still an open question whether this approach is practically feasible for GNN models of the size needed to obtain high accuracy on their specialized discriminative tasks.</p></sec>
<sec sec-type="conclusions" id="s8">
<title>8. Conclusion</title>
<p>We have given a broad overview over modeling, reasoning, and learning with graphs. The main objective of this review was to view the area from a broader perspective than more specialized existing surveys, while at the same time developing a coherent conceptual framework that emphasizes the commonalities between very diverse approaches to dealing with graph data. Our central definitions of models, reasoning types and learning tasks cover a wide range of different frameworks and approaches. While the uniformity we thereby obtain sometimes is a bit contrived (notably by casting logical concepts in probabilistic terms), it still may be useful to elucidate the common ground among quite disparate traditions and approaches, to provide a basis for further theoretical (comparative) analyses, and to facilitate the combination and integration of different frameworks.</p>
</sec>
<sec sec-type="author-contributions" id="s9">
<title>Author contributions</title>
<p>The author confirms being the sole contributor of this work and has approved it for publication.</p>
</sec>
</body>
<back>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>In logic terminology, these graphs would be called the models of <italic>KB</italic>. This is in direct conflict with our terminology, where rather the knowledge base is considered a model; therefore we avoid to use the term &#x0201C;model&#x0201D; in the sense of logic.</p></fn>
<fn id="fn0002"><p><sup>2</sup>Subject to the complexity-theoretic assumption that NETIME&#x02260;ETIME.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abboud</surname> <given-names>R.</given-names></name> <name><surname>Ceylan</surname> <given-names>I. I.</given-names></name> <name><surname>Grohe</surname> <given-names>M.</given-names></name> <name><surname>Lukasiewicz</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;The surprising power of graph neural networks with random node initialization,&#x0201D;</article-title> in <source>Proceedings of IJCAI 2021</source>. <pub-id pub-id-type="doi">10.24963/ijcai.2021/291</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barcel&#x000F3;</surname> <given-names>P.</given-names></name> <name><surname>Kostylev</surname> <given-names>E.</given-names></name> <name><surname>Monet</surname> <given-names>M.</given-names></name> <name><surname>P&#x000E9;rez</surname> <given-names>J.</given-names></name> <name><surname>Reutter</surname> <given-names>J.</given-names></name> <name><surname>Silva</surname> <given-names>J.-P.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;The logical expressiveness of graph neural networks,&#x0201D;</article-title> in <source>8th International Conference on Learning Representations (ICLR 2020)</source>.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besag</surname> <given-names>J.</given-names></name></person-group> (<year>1975</year>). <article-title>Statistical analysis of non-lattice data</article-title>. <source>J. R. Stat. Soc. Ser. D</source> <volume>24</volume>, <fpage>179</fpage>&#x02013;<lpage>195</lpage>. <pub-id pub-id-type="doi">10.2307/2987782</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blockeel</surname> <given-names>H.</given-names></name> <name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>1998</year>). <article-title>Top-down induction of first-order logical decision trees</article-title>. <source>Artif. Intell.</source> <volume>101</volume>, <fpage>285</fpage>&#x02013;<lpage>297</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(98)00034-4</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonifati</surname> <given-names>A.</given-names></name> <name><surname>Holubov&#x000E1;</surname> <given-names>I.</given-names></name> <name><surname>Prat-P&#x000E9;rez</surname> <given-names>A.</given-names></name> <name><surname>Sakr</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Graph generators: state of the art and open challenges</article-title>. <source>ACM Comput. Surveys</source> <volume>53</volume>, <fpage>1</fpage>&#x02013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1145/3379445</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breese</surname> <given-names>J. S.</given-names></name> <name><surname>Goldman</surname> <given-names>R. P.</given-names></name> <name><surname>Wellman</surname> <given-names>M. P.</given-names></name></person-group> (<year>1994</year>). <article-title>Introduction to the special section on knowledge-based construction of probabilistic decision models</article-title>. <source>IEEE Trans. Syst. Man Cybern</source>. <volume>24</volume>, <fpage>1580</fpage>&#x02013;<lpage>1592</lpage>. <pub-id pub-id-type="doi">10.1109/21.328909</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cozman</surname> <given-names>F. G.</given-names></name> <name><surname>Mau&#x000E1;</surname> <given-names>D. D.</given-names></name></person-group> (<year>2019</year>). <article-title>The finite model theory of Bayesian network specifications: Descriptive complexity and zero/one laws</article-title>. <source>Int. J. Approx. Reason</source>. <volume>110</volume>, <fpage>107</fpage>&#x02013;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijar.2019.04.003</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Kozareva</surname> <given-names>Z.</given-names></name> <name><surname>Dai</surname> <given-names>B.</given-names></name> <name><surname>Smola</surname> <given-names>A.</given-names></name> <name><surname>Song</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Learning steady-states of iterative algorithms over graphs,&#x0201D;</article-title> in <source>International Conference on Machine Learning (PMLR)</source>, 1106&#x02013;1114. Available online at: <ext-link ext-link-type="uri" xlink:href="https://proceedings.mlr.press/v80/dai18a.html">https://proceedings.mlr.press/v80/dai18a.html</ext-link></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Nazi</surname> <given-names>A.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Dai</surname> <given-names>B.</given-names></name> <name><surname>Schuurmans</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Scalable deep generative modeling for sparse graphs,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source>, <fpage>2302</fpage>&#x02013;<lpage>2312</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>1997</year>). <article-title>Logical settings for concept-learning</article-title>. <source>Artif. Intell</source>. <volume>95</volume>, <fpage>187</fpage>&#x02013;<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(97)00041-6</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Raedt</surname> <given-names>L.</given-names></name> <name><surname>Kimmig</surname> <given-names>A.</given-names></name> <name><surname>Toivonen</surname> <given-names>H.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Problog: a probabilistic prolog and its application in link discovery,&#x0201D;</article-title> in <source>IJCAI, Vol. 7</source>, <fpage>2462</fpage>&#x02013;<lpage>2467</lpage>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elsken</surname> <given-names>T.</given-names></name> <name><surname>Metzen</surname> <given-names>J. H.</given-names></name> <name><surname>Hutter</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Neural architecture search: a survey</article-title>. <source>J. Mach. Learn. Res</source>. <volume>20</volume>, <fpage>1997</fpage>&#x02013;<lpage>2017</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-05318-5_3</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Erd&#x00151;s</surname> <given-names>P.</given-names></name> <name><surname>R&#x000E9;nyi</surname> <given-names>A.</given-names></name></person-group> (<year>1960</year>). <article-title>On the evolution of random graphs</article-title>. <source>Publ. Math. Inst. Hung. Acad. Sci</source>. <volume>5</volume>, <fpage>17</fpage>&#x02013;<lpage>60</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fagin</surname> <given-names>R.</given-names></name></person-group> (<year>1976</year>). <article-title>Probabilities on finite models</article-title>. <source>J. Symb. Logic</source> <volume>41</volume>, <fpage>50</fpage>&#x02013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.2307/2272945</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fierens</surname> <given-names>D.</given-names></name> <name><surname>den Broeck</surname> <given-names>G. V.</given-names></name> <name><surname>Thon</surname> <given-names>I.</given-names></name> <name><surname>Gutmann</surname> <given-names>B.</given-names></name> <name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;Inference in probabilistic logic programs using weighted CNF&#x00027;s,&#x0201D;</article-title> in <source>Proceedings of UAI 2011</source>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friedman</surname> <given-names>N.</given-names></name> <name><surname>Getoor</surname> <given-names>L.</given-names></name> <name><surname>Koller</surname> <given-names>D.</given-names></name> <name><surname>Pfeffer</surname> <given-names>A.</given-names></name></person-group> (<year>1999</year>). <article-title>&#x0201C;Learning probabilistic relational models,&#x0201D;</article-title> in <source>Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99)</source>.</citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>G&#x000E4;rtner</surname> <given-names>T.</given-names></name> <name><surname>Flach</surname> <given-names>P.</given-names></name> <name><surname>Wrobel</surname> <given-names>S.</given-names></name></person-group> (<year>2003</year>). <article-title>&#x0201C;On graph kernels: hardness results and efficient alternatives,&#x0201D;</article-title> in <source>Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003</source> (<publisher-loc>Springer</publisher-loc>), <fpage>129</fpage>&#x02013;<lpage>143</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-45167-9_11</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grandjean</surname> <given-names>E.</given-names></name></person-group> (<year>1983</year>). <article-title>Complexity of the first-order theory of almost all finite strucutures</article-title>. <source>Inform. Control</source> <volume>57</volume>, <fpage>180</fpage>&#x02013;<lpage>204</lpage>. <pub-id pub-id-type="doi">10.1016/S0019-9958(83)80043-6</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grover</surname> <given-names>A.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;node2vec: scalable feature learning for networks,&#x0201D;</article-title> in <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, 855&#x02013;864. <pub-id pub-id-type="doi">10.1145/2939672.2939754</pub-id><pub-id pub-id-type="pmid">27853626</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halpern</surname> <given-names>J. Y.</given-names></name> <name><surname>Vardi</surname> <given-names>M. Y.</given-names></name></person-group> (<year>1991</year>). <article-title>&#x0201C;Model checking vs. theorem proving: a manifesto,&#x0201D;</article-title> in <source>Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy</source>, 151&#x02013;176. <pub-id pub-id-type="doi">10.1016/B978-0-12-450010-5.50015-3</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hamilton</surname> <given-names>W. L.</given-names></name></person-group> (<year>2020</year>). <source>Graph Representation Learning, Vol. 46 of Synthesis Lectures on Artifical Intelligence and Machine Learning</source>. <publisher-loc>Morgan &#x00026; Claypool Publishers.</publisher-loc> <pub-id pub-id-type="doi">10.1007/978-3-031-01588-5</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hamilton</surname> <given-names>W. L.</given-names></name> <name><surname>Ying</surname> <given-names>Z.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Inductive representation learning on large graphs,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017</source> (<publisher-loc>Long Beach, CA</publisher-loc>), <fpage>1024</fpage>&#x02013;<lpage>1034</lpage>.<pub-id pub-id-type="pmid">34111002</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Harrison</surname> <given-names>J.</given-names></name></person-group> (<year>1996</year>). <article-title>&#x0201C;HOL light: A tutorial introduction,&#x0201D;</article-title> in <source>International Conference on Formal Methods in Computer-Aided Design</source> (<publisher-loc>Springer</publisher-loc>), <fpage>265</fpage>&#x02013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1007/BFb0031814</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Heckerman</surname> <given-names>D.</given-names></name> <name><surname>Meek</surname> <given-names>C.</given-names></name> <name><surname>Koller</surname> <given-names>D.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Probabilistic entity-relationship models, PRMs, and plate models,&#x0201D;</article-title> in <source>Introduction to Statistical Relational Learning</source>, eds L. Getoor and B. Taskar (<publisher-name>MIT Press</publisher-name>). <pub-id pub-id-type="doi">10.7551/mitpress/7432.003.0009</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holland</surname> <given-names>P. W.</given-names></name> <name><surname>Laskey</surname> <given-names>K. B.</given-names></name> <name><surname>Leinhardt</surname> <given-names>S.</given-names></name></person-group> (<year>1983</year>). <article-title>Stochastic blockmodels: first steps</article-title>. <source>Soc. Netw</source>. <volume>5</volume>, <fpage>109</fpage>&#x02013;<lpage>137</lpage>. <pub-id pub-id-type="doi">10.1016/0378-8733(83)90021-7</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>M.</given-names></name></person-group> (<year>1997</year>). <article-title>&#x0201C;Relational Bayesian networks,&#x0201D;</article-title> in <source>Proceedings of the 13th Conference of Uncertainty in Artificial Intelligence (UAI-13)</source>, eds D. Geiger and P. P. Shenoy (<publisher-loc>Providence, RI</publisher-loc>: <publisher-name>Morgan Kaufmann</publisher-name>), <fpage>266</fpage>&#x02013;<lpage>273</lpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>M.</given-names></name></person-group> (<year>1998</year>). <article-title>&#x0201C;Convergence results for relational Bayesian networks,&#x0201D;</article-title> in <source>Proceedings of the 13th Annual IEEE Symposium on Logic in Computer Science (LICS-98)</source>, ed V. Pratt (Indianapolis, IN: IEEE Technical Committee on Mathematical Foundations of Computing; IEEE Computer Society Press), <fpage>44</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1109/LICS.1998.705642</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>M.</given-names></name></person-group> (<year>2000</year>). <article-title>On the complexity of inference about probabilistic relational models</article-title>. <source>Artif. Intell</source>. <volume>117</volume>, <fpage>297</fpage>&#x02013;<lpage>308</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(99)00109-5</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Learning and reasoning with graph data: neural and statistical-relational approaches,&#x0201D;</article-title> in <source>International Research School in Artificial Intelligence in Bergen (AIB 2022), Vol. 99 of Open Access Series in Informatics (OASIcs)</source>, eds C. Bourgaux, A. Ozaki, and R. Pe naloza (Schloss Dagstuhl-Leibniz-Zentrum f&#x000FC;r Informatik), <volume>5</volume>:<fpage>1</fpage>&#x02013;<lpage>5</lpage>:42.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kersting</surname> <given-names>K.</given-names></name> <name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>&#x0201C;Towards combining inductive logic programming and Bayesian networks,&#x0201D;</article-title> in <source>Proceedings of the Eleventh International Conference on Inductive Logic Programming (ILP-2001)</source>. <pub-id pub-id-type="doi">10.1007/3-540-44797-0_10</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>T. N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Variational graph auto-encoders</article-title>. <source>arXiv [Preprint].</source> arXiv: 1611.07308. <pub-id pub-id-type="doi">10.48550/arXiv.1611.07308</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>T. N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Semi-supervised classification with graph convolutional networks,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum?id=SJU4ayYgl">https://openreview.net/forum?id=SJU4ayYgl</ext-link></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koponen</surname> <given-names>V.</given-names></name> <name><surname>Weitk&#x000E4;mper</surname> <given-names>F.</given-names></name></person-group> (<year>2023</year>). <article-title>Asymptotic elimination of partially continuous aggregation functions in directed graphical models</article-title>. <source>Infm. Comput.</source> <volume>293</volume>, <fpage>105061</fpage>. <pub-id pub-id-type="doi">10.1016/j.ic.2023.105061</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koren</surname> <given-names>Y.</given-names></name> <name><surname>Bell</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Advances in collaborative filtering,&#x0201D;</article-title> in <source>Recommender Systems Handbook</source>, eds F. Ricci, L. Roakach, and B. Shapira (Boston, MA: Springer), <fpage>77</fpage>&#x02013;<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4899-7637-6_3</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kriege</surname> <given-names>N. M.</given-names></name> <name><surname>Johansson</surname> <given-names>F. D.</given-names></name> <name><surname>Morris</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>A survey on graph kernels</article-title>. <source>Appl. Netw. Sci</source>. <volume>5</volume>, <fpage>1</fpage>&#x02013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1007/s41109-019-0195-3</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname> <given-names>A.</given-names></name> <name><surname>Singh</surname> <given-names>S. S.</given-names></name> <name><surname>Singh</surname> <given-names>K.</given-names></name> <name><surname>Biswas</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>Link prediction techniques, applications, and performance: a survey</article-title>. <source>Phys. A Stat. Mech. Appl</source>. <volume>553</volume>:<fpage>124289</fpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2020.124289</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laskey</surname> <given-names>K. B.</given-names></name></person-group> (<year>2008</year>). <article-title>MEBN: a language for first-order Bayesian knowledge bases</article-title>. <source>Artif. Intell</source>. <volume>172</volume>, <fpage>140</fpage>&#x02013;<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1016/j.artint.2007.09.006</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Laskey</surname> <given-names>K. B.</given-names></name> <name><surname>Mahoney</surname> <given-names>S. M.</given-names></name></person-group> (<year>1997</year>). <article-title>&#x0201C;Network fragments: representing knowledge for constructing probabilistic models,&#x0201D;</article-title> in <source>Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI-97)</source> (<publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Morgan Kaufmann Publishers</publisher-name>), <fpage>334</fpage>&#x02013;<lpage>341</lpage>.<pub-id pub-id-type="pmid">33093586</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leskovec</surname> <given-names>J.</given-names></name> <name><surname>Kleinberg</surname> <given-names>J.</given-names></name> <name><surname>Faloutsos</surname> <given-names>C.</given-names></name></person-group> (<year>2007</year>). <article-title>Graph evolution: densification and shrinking diameters</article-title>. <source>ACM Trans. Knowl. Discov Data</source> <volume>1</volume>:<fpage>2</fpage>-es. <pub-id pub-id-type="doi">10.1145/1217299.1217301</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Vinyals</surname> <given-names>O.</given-names></name> <name><surname>Dyer</surname> <given-names>C.</given-names></name> <name><surname>Pascanu</surname> <given-names>R.</given-names></name> <name><surname>Battaglia</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning deep generative models of graphs</article-title>. <source>arXiv [Preprint].</source> arXiv: 1803.03324. <pub-id pub-id-type="doi">10.48550/arXiv.1803.03324</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Manhaeve</surname> <given-names>R.</given-names></name> <name><surname>Dumancic</surname> <given-names>S.</given-names></name> <name><surname>Kimmig</surname> <given-names>A.</given-names></name> <name><surname>Demeester</surname> <given-names>T.</given-names></name> <name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;DeepProbLog: Neural probabilistic logic programming,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 31.</source> p. 3749&#x02013;3759. Available online at: <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper_files/paper/2018/file/dc5d637ed5e62c36ecb73b654b05ba2a-Paper.pdf">https://proceedings.neurips.cc/paper_files/paper/2018/file/dc5d637ed5e62c36ecb73b654b05ba2a-Paper.pdf</ext-link><pub-id pub-id-type="pmid">33693375</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morris</surname> <given-names>C.</given-names></name> <name><surname>Ritzert</surname> <given-names>M.</given-names></name> <name><surname>Fey</surname> <given-names>M.</given-names></name> <name><surname>Hamilton</surname> <given-names>W. L.</given-names></name> <name><surname>Lenssen</surname> <given-names>J. E.</given-names></name> <name><surname>Rattan</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>&#x0201C;Weisfeiler and leman go neural: Higher-order graph neural networks,&#x0201D;</article-title> in <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>, 4602&#x02013;4609. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33014602</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muggleton</surname> <given-names>S.</given-names></name></person-group> (<year>1995</year>). <article-title>Inverse entailment and progol</article-title>. <source>New Generat. Comput.</source> <volume>13</volume>, <fpage>245</fpage>&#x02013;<lpage>286</lpage>.</citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ngo</surname> <given-names>L.</given-names></name> <name><surname>Haddawy</surname> <given-names>P.</given-names></name></person-group> (<year>1995</year>). <article-title>&#x0201C;Probabilistic logic programming and Bayesian networks,&#x0201D;</article-title> in <source>Algorithms, Concurrency and Knowledge (Proceedings ACSC95), Springer Lecture Notes in Computer Science 1023</source>, 286&#x02013;300. <pub-id pub-id-type="doi">10.1007/3-540-60688-2_51</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niepert</surname> <given-names>M.</given-names></name> <name><surname>Ahmed</surname> <given-names>M.</given-names></name> <name><surname>Kutzkov</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Learning convolutional neural networks for graphs,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source>, <fpage>2014</fpage>&#x02013;<lpage>2023</lpage>.</citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>L.</given-names></name> <name><surname>Shi</surname> <given-names>C.</given-names></name> <name><surname>Dokmani&#x00107;</surname> <given-names>I.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Neural link prediction with walk pooling,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perozzi</surname> <given-names>B.</given-names></name> <name><surname>Al-Rfou</surname> <given-names>R.</given-names></name> <name><surname>Skiena</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;Deepwalk: online learning of social representations,&#x0201D;</article-title> in <source>Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, 701&#x02013;710. <pub-id pub-id-type="doi">10.1145/2623330.2623732</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poole</surname> <given-names>D.</given-names></name></person-group> (<year>1997</year>). <article-title>The independent choice logic for modelling multiple agents under uncertainty</article-title>. <source>Artif. Intell.</source> <volume>94</volume>, <fpage>7</fpage>&#x02013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(97)00027-1</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poole</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>&#x0201C;First-order probabilistic inference,&#x0201D;</article-title> in <source>Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03)</source>.</citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname> <given-names>J. R.</given-names></name> <name><surname>Cameron-Jones</surname> <given-names>R. M.</given-names></name></person-group> (<year>1993</year>). <article-title>&#x0201C;FOIL: A midterm report,&#x0201D;</article-title> in <source>Machine Learning: ECML-93: European Conference on Machine Learning Vienna, Austria, April</source> 5&#x02013;7, <italic>1993 Proceedings 6</italic> (Springer), <fpage>1</fpage>&#x02013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1007/3-540-56602-3_124</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Richardson</surname> <given-names>M.</given-names></name> <name><surname>Domingos</surname> <given-names>P.</given-names></name></person-group> (<year>2006</year>). <article-title>Markov logic networks</article-title>. <source>Mach. Learn</source>. <volume>62</volume>, <fpage>107</fpage>&#x02013;<lpage>136</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-006-5833-1</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sarker</surname> <given-names>M. K.</given-names></name> <name><surname>Zhou</surname> <given-names>L.</given-names></name> <name><surname>Eberhart</surname> <given-names>A.</given-names></name> <name><surname>Hitzler</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>Neuro-symbolic artificial intelligence: current trends</article-title>. <source>arXiv preprint arXiv:2105.05330</source>. <pub-id pub-id-type="doi">10.3233/AIC-210084</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sato</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>A survey on the expressive power of graph neural networks</article-title>. <source>arXiv preprint arXiv:2003.04078</source>.</citation>
</ref>
<ref id="B54">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sato</surname> <given-names>R.</given-names></name> <name><surname>Yamada</surname> <given-names>M.</given-names></name> <name><surname>Kashima</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Random features strengthen graph neural networks,&#x0201D;</article-title> in <source>Proceedings of the 2021 SIAM International Conference on Data Mining (SDM)</source> (<publisher-loc>SIAM</publisher-loc>), <fpage>333</fpage>&#x02013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611976700.38</pub-id><pub-id pub-id-type="pmid">20953245</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sato</surname> <given-names>T.</given-names></name></person-group> (<year>1995</year>). <article-title>&#x0201C;A statistical learning method for logic programs with distribution semantics,&#x0201D;</article-title> in <source>Proceedings of the 12th International Conference on Logic Programming (ICLP&#x00027;95)</source>. p. <fpage>715</fpage>&#x02013;<lpage>729</lpage>.</citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scarselli</surname> <given-names>F.</given-names></name> <name><surname>Gori</surname> <given-names>M.</given-names></name> <name><surname>Tsoi</surname> <given-names>A. C.</given-names></name> <name><surname>Hagenbuchner</surname> <given-names>M.</given-names></name> <name><surname>Monfardini</surname> <given-names>G.</given-names></name></person-group> (<year>2008</year>). <article-title>The graph neural network model</article-title>. <source>IEEE Trans. Neur. Netw.</source> <volume>20</volume>, <fpage>61</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1109/TNN.2008.2005605</pub-id><pub-id pub-id-type="pmid">19068426</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shervashidze</surname> <given-names>N.</given-names></name> <name><surname>Vishwanathan</surname> <given-names>S.</given-names></name> <name><surname>Petri</surname> <given-names>T.</given-names></name> <name><surname>Mehlhorn</surname> <given-names>K.</given-names></name> <name><surname>Borgwardt</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Efficient graphlet kernels for large graph comparison,&#x0201D;</article-title> in <source>Artificial Intelligence and Statistics</source>, <fpage>488</fpage>&#x02013;<lpage>495</lpage>.</citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shervashidze</surname> <given-names>N.</given-names></name> <name><surname>Schweitzer</surname> <given-names>P.</given-names></name> <name><surname>Van Leeuwen</surname> <given-names>E. J.</given-names></name> <name><surname>Mehlhorn</surname> <given-names>K.</given-names></name> <name><surname>Borgwardt</surname> <given-names>K. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Weisfeiler-lehman graph kernels</article-title>. <source>J. Mach. Learn. Res</source>. <volume>12</volume>, <fpage>2539</fpage>&#x02013;<lpage>2561</lpage>. <pub-id pub-id-type="doi">10.5555/1953048.2078187</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Simonovsky</surname> <given-names>M.</given-names></name> <name><surname>Komodakis</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Graphvae: towards generation of small graphs using variational autoencoders,&#x0201D;</article-title> in <source>International Conference on Artificial Neural Networks</source> (<publisher-loc>Springer</publisher-loc>), <fpage>412</fpage>&#x02013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-01418-6_41</pub-id><pub-id pub-id-type="pmid">27534393</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srinivasan</surname> <given-names>A.</given-names></name> <name><surname>Muggleton</surname> <given-names>S. H.</given-names></name> <name><surname>Sternberg</surname> <given-names>M. J.</given-names></name> <name><surname>King</surname> <given-names>R. D.</given-names></name></person-group> (<year>1996</year>). <article-title>Theories for mutagenicity: a study in first-order and feature-based induction</article-title>. <source>Artif. Intell</source>. <volume>85</volume>, <fpage>277</fpage>&#x02013;<lpage>299</lpage>. <pub-id pub-id-type="doi">10.1016/0004-3702(95)00122-0</pub-id></citation>
</ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van den Broeck</surname> <given-names>G.</given-names></name> <name><surname>Davis</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Conditioning in first-order knowledge compilation and lifted probabilistic inference,&#x0201D;</article-title> in <source>Twenty-Sixth AAAI Conference on Artificial Intelligence</source>.</citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van den Broeck</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;On the completeness of first-order knowledge compilation for lifted probabilistic inference,&#x0201D;</article-title> in <source>Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS)</source>.</citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Veli&#x0010D;kovi&#x00107;</surname> <given-names>P.</given-names></name> <name><surname>Cucurull</surname> <given-names>G.</given-names></name> <name><surname>Casanova</surname> <given-names>A.</given-names></name> <name><surname>Romero</surname> <given-names>A.</given-names></name> <name><surname>Li&#x000F2;</surname> <given-names>P.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Graph attention networks,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source>.</citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vignac</surname> <given-names>C.</given-names></name> <name><surname>Loukas</surname> <given-names>A.</given-names></name> <name><surname>Frossard</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Building powerful and equivariant graph neural networks with structural message-passing,&#x0201D;</article-title> in <source>NeurIPS</source>.</citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wagstaff</surname> <given-names>E.</given-names></name> <name><surname>Fuchs</surname> <given-names>F.</given-names></name> <name><surname>Engelcke</surname> <given-names>M.</given-names></name> <name><surname>Posner</surname> <given-names>I.</given-names></name> <name><surname>Osborne</surname> <given-names>M. A.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;On the limitations of representing functions on sets,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source>, <fpage>6487</fpage>&#x02013;<lpage>6494</lpage>.</citation>
</ref>
<ref id="B66">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Weidenbach</surname> <given-names>C.</given-names></name> <name><surname>Dimova</surname> <given-names>D.</given-names></name> <name><surname>Fietzke</surname> <given-names>A.</given-names></name> <name><surname>Kumar</surname> <given-names>R.</given-names></name> <name><surname>Suda</surname> <given-names>M.</given-names></name> <name><surname>Wischnewski</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Spass version 3.5,&#x0201D;</article-title> in <source>International Conference on Automated Deduction</source> (<publisher-loc>Springer</publisher-loc>), <fpage>140</fpage>&#x02013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-02959-2_10</pub-id></citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Welling</surname> <given-names>M.</given-names></name> <name><surname>Kipf</surname> <given-names>T. N.</given-names></name></person-group> (<year>2017</year>). Semi-supervised classification with graph convolutional networks.</citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>K.</given-names></name> <name><surname>Hu</surname> <given-names>W.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name> <name><surname>Jegelka</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;How powerful are graph neural networks?,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source>.</citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>J.</given-names></name> <name><surname>Ying</surname> <given-names>R.</given-names></name> <name><surname>Ren</surname> <given-names>X.</given-names></name> <name><surname>Hamilton</surname> <given-names>W.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;GraphRNN: generating realistic graphs with deep auto-regressive models,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source>, <fpage>5708</fpage>&#x02013;<lpage>5717</lpage>.</citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>H.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Ji</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;XGNN: towards model-level explanations of graph neural networks,&#x0201D;</article-title> in <source>Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery</source> &#x00026; <italic>Data Mining</italic>, 430&#x02013;438. <pub-id pub-id-type="doi">10.1145/3394486.3403085</pub-id></citation>
</ref>
<ref id="B71">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zaheer</surname> <given-names>M.</given-names></name> <name><surname>Kottur</surname> <given-names>S.</given-names></name> <name><surname>Ravanbakhsh</surname> <given-names>S.</given-names></name> <name><surname>Poczos</surname> <given-names>B.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R. R.</given-names></name> <name><surname>Smola</surname> <given-names>A. J.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Deep sets,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 30</source>, eds I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (<publisher-name>Curran Associates, Inc.</publisher-name>).</citation>
</ref>
</ref-list>
</back>
</article> 