<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2019.00002</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Representation Learning for Social Network Analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Tan</surname> <given-names>Qiaoyu</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/708642/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Ninghao</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/687695/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Hu</surname> <given-names>Xia</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/277895/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Computer Science and Engineering, Texas A&#x00026;M University</institution>, <addr-line>College Station, TX</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Yuxiao Dong, Microsoft Research, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Chuan-Ju Wang, Academia Sinica, Taiwan; Donghui Hu, Hefei University of Technology, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Xia Hu <email>xiahu&#x00040;tamu.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big Data</p></fn></author-notes>
<pub-date pub-type="epub">
<day>03</day>
<month>04</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>2</volume>
<elocation-id>2</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>12</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>03</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2019 Tan, Liu and Hu.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Tan, Liu and Hu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Social network analysis is an important problem in data mining. A fundamental step for analyzing social networks is to encode network data into low-dimensional representations, i.e., network embeddings, so that the network topology structure and other attribute information can be effectively preserved. Network representation leaning facilitates further applications such as classification, link prediction, anomaly detection, and clustering. In addition, techniques based on deep neural networks have attracted great interests over the past a few years. In this survey, we conduct a comprehensive review of the current literature in network representation learning, utilizing neural network models. First, we introduce the basic models for learning node representations in homogeneous networks. We will also introduce some extensions of the base models, tackling more complex scenarios such as analyzing attributed networks, heterogeneous networks, and dynamic networks. We then introduce techniques for embedding subgraphs and also present the applications of network representation learning. Finally, we discuss some promising research directions for future work.</p></abstract>
<kwd-group>
<kwd>deep learning</kwd>
<kwd>social networks</kwd>
<kwd>deep social network analysis</kwd>
<kwd>representation learning</kwd>
<kwd>network embedding</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="0"/>
<equation-count count="11"/>
<ref-count count="84"/>
<page-count count="10"/>
<word-count count="8387"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Social networks, such as Facebook, Twitter, and LinkedIn, have greatly facilitated communication between web users around the world. The analysis of social networks helps summarizing the interests and opinions of users (nodes), discovering patterns from the interactions (links) between users, and mining the events that take place in online platforms. The information obtained by analyzing social networks could be especially valuable for many applications. Some typical examples include online advertisement targeting (Li et al., <xref ref-type="bibr" rid="B43">2015</xref>), personalized recommendation (Song et al., <xref ref-type="bibr" rid="B66">2006</xref>), viral marketing (Leskovec et al., <xref ref-type="bibr" rid="B39">2007</xref>; Chen et al., <xref ref-type="bibr" rid="B16">2010</xref>), social healthcare (Tang and Yang, <xref ref-type="bibr" rid="B71">2012</xref>), social influence analysis (Peng et al., <xref ref-type="bibr" rid="B59">2017</xref>), and academic network analysis (Dietz et al., <xref ref-type="bibr" rid="B19">2007</xref>; Guo et al., <xref ref-type="bibr" rid="B28">2014</xref>).</p>
<p>One central problem in social network analysis is how to extract useful features from non-Euclidean structured networks, to enable the deployment of downstream machine learning prediction models for specific analysis. For example, in the case of recommending new friends to a user in a social network, the key challenge might be how to embed network users into a low-dimensional space so that the closeness between users could be easily measured with distance metrics. To process structure information in networks, most previous efforts mainly rely on hand-crafted features, such as kernel functions (Vishwanathan et al., <xref ref-type="bibr" rid="B75">2010</xref>), graph statistics (i.e., degrees or clustering coefficients) (Bhagat et al., <xref ref-type="bibr" rid="B8">2011</xref>), or other carefully engineered features (Liben-Nowell and Kleinberg, <xref ref-type="bibr" rid="B46">2007</xref>). However, such feature engineering processes could be very time-consuming and expensive, rendering it ineffective for many real-world applications. An alternative way to avoid this limitation, is to automatically learn feature representations that capture various information sources in networks (Bengio et al., <xref ref-type="bibr" rid="B6">2013</xref>; Liao et al., <xref ref-type="bibr" rid="B45">2018</xref>). The goal is to learn a transformation function that maps nodes, subgraphs, or even the whole network as vectors to a low-dimensional feature space, where the spatial relations between the vectors reflect the structures or contents in the original network. Given these feature vectors, subsequent machine learning models such as classification models, clustering models and outlier detection models could be directly used toward target applications.</p>
<p>Along with the substantial performance improvement gained by deep learning on image recognition, text mining, and natural language processing tasks (Bengio, <xref ref-type="bibr" rid="B5">2009</xref>), developing network representation methods using neural network models have received increased attention in recent years. In this review, we provide a comprehensive overview of recent advancements in network representation learning, using neural network models. After introducing the notations and problem definitions, we first review the basic representation learning models for node embedding in homogeneous networks. Specifically, based on the type of representation generation modules, we divide the existing approaches into three categories: embedding look-up based, autoencoder based and graph convolution based. We then provide an overview of the approaches that learn representations for subgraphs in networks, which to some extent rely on the techniques of node representation learning. After that, we list some applications of network representation models. Finally, we discuss some promising research directions for future work.</p></sec>
<sec id="s2">
<title>2. Notations and Problem Definitions</title>
<p>In this section, we define some important terminologies that will be used in later sections, and then provide the formal definition of the network representation learning problem. In general, we use boldface uppercase letters (e.g., <bold>A</bold>) to denote matrices, boldface lowercase letters (e.g., <bold>a</bold>) to denote vectors, and lowercase letters (e.g., <italic>a</italic>) to denote scalars. The (<italic>i, j</italic>) entry, the <italic>i</italic>-th row and the <italic>j</italic>-th column of a matrix <bold>A</bold> is denoted as <bold>A</bold><sub><italic>ij</italic></sub>, <bold>A</bold><sub><italic>i</italic>&#x0002A;</sub>, and <bold>A</bold><sub>&#x0002A;<italic>j</italic></sub>, respectively.</p>
<p><italic>Definition 1 (Network)</italic>. Let <inline-formula><mml:math id="M1"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> be a network, where the <italic>i</italic>-th node (or vertex) is denoted as <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M3"><mml:msub><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow></mml:math></inline-formula> denotes the edge between node <italic>v</italic><sub><italic>i</italic></sub> and <italic>v</italic><sub><italic>j</italic></sub>. <bold>X</bold> and <bold>Y</bold> are node attributes and labels, if available. Besides, we let <bold>A</bold> &#x02208; &#x0211D;<sup><italic>N</italic>&#x000D7;<italic>N</italic></sup> denote the associated adjacency matrix of <inline-formula><mml:math id="M4"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow></mml:math></inline-formula>. <bold>A</bold><sub><italic>ij</italic></sub> is the weight of <italic>e</italic><sub><italic>i, j</italic></sub>, where <bold>A</bold><sub><italic>ij</italic></sub> &#x0003E; 0 indicates that the two nodes are connected, and otherwise <bold>A</bold><sub><italic>ij</italic></sub> &#x0003D; 0. For undirected graphs, <bold>A</bold><sub><italic>ij</italic></sub> &#x0003D; <bold>A</bold><sub><italic>ji</italic></sub>.</p>
<p>In many scenarios, the nodes and edges in <inline-formula><mml:math id="M5"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow></mml:math></inline-formula> can also be associated with the type information. Let <inline-formula><mml:math id="M6"><mml:msub><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:msup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> be a node-type mapping function and <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:msup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> be an edge-type mapping function, where <italic>T</italic><sup><italic>v</italic></sup> and <italic>T</italic><sup><italic>e</italic></sup> denote the set of node and edge types, respectively. Here, each node <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:math></inline-formula> has one specific type, e.g., <inline-formula><mml:math id="M9"><mml:msub><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Similarly, for each edge <italic>e</italic><sub><italic>ij</italic></sub>, <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
<p><italic>Definition 2 (Homogeneous Network)</italic>. A homogeneous network is a network in which |<italic>T</italic><sup><italic>v</italic></sup>| &#x0003D; |<italic>T</italic><sup><italic>e</italic></sup>| &#x0003D; 1. All nodes and edges in <italic>G</italic> belong to one single type.</p>
<p><italic>Definition 3 (Heterogeneous Network)</italic>. A heterogeneous network is a network with |<italic>T</italic><sup><italic>v</italic></sup>| &#x0002B; |<italic>T</italic><sup><italic>e</italic></sup>| &#x0003E; 2. There are at least two different types of nodes or edges in heterogeneous networks.</p>
<p>Given a network <inline-formula><mml:math id="M11"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow></mml:math></inline-formula>, the task of network representation learning is to train a mapping function <italic>f</italic> that maps certain components in <inline-formula><mml:math id="M12"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow></mml:math></inline-formula>, such as nodes or subgraphs, into a latent space. Let <italic>D</italic> be the dimension of the latent space and usually <inline-formula><mml:math id="M13"><mml:mi>D</mml:mi><mml:mo>&#x0226A;</mml:mo><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>|</mml:mo></mml:math></inline-formula>. In this work, we focus on the problem of node representation learning and subgraph representation learning.</p>
<p><italic>Definition 4 (Node Representation Learning)</italic>. Suppose <bold>z</bold> &#x02208; &#x0211D;<sup><italic>D</italic></sup> denotes the latent vector of node <italic>v</italic>, node representation learning aims to build a mapping function <italic>f</italic> so that <bold>z</bold> &#x0003D; <italic>f</italic>(<italic>v</italic>). It is expected that nodes with similar roles or characteristics, which are defined according to specific application domains, are mapped close to each other in the latent space.</p>
<p><italic>Definition 5 (Subgraph Representation Learning)</italic>. Let <italic>g</italic> denote a subgraph of <inline-formula><mml:math id="M14"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:mrow></mml:math></inline-formula>. The nodes and edges in <italic>g</italic> are denoted as <inline-formula><mml:math id="M15"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M16"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, respectively, and we have <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02282;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M18"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02282;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow></mml:math></inline-formula>. The subgraph representation learning aims to learn a mapping function <italic>f</italic> so that <bold>z</bold> &#x0003D; <italic>f</italic>(<italic>g</italic>), where in this case <bold>z</bold> &#x02208; &#x0211D;<sup><italic>D</italic></sup> corresponds to the latent vector of <italic>g</italic>.</p>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> shows a toy example of network embedding. There are three subgraphs in this network distinguished with different colors: <inline-formula><mml:math id="M19"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M20"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M21"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. Given a network as input, the example below generates one representation for each node, as well as for each of the three subgraphs.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>A toy example of node representation learning and subgraph representation learning (best viewed in color). There are three subgraphs in the input network denoted by different colors. The target of node embedding is to generate one representation for each individual node, while subgraph embedding is to learn one representation for an entire subgraph.</p></caption>
<graphic xlink:href="fdata-02-00002-g0001.tif"/>
</fig></sec>
<sec id="s3">
<title>3. Neural Network Based Models</title>
<p>It has been demonstrated that neural networks have powerful capabilities in capturing complex patterns in data, and have achieved substantial success in the fields of computer vision, audio recognition, and natural language processing, etc. Recently, some efforts have been made to extend neural network models to learn representations from network data. Based on the type of base neural networks that are applied, we categorize them into three subgroups: look-up table based models, autoencoder based models, and GCN based models. In this section, we first give an overview of network representation learning from the perspective of <italic>encoding</italic> and <italic>decoding</italic>. We then discuss the details of some well-known network embedding models and how they fulfill the two steps. In this section, we only discuss representation learning for nodes. The models dealing with subgraphs will be introduced in later sections.</p>
<sec>
<title>3.1. Framework Overview From the Encoder-Decoder Perspective</title>
<p>In order to elaborate the diversity of various neural network architectures, we argue that different techniques can be derived from the aspect of <italic>encoding</italic> and <italic>decoding</italic> schema, as well as their <italic>target network structure</italic> constrained for low dimensional feature space. Specifically, existing methods can be reduced to solving the following optimization problem:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M22"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mtext>&#x003A8;</mml:mtext></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003D5;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mtext>&#x003A6;</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mtext>&#x003A8;</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003A6;<sub><italic>tar</italic></sub> is the target relations that the embedding algorithm expects to preserve, and <inline-formula><mml:math id="M23"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the nodes involved in &#x003D5;. <inline-formula><mml:math id="M24"><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the <italic>encoding</italic> function that maps nodes into representation vectors, and &#x003C8;<sub><italic>dec</italic></sub> is a decoding function that reconstructs the original network structure from the representation space. &#x003A8; denotes the trainable parameters in encoders and decoders. By minimizing the loss function above, model parameters are trained so that the desired network structures &#x003A8;<sub><italic>tar</italic></sub> are preserved. As we will show in subsequent sections, from the overview framework aspect, the primary distinctions between various network representation methods rely on how they define the three components.</p></sec>
<sec>
<title>3.2. Models With Embedding Look-Up Tables</title>
<p>Instead of using multiple layers of nonlinear transformations, network representation learning could be achieved simply by using look-up tables which directly map a node index into its corresponding representation vector. Specifically, a look-up table could be implemented using a matrix, where each row corresponds to the representation of one node. The diversity of different models mainly lies in the definition of target relations in the network data that we hope to preserve. In the rest of this subsection, we will first introduce DeepWalk (Perozzi et al., <xref ref-type="bibr" rid="B61">2014</xref>) to discuss the basic concepts and techniques in network embedding, and then extend the discussion to more complex and practical scenarios.</p>
<sec>
<title>3.2.1. Skip-Gram Based Models</title>
<p>As a pioneering network representation model, DeepWalk treats nodes as words, samples random walks as sentences and utilizes the skip-gram model (Mikolov et al., <xref ref-type="bibr" rid="B53">2013</xref>) to learn the representations of nodes as shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. In this case, the encoder &#x003C8;<sub><italic>enc</italic></sub> is implemented as two embedding look-up tables <bold>Z</bold> &#x02208; &#x0211D;<sup><italic>N</italic>&#x000D7;<italic>D</italic></sup> and <bold>Z</bold><sup><italic>c</italic></sup> &#x02208; &#x0211D;<sup><italic>N</italic>&#x000D7;<italic>D</italic></sup>, respectively for target embeddings and context embeddings. The network information &#x003D5; &#x02208; &#x003A6;<sub><italic>tar</italic></sub> that we try to preserve is defined as the node-context pairs <inline-formula><mml:math id="M25"><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> observed in the random walks, where <inline-formula><mml:math id="M26"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the context nodes (or neighborhood) of <italic>v</italic><sub><italic>i</italic></sub>. The objective is to maximize the probability of observing a node&#x00027;s neighborhood conditioned on embeddings:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M27"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mo class="qopname">log</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <bold>e</bold><sub><italic>i</italic></sub> is a one-hot row vector of length <italic>N</italic> that picks the <italic>i</italic>-th row of <bold>Z</bold>. Let <bold>z</bold><sub><italic>i</italic></sub> &#x0003D; <bold>e</bold><sub><italic>i</italic></sub><bold>Z</bold> and <inline-formula><mml:math id="M28"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, the conditional probability above is formulated as</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M29"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:msubsup></mml:mstyle><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>so that &#x003C8;<sub><italic>dec</italic></sub> could be regarded as link reconstruction based on the normalized proximity between different nodes. In practice, the computation of the probability is expensive due to the summation over every node in the network, but hierarchical softmax or negative sampling can be applied to reduce time complexity.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Building blocks of models with embedding look-up tables. There are two key components of these works: <italic>sampling</italic> and <italic>modeling</italic>. The primary distinction between different methods under this line relies on how to define the two components.</p></caption>
<graphic xlink:href="fdata-02-00002-g0002.tif"/>
</fig>
<p>There are also some approaches that are developed based on similar ideas. LINE (Tang et al., <xref ref-type="bibr" rid="B69">2015</xref>) defines the first-order and second-order proximity for learning node embedding, where the latter can be seen as a special case of DeepWalk with context window length set as 1. Meanwhile, node2vec (Grover and Leskovec, <xref ref-type="bibr" rid="B26">2016</xref>) applies different random walk strategies, which provides a trade-off between breadth-first search (BFS) and depth-first search (DFS) in networks search strategies. Planetoid (Yang et al., <xref ref-type="bibr" rid="B80">2016</xref>) extends skip-gram models for semi-supervised learning, which predicts the class label of nodes along with the context in the input network data. In addition, it has been shown that there exists a close relationship between skip-gram models and matrix factorization algorithms (Levy and Goldberg, <xref ref-type="bibr" rid="B40">2014</xref>; Qiu et al., <xref ref-type="bibr" rid="B62">2018</xref>). Therefore, network embedding models that utilize matrix factorization techniques, such as LE (Belkin and Niyogi, <xref ref-type="bibr" rid="B4">2002</xref>), Grarep (Cao et al., <xref ref-type="bibr" rid="B13">2015</xref>), and HOPE (Ou et al., <xref ref-type="bibr" rid="B58">2016</xref>), may also be implemented in the similar manner. Random sampling-based approaches have the capacity to allow a flexible and stochastic measure of node similarity, making them not only achieve higher performance in many applications, but also become more scalable toward large-scale datasets.</p></sec>
<sec>
<title>3.2.2. Attributed Network Embedding Models</title>
<p>Social networks are rich in side information, where nodes could be associated with various attributes that characterize their properties. Inspired by the idea of inductive matrix completion (Natarajan and Dhillon, <xref ref-type="bibr" rid="B56">2014</xref>), TADW (Yang et al., <xref ref-type="bibr" rid="B79">2015</xref>) extends the framework of DeepWalk by incorporating features of vertices into network representation learning. Besides sampling from plain networks, FeatWalk (Huang et al., <xref ref-type="bibr" rid="B34">2019</xref>) proposes a novel feature-based random walk strategy to generate node sequences by considering node similarity on attributes. With the random walks based on both topological and attribute information, the skip-gram model is then applied to learn node representations.</p></sec>
<sec>
<title>3.2.3. Heterogeneous Network Embedding Models</title>
<p>Nodes in networks could be of different types, which poses the challenge of how to preserve relations among them. HERec (Shi et al., <xref ref-type="bibr" rid="B64">2019</xref>) and metapath2vec&#x0002B;&#x0002B; (Dong et al., <xref ref-type="bibr" rid="B22">2017</xref>) propose meta-path based random walk schema to discover the context across different types of nodes. The skip-gram architecture in metapath2vec&#x0002B;&#x0002B; is also modified, so that the normalization term in softmax only considers nodes of the same type. In a more complex scenario where we have both nodes and attributes of different types, HNE (Chang et al., <xref ref-type="bibr" rid="B15">2015</xref>) combines feed-forward neural networks and embedding models toward a unified framework. Suppose <bold>z</bold><sup><italic>a</italic></sup> and <bold>z</bold><sup><italic>b</italic></sup> denote the latent vectors of two different types of nodes, HNE defines two additional transformation matrices <bold>U</bold> and <bold>V</bold> to respectively map <bold>z</bold><sup><italic>a</italic></sup> and <bold>z</bold><sup><italic>b</italic></sup> to the joint space. Let <inline-formula><mml:math id="M30"><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M31"><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, intra-type node similarity and inter-type node similarity are defined as</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M32"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>s</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msubsup><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msubsup><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>s</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msubsup><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msubsup><mml:mstyle mathvariant="bold"><mml:mtext>V</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>s</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msubsup><mml:mstyle mathvariant="bold"><mml:mtext>V</mml:mtext></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msubsup><mml:mstyle mathvariant="bold"><mml:mtext>V</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where we hope to preserve various types of similarities during training. As for obtaining <bold>z</bold><sup><italic>a</italic></sup> and <bold>z</bold><sup><italic>b</italic></sup>, HNE applies different feed-forward neural networks to map raw input (e.g., images and texts) to latent spaces, thus enabling an end-to-end training framework. Specifically, the authors use a CNN to process images and a fully-connected neural network to process texts.</p></sec>
<sec>
<title>3.2.4. Dynamic Embedding Models</title>
<p>Real world social networks are not static and will evolve over time with addition/deletion of nodes and links. To deal with this challenge, DNE (Du L. et al., <xref ref-type="bibr" rid="B23">2018</xref>) presents a decomposable objective to learn the representation of each node separately, where the impact of network changes on existing nodes is measurable and greatly affected nodes will be chosen for updates as the learning process proceeds. In addition, DANE (Li J. et al., <xref ref-type="bibr" rid="B42">2017</xref>) leverages matrix perturbation theory to tackle online embedding updates.</p></sec></sec>
<sec>
<title>3.3. Autoencoder Techniques</title>
<p>In this section, we discuss network representation models based on the autoencoder architecture (Hinton and Salakhutdinov, <xref ref-type="bibr" rid="B32">2006</xref>; Bengio et al., <xref ref-type="bibr" rid="B6">2013</xref>). As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, an autoencoder consists of two neural network modules: encoder and decoder. The encoder &#x003C8;<sub><italic>enc</italic></sub> maps the features of each node into a latent space, and the decoder &#x003C8;<sub><italic>doc</italic></sub> reconstructs the information about the network from the latent space. Usually the hidden representation layer has a smaller size than that of the input/output layer, forcing it to create a compressed representation that captures the non-linear structure of network. Formally, following Equation (1), the objective function of autoencoder is to minimize the reconstruction error between the input and the output decoded from low-dimensional representations.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>An example of autoencoder-based network representation algorithms. Rows of the proximity matrix <inline-formula><mml:math id="M33"><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x000D7;</mml:mo><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> are fed into the autoencoder to learn and generate embeddings <inline-formula><mml:math id="M34"><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x000D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> at the hidden layer.</p></caption>
<graphic xlink:href="fdata-02-00002-g0003.tif"/>
</fig>
<sec>
<title>3.3.1. Deep Neural Graph Representation (DNGR)</title>
<p>DNGR (Cao et al., <xref ref-type="bibr" rid="B14">2016</xref>) attempts to preserve a node&#x00027;s local neighborhood information using a stacked denoising autoencoder. Specifically, assume <bold>S</bold> is the PPMI matrix (Bullinaria and Levy, <xref ref-type="bibr" rid="B11">2007</xref>) constructed from <bold>A</bold>, then DNGR minimizes the following loss:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M35"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M36"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> denotes the associated neighborhood information of <italic>v</italic><sub><italic>i</italic></sub>. In this case, &#x003A6;<sub><italic>tar</italic></sub> &#x0003D; {<sub><bold>S</bold><sub><italic>i</italic>&#x0002A;</sub>}<italic>v</italic><sub><italic>i</italic></sub>&#x02208;<italic>V</italic></sub> and DNSR targets to reconstruct the PPMI matrix. <bold>z</bold><sub><italic>i</italic></sub> is the embedding of node <italic>v</italic><sub><italic>i</italic></sub> in the hidden layer.</p></sec>
<sec>
<title>3.3.2. Structural Deep Network Embedding (SDNE)</title>
<p>SDNE (Wang et al., <xref ref-type="bibr" rid="B76">2016</xref>) is another autoencoder-based model for network representation learning. The objective function of SDNE is:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M37"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>V</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02299;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mtext>&#x003A8;</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>S</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The first term is an autoencoder as in Equation (5), except that the reconstruction error is weighted, so that more emphasis is put on recovering non-zero entries in <bold>S</bold><sub><italic>i</italic>&#x0002A;</sub>. The second part is motivated by Laplacian Eigenmaps that imposes nearby nodes to have similar embeddings. Besides, SDNE differs from DNGR in the definition of <bold>S</bold>, where DNGR defines <bold>S</bold> as the PPMI matrix while SDNE sets <bold>S</bold> as the adjacency matrix.</p>
<p>It is worth noting that, unlike Equation (2) which uses one-hot indicator vector for embedding look-up, DNGR and SDNE transform each node&#x00027;s information to an embedding by training neural network modules. Such distinction allows autoencoder-based methods to directly model on a node&#x00027;s neighborhood structure and features, which is not straightforward for random walk approaches. Therefore, it is straightforward to incorporate richer information sources (e.g., node attributes) into representation learning, as will be introduced below. However, autoencoder-based methods may suffer from scalability issues as the input dimension is <inline-formula><mml:math id="M38"><mml:mo>|</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow><mml:mo>|</mml:mo></mml:math></inline-formula>, which may result in significant time costs in real massive datasets.</p></sec>
<sec>
<title>3.3.3. Autoencoder-Based Attributed Network Embedding</title>
<p>The structure of autoencoders facilitates the incorporation of multiple information sources toward joint representation learning. Instead of only mapping nodes to the latent space, CAN (Meng et al., <xref ref-type="bibr" rid="B52">2019</xref>) proposes to learn the representation of nodes and attributes in the same latent space by using variational autoencoders (VAEs) (Doersch, <xref ref-type="bibr" rid="B21">2016</xref>), in order to capture the affinities between nodes and attributes. DANE (Gao and Huang, <xref ref-type="bibr" rid="B25">2018</xref>) utilizes the correlation between topological and attribute information of nodes by building two autoencoders for each information source, and then encourages the two sets of latent representations to be consistent and complementary. Li H. et al. (<xref ref-type="bibr" rid="B41">2017</xref>) adopts another strategy, where topological feature vector and content information vector (learned by doc2vec Le and Mikolov, <xref ref-type="bibr" rid="B38">2014</xref>) are directly concatenated and put into a VAE to capture the nonlinear relationship between them.</p></sec></sec>
<sec>
<title>3.4. Graph Convolutional Approaches</title>
<p>Inspired by the significant performance improvement of convolutional neural networks (CNN) in image recognition, recent years have witnessed a surge in adapting convolutional modules to learn representations of network data. The intuition behind it is to generate node embedding by aggregating information from its local neighborhood as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. Different from autoencoder-based approaches, the encoding function of graph convolutional approaches leverages a node&#x00027;s local neighborhood as well as attribute information. Some efforts (Bruna et al., <xref ref-type="bibr" rid="B10">2013</xref>; Henaff et al., <xref ref-type="bibr" rid="B31">2015</xref>; Defferrard et al., <xref ref-type="bibr" rid="B18">2016</xref>; Hamilton W. et al., <xref ref-type="bibr" rid="B29">2017</xref>) have been made to extend traditional convolutional networks for network data to generate network embedding in the past few years. The convolutional filters of these approaches are either spatial filters or spectral filters. Spatial filters operate directly on the adjacency matrix whereas spectral filters operate on the spectrum of graph Laplacian (Defferrard et al., <xref ref-type="bibr" rid="B18">2016</xref>).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>An overview of graph convolutional networks. The dashed rectangles denote node attributes. The representation of each individual node (e.g., node C) is aggregated from its immediate neighbors (e.g., node A, B, D, E), concatenated with the lower-layer representation of itself.</p></caption>
<graphic xlink:href="fdata-02-00002-g0004.tif"/>
</fig>
<sec>
<title>3.4.1. Graph Convolutional Networks (GCN)</title>
<p>GCN (Bronstein et al., <xref ref-type="bibr" rid="B9">2017</xref>) is a well-known semi-supervised graph convolutional networks. It defines a convolutional operator on network, and iteratively aggregates embeddings of neighbors of a node and uses the aggregated embedding as well as its own embedding at previous iteration to generate the node&#x00027;s new representation. The layer-wise propagation rule of encoding function &#x003C8;<sub><italic>enc</italic></sub> is defined as:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M39"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>H</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>D</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>D</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>H</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <bold>H</bold><sup><italic>k</italic>&#x02212;1</sup> denotes the learned embeddings in layer <italic>k</italic> &#x02212; 1, and <bold>H</bold><sup>0</sup> &#x0003D; <bold>X</bold>. <inline-formula><mml:math id="M40"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>I</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is the adjacency matrix with added self-connections. <bold>I</bold><sub><italic>G</italic></sub> is the identity matrix, <inline-formula><mml:math id="M41"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>D</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. <bold>W</bold><sup><italic>k</italic>&#x02212;1</sup> is a layer-wise trainable weight matrix. &#x003C3;(&#x000B7;) denotes an activation function such as ReLU. The loss function for supervised training is to evaluate the cross-entropy error over all labeled nodes:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M42"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo class="qopname">ln</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo class="qopname">^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo class="qopname">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>&#x000A0;A</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M43"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>F</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the predictive matrix with <italic>F</italic> candidate labels. <italic>&#x003C8;</italic><sub><italic>dec</italic></sub>(&#x000B7;) can be viewed as a fully-connected network with the softmax activation function to map representations to predicted labels. Note that unlike autoencoders that explicitly treat each node&#x00027;s neighborhood as features or reconstruction goals as in Equations (5) or (6), GCN implicitly applies the local neighborhood links on each encoding layer as pathways to aggregate embeddings from neighbors, so that higher order network structures are utilized. Since Equation (8) is a supervised loss function, &#x003A6;<sub><italic>tar</italic></sub> is not applicable here. However, the loss function can also be formulated in unsupervised manners, similar to the skip-gram model (Kipf and Welling, <xref ref-type="bibr" rid="B37">2016</xref>; Hamilton W. et al., <xref ref-type="bibr" rid="B29">2017</xref>). GCN may suffer from the scalability problem when the size of <bold>A</bold> is large. The corresponding training algorithms have been proposed to tackle this challenge (Ying et al., <xref ref-type="bibr" rid="B81">2018a</xref>), where the network data is processed in small batches and we can sample a node&#x00027;s local neighbors instead of using all of them.</p></sec>
<sec>
<title>3.4.2. Inductive Training With GCN</title>
<p>So far many basic models we have reviewed mainly generate network representations in a transductive manner. GraphSAGE (Hamilton W. et al., <xref ref-type="bibr" rid="B29">2017</xref>) emphasized the inductive capability of GCN. Inductive learning is essential for high-throughput machine learning systems, especially when operating on evolving networks that constantly encounter unseen nodes (Yang et al., <xref ref-type="bibr" rid="B80">2016</xref>; Guo et al., <xref ref-type="bibr" rid="B27">2018</xref>). The core representation update scheme of GraphSAGE is similar to that of traditional GCN, except that the operation on the whole network is replaced by sample-based representation aggregators:</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M44"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo>&#x000B7;</mml:mo><mml:mtext class="textsc" mathvariant="normal">CONCAT</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext class="textsc" mathvariant="normal">AGGREGATE</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x02200;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M45"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the hidden representation of node <italic>v</italic><sub><italic>i</italic></sub> in the <italic>k</italic>-th layer. CONCAT denotes concatenation operator and AGGREGATE<sub><italic>k</italic></sub> represents neighborhood aggregation function of the <italic>k</italic>-th layer (e.g., element-wise mean or max operator). <inline-formula><mml:math id="M46"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the neighbors of <italic>v</italic><sub><italic>i</italic></sub>. Compared with Equation (7), GraphSAGE only needs to aggregate feature vectors from the partial set of neighbors, making it scalable for large-scale data. Given the attribute features and neighborhood relations of an unseen node, GraphSAGE can generate the embedding of this node by leveraging its local neighbors as well as attributes via forward propagation.</p></sec>
<sec>
<title>3.4.3. Graph Attention Mechanisms</title>
<p>Attention mechanisms have become the standard technique in many sequence-based tasks, in order to make models focus on the most relevant parts of the input in making decisions. We could also utilize attention mechanisms to aggregate the most important features from nodes&#x00027; local neighbors. GAT (Velickovic et al., <xref ref-type="bibr" rid="B74">2017</xref>) extends the framework of GCN by replacing the standard aggregation function with an attention layer to aggregate messages from most important neighbors. Thekumparampil et al. (<xref ref-type="bibr" rid="B72">2018</xref>) also proposes to remove all intermediate fully-connected layers in conventional GCN and to replace the propagation layers with attention layers. It thus allows the model to learn a dynamic and adaptive local summary of neighborhoods, greatly reduces the parameters, and also achieves more accurate predictions.</p></sec></sec></sec>
<sec id="s4">
<title>4. Subgraph Embedding</title>
<p>Besides learning representations for nodes, recent years have also witnessed an increasing branch of research efforts that try to learn representations for a set of nodes and edges as an integral. Thus, the goal is to represent a subgraph with a low-dimensional vector. Many traditional methods that operate on subgraphs rely on graph kernels (Haussler, <xref ref-type="bibr" rid="B30">1999</xref>), which decompose a network into some atomic substructures such as graphlets, subtree patterns, and paths, and treat these substructures as features to obtain an embedding through further transformation. In this section, however, we focus on reviewing methods that seek to automatically learn embeddings of subgraphs using deep models. For those who are interested in graph kernels, we refer the readers to Vishwanathan et al. (<xref ref-type="bibr" rid="B75">2010</xref>).</p>
<p>According to the literature, most existing methods are built on the techniques used for node embedding, as introduced in section 3. However, in graph representation problems, the label information is associated with particular subgraphs instead of individual nodes or links. In this review, we divide the approaches of subgraph representation learning into two categories based on how they aggregate node-level embeddings in each subgraph. The detailed discussion for each category is as below.</p>
<sec>
<title>4.1. Flat Aggregation</title>
<p>Assume <inline-formula><mml:math id="M47"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> denotes the set of nodes in a particular subgraph and <inline-formula><mml:math id="M48"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> represents the subgraph&#x00027;s embedding, <inline-formula><mml:math id="M49"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> could be obtained by aggregating the embeddings of all individual nodes in the subgraph:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M50"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C8;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>g</mml:mi><mml:mi>g</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C8;<sub><italic>aggr</italic></sub> denotes the aggregation function. Methods based on such flat aggregation usually define &#x003C8;<sub><italic>aggr</italic></sub> that captures simple correlations among nodes. For example, Niepert et al. (<xref ref-type="bibr" rid="B57">2016</xref>) directly concatenates node embeddings together and utilize standard convolutional neural networks as an aggregation function to generate graph representation. Dai et al. (<xref ref-type="bibr" rid="B17">2016</xref>) employs a simple element-wise summation operation to define &#x003C8;<sub><italic>aggr</italic></sub>, and learns graph embedding by summing all embeddings of individual nodes.</p>
<p>In addition, some methods apply recurrent neural networks (RNNs) for representing graphs. Some typical methods first sample a number of graph sequences from the input network, and then apply RNN-based autoencoders to generate an embedding for each graph sequence. The final graph representation is obtained by either averaging (Jin et al., <xref ref-type="bibr" rid="B36">2018</xref>) or concatenating (Taheri et al., <xref ref-type="bibr" rid="B68">2018</xref>) these graph sequence embeddings.</p></sec>
<sec>
<title>4.2. Hierarchical Aggregation</title>
<p>In contrast to flat aggregation, the motivation behind <italic>hierarchical</italic> aggregation is to preserve the hierarchical structure that might be presented in the subgraph by aggregating neighborhood information via a hierarchical way. Bruna et al. (<xref ref-type="bibr" rid="B10">2013</xref>) and Defferrard et al. (<xref ref-type="bibr" rid="B18">2016</xref>) attempt to utilize such a hierarchical structure of networks by combining convolutional neural networks with graph coarsening. The main idea behind them is to stack multiple graph coarsening and convolutional layers. In each layer, they first apply graph cluster algorithms to group nodes, and then merge node embeddings within each cluster using element-wise max-pooling. After clustering, they generate a new coarse network by stacking embeddings of clusters together, which is again fed into convolutional layers and the same process repeats. Clusters in each layer can be viewed as subgraphs, and cluster algorithms are used to learn the assignment matrix of subgraphs, so that the hierarchical structure of the network is also propagated through the layers. Although these methods work well in certain applications, they actually follow a two-stage fashion, where the stages of clustering and embedding may not reinforce each other.</p>
<p>To avoid this limitation, DiffPool (Ying et al., <xref ref-type="bibr" rid="B82">2018b</xref>) proposes an end-to-end model that does not depend on a deterministic clustering subroutine. The layer-wise propagation rule is formulated as below:</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M51"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>M</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>C</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>C</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>C</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M52"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> denotes node embeddings, <inline-formula><mml:math id="M53"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>C</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> is the cluster assignment matrix learned from the previous layer. The goal of the left equation is to generate the (<italic>k</italic> &#x0002B; 1)-th coarser network embedding <bold>M</bold><sup>(<italic>k</italic>&#x0002B;1)</sup> by aggregating node embeddings according to cluster assignment <bold>C</bold><sup>(<italic>k</italic>)</sup>; while the right equation is to learn a new coarsened adjacency matrix <inline-formula><mml:math id="M54"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> from the previous adjacency matrix <bold>A</bold><sup>(<italic>k</italic>)</sup>, which stores the similarity between each pair of clusters. Here, instead of applying deterministic clustering algorithm to learn <bold>C</bold><sup>(<italic>k</italic>)</sup>, they adopt graph neural networks (GNNs) to learn it. Specifically, they use two separate GNNs on the input embedding matrix <bold>M</bold><sup>(<italic>k</italic>)</sup> and coarsened adjacency matrix <bold>A</bold><sup>(<italic>k</italic>)</sup> to generate assignment matrix <bold>C</bold><sup>(<italic>k</italic>)</sup> and embedding matrix <bold>Z</bold><sup>(<italic>k</italic>)</sup>, respectively. Formally, <inline-formula><mml:math id="M55"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>Z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext class="textsc" mathvariant="normal">GNN</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>M</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M56"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>C</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext class="textsc" mathvariant="normal">GNN</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>M</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. The two steps could reinforce each other to improve the performance. DiffPool may suffer from computational issues brought by the computation of soft clustering assignment, which is further addressed in Cangea et al. (<xref ref-type="bibr" rid="B12">2018</xref>).</p></sec></sec>
<sec id="s5">
<title>5. Applications</title>
<p>The representations learned from networks can be easily applied to downstream machine learning models for further analysis on social networks. Some common applications include node classification, link prediction, anomaly detection, and clustering.</p>
<sec>
<title>5.1. Node Classification</title>
<p>In social networks, people are often associated with semantic labels with respect to certain aspects about them, such as affiliations, interests, or beliefs. However, in real-world scenarios, people are usually partially or sparsely labeled, since labeling is expensive and time consuming. The goal of node classification is to predict labels of unlabeled nodes in networks by leveraging their connections with the labeled ones considering the network structure. According to Bhagat et al. (<xref ref-type="bibr" rid="B8">2011</xref>), existing methods can be classified into two categories, e.g., random walk based, and feature extraction-based methods. The former aims to propagate labels with random walks (Baluja et al., <xref ref-type="bibr" rid="B3">2008</xref>), while the latter targets to extract features from a node&#x00027;s surrounding information and network statistics.</p>
<p>In general, the network representation approach follows the second principle. A number of existing network representation models, like Yang et al. (<xref ref-type="bibr" rid="B79">2015</xref>), Wang et al. (<xref ref-type="bibr" rid="B76">2016</xref>), and Liao et al. (<xref ref-type="bibr" rid="B45">2018</xref>), focus on extracting node features from the network using representation learning techniques, and then apply machine learning classifiers like support vector machine, naive Bayes classifiers, and logistic regression for prediction. In contrast to separating the steps of node embedding and node classification, some recent work (Dai et al., <xref ref-type="bibr" rid="B17">2016</xref>; Hamilton W. et al., <xref ref-type="bibr" rid="B29">2017</xref>; Monti et al., <xref ref-type="bibr" rid="B55">2017</xref>) designs an end-to-end framework to combine the two tasks, so that the discriminative information inferred from labels can directly benefit the learning of network embedding.</p></sec>
<sec>
<title>5.2. Link Prediction</title>
<p>Social networks are not necessarily complete as some links might be missing. For example, friendship links between two users in a social network can be missing even if they actually know each other in real world. The goal of link prediction is to infer the existence of new interactions or emerging links between users in the future, based on the observed links and the network evolution mechanism (Liben-Nowell and Kleinberg, <xref ref-type="bibr" rid="B46">2007</xref>; Al Hasan and Zaki, <xref ref-type="bibr" rid="B2">2011</xref>; L&#x000FC; and Zhou, <xref ref-type="bibr" rid="B51">2011</xref>). In network embedding, an effective model is expected to preserve both network structure and inherent dynamics of the network in the low-dimensional space. In general, the majority of previous work focuses on predicting missing links between users under homogeneous network settings (Grover and Leskovec, <xref ref-type="bibr" rid="B26">2016</xref>; Ou et al., <xref ref-type="bibr" rid="B58">2016</xref>; Zhou et al., <xref ref-type="bibr" rid="B84">2017</xref>), and some efforts also attempt to predict missing links in heterogeneous networks (Liu Z. et al., <xref ref-type="bibr" rid="B49">2017</xref>, <xref ref-type="bibr" rid="B50">2018</xref>). Although, beyond the scope of this survey, applying network embedding for building recommender systems (Ying et al., <xref ref-type="bibr" rid="B81">2018a</xref>) may also be a direction that is worth exploring.</p></sec>
<sec>
<title>5.3. Anomaly Detection</title>
<p>Another challenging task in social network analysis is anomaly detection. Malicious activities in social networks, such as spamming, fraud, and phishing, can be interpreted as rare or unexpected behaviors that deviate from the majority of normal users. While numerous algorithms have been proposed for spotting anomalies and outliers in networks (Savage et al., <xref ref-type="bibr" rid="B63">2014</xref>; Akoglu et al., <xref ref-type="bibr" rid="B1">2015</xref>; Liu N. et al., <xref ref-type="bibr" rid="B47">2017</xref>), anomaly detection methods, based on network embedding techniques, have recently received increased attention (Hu et al., <xref ref-type="bibr" rid="B33">2016</xref>; Liang et al., <xref ref-type="bibr" rid="B44">2018</xref>; Peng et al., <xref ref-type="bibr" rid="B60">2018</xref>). The discrete and structural information in networks are merged and projected into the continuous latent space, which facilitates the application of various statistical or geometrical algorithms in measuring the degree of isolation or outlierness of network components. In addition, in contrast to detect malicious activities in a static way, Sricharan and Das (<xref ref-type="bibr" rid="B67">2014</xref>) and Yu et al. (<xref ref-type="bibr" rid="B83">2018</xref>) also attempted to study the problem in dynamic networks.</p></sec>
<sec>
<title>5.4. Node Clustering</title>
<p>In addition to the above applications, node clustering is another important network analysis problem. The target of node clustering is to partition a network into a set of clusters (or subgraphs), so that nodes in the same cluster are more similar to each other than those from other clusters. In social networks, such clusters are widely spread in terms of communities, such as groups of people that belong to similar affiliations or have similar interests. Most previous work focuses on clustering networks with various metrics of proximity or connection strength between nodes. For example, Shi and Malik (<xref ref-type="bibr" rid="B65">2000</xref>) and Ding et al. (<xref ref-type="bibr" rid="B20">2001</xref>) seek to maximize the number of connections within clusters while minimizing the connections between clusters. Recently, many efforts have resort to network representation techniques for node clustering. Some methods treat embedding and clustering as disjointed tasks, where they first embed nodes to low-dimensional vectors, and then apply traditional clustering algorithms to produce clusters (Tian et al., <xref ref-type="bibr" rid="B73">2014</xref>; Cao et al., <xref ref-type="bibr" rid="B13">2015</xref>; Wang et al., <xref ref-type="bibr" rid="B77">2017</xref>). Other methods such as Tang et al. (<xref ref-type="bibr" rid="B70">2016</xref>) and Wei et al. (<xref ref-type="bibr" rid="B78">2017</xref>) consider the optimization problem of clustering and network embedding in a unified objective function and generate cluster-induced node embeddings.</p></sec></sec>
<sec id="s6">
<title>6. Conclusion and Future Directions</title>
<p>In recent years there has been a surge in leveraging representation learning techniques for network analysis. In this review, we have provided an overview of the recent efforts on this topic. Specifically, we summarize existing techniques into three subgroups based on the type of the core learning modules: representation look-up tables, autoencoders, and graph convolutional networks. Although many techniques have been developed for a wide spectrum of social networks analysis problems in the past few years, we believe there still remains many promising directions that are worth further exploring.</p>
<sec>
<title>6.1. Dynamic Networks</title>
<p>Social networks are inherently highly dynamic in real-life scenarios. The overall set of nodes, the underlying network structure, as well as attribute information, might evolve over time. As an example, these elements in real world social networks such as Facebook could correspond to users, connections, and personal profiles. This property makes existing static learning techniques fail in working properly. Although several methods have been proposed to tackle dynamic networks, they often rely on certain assumptions, such as assuming that the node set is fixed and only deals with dynamics caused by edge deletion and addition (Li J. et al., <xref ref-type="bibr" rid="B42">2017</xref>). Furthermore, the changes in attribute information are rarely considered in existing works. Therefore, how to design effective and efficient network embedding techniques for truly dynamic networks remains an open question.</p></sec>
<sec>
<title>6.2. Hierarchical Network Structure</title>
<p>Most of the existing techniques mainly focus on designing advanced encoding or decoding functions trying to capture node pairwise relationships. Nevertheless, pairwise relations can only provide insights about local neighborhoods, and might not infer global hierarchical network structures, which is crucial for more complex networks (Benson et al., <xref ref-type="bibr" rid="B7">2016</xref>). How to design effective network embedding methods that are capable of preserving hierarchical structures of networks is a promising direction for further work.</p></sec>
<sec>
<title>6.3. Heterogeneous Networks</title>
<p>Existing network embedding methods mainly deal with homogeneous networks. However, many relational systems in real-life scenarios can be abstracted as heterogeneous networks with multiple types of nodes or edges. In this case, it is hard to evaluate semantic proximity between different network elements in the low-dimensional space. While some work has investigated the use of metapaths (Dong et al., <xref ref-type="bibr" rid="B22">2017</xref>; Huang and Mamoulis, <xref ref-type="bibr" rid="B35">2017</xref>) to approximate semantic similarity for heterogeneous network embedding, many tasks on heterogeneous networks have not been fully evaluated. Learning embeddings for heterogeneous networks is still at the early stage, and more comprehensive techniques are required to fully capture the relations between different types of network elements, toward modeling more complex real systems.</p></sec>
<sec>
<title>6.4. Scalability</title>
<p>Although deep learning based network embedding methods have achieved substantial performances due to their great capacities, they still suffer from the problem of efficiency. This problem will become more severe when dealing with real-life massive datasets with billions of nodes and edges. Designing deep representation learning frameworks that are scalable for real network datasets is another driving factor to advance the research in this domain. Additionally, similar to using GPUs for traditional deep models built on grid structured data, developing computational paradigms for large-scale network processing could be an alternative way toward efficiency improvement (Bronstein et al., <xref ref-type="bibr" rid="B9">2017</xref>).</p></sec>
<sec>
<title>6.5. Interpretability</title>
<p>Despite the superior performances achieved by deep models, one fundamental limitation of them is the lack of interpretability (Liu N. et al., <xref ref-type="bibr" rid="B48">2018</xref>). Different dimensions in the embedding space usually have no specific meaning, thus it is difficult to comprehend the underlying factors that have been preserved in the latent space. Since the interpretability aspect of machine learning models is currently receiving increased attention (Du M. et al., <xref ref-type="bibr" rid="B24">2018</xref>; Montavon et al., <xref ref-type="bibr" rid="B54">2018</xref>), it might also be important to explore how to understand the representation learning outcome, how to develop interpretable network representation learning models, as well as how to utilize interpretation to improve the representation models. Answering these questions is helpful to learn more meaningful and task-specific embeddings toward various social network analysis problems.</p></sec></sec>
<sec id="s7">
<title>Author Contributions</title>
<p>All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.</p>
<sec>
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec></sec>
</body>
<back>
<ack><p>This work is, in part, supported by NSF (&#x00023;IIS-1657196, &#x00023;IIS-1750074 and &#x00023;IIS-1718840).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Akoglu</surname> <given-names>L.</given-names></name> <name><surname>Tong</surname> <given-names>H.</given-names></name> <name><surname>Koutra</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>Graph based anomaly detection and description: a survey</article-title>. <source>Data Min. Knowl. Discov.</source> <volume>29</volume>, <fpage>626</fpage>&#x02013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-014-0365-y</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Al Hasan</surname> <given-names>M.</given-names></name> <name><surname>Zaki</surname> <given-names>M. J.</given-names></name></person-group> (<year>2011</year>). <article-title>A survey of link prediction in social networks</article-title>, in <source>Social Network Data Analytics</source>, ed <person-group person-group-type="editor"><name><surname>Aggarwal</surname> <given-names>C. C.</given-names></name></person-group> (<publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>243</fpage>&#x02013;<lpage>275</lpage>.</citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Baluja</surname> <given-names>S.</given-names></name> <name><surname>Seth</surname> <given-names>R.</given-names></name> <name><surname>Sivakumar</surname> <given-names>D.</given-names></name> <name><surname>Jing</surname> <given-names>Y.</given-names></name> <name><surname>Yagnik</surname> <given-names>J.</given-names></name> <name><surname>Kumar</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Video suggestion and discovery for youtube: taking random walks through the view graph</article-title>, in <source>International Conference on World Wide Web</source> (<publisher-loc>Beijing</publisher-loc>), <fpage>895</fpage>&#x02013;<lpage>904</lpage>.</citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Belkin</surname> <given-names>M.</given-names></name> <name><surname>Niyogi</surname> <given-names>P.</given-names></name></person-group> (<year>2002</year>). <article-title>Laplacian eigenmaps and spectral techniques for embedding and clustering</article-title>, in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Cambridge, MA</publisher-loc>), <fpage>585</fpage>&#x02013;<lpage>591</lpage>.</citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2009</year>). <article-title>Learning deep architectures for AI</article-title>. <source>Found. Trends&#x000AE; in Mach. Learn.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>127</lpage>. <pub-id pub-id-type="doi">10.1561/2200000006</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Courville</surname> <given-names>A.</given-names></name> <name><surname>Vincent</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>Representation learning: a review and new perspectives</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>35</volume>, <fpage>1798</fpage>&#x02013;<lpage>1828</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2013.50</pub-id><pub-id pub-id-type="pmid">23787338</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benson</surname> <given-names>A. R.</given-names></name> <name><surname>Gleich</surname> <given-names>D. F.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Higher-order organization of complex networks</article-title>. <source>Science</source> <volume>353</volume>, <fpage>163</fpage>&#x02013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1126/science.aad9029</pub-id><pub-id pub-id-type="pmid">27387949</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bhagat</surname> <given-names>S.</given-names></name> <name><surname>Cormode</surname> <given-names>G.</given-names></name> <name><surname>Muthukrishnan</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Node classification in social networks</article-title>, in <source>Social Network Data Analytics</source>, ed <person-group person-group-type="editor"><name><surname>Aggarwal</surname> <given-names>C.</given-names></name></person-group> (<publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>115</fpage>&#x02013;<lpage>148</lpage>.</citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bronstein</surname> <given-names>M. M.</given-names></name> <name><surname>Bruna</surname> <given-names>J.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Szlam</surname> <given-names>A.</given-names></name> <name><surname>Vandergheynst</surname> <given-names>P.</given-names></name></person-group> (<year>2017</year>). <article-title>Geometric deep learning: going beyond euclidean data</article-title>. <source>IEEE Signal Process. Mag.</source> <volume>34</volume>, <fpage>18</fpage>&#x02013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1109/MSP.2017.2693418</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bruna</surname> <given-names>J.</given-names></name> <name><surname>Zaremba</surname> <given-names>W.</given-names></name> <name><surname>Szlam</surname> <given-names>A.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Spectral networks and locally connected networks on graphs</article-title>, in <source>Proceedings of International Conference on Learning Representation</source> (<publisher-loc>Banff, AB</publisher-loc>).</citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bullinaria</surname> <given-names>J. A.</given-names></name> <name><surname>Levy</surname> <given-names>J. P.</given-names></name></person-group> (<year>2007</year>). <article-title>Extracting semantic representations from word co-occurrence statistics: a computational study</article-title>. <source>Behav. Res. Methods</source> <volume>39</volume>, <fpage>510</fpage>&#x02013;<lpage>526</lpage>. <pub-id pub-id-type="doi">10.3758/BF03193020</pub-id><pub-id pub-id-type="pmid">17958162</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cangea</surname> <given-names>C.</given-names></name> <name><surname>Veli&#x0010D;kovi&#x00107;</surname> <given-names>P.</given-names></name> <name><surname>Jovanovi&#x00107;</surname> <given-names>N.</given-names></name> <name><surname>Kipf</surname> <given-names>T.</given-names></name> <name><surname>Li&#x000F2;</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Towards sparse hierarchical graph classifiers</article-title>, in <source>Workshop Proceedings of International Conference on Learning Representations</source> (<publisher-loc>Scottsdale, AZ</publisher-loc>).</citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cao</surname> <given-names>S.</given-names></name> <name><surname>Lu</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>Q.</given-names></name></person-group> (<year>2015</year>). <article-title>Grarep: learning graph representations with global structural information</article-title>, in <source>ACM International Conference on Information and Knowledge Management</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>891</fpage>&#x02013;<lpage>900</lpage>.</citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cao</surname> <given-names>S.</given-names></name> <name><surname>Lu</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>Q.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep neural networks for learning graph representations</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>Phoenix, AZ</publisher-loc>), <fpage>1145</fpage>&#x02013;<lpage>1152</lpage>.</citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>S.</given-names></name> <name><surname>Han</surname> <given-names>W.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Qi</surname> <given-names>G.-J.</given-names></name> <name><surname>Aggarwal</surname> <given-names>C. C.</given-names></name> <name><surname>Huang</surname> <given-names>T. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Heterogeneous network embedding via deep architectures</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>Sydney, NSW</publisher-loc>), <fpage>119</fpage>&#x02013;<lpage>128</lpage>.</citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name></person-group> (<year>2010</year>). <article-title>Scalable influence maximization for prevalent viral marketing in large-scale social networks</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>Washington, DC</publisher-loc>), <fpage>1029</fpage>&#x02013;<lpage>1038</lpage>.</citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Dai</surname> <given-names>B.</given-names></name> <name><surname>Song</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>Discriminative embeddings of latent variable models for structured data</article-title>, in <source>International Conference on Machine Learning</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>2702</fpage>&#x02013;<lpage>2711</lpage>.</citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Defferrard</surname> <given-names>M.</given-names></name> <name><surname>Bresson</surname> <given-names>X.</given-names></name> <name><surname>Vandergheynst</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>Convolutional neural networks on graphs with fast localized spectral filtering</article-title>, in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Barcelona</publisher-loc>), <fpage>3844</fpage>&#x02013;<lpage>3852</lpage>.</citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dietz</surname> <given-names>L.</given-names></name> <name><surname>Bickel</surname> <given-names>S.</given-names></name> <name><surname>Scheffer</surname> <given-names>T.</given-names></name></person-group> (<year>2007</year>). <article-title>Unsupervised prediction of citation influences</article-title>, in <source>International Conference on Machine Learning</source> (<publisher-loc>Corvallis, OR</publisher-loc>), <fpage>233</fpage>&#x02013;<lpage>240</lpage>.</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ding</surname> <given-names>C. H.</given-names></name> <name><surname>He</surname> <given-names>X.</given-names></name> <name><surname>Zha</surname> <given-names>H.</given-names></name> <name><surname>Gu</surname> <given-names>M.</given-names></name> <name><surname>Simon</surname> <given-names>H. D.</given-names></name></person-group> (<year>2001</year>). <article-title>A min-max cut algorithm for graph partitioning and data clustering</article-title>, in <source>IEEE International Conference on Data Mining</source> (<publisher-loc>San Jose, CA</publisher-loc>), <fpage>107</fpage>&#x02013;<lpage>114</lpage>.</citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doersch</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>Tutorial on variational autoencoders</article-title>. <source>arXiv preprint arXiv:1606.05908</source>.</citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>Y.</given-names></name> <name><surname>Chawla</surname> <given-names>N. V.</given-names></name> <name><surname>Swami</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>metapath2vec: Scalable representation learning for heterogeneous networks</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>Halifax, NS</publisher-loc>), <fpage>135</fpage>&#x02013;<lpage>144</lpage>.</citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Song</surname> <given-names>G.</given-names></name> <name><surname>Lu</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Dynamic network embedding: An extended approach for skip-gram based network embedding</article-title>, in <source>International Joint Conference on Artificial Intelligence</source> (<publisher-loc>Stockholm</publisher-loc>), <fpage>2086</fpage>&#x02013;<lpage>2092</lpage>.</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Techniques for interpretable machine learning</article-title>. <source>arXiv preprint arXiv:1808.00033</source>.</citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep attributed network embedding</article-title>, in <source>IJCAI</source> (<publisher-loc>New York, NY</publisher-loc>).</citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Grover</surname> <given-names>A.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>node2vec: Scalable feature learning for networks</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>855</fpage>&#x02013;<lpage>864</lpage>.</citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>E.</given-names></name></person-group> (<year>2018</year>). <article-title>Spine: Structural identity preserved inductive network embedding</article-title>. <source>arXiv preprint arXiv:1802.03984</source>.</citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>Z. M.</given-names></name> <name><surname>Zhu</surname> <given-names>S.</given-names></name> <name><surname>Chi</surname> <given-names>Y.</given-names></name> <name><surname>Gong</surname> <given-names>Y.</given-names></name></person-group> (<year>2014</year>). <article-title>A two-level topic model towards knowledge discovery from citation networks</article-title>. <source>IEEE Trans. Knowl. Data Eng.</source> <volume>26</volume>, <fpage>780</fpage>&#x02013;<lpage>794</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2013.56</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hamilton</surname> <given-names>W.</given-names></name> <name><surname>Ying</surname> <given-names>Z.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Inductive representation learning on large graphs</article-title>, in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Long Beach, CA</publisher-loc>), <fpage>1024</fpage>&#x02013;<lpage>1034</lpage>.</citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Haussler</surname> <given-names>D.</given-names></name></person-group> (<year>1999</year>). <source>Convolution Kernels on Discrete Structures.</source> <publisher-name>Technical Report, Department of Computer Science, University of California at Santa Cruz</publisher-name>.</citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henaff</surname> <given-names>M.</given-names></name> <name><surname>Bruna</surname> <given-names>J.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep convolutional networks on graph-structured data</article-title>. <source>arXiv preprint arXiv:1506.05163</source>.</citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R. R.</given-names></name></person-group> (<year>2006</year>). <article-title>Reducing the dimensionality of data with neural networks</article-title>. <source>Science</source> <volume>313</volume>, <fpage>504</fpage>&#x02013;<lpage>507</lpage>. <pub-id pub-id-type="doi">10.1126/science.1127647</pub-id><pub-id pub-id-type="pmid">16873662</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>R.</given-names></name> <name><surname>Aggarwal</surname> <given-names>C. C.</given-names></name> <name><surname>Ma</surname> <given-names>S.</given-names></name> <name><surname>Huai</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>An embedding approach to anomaly detection</article-title>, in <source>IEEE International Conference on Data Engineering</source> (<publisher-loc>Helsinki</publisher-loc>), <fpage>385</fpage>&#x02013;<lpage>396</lpage>.</citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Song</surname> <given-names>Q.</given-names></name> <name><surname>Yang</surname> <given-names>F.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Large-scale heterogeneous feature embedding</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>Honolulu, HI</publisher-loc>).</citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Z.</given-names></name> <name><surname>Mamoulis</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>Heterogeneous information network embedding for meta path based proximity</article-title>. <source>arXiv preprint arXiv:1701.05291</source>. <pub-id pub-id-type="doi">10.1145/1235</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>H.</given-names></name> <name><surname>Song</surname> <given-names>Q.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Discriminative graph autoencoder</article-title>, in <source>2018 IEEE International Conference on Big Knowledge (ICBK)</source> (<publisher-loc>Singapore</publisher-loc>).</citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>T. N.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Variational graph auto-encoders</article-title>, in <source>Proceedings of NeurIPS Bayesian Deep Learning Workshop</source> (<publisher-loc>Barcelona</publisher-loc>).</citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Le</surname> <given-names>Q.</given-names></name> <name><surname>Mikolov</surname> <given-names>T.</given-names></name></person-group> (<year>2014</year>). <article-title>Distributed representations of sentences and documents</article-title>, in <source>International Conference on Machine Learning</source> (<publisher-loc>Beijing</publisher-loc>), <fpage>1188</fpage>&#x02013;<lpage>1196</lpage>.</citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leskovec</surname> <given-names>J.</given-names></name> <name><surname>Adamic</surname> <given-names>L. A.</given-names></name> <name><surname>Huberman</surname> <given-names>B. A.</given-names></name></person-group> (<year>2007</year>). <article-title>The dynamics of viral marketing</article-title>. <source>ACM Trans. Web</source> <volume>1</volume>:<fpage>5</fpage>. <pub-id pub-id-type="doi">10.1145/1232722.1232727</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Levy</surname> <given-names>O.</given-names></name> <name><surname>Goldberg</surname> <given-names>Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Neural word embedding as implicit matrix factorization</article-title>, in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Montreal, QC</publisher-loc>), <fpage>2177</fpage>&#x02013;<lpage>2185</lpage>.</citation></ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Odagaki</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Variation autoencoder based network representation learning for classification</article-title>, in <source>Proceedings of ACL 2017, Student Research Workshop</source> (<publisher-loc>Vancouver, BC</publisher-loc>).</citation></ref>
<ref id="B42">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Dani</surname> <given-names>H.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Chang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Attributed network embedding for learning in a dynamic environment</article-title>, in <source>ACM Conference on Information and Knowledge Management</source> (<publisher-loc>Singapore</publisher-loc>), <fpage>387</fpage>&#x02013;<lpage>396</lpage>.</citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Tan</surname> <given-names>K.-L.</given-names></name></person-group> (<year>2015</year>). <article-title>Real-time targeted influence maximization for online advertisements</article-title>. <source>Proc. VLDB Endowm.</source> <volume>8</volume>, <fpage>1070</fpage>&#x02013;<lpage>1081</lpage>. <pub-id pub-id-type="doi">10.14778/2794367.2794376</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liang</surname> <given-names>J.</given-names></name> <name><surname>Jacobs</surname> <given-names>P.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name> <name><surname>Parthasarathy</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Semi-supervised embedding in attributed networks with outliers</article-title>, in <source>SIAM International Conference on Data Mining</source> (<publisher-loc>San Diego, CA</publisher-loc>), <fpage>153</fpage>&#x02013;<lpage>161</lpage>.</citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>L.</given-names></name> <name><surname>He</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Chua</surname> <given-names>T.-S.</given-names></name></person-group> (<year>2018</year>). <article-title>Attributed social network embedding</article-title>. <source>IEEE Trans. Knowl. Data Eng</source>. <volume>30</volume>, <fpage>2257</fpage>&#x02013;<lpage>2270</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2018.2819980</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liben-Nowell</surname> <given-names>D.</given-names></name> <name><surname>Kleinberg</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>The link-prediction problem for social networks</article-title>. <source>J. Am. Soc. Inform. Sci. Technol.</source> <volume>58</volume>, <fpage>1019</fpage>&#x02013;<lpage>1031</lpage>. <pub-id pub-id-type="doi">10.1002/asi.20591</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2017</year>). <article-title>Accelerated local anomaly detection via resolving attributed networks</article-title>, in <source>International Joint Conference on Artificial Intelligence</source> (<publisher-loc>Melbourne, VIC</publisher-loc>), <fpage>2337</fpage>&#x02013;<lpage>2343</lpage>.</citation></ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>On interpretation of network embedding via taxonomy induction</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>London</publisher-loc>).</citation></ref>
<ref id="B49">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>V. W.</given-names></name> <name><surname>Zhao</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>F.</given-names></name> <name><surname>Chang</surname> <given-names>K. C.-C.</given-names></name> <name><surname>Wu</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Semantic proximity search on heterogeneous graph by proximity embedding</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>154</fpage>&#x02013;<lpage>160</lpage>.</citation></ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>V. W.</given-names></name> <name><surname>Zhao</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>F.</given-names></name> <name><surname>Chang</surname> <given-names>K. C.-C.</given-names></name> <name><surname>Wu</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Distance-aware dag embedding for proximity search on heterogeneous graphs</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>New Orleans, LA</publisher-loc>).</citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>L&#x000FC;</surname> <given-names>L.</given-names></name> <name><surname>Zhou</surname> <given-names>T.</given-names></name></person-group> (<year>2011</year>). <article-title>Link prediction in complex networks: a survey</article-title>. <source>Phys. A Stat. Mech. Appl.</source> <volume>390</volume>, <fpage>1150</fpage>&#x02013;<lpage>1170</lpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2010.11.027</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>Z.</given-names></name> <name><surname>Liang</surname> <given-names>S.</given-names></name> <name><surname>Bao</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Co-embedding attributed networks</article-title>, in <source>ACM International Conference on Web Search and Data Mining</source> (<publisher-loc>Melbourne, VIC</publisher-loc>).</citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Corrado</surname> <given-names>G.</given-names></name> <name><surname>Dean</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Efficient estimation of word representations in vector space</article-title>. <source>arXiv preprint arXiv:1301.3781</source>.</citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Montavon</surname> <given-names>G.</given-names></name> <name><surname>Samek</surname> <given-names>W.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name></person-group> (<year>2018</year>). <article-title>Methods for interpreting and understanding deep neural networks</article-title>. <source>Digital Signal Process</source>. <volume>17</volume>, <fpage>1</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1016/j.dsp.2017.10.011</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Monti</surname> <given-names>F.</given-names></name> <name><surname>Boscaini</surname> <given-names>D.</given-names></name> <name><surname>Masci</surname> <given-names>J.</given-names></name> <name><surname>Rodola</surname> <given-names>E.</given-names></name> <name><surname>Svoboda</surname> <given-names>J.</given-names></name> <name><surname>Bronstein</surname> <given-names>M. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Geometric deep learning on graphs and manifolds using mixture model CNNs</article-title>, in <source>The IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Honolulu, HI</publisher-loc>).</citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Natarajan</surname> <given-names>N.</given-names></name> <name><surname>Dhillon</surname> <given-names>I. S.</given-names></name></person-group> (<year>2014</year>). <article-title>Inductive matrix completion for predicting gene&#x02013;disease associations</article-title>. <source>Bioinformatics</source> <volume>30</volume>, <fpage>i60</fpage>&#x02013;<lpage>i68</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu269</pub-id><pub-id pub-id-type="pmid">24932006</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Niepert</surname> <given-names>M.</given-names></name> <name><surname>Ahmed</surname> <given-names>M.</given-names></name> <name><surname>Kutzkov</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>Learning convolutional neural networks for graphs</article-title>, in <source>International Conference on Machine Learning</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>2014</fpage>&#x02013;<lpage>2023</lpage>.</citation></ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ou</surname> <given-names>M.</given-names></name> <name><surname>Cui</surname> <given-names>P.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>W.</given-names></name></person-group> (<year>2016</year>). <article-title>Asymmetric transitivity preserving graph embedding</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>1105</fpage>&#x02013;<lpage>1114</lpage>.</citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>G.</given-names></name> <name><surname>Xie</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Social influence analysis in social networking big data: Opportunities and challenges</article-title>. <source>IEEE Netw.</source> <volume>31</volume>, <fpage>11</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1109/MNET.2016.1500104NM</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>Z.</given-names></name> <name><surname>Luo</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Zheng</surname> <given-names>Q.</given-names></name></person-group> (<year>2018</year>). <article-title>Anomalous: a joint modeling approach for anomaly detection on attributed networks</article-title>, in <source>International Joint Conference on Artificial Intelligence</source>, <fpage>3513</fpage>&#x02013;<lpage>3519</lpage>.</citation></ref>
<ref id="B61">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Perozzi</surname> <given-names>B.</given-names></name> <name><surname>Al-Rfou</surname> <given-names>R.</given-names></name> <name><surname>Skiena</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Deepwalk: Online learning of social representations</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>701</fpage>&#x02013;<lpage>710</lpage>.</citation></ref>
<ref id="B62">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Qiu</surname> <given-names>J.</given-names></name> <name><surname>Dong</surname> <given-names>Y.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Network embedding as matrix factorization: unifying deepwalk, LINE, PTE, and node2vec</article-title>, in <source>ACM International Conference on Web Search and Data Mining</source> (<publisher-loc>Los Angeles, CA</publisher-loc>), <fpage>459</fpage>&#x02013;<lpage>467</lpage>.</citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savage</surname> <given-names>D.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Yu</surname> <given-names>X.</given-names></name> <name><surname>Chou</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>Q.</given-names></name></person-group> (<year>2014</year>). <article-title>Anomaly detection in online social networks</article-title>. <source>Soc. Netw.</source> <volume>39</volume>, <fpage>62</fpage>&#x02013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1016/j.socnet.2014.05.002</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>C.</given-names></name> <name><surname>Hu</surname> <given-names>B.</given-names></name> <name><surname>Zhao</surname> <given-names>W. X.</given-names></name> <name><surname>Philip</surname> <given-names>S. Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Heterogeneous information network embedding for recommendation</article-title>. <source>IEEE Trans. Knowl. Data Eng</source>. <volume>31</volume>, <fpage>357</fpage>&#x02013;<lpage>370</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2018.2833443</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>J.</given-names></name> <name><surname>Malik</surname> <given-names>J.</given-names></name></person-group> (<year>2000</year>). <article-title>Normalized cuts and image segmentation</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>22</volume>, <fpage>888</fpage>&#x02013;<lpage>905</lpage>. <pub-id pub-id-type="doi">10.1109/34.868688</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>X.</given-names></name> <name><surname>Tseng</surname> <given-names>B. L.</given-names></name> <name><surname>Lin</surname> <given-names>C.-Y.</given-names></name> <name><surname>Sun</surname> <given-names>M.-T.</given-names></name></person-group> (<year>2006</year>). <article-title>Personalized recommendation driven by information flow</article-title>, in <source>International ACM SIGIR Conference on Research and Development in Information Retrieval</source> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>509</fpage>&#x02013;<lpage>516</lpage>.</citation></ref>
<ref id="B67">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sricharan</surname> <given-names>K.</given-names></name> <name><surname>Das</surname> <given-names>K.</given-names></name></person-group> (<year>2014</year>). <article-title>Localizing anomalous changes in time-evolving graphs</article-title>, in <source>ACM SIGMOD International Conference on Management of Data</source> (<publisher-loc>Snowbird, UT</publisher-loc>), <fpage>1347</fpage>&#x02013;<lpage>1358</lpage>.</citation></ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taheri</surname> <given-names>A.</given-names></name> <name><surname>Gimpel</surname> <given-names>K.</given-names></name> <name><surname>Berger-Wolf</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning graph representations with recurrent neural network autoencoders</article-title>. <source>arXiv preprint arXiv:1805.07683.2018</source>.</citation></ref>
<ref id="B69">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Qu</surname> <given-names>M.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>M.</given-names></name> <name><surname>Yan</surname> <given-names>J.</given-names></name> <name><surname>Mei</surname> <given-names>Q.</given-names></name></person-group> (<year>2015</year>). <article-title>Line: Large-scale information network embedding</article-title>, in <source>International Conference on World Wide Web</source> (<publisher-loc>Florence</publisher-loc>), <fpage>1067</fpage>&#x02013;<lpage>1077</lpage>.</citation></ref>
<ref id="B70">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>M.</given-names></name> <name><surname>Nie</surname> <given-names>F.</given-names></name> <name><surname>Jain</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Capped LP-norm graph embedding for photo clustering</article-title>, in <source>ACM Multimedia Conference</source> (<publisher-loc>Amsterdam</publisher-loc>), <fpage>431</fpage>&#x02013;<lpage>435</lpage>.</citation></ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>C. C.</given-names></name></person-group> (<year>2012</year>). <article-title>Ranking user influence in healthcare social media</article-title>. <source>ACM Trans. Intell. Syst. Technol.</source> <volume>3</volume>:<fpage>73</fpage>. <pub-id pub-id-type="doi">10.1145/2337542.2337558</pub-id></citation></ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thekumparampil</surname> <given-names>K. K.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Oh</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>L.-J.</given-names></name></person-group> (<year>2018</year>). <article-title>Attention-based graph neural network for semi-supervised learning</article-title>. <source>arXiv preprint arXiv:1803.03735</source>.</citation></ref>
<ref id="B73">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tian</surname> <given-names>F.</given-names></name> <name><surname>Gao</surname> <given-names>B.</given-names></name> <name><surname>Cui</surname> <given-names>Q.</given-names></name> <name><surname>Chen</surname> <given-names>E.</given-names></name> <name><surname>Liu</surname> <given-names>T.-Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Learning deep representations for graph clustering</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>Quebec City, QC</publisher-loc>), <fpage>1293</fpage>&#x02013;<lpage>1299</lpage>.</citation></ref>
<ref id="B74">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Velickovic</surname> <given-names>P.</given-names></name> <name><surname>Cucurull</surname> <given-names>G.</given-names></name> <name><surname>Casanova</surname> <given-names>A.</given-names></name> <name><surname>Romero</surname> <given-names>A.</given-names></name> <name><surname>Lio</surname> <given-names>P.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Graph attention networks</article-title>, in <source>Proceedings of International Conference on Learning Representation</source> (<publisher-loc>Vancouver, BC</publisher-loc>).</citation></ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vishwanathan</surname> <given-names>S. V. N.</given-names></name> <name><surname>Schraudolph</surname> <given-names>N. N.</given-names></name> <name><surname>Kondor</surname> <given-names>R.</given-names></name> <name><surname>Borgwardt</surname> <given-names>K. M.</given-names></name></person-group> (<year>2010</year>). <article-title>Graph kernels</article-title>. <source>J. Mach. Learn. Res.</source> <volume>11</volume>, <fpage>1201</fpage>&#x02013;<lpage>1242</lpage>.</citation></ref>
<ref id="B76">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Cui</surname> <given-names>P.</given-names></name> <name><surname>Zhu</surname> <given-names>W.</given-names></name></person-group> (<year>2016</year>). <article-title>Structural deep network embedding</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>San Francisco, Ca</publisher-loc>), <fpage>1225</fpage>&#x02013;<lpage>1234</lpage>.</citation></ref>
<ref id="B77">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Cui</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Zhu</surname> <given-names>W.</given-names></name> <name><surname>Yang</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Community preserving network embedding</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>San Francisco, Ca</publisher-loc>), <fpage>203</fpage>&#x02013;<lpage>209</lpage>.</citation></ref>
<ref id="B78">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>X.</given-names></name> <name><surname>Xu</surname> <given-names>L.</given-names></name> <name><surname>Cao</surname> <given-names>B.</given-names></name> <name><surname>Yu</surname> <given-names>P. S.</given-names></name></person-group> (<year>2017</year>). <article-title>Cross view link prediction by learning noise-resilient representation consensus</article-title>, in <source>International Conference on World Wide Web</source> (<publisher-loc>Perth, WA</publisher-loc>), <fpage>1611</fpage>&#x02013;<lpage>1619</lpage>.</citation></ref>
<ref id="B79">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>D.</given-names></name> <name><surname>Sun</surname> <given-names>M.</given-names></name> <name><surname>Chang</surname> <given-names>E. Y.</given-names></name></person-group> (<year>2015</year>). <article-title>Network representation learning with rich text information</article-title>, in <source>International Joint Conference on Artificial Intelligence</source> (<publisher-loc>Buenos Aires</publisher-loc>), <fpage>2111</fpage>&#x02013;<lpage>2117</lpage>.</citation></ref>
<ref id="B80">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Cohen</surname> <given-names>W.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Revisiting semi-supervised learning with graph embeddings</article-title>, in <source>International Conference on Machine Learning</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>40</fpage>&#x02013;<lpage>48</lpage>.</citation></ref>
<ref id="B81">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ying</surname> <given-names>R.</given-names></name> <name><surname>He</surname> <given-names>R.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Eksombatchai</surname> <given-names>P.</given-names></name> <name><surname>Hamilton</surname> <given-names>W. L.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2018a</year>). <article-title>Graph convolutional neural networks for web-scale recommender systems</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>London</publisher-loc>), <fpage>974</fpage>&#x02013;<lpage>2681</lpage>.</citation></ref>
<ref id="B82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ying</surname> <given-names>R.</given-names></name> <name><surname>You</surname> <given-names>J.</given-names></name> <name><surname>Morris</surname> <given-names>C.</given-names></name> <name><surname>Ren</surname> <given-names>X.</given-names></name> <name><surname>Hamilton</surname> <given-names>W. L.</given-names></name> <name><surname>Leskovec</surname> <given-names>J.</given-names></name></person-group> (<year>2018b</year>). <article-title>Hierarchical graph representation learning withdifferentiable pooling</article-title>. <source>arXiv preprint arXiv:1806.08804</source>.</citation></ref>
<ref id="B83">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>W.</given-names></name> <name><surname>Cheng</surname> <given-names>W.</given-names></name> <name><surname>Aggarwal</surname> <given-names>C. C.</given-names></name> <name><surname>Zhang</surname> <given-names>K.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks</article-title>, in <source>ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>London</publisher-loc>), <fpage>2672</fpage>&#x02013;<lpage>2681</lpage>.</citation></ref>
<ref id="B84">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Gao</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Scalable graph embedding for asymmetric proximity</article-title>, in <source>AAAI Conference on Artificial Intelligence</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>2942</fpage>&#x02013;<lpage>2948</lpage>.</citation></ref>
</ref-list>
</back>
</article>