Deep Representation Learning for Social Network Analysis

Social network analysis is an important problem in data mining. A fundamental step for analyzing social networks is to encode network data into low-dimensional representations, i.e., network embeddings, so that the network topology structure and other attribute information can be effectively preserved. Network representation leaning facilitates further applications such as classification, link prediction, anomaly detection, and clustering. In addition, techniques based on deep neural networks have attracted great interests over the past a few years. In this survey, we conduct a comprehensive review of the current literature in network representation learning, utilizing neural network models. First, we introduce the basic models for learning node representations in homogeneous networks. We will also introduce some extensions of the base models, tackling more complex scenarios such as analyzing attributed networks, heterogeneous networks, and dynamic networks. We then introduce techniques for embedding subgraphs and also present the applications of network representation learning. Finally, we discuss some promising research directions for future work.


INTRODUCTION
Social networks, such as Facebook, Twitter and Linkedin, have greatly facilitated communications between web users around the world.The analysis of social networks helps summarize the interests and opinions of users (nodes), discovering patterns from the interactions (links) between users, and mining the events that take place in online platforms.The information obtained by analyzing social networks could be especially valuable for many applications.Some typical examples include online advertisement targeting (Li et al., 2015), personalized recommendation (Song et al., 2006), viral marketing (Chen et al., 2010;Leskovec et al., 2007), social healthcare (Tang and Yang, 2012), social influence analysis (Peng et al., 2017), academic networks analysis (Dietz et al., 2007;Guo et al., 2014).
One central problem in social network analysis is how to extract useful features from non-Euclidean structured networks, to enable the deployment of downstream machine learning prediction models for specific analysis.For example, in the case of recommending new friends to a user in a social network, the key challenge might be how to embed network users into a low-dimensional space so that the closeness between users could be easily measured with distance metrics.To process the structure information in networks, most previous efforts mainly rely on hand-crafted features, such as kernel functions (Vishwanathan et al., 2010), graph statistics (i.e., degrees or clustering coefficients) (Bhagat et al., 2011), or other carefully engineered features (Liben-Nowell and Kleinberg, 2007).However, such feature engineering process could be very time-consuming and expensive, making it ineffective for many real-world applications.An alternative way to avoid this limitation is to automatically learn feature representations that capture various information sources in networks (Bengio et al., 2013;Liao et al., 2018).The goal is to learn a transformation function that maps nodes, subgraphs or even the whole network as vectors to a low-dimensional feature space, where the spatial relations between the vectors reflect the structures or contents in the original network.Given these feature vectors, subsequent machine learning models such as classification models, clustering models and outlier detection models could be directly used towards target applications.
Along with the substantial performance improvement gained by deep learning on image recognition, text mining, and natural language processing tasks (Bengio et al., 2009), developing network representation methods using neural network models have received increasing attentions in recent years.In this survey, we provide a comprehensive overview of recent advancements in network representation learning using neural network models.After introducing the notations and problem definitions, we first review the basic representation learning models for node embedding in homogeneous networks.Specifically, based on the type of representation generation modules, we divide the existing approaches into three categories: embedding look-up based, autoencoder based and graph convolution based.Then, we give an overview of approaches that learn representations for subgraphs in networks, which to some extent rely on the techniques of node representation learning.After that, we list some applications of network representation models.At the end, we discuss some promising research directions for future work.

NOTATIONS AND PROBLEM DEFINITIONS
In this section, we define some important terminologies that will be used in later sections, and then give the formal definition of network representation learning problem.In general, we use boldface uppercase letters (e.g., A) to denote matrices, boldface lowercase letters (e.g., a) to denote vectors, and lowercase letters (e.g., a) to denote scalars.The (i, j) entry, the i-th row and the j-th column of a matrix A is denoted as A ij , A i * and A * j , respectively.
Definition 1 (Network).Let G = {V, E, X, Y} be a network, where the i-th node (or vertex) is denoted as v i ∈ V and e i,j ∈ E denotes the edge between node v i and v j .X and Y are node attributes and labels, if available.Besides, we let A ∈ R N ×N denote the associated adjacency matrix of G.A ij is the weight of e i,j , where A ij > 0 indicates that the two nodes are connected, and otherwise A ij = 0.For undirected graphs, A ij = A ji .
In many scenarios, the nodes and edges in G can also be associated with type information.Let τ v : V → T v be a node-type mapping function and τ e : E → T e be an edge-type mapping function, where T v and T e denote the set of node and edge types, respectively.Here, each node v i ∈ V has one specific type, e.g., Definition 2 (Homogeneous Network).A homogeneous network is a network in which All nodes and edges in G belong to one single type.

Definition 3 (Heterogeneous Network
There are at least two different types of nodes or edges in heterogeneous networks. Given a network G, the task of network representation learning is to train a mapping function f that maps certain components in G, such as nodes or subgraphs, into a latent space.Let D be the dimension of the Definition 4 (Node Representation Learning).Suppose z ∈ R D denote the latent vector of node v, node representation learning aims to build a mapping function f so that z = f (v).It is expected that nodes with similar roles or characteristics, which is defined according to specific application domains, are mapped close to each other in the latent space.
Definition 5 (Subgraph Representation Learning).Let g denote a subgraph of G.The nodes and edges in g are denoted as V S and E S , respectively, and we have V S ⊂ V and E S ⊂ E. The subgraph representation learning aims to learn a mapping function f so that z = f (g), where in this case z ∈ R D corresponds to the latent vector of g.
Figure 1 shows a toy example of network embedding.There are three subgraphs in this network distinguished with different colors: Given a network as input, the example below generates one representation for each node, as well as for each of the three subgraphs.

NEURAL NETWORK BASED MODELS
Neural networks have been demonstrated to have powerful capabilities in capturing complex patterns in data, and have achieved substantial success in the fields of computer vision, audio recognition and natural language processing, etc.Recently, some efforts have been made to extend neural network models to learn representations from network data.Based on the type of base neural networks that are applied, we categorize them into three subgroups: look-up table based models, autoencoder based models, and GCN based models.In this section, we first give an overview of network representation learning from the perspective of encoding and decoding.Then we discuss the details of some well-known network embedding models and how they fulfill the two steps.In this section, we only discuss representation learning for nodes.The models dealing with subgraphs will be introduced in later sections.

Framework Overview from the Encoder-Decoder Perspective
In order to elaborate the diversity of various neural network architectures, we argue that different techniques can be derived from the aspect of encoding and decoding schema, as well as their target network structure constrained for low dimensional feature space.Specifically, existing methods can be reduced to solving the following optimization problem: where Φ tar is the target relations that the embedding algorithm expects to preserve, and V φ denotes the nodes involved in φ. ψ enc : V → R D is the encoding function that maps nodes into representation vectors, and ψ dec is a decoding function that reconstructs the original network structure from the representation space.Ψ denotes the trainable parameters in encoders and decoders.By minimizing the loss function above, model parameters are trained so that the desired network structure Ψ tar are preserved.As we will show in subsequent sections, from the overview framework aspect, the primary distinctions between various network representation methods rely on how they define the three components.

Models with Embedding Look-up Tables
Instead of using multiple layers of nonlinear transformation, network representation learning could be achieved simply using look-up tables which directly map a node index into its corresponding representation vector.Specifically, a look-up table could be implemented using a matrix, where each row corresponds to the representation of one node.The diversity of different models mainly lies in the definition of target relations in the network data that we hope to preserve.In the rest of this subsection, we will first introduce DeepWalk (Perozzi et al., 2014) to discuss the basic concepts and techniques in network embedding, and then extend the discussion to more complex and practical scenarios.
Skip-Gram Based Models.As a pioneering network representation model, DeepWalk treats nodes as words, samples random walks as sentences, and utilizes the skip-gram model (Mikolov et al., 2013) to learn the representations of nodes as shown in Figure 2. In this case, the encoder ψ enc is implemented as two embedding look-up tables Z ∈ R N ×D and Z c ∈ R N ×D , respectively for target embeddings and context embeddings.The network information φ ∈ Φ tar that we try to preserve is defined as the node-context pairs (v i , N (v i )) observed in the random walks, where N (v i ) denotes the context nodes (or neighborhood) of v i .The objective is to maximize the probability of observing a node's neighborhood conditioned on embeddings: where e i is a one-hot row vector of length N that picks the i-th row of Z.Let z i = e i Z and z c j = e j Z c , the conditional probability above is formulated as so that ψ dec could be regarded as link reconstruction based on the normalized proximity between different nodes.In practice, the computation of the probability is expensive due to the summation over every node in the network, but hierarchical softmax or negative sampling can be applied to reduce time complexity.
There are also some approaches that are developed based on similar ideas.LINE (Tang et al., 2015) defines the first-order and second-order proximity for learning node embedding, where the latter can be seen as a special case of DeepWalk with context window length set as 1.Meanwhile, node2vec (Grover and Leskovec, 2016) applies different random walk strategies, which provides a trade-off between breadth-first search (BFS) and depth-first search (DFS) in networks search strategies.Planetoid (Yang et al., 2016) extends skip-gram models for semi-supervised learning, which predicts the class label of nodes along with the context in the input network data.In addition, it has been shown that there exists a close relationship between skip-gram models and matrix factorization algorithms (Qiu et al., 2018;Levy and Goldberg, 2014).Therefore, network embedding models that utilize matrix factorization techniques, such as LE (Belkin and Niyogi, 2002), Grarep (Cao et al., 2015), and HOPE (Ou et al., 2016), may also be implemented in the similar manner.Random sampling based approaches have the capacity to allow a flexible and stochastic measure of node similarity, making them not only achieve higher performance in many applications but also become more scalable toward large-scale datasets.
Attributed Network Embedding Models.Social networks are rich in side information, where nodes could be associated with various attributes that characterize their properties.Inspired by the idea of inductive matrix completion (Natarajan and Dhillon, 2014), TADW (Yang et al., 2015) extends the framework of DeepWalk by incorporating features of vertices into network representation learning.Besides sampling from plain networks, FeatWalk (Huang et al., 2019) proposes a novel feature-based random walk strategy to generate node sequences by considering node similarity on attributes.With the random walks based on both topological and attribute information, the skip-gram model is then applied to learn node representations.
Heterogeneous Network Embedding Models.Nodes in networks could be of different types, which poses the challenge of how to preserve relations among them.HERec (Shi et al., 2019) and metapath2vec++ (Dong et al., 2017) propose meta-path based random walk schema to discover the context across different types of nodes.The skip-gram architecture in metapath2vec++ is also modified, so that the normalization term in softmax only consider the nodes of the same type as the target node.In a more complex scenario where we have both nodes and attributes of different types, HNE (Chang et al., 2015) combines feed-forward neural networks and embedding models towards a unified framework.Suppose z a and z b denote the latent vectors of two different types of nodes, HNE defines two additional transformation matrices U and V to respectively map z a and z b to the joint space.Let v i , v j ∈ V a and v k , v l ∈ V b , intra-type node similarity and inter-type node similarity are defined as where we hope to preserve various types of similarities during training.As for obtaining z a and z b , HNE applies different feed-forward neural networks to map raw input (e.g., images and texts) to latent spaces, thus enables an end-to-end training framework.Specifically, the authors use a CNN to process images and a fully-connected neural network to process texts.
Dynamic Embedding Models.Real world social networks are not static and will evolve over time with addition/deletion of nodes and links.To deal with this challenge, DNE (Du et al., 2018a) presents a decomposable objective to learn the representation of each node separately, where the impact of network changes on existing nodes is measurable and the greatly affected nodes will be chosen for update as learning process proceeds.In addition, DANE (Li et al., 2017b) leverages matrix perturbation theory for tackling online embedding updates.

Autoencoder Techniques
In this section, we discuss network representation models based on the autoencoder architecture (Hinton and Salakhutdinov, 2006;Bengio et al., 2013).As shown in Figure 3, an autoencoder consists of two neural network modules: encoder and decoder.The encoder ψ enc maps the features of each node into a latent space, and the decoder ψ doc reconstructs the information about the network from the latent space.Usually the hidden representation layer has a smaller size than that of the input/output layer, forcing it to create a compressed representation that captures the non-linear structure of network.Formally, following Equation 1, the objective function of autoencoder is to minimize the reconstruction error between the input and the output decoded from low-dimensional representations.
Deep Neural Graph Representation (DNGR).DNGR (Cao et al., 2016) attempts to preserve a node's local neighborhood information using a stacked denoising autoencoder.Specifically, assume S is the PPMI matrix (Bullinaria and Levy, 2007) constructed from A, then DNGR minimizes the following loss: where S i * ∈ R |V| denotes the associated neighborhood information of v i .In this case, Φ tar = {S i * } v i ∈V and DNSR targets to reconstruct the PPMI matrix.z i is the embedding of node v i in hidden layer.
The first term is an autoencoder as in Equation 5, except that the recostruction error is weighted, so that more emphasis is put on recovering non-zero entries in S i * .The second part is motivated by Laplacian Eigenmaps that imposes nearby nodes to have similar embeddings.Besides, SDNE differs from DNGR in the definition of S, where DNGR defines S as the PPMI matrix while SDNE sets S as the adjacency matrix.
It is worth noting that, unlike in Equation 2 that uses one-hot indicator vector for embedding look-up, DNGR and SDNE transform each node's information to an embedding by training neural network modules.Such distinction allows autoencoder-based methods to directly model on a node's neighborhood structure and features, which is not straightforward for random walk approaches.Therefore, it is straightforward to incorporate richer information sources (e.g., node attributes) into representation learning, as to be introduced below.However, autoencoder-based methods may suffer from scalability issues as the input dimension is |V|, which may result in significant time costs in real massive datasets.
Autoencoder-Based Attributed Network Embedding.The structure of autoencoders facilitates the incorporation of multiple information sources towards joint representation learning.Instead of only mapping nodes to the latent space, CAN (Meng et al., 2019) proposes to learn the representation of nodes and attributes in the same latent space by using variational autoencoders (VAEs) (Doersch, 2016), in order to capture the affinities between nodes and attributes.DANE (Gao and Huang, 2018) utilizes the correlation between topological and attribute information of nodes by building two autoencoders for each information source, and then encourages the two sets of latent representations to be consistent and complementary.(Li et al., 2017a) adopts another strategy, where topological feature vector and content information vector (learned by doc2vec (Le and Mikolov, 2014)) are directly concatenated and put into a VAE to capture the nonlinear relationship between them.

Graph Convolutional Approaches
Inspired by the significant performance improvement of convolutional neural networks (CNN) in image recognition, recent years have witnessed a surge in adapting convolutional modules to learn representations of network data.The intuition behind is to generate node embedding by aggregating information from its local neighborhood as shown in Figure 4. Different from autoencoder-based approaches, the encoding function of graph convolutional approaches leverages a node's local neighborhood as well as attribute information.Some efforts (Bruna et al., 2013;Henaff et al., 2015;Defferrard et al., 2016;Hamilton et al., 2017a) have been made to extend traditional convolutional networks for network data to generate network embedding in the past few years.The convolutional filters of these approaches are either spatial filters or spectral filters.Spatial filters operate directly on the adjacency matrix whereas spectral filters operate on the spectrum of graph Laplacian (Defferrard et al., 2016).
Graph Convolutional Networks (GCN).GCN (Bronstein et al., 2017) is a well-known semi-supervised graph convolutional networks.It defines a convolutional operator on network, and iteratively aggregates embeddings of neighbors of a node and uses the aggregated embedding as well as its own embedding at previous iteration to generate the node's new representation.The layer-wise propagation rule of encoding function ψ enc is defined as: where H k−1 denotes the learned embeddings in layer k − 1, and H 0 = X.Â = (I G + A) is the adjacency matrix with added self-connections.I G is the identity matrix, Dii = j Âij .W k−1 is a layer-wise trainable weight matrix.σ(•) denotes an activation function such as ReLU.The loss function for supervised training is to evaluate the cross-entroy error over all labeled nodes: where Ŷ ∈ R N ×F is the predictive matrix with F candidate labels.ψ dec (•) can be viewed as a fullyconnected network with the softmax activation function to map representations to predicted labels.Note that unlike autoencoders that explicitly treat each node's neighborhood as features or reconstruction goals as in Equation 5or Equation 6, GCN implicitly applies the local neighborhood links on each encoding layer as pathways to aggregate embeddings from neighbors, so that higher order network structures are utilized.Since Equation 8 is a supervised loss function, Φ tar is not applicable here.However, the loss function can also be formulated in unsupervised manners, similar to the skip-gram model (Hamilton et al., 2017a;Kipf and Welling, 2016).GCN may suffer from the scalability problem when the size of A is large.The corresponding training algorithms have been proposed to tackle this challenge (Ying et al., 2018a), where the network data is processed in small batches and we can sample a node's local neighbors instead of using all of them.
Inductive Training With GCN.So far many basic models we have reviewed mainly generate network representations in a transductive manner.GraphSAGE (Hamilton et al., 2017a) emphasized the inductive capability of GCN.Inductive learning is essential for high-throughput machine learning systems, especially when operating on evolving networks that constantly encounter unseen nodes (Yang et al., 2016;Guo et al., 2018).The core representation update scheme of GraphSAGE is similar to that of traditional GCN, except that the operation on the whole network is replace by sample-based representation aggregators: where h k i is the hidden representation of node v i in the k-th layer.CONCAT denotes concatenation operator and AGGREGATE k represents neighborhood aggregation function of the k-th layer (e.g., element-wise mean or max operator).N (v i ) denotes the neighbors of v i .Compared with Equation 7, GraphSAGE only needs to aggregate feature vectors from the partial set of neighbors, making it scalable for large-scale data.Given the attribute features and neighborhood relations of an unseen node, GraphSAGE can generate the embedding of this node by leveraging its local neighbors as well as attributes via forward propagation.
Graph Attention Mechanisms.Attention mechanisms have become the standard technique in many sequence-based tasks, in order to make models focus on the most relevant parts of the input in making decisions.We could also utilize attention mechanisms to aggregate the most important features from nodes' local neighbors.GAT (Velickovic et al., 2017) extends the framework of GCN by replacing the standard aggregation function with an attention layer to aggregate message from most important neighbors.Also, (Thekumparampil et al., 2018) proposes to remove all intermediate fully-connected layers in conventional GCN, and replace the propagation layers with attention layers.It thus allows the model to learn a dynamic and adaptive local summary of neighborhoods, greatly reduces the parameters, and also achieves more accurate predictions.

SUBGRAPH EMBEDDING
Besides learning representations for nodes, recent years have also witnessed an increasing branch of research efforts that try to learn representations for a set of nodes and edges as an integral.Thus, the goal is to represent a subgraph with a low-dimensional vector.Many traditional methods that operate on subgraphs rely on graph kernels (Haussler, 1999), which decompose a network into some atomic substructures such as graphlets, subtree patterns and paths, and treat these substructures as features to obtain an embedding through further transformation.In this section, however, we focus on reviewing methods that seek to automatically learn embeddings of subgraphs using deep models.For those who are interested in graph kernels, we refer the readers to (Vishwanathan et al., 2010).
According to the literature, most existing methods are built upon the techniques used for node embedding, as introduced in Section 3.However, in graph representation problems, the label information is associated with particular subgraphs instead of individual nodes or links.In this survey, we divide the approaches of subgraph representation learning into two categories based on how they aggregate node-level embeddings in each subgraph.The detailed discussion for each category is as below.

Flat Aggregation
Assume V S denotes the set of nodes in a particular subgraph and z S represents the subgraph's embedding, z S could be obtained by aggregating the embeddings of all individual nodes in the subgraph: where ψ aggr denotes the aggregation function.Methods based on such flat aggregation usually define ψ aggr that captures simple correlations among nodes.For example, (Niepert et al., 2016) directly concatenates node embeddings together and utilize standard convolutional neural networks as aggregation function to generate graph representation.(Dai et al., 2016) employs a simple element-wise summation operation to define ψ aggr , and learns graph embedding by summing all embeddings of individual nodes.
In addition, some methods apply recurrent neural networks (RNNs) for representing graphs.Some typical methods first sample a number of graph sequences from the input network, and then apply RNN-based autoencoders to generate embedding for each graph sequence.The final graph representation is obtained by either averaging (Jin et al., 2018) or concatenating (Taheri et al., 2018) these graph sequence embeddings.

Hierarchical Aggregation
In contrast to flat aggregation, the motivation behind hierarchical aggregation is to preserve the hierarchical structure that might be presented in the subgraph by aggregating neighborhood information via a hierarchical way.(Bruna et al., 2013) and (Defferrard et al., 2016) attempt to utilize such hierarchical structure of networks by combining convolutional neural networks with graph coarsening.The main idea behind them is to stack multiple graph coarsening and convolutional layers.In each layer, they first apply graph cluster algorithms to group nodes, and then merge node embeddings within each cluster using element-wise max-pooling.After clustering, they generate a new coarse network by stacking embeddings of clusters together, which is again fed into convolutional layers and the same process repeats.Clusters in each layer can be viewed as subgraphs, and cluster algorithms are used to learn the assignment matrix of subgraphs, so that the hierarchical structure of network is also propagated through layers.Although these methods work well in certain applications, they actually follow a two-stage fashion, where the stages of clustering and embedding may not reinforce each other.
To avoid this limitation, DiffPool (Ying et al., 2018b) proposes an end-to-end model that does not depend on a deterministic clustering subroutine.The layer-wise propagation rule is formulated as below: where is the cluster assignment matrix learned from the previous layer.The goal of the left equation is to generate the (k + 1)-th coarser network embedding M (k+1) by aggregating node embeddings according to cluster assignment C (k) ; while the right equation is to learn a new coarsened adjacency matrix A (k+1) ∈ R N k+1 ×N k+1 from the previous adjacency matrix A (k) , which stores the similarity between each pair of clusters.Here, instead of applying deterministic clustering algorithm to learn C (k) , they adopt graph neural networks (GNNs) to learn it.Specifically, they use two separate GNNs on the input embedding matrix M (k) and coarsened adjacency matrix A (k) to generate assignment matrix C (k) and embedding matrix Z (k) , respectively.Formally, The two steps could reinforce each other to improve the performance.DiffPool may suffer from computational issues brought by the computation of soft clustering assignment, which is further addressed in (Cangea et al., 2018).

APPLICATIONS
The representations learned from networks can be easily applied to downstream machine learning models for further analysis on social networks.Some common applications include node classification, link prediction, anomaly detection and clustering.

Node Classification
In social networks, people are often associated with semantic labels with respect to certain aspects of them, such as affiliations, interests or beliefs.However, in real-world scenarios, people are usually partially or sparsely labeled, since labeling is expensive and time consuming.The goal of node classification is to predict labels of unlabeled nodes in networks by leveraging their connections with the labeled ones considering the network structure.According to (Bhagat et al., 2011), existing methods can be classified into two categories, e.g., random walk based, and feature extraction based methods.The former aims to propagate labels with random walks (Baluja et al., 2008), while the latter targets to extract features from a node's surrounding information and network statistics.
In general, network representation approach follows the second principle.A number of existing network representation models, like (Yang et al., 2015;Wang et al., 2016;Liao et al., 2018), focus on extracting node features from network using representation learning techniques, and then apply machine learning classifiers like support vector machine, naive bayes classifiers, and logistic regression for prediction.In contrast to separating the steps of node embedding and node classification, some recent work (Hamilton et al., 2017a;Dai et al., 2016;Monti et al., 2017) designs a end-to-end framework to combine the two tasks, so that the discriminative information inferred from labels can directly benefit the learning of network embedding.

Link Prediction
Social networks are not necessarily complete as some links might be missing.For example, friendship links between two users in a social network can be missing even they actually know each other in real world.The goal of link prediction is to infer the existence of new interactions or emerging links between users in the future, based on the observed links and the network evolution mechanism (Lü and Zhou, 2011;Al Hasan and Zaki, 2011;Liben-Nowell and Kleinberg, 2007).In network embedding, an effective model is expected to preserve both network structure and inherent dynamics of the network in the low-dimensional space.In general, the majority of previous work focus on predicting missing links between users under homogeneous network settings (Grover and Leskovec, 2016;Ou et al., 2016;Zhou et al., 2017), and some efforts also attempt to predict missing links in heterogeneous networks (Liu et al., 2017b(Liu et al., , 2018b)).Although beyond the scope of this survey, applying network embedding for building recommender systems (Ying et al., 2018a) may also be a direction that is worth exploring.

Anomaly Detection
Another challenging task in social network analysis is anomaly detection.Malicious activities in social networks, such as spamming, fraud and phishing, can be interpreted as rare or unexpected behaviors that deviate from the majority of normal users.While numerous algorithms have been proposed for spotting anomalies and outliers in networks (Savage et al., 2014;Akoglu et al., 2015;Liu et al., 2017a), anomaly detection methods based on network embedding techniques are receiving increasing attentions recently (Hu et al., 2016;Peng et al., 2018;Liang et al., 2018).The discrete and structural information in networks are merged and projected into the continuous latent space, which facilitates the application of various statistical or geometrical algorithms in measuring the degree of isolation or outlierness of network components.In addition, in contrast to detect malicious activities in a static way, (Sricharan and Das, 2014) and (Yu et al., 2018) also attempt to study the problem in dynamic networks.

Node Clustering
In addition to the above applications, node clustering is another important network analysis problem.The target of node clustering is to partition a network into a set of clusters (or subgraphs), so that nodes in the same cluster are more similar to each other than those from other clusters.In social networks, such clusters are widely spread in terms of communities, such as groups of people that belong to similar affiliations or have similar interests.Most previous work focuses on clustering networks with various metrics of proximity or connection strength between nodes.For examples, (Shi and Malik, 2000) and (Ding et al., 2001) seek to maximize the number of connections within clusters while minimize the connections between clusters.Recently, many efforts have resort to network representation techniques for node clustering.Some methods treat embedding and clustering as disjoint tasks, where they first embed nodes to low-dimensional vectors, and then apply traditional clustering algorithms to produce clusters (Tian et al., 2014;Cao et al., 2015;Wang et al., 2017).Other methods such as (Tang et al., 2016) and (Wei et al., 2017) consider the optimization problem of clustering and network embedding in a unified objective function and generate cluster-induced node embeddings.

CONCLUSION AND FUTURE DIRECTIONS
Recent years have witnessed a surge in leveraging representation learning techniques for network analysis.In this survey, we have provided a overview of the recent efforts on this topic.Specifically, we summarize existing techniques into three subgroups based on the type of the core learning modules: representation look-up tables, autoencoders and graph convolutional networks.Although many techniques have been developed for a wide spectrum of social networks analysis problems in the past few years, we believe there still remains many promising directions worth of further exploration.Dynamic networks.Social networks are inherently highly dynamic in real-life scenarios.The overall set of nodes, the underlying network structure, as well as attribute information, might evolve over time.As an example, these elements in real world social networks such as Facebook could correspond to users, connections and personal profiles.This property makes existing static learning techniques fail to work properly.Although several methods have been proposed to tackle dynamic networks, they often rely on certain assumptions, such as assuming that the node set is fixed and only deal with dynamics caused by edge deletion and addition (Li et al., 2017b).Also, the changes in attribute information are rarely considered in existing works.Therefore, how to design effective and efficient network embedding techniques for truly dynamic networks is still an open question.
Hierarchical network structure.Most of the existing techniques mainly focus on designing advanced encoding or decoding functions trying to capture node pairwise relationships.Nevertheless, pairwise relations can only provide insights about local neighborhoods, and might not infer global hierarchical network structures, which however is crucial for more complex networks (Benson et al., 2016).How to design effective network embedding methods that are capable of preserving hierarchical structures of networks is an promising direction for further work.
Heterogeneous networks.Existing network embedding methods mainly deal with homogeneous networks.However, many relational systems in real-life scenarios can be abstracted as heterogeneous networks with multiple types of nodes or edges.In this case, it is hard to evaluate semantic proximity between different network elements in the low-dimensional space.While some work have investigated the use of metapaths (Dong et al., 2017;Huang and Mamoulis, 2017) to approximate semantic similarity for heterogeneous network embedding, many tasks on heterogeneous networks have not been fully evaluated.
Learning embeddings for heterogeneous networks is still at the early stage, more comprehensive techniques are needed to fully capture the relations between different types of network elements, towards modeling more complex real systems.
Scalability.Although deep learning based network embedding methods have achieved substantial performances due to their great capacities, they still suffer from the problem of efficiency.This problem will become more severe when dealing with real-life massive datasets with billions of nodes and edges.Designing deep representation learning frameworks that are scalable for real network datasets is another driving factor to advance the research on this domain.In addition, similar to using GPUs for traditional deep models built on grid structured data, developing computational paradigms for large-scale network processing could be an alternative way towards efficiency improvement (Bronstein et al., 2017).
Interpretability.Despite the superior performances achieved by deep models, one fundamental limitation of them is the lack of interpretability (Liu et al., 2018a).Different dimensions in the embedding space usually have no specific meaning, thus it is difficulty to comprehend the underlying factors that have been preserved in the latent space.Since the interpretability aspect of machine learning models is receiving more and more attentions recently (Montavon et al., 2018;Du et al., 2018b), it might also be important to explore how to understand the representation learning outcome, how to develop interpretable network representation learning models, as well as how to utilize interpretation to improve the representation models.Answering these questions is helpful for learning more meaningful and task-specific embeddings towards various social network analysis problems.

Figure 1 .
Figure 1.A toy example of node representation learning and subgraph representation learning (best viewed in color).There are three subgraphs in the input network denoted by different colors.The target of node embedding is to generate one representation for each individual node, while subgraph embedding is to learn one representation for an entire subgraph.

Figure 2 .
Figure 2. Building blocks of models with embedding look-up tables.There are two key components of these work: sampling and generating.The primary distinctions between different methods under this line rely on how to define the two components.

Figure 3 .
Figure 3.An example of autoencoder-based network representation algorithms.Rows of the proximity matrix S ∈ R |V|×|V| are fed into the autoencoder to learn and generate embeddings Z ∈ R |V|×D at the hidden layer.

Figure 4 .
Figure 4.An overview of graph convolutional networks.The dashed rectangles denote node attributes.The representation of each individual node (e.g., node C) is aggregated from its immediate neighbors (e.g., node A, B, D, E), concatenated with the lower-layer representation of itself.

Qiaoyu Tan et al. Deep Representation Learning for Social Network Analysis latent
space and usually D |V|.In this work, we focus on the problem of node representation learning and subgraph representation learning.