The Network Nullspace Property for Compressed Sensing of Big Data over Networks

We adapt the nullspace property of compressed sensing for sparse vectors to semi-supervised learning of labels for network-structured datasets. In particular, we derive a sufficient condition, which we term the network nullspace property, for convex optimization methods to accurately learn labels which form smooth graph signals. The network nullspace property involves both the network topology and the sampling strategy and can be used to guide the design of efficient sampling strategies, i.e., the selection of those data points whose labels provide the most information for the learning task.


I. INTRODUCTION
We introduce a novel recovery condition, termed the network nullspace property (NNSP), which guarantees accurate recovery of clustered ("piece-wise constant") graph signals from knowledge of its values on a small subset of sampled nodes. The NNSP couples the clustering structure of the underlying data graph to the locations of the sampled nodes via interpreting the underlying graph as a flow network.
The presented results apply to an arbitrary partitioning, but are most useful for a partitioning such that nodes in the same cluster are connected with edges of relatively large weights, whereas edges between clusters have low weights. Our analysis reveals that if cluster boundaries are wellconnected (in a sense made precise) to the sampled nodes, then accurate recovery of clustered graph signals is possible by solving a convex optimization problem.
Most of the existing work applies spectral graph theory to define a notion of band-limited graph signals, e.g. based on principal subspaces of the graph Laplacian matrix, as well as sufficient conditions for recoverability, i.e., sampling theorems, for those signals [4], [16]. In contrast, our approach does not rely on spectral graph theory, but involves structural (connectivity) properties of the underlying data graph.
The problem setup considered in this work is very similar to those of [18], [21], which provide sufficient conditions such that a variant of the Lasso method accurately recovers smooth graph signals from noisy observations. However, in contrast to this line of work, we assume the graph signal values are only observed on a small subset of nodes.

II. PROBLEM FORMULATION
Many important applications involve massive heterogeneous datasets comprised heterogeneous data chunks, e.g., mixtures of audio, video and text data [5]. Moreover, datasets typically contain mostly unlabeled data points; only a small fraction is labeled data. An efficient strategy to handle such heterogenous datasets is to organize them as a network or data graph whose nodes represent individual data points.

II-A. Graph Signal Representation of Big Data
In what follows we consider datasets which are represented by a weighted data graph G = (V, E, W) with nodes V = {1, . . . , N }, each node representing an individual data point. These nodes are connected by edges {i, j} ∈ E. In particular, given some application-specific notion of similarity, the edges of the data graph G connect similar data points i, j ∈ V by an edge {i, j} ∈ E. In some applications it is possible to quantify the extent to which data points are similar, e.g., via the distance between sensors in a wireless sensor network [22]. Given two similar data points i, j ∈ V, we quantify the strength of their connection {i, j} ∈ E by a non-negative edge weight W i,j ≥ 0 which we collect in the symmetric weight matrix W ∈ R N ×N In what follows we will silently assume that the data graph G is oriented by declaring for each edge {i, j} ∈ E one node as the head e + and the other node as the tail e − . For the oriented data graph we define the directed neighbourhoods of a node i ∈ V as N + Beside the edges structure E, network-structured datasets typically also carry label information which induces a graph signal defined over G. We define a graph signal x[·] over the graph G = (V, E, W) as a mapping V → R, which associates (labels) every node i ∈ V with the signal value x[i] ∈ R. In a supervised machine learning application, the signal values x[i] might represent class membership in a classification problem or the target (output) value in a regression problem. We denote the space of all graph signals, which is also known as the vertex space (cf. [6]), by R V .

II-B. Graph Signal Recovery
We aim at recovering (learning) a graph signal x[·] ∈ R V defined over the data graph G, from observing its values The recovery of the entire graph signal x[·] from the incomplete information provided by the signal samples {x[i]} i∈M is possible under a smoothness assumption, which is also underlying many supervised machine learning methods [3]. This smoothness assumption requires the signal values or labels of data points which are close, with respect to the data graph topology, to be similar. More formally, we expect the underlying graph signal x[·] ∈ R V to have a relatively small total variation (TV) The total variation of the graph signal x[·] obtained over a subset S ⊆ E of edges is denoted Some well-known examples of smooth graph signals include low-pass signals in digital signal processing where time samples at adjacent time instants are strongly correlated and close-by pixels in images tend to be coloured likely. The class of graph signals with a small total variation are sparse in the sense of changing significantly over few edges only. In particular, if we stack the signal differences (across the edges {i, j} ∈ E) into a big vector of size |E|, then this vector is sparse in the ordinary sense of having only few significantly large entries [7].
In order to recover a signal with small TV x[·] TV , from its signal values {x[i]} i∈M , a natural strategy iŝ There exist highly efficient methods for solving convex optimization problems of the form (1) (cf. [2], [11], [23] and the references therein).

III. RECOVERY CONDITIONS
The accuracy of any learning method based on solving (1) depends on the deviations between the solutionsx[·] of the optimization problem (1) and the true underlying graph signal x[·] ∈ R V . In what follows, we introduce the network nullspace condition as a sufficient condition on the sampling set and graph topology such that any solutionx[·] of (1) accurately resembles an underlying clustered graph signal Here, we used a fixed partition F = {C 1 , . . . , C |F | } of the entire data graph G into disjoint clusters C l ⊆ V. While our analysis applies to an arbitrary partition F , our results are most useful for reasonable partitions where edges within clusters are connected by many edges with large weight, while nodes of different clusters are loosely connected by few edges with small weights. Such reasonable partitions can be obtained by one of the recent highly scalable clustering methods (cf. [9], [19]). However, we highlight that the knowledge of the partition is only required for the analysis of methods based on solving the recovery problem (1), it is not required for the actual implementation of those methods, as the recovery problem (1) itself does not involve the partition.
We will characterize a partition F by its boundary which is the set of edges connecting nodes from different clusters. We highlight that the recovery problem 1 does not require knowledge of the partition F . Rather, the partition F and corresponding signal model (2) is only used for analyzing the solutions of (1). Consider a clustered graph signal x[·] ∈ R V of the form (2). We observe its values x[i] at the sampled nodes i ∈ M only. In order to have any chance for recovering the complete signal only from the samples {x[i]} i∈M we have to restrict the nullspace of the sampling set, which we define as In order to define the network nullspace property which characterizes the solutions of the recovery problem (1), we need the notion of a flow with demands [14].
at every node i ∈ V.
For a more detailed discussion of the concept of network flows, we refer to [14]. In this paper, we will use the flow concept in order to characterize the connectivity properties or topology of a data graph G = (V, E, W) by interpreting the edge weights W i,j as capacity constraints that limit the amount of flow along the edge {i, j}. In particular, using network flows with demands will allow us to adapt the nullspace property, introduced within the theory of compressed sensing [8], [10] for sparse signals, to the problem of recovering smooth graph signals. It turns out that if NNSP-(M,F ) is satisfied by the sampling set M for a partition F , then the nullspace of the sampling process, i.e., the set of graph signals which vanish on the sampling set, which is precisely the nullspace K(M) (cf. (4)), cannot contain a non-zero clustered graph signal of the form (2).
The formulation of NNSP involves a search over all signatures, whose number is around 2 |∂F | , which might be intractable for large data graphs. However, similar to many results in compressed sensing, we expect using probabilistic models for the data graph to render the verification of NNSP tractable [10]. In particular, we expect that probabilistic statements about how likely the NNSP is satisfied for random data graphs (e.g., conforming to a stochastic block model) can be obtained easily. Now we are ready to state our main result, i.e., the network nullspace condition implies that the solution (1) is unique and coincides with a true underlying clustered graph signal of the form (2).

Theorem 3. Consider a clustered graph signal x c [·] ∈ X (cf. (2)) which is observed only at the sampling set M ⊆ V. If NNSP-(M, F ) holds, then the solution of (1) is unique and coincides with x c [·].
Thus, if NNSP-(M, F ) holds, we can expect recovery algorithms based on solving (1), to accurately learn clustered graph signals x[·] of the form (2).
The scope of Theorem 3 is somewhat limited as it applies only to graph signals which are precisely of the form (2). We now state a more general result applying to any graph signal x[·] ∈ R V .
Thus, as long as the underlying graph signal x[·] can be well approximated by a clustered signal of the form (2), any solutionx[·] of (1) is a graph signal which varies significantly only over the boundary edges ∂F . We highlight that the error bound (5) only controls the TV (semi-)norm of the error signalx[·] − x[·]. In particular, this bound does not directly allow to quantify the size of the global mean squared error One important use of Theorems 3, 4 is that it guides the choice for the sampling set M. In particular, for a suitably chosen partition F and associated signal model (2), one should aim at sampling nodes such that the NNSP is likely to be satisfied. This approach has been studied empirically in [1], [15], verifying accurate recovery by efficient convex optimization methods using sampling sets satisfying the NNSP (cf. Definition 2).

IV. NUMERICAL EXPERIMENTS
We now verify the relevance of NNSP for the graph signal recovery problem using a synthetic data set whose underlying data graph is a chain graph G chain . This data graph contains |V| = 100 nodes which are connected by |E| = 99 undirected edges {i, i + 1}, for i ∈ {1, . . . , 99} and partitioned into |F | = 10 equal-size clusters F = {C l } l=1,2,...,10 , each cluster containing 10 consecutive nodes. The edges connecting nodes in the same cluster have weight W i,j = 4, while those connecting different clusters have weight W i,j = 2. For this data graph we generated a clustered graph signal x[i] of the form with alternating coefficients a l ∈ {1, 5}.
The graph signal x[i] is observed only on the nodes belonging to a sampling set, which is either M 1 or M 2 . The sampling set M 1 contains exactly one node from each cluster C l and thus, as can be verified easily, satisfies the NNSP (cf. Definition 2). While having the same size as M 1 , the sampling set M 2 does not contain any node of clusters C 2 and C 4 .
In Figure 1, we illustrate the recovered signals obtained for each of the two sampling sets by solving (1) using the sparse label propagation (SLP) algorithm [11]. The signal recovered from the sampling set M 1 , which satisfies the NNSP, closely resembles the true underlying clustered graph signal. In contrast, the sampling set M 2 , which does not satisfy the NNSP, results in a recovered signal which significantly deviates from the true signal.

V. CONCLUSIONS
We considered the problem of recovering clustered graph signals, defined over complex networks, from observing its signal values on a small set of sampled nodes. By extending tools from compressed sensing, we derived a sufficient condition, the network nullspace condition, on the graph topology and sampling set such that a convex recovery method is accurate. This condition is based on the connectivity properties of the underlying network. In particular, it requires the existence of certain network flows with the edge weights of the data graph interpreted as capacities. The network nullspace condition involves both, the sampling set and the cluster structure of the data graph. Roughly speaking it requires to sample more densely near the boundaries between different clusters.

VI. PROOFS
The proofs for Theorem 3 and Theorem 4 rely on recognizing the recovery problem (1) as an analysis ℓ 1minimization problem [17]. A sufficient condition for analysis ℓ 1 -minimization to deliver the correct solution x[·] is given by the analysis nullspace property [13], [17]. In particular, the sampling set M is said to satisfy the stable analysis nullspace property w.r.t. an edge set S ⊆ E if u[·] E\S ≥ 2 u[·] S for any u ∈ K(M).  (4)). Note that, since x[i] is constant for all nodes i ∈ C l in the same cluster, , for any edge {i, j} ∈ E \ ∂F . (7) By the triangle inequality, and thus, since The next result extends Lemma 5 to graph signals x[·] ∈ R V which are not exactly clustered, but which can be well approximated by a clustered signal of the form (2). Lemma 6. Consider a data graph G and a fixed partition F = {C 1 , . . . , C |F | } of its nodes into disjoint clusters C l ⊆ V. We observe a graph signal x ∈ R V at the sampling set M ⊆ V. If (6) holds for S = ∂F , any solutionx[·] of (1) satisfies Proof. The argument closely follows the proof of [12,Theorem 8]. First note that any solutionx[·] of (1) obeys since x[·] is trivially feasible for (1). From (9), we have  (4)). Applying the triangle inequality to (10), Combining (11) with (6) (for the signal Using (6) again,  We are allowed to assume this since according to Definition 2, if there exists a flow with f [e] > 0 for some e ∈ ∂F , there also exists a flow with f [e] < 0 for the same edge e ∈ ∂F . Next, we add an extra node s to the data graph G which is connected to all sampled nodes i ∈ M with an edge e i = {s, i} which is oriented such that e + i = s. We assign to each edge e i = {s, i} the flow f [e i ] = g[i]. It can be verified easily that the flow over the augmented graph has zero demands for all nodes. Thus, we can apply Tellegen's theorem [20] to obtain u[·] E\∂F ≥ 2 u[·] ∂F . We obtain Theorem 3 by combining Lemma 7 with Lemma 5. In order to verify Theorem 4 we note that, by Lemma 7, the NNSP according to Definition 2 implies the stable nullspace condition (6) for S = ∂F . Therefore, we can invoke Lemma 6 to reach (5).