The Projection Method: a Unified Formalism for Community Detection

We present the class of projection methods for community detection that generalizes many popular community detection methods. In this framework, we represent each clustering (partition) by a vector on a high-dimensional hypersphere. A community detection method is a projection method if it can be described by the following two-step approach: 1) the graph is mapped to a query vector on the hypersphere; and 2) the query vector is projected on the set of clustering vectors. This last projection step is performed by minimizing the distance between the query vector and the clustering vector, over the set of clusterings. We prove that optimizing Markov stability, modularity, the likelihood of planted partition models and correlation clustering fit this framework. A consequence of this equivalence is that algorithms for each of these methods can be modified to perform the projection step in our framework. In addition, we show that these different methods suffer from the same granularity problem: they have parameters that control the granularity of the resulting clustering, but choosing these to obtain clusterings of the desired granularity is nontrivial. We provide a general heuristic to address this granularity problem, which can be applied to any projection method. Finally, we show how, given a generator of graphs with community structure, we can optimize a projection method for this generator in order to obtain a community detection method that performs well on this generator.


Introduction
In complex networks, there often are groups of nodes that are better connected internally than to the rest of the network.In network science, these groups are referred to as communities.These communities often have a natural interpretation: they correspond to friend groups in social networks, subject areas in citation networks, or industries in trade networks.Community detection is the task of finding these groups of nodes in a network.This is typically done by partitioning the nodes, so that each node is assigned to exactly one community.There are many different methods for community detection [Fortunato, 2010;Fortunato and Hric, 2016;Rosvall et al., 2019].Yet, it is not easy to say which method is preferable in a given setting.
Community detection is closely related to the more general machine learning task of data clustering, as we essentially cluster the nodes based on network topology.In data clustering, the objects to be clustered are typically represented by vectors, and one uses methods like k-means [Jain, 2010] or spectral clustering [Von Luxburg, 2007] to find a spatial clustering of these vectors so that nearby vectors are assigned to the same cluster.Community detection can be considered as an instance of clustering, where the elements to be clustered are network nodes.
In this study, we unify several popular community detection and clustering methods into a single geometric framework.We do so by describing a metric space of clusterings, where we represent each clustering C by a binary vector b(C) indexed over the node pairs, i.e., b(C) ∈ R ( n 2 ) , where n is the size of the network.We say that a community detection method is a projection method if it is equivalent to the following two-step approach: firstly, the graph is mapped to a query vector q ∈ R ( n 2 ) .Secondly, this query vector is projected to the set of clustering vectors.That is, we search for the clustering vector b(C) that minimizes the distance to q.
It turns out that many community detection methods fit this framework.In Gösgens et al. [2023], we prove that modularity maximization is a projection method.In this work, we additionally show that several other popular community detection methods are projection methods.In Section 3, we show that Correlation Clustering [Bansal et al., 2004], the maximization of Markov stability [Delvenne et al., 2010;Lambiotte et al., 2014] and likelihood maximization for several generative models [Avrachenkov et al., 2020] are projection methods.We emphasize that in this paper, we establish equivalences between community detection methods in the strictest mathematical sense.As such, our analytical results are much stronger than merely pointing out that methods are similar or related.Specifically, when we say that two methods are equivalent, we mean that their quality functions f 1 and f 2 define the exact same rankings of clusterings, so that for all clusterings Some relations between existing community detection methods were already known [Veldt et al., 2018;Newman, 2016].The novelty of this work is that we unify many community detection methods into a single class of projection methods, and uncover the geometric structure that is baked into each of these methods.Furthermore, we demonstrate the following advantages of this geometric perspective.
Firstly, we show that any community detection method that maximizes or minimizes a weighted sum over pairs of vertices is a projection method.This unifies many well-known methods (Correlation Clustering [Bansal et al., 2004], Markov Stability [Delvenne et al., 2010], modularity maximization [Newman and Girvan, 2004], likelihood maximization [Avrachenkov et al., 2020]), and any other current or future method that can be presented in this form.Importantly, the hyperspherical geometry comes with natural measures for clustering granularity (the latitude) and the similarity between clusterings (the correlation distance).These measures are additionally related to the quality function (the angular distance) by the hyperspherical law of cosines, as we explain in Section 2.
Secondly, this geometric framework yields understanding that all community detection methods that are generalized by the projection method, suffer from the same granularity problem.That is, these methods require parameter tuning to produce communities of the desired granularity.In Section 5.2 we use the hyperspherical geometry to derive a general heuristic that addresses this problem.We demonstrate that this heuristic, obtained in our earlier work Gösgens et al. [2023], can be applied to any projection method.
Thirdly, projection methods can be combined by taking linear combinations of their query vectors.In Section 5.3, we demonstrate how we can efficiently find a linear combination that performs well in a given setting.
As a side remark, we note that in network science, the term "clustering" is also used to refer to the abundance of triangles in real-world networks, which is often quantified by the clustering coefficient [Watts and Strogatz, 1998;Newman, 2003].The presence of communities usually goes hand in hand with an abundance of triangles [Peixoto, 2022].We emphasize that in the present work, we use the term "clustering" to refer to data clustering, and not to the clustering coefficient.Nevertheless, the global clustering coefficient can be expressed in terms of our hyperspherical geometry [Gösgens et al., 2023].

Outline
The remainder of the paper is organized as follows: in Section 2, we describe the projection method and the hyperspherical geometry of clusterings.In Section 3, we prove that several popular community detection methods are projection methods and discuss the implications of these equivalences.In Section 4, we discuss algorithms that can be used to perform the projection step in the projection method.Finally, Section 5 presents methodology for choosing a suitable projection method in given settings.In particular, Section 5.2 discusses how to modify a query mapping in order to obtain a projection method that detects communities of desired granularity, while Section 5.3 demonstrates how we can can perform hyperparameter tuning within the projection method.Our implementation of the projection method and the experiments of Section 5 is available on Github1 .

Notation
This paper discusses the relations between several different community detection methods, that each come with their own notations.We aim to keep notation as consistent as possible, and here we list the most common notation that we will use throughout the paper.
We represent a graph by an n × n adjacency matrix A with node set [n] = {1, . . ., n} and m edges.We denote the degree of node i by d i .We write i<j to denote a sum over all node pairs i, j ∈ [n] with i < j.We denote the number of node pairs by N = n 2 .For a clustering C, we define Intra(C) as the set of intra-cluster node pairs, i.e., pairs of nodes i < j that are part of the same cluster according to C. Similarly, we define Inter(C) as the set of node pairs that are part of different clusters according to C. We define m C = |Intra(C)| as the number of intra-cluster pairs.For two functions f, g, we write f (C) ≡ g(C) and say that optimizing f is equivalent to optimizing g if, for each pair of clusterings C 1 , C 2 , the inequality f (C 1 ) ≥ f (C 2 ) holds if and only if g(C 1 ) ≥ g(C 2 ).We denote vectors by bold letters x, y and we denote the inner product between two vectors by ⟨x, y⟩.The Euclidean length of a vector is given by ∥x∥ = ⟨x, x⟩.

The projection method
In this section, we describe the hyperspherical geometry that the projection method relies on.For more details, we refer to Gösgens et al. [2023].We consider a graph with n nodes and define a clustering as a partition of these nodes.For a clustering C, we define the clustering vector b(C) as the binary vector indexed by the node-pairs, given by b(C) ij = 1, if i and j are in the same cluster, −1, if i and j are in different clusters.
Note that the dimension of this vector is N = n 2 , and that the Euclidean length of each clustering vector is √ N , so that they are all located on a hypersphere of radius

√
N around the origin.Because of this, it is natural to consider the geometry induced by the angular distance, given by The vector representation b(C), together with the angular distance, defines a hyperspherical geometry of clusterings.

Clustering granularity and latitude
The clustering into a single community corresponds to the all-one vector b(C) = 1, while the clustering into n singleton communities corresponds to b(C) = −1.These two vectors form opposite poles on the hypersphere.The extent to which a clustering resembles the former or latter is referred to as its granularity: fine-grained clusterings consist of many small communities, while coarse-grained clusterings consist of few and large communities.We measure clustering granularity by the latitude of the clustering vector.For x ∈ R N , the latitude is defined as ℓ(x) = d a (x, −1).
For a clustering vector b(C), this is given by ℓ is the number of intra-cluster pairs of C. Note that the number of intra-cluster pairs is related to the sum of the cluster sizes: let s 1 , . . ., s k be the sizes of the clusters of C. Then Thus, quantifying clustering granularity by the latitude is equivalent to quantifying it by the sum of squared cluster sizes.

Parallels and meridians
Borrowing more terminology from geography, for λ ∈ [0, π], we define the parallel P λ as the set of vectors with latitude λ.In particular, we refer to P π/2 as the equator, which corresponds to the set of vectors perpendicular to 1.For a vector x that is not a multiple of 1, we define the parallel projection P λ (x) as the projection of x onto P λ , and it is given by Similarly, we define the meridian of x as the one-dimensional line {P λ (x) : λ ∈ (0, π)}.

Correlation distance and clustering similarity
For three vectors x, y, r, we can measure the angle on the surface of the hypersphere that the line from x to r makes with the line from y to r.This angle ∠(x, r, z) is given by the hyperspherical variant of the law of cosines: In particular, when we take r = −1, this angle corresponds to the angle between the meridians of x and y.This angle turns out to have an interesting interpretation, as stated in Theorem 1: Theorem 1 ( [Gösgens et al., 2023]).For two vectors x and y that are not multiples of 1, the angle that their meridians make is equal to the arccosine of the Pearson correlation between x and y.
Because of Theorem 1, we call this angle the correlation distance between x and y.Note that d a (x, −1) = ℓ(x), so that the correlation distance is given by and the Pearson correlation between vectors x and y is thus given by cos d ρ (x, y).The correlation coefficient between two clustering vectors b(C), b(T ) turns out to be a useful quantity for measuring the similarity between clusterings C and T [Gösgens et al., 2021].There exist many measures to quantify the similarity between two clusterings, but most of these measures suffer from the defect that they are biased towards either coarse-or fine-grained clusterings [Vinh et al., 2009;Lei et al., 2017].The Pearson correlation between two clustering vectors does not suffer from this bias, and additionally satisfies many other desirable properties [Gösgens et al., 2021].Because of that, we will use the Pearson correlation to measure the similarity between clusterings.For clusterings C and T , the Pearson correlation is given by where In cases where C and T correspond to the detected and planted (i.e., ground-truth) clusterings, we use ρ(C, T ) as a measure of the performance of the community detection.

Query mappings
Above, we have defined the hyperspherical geometry of clusterings.This geometry comes with natural measures for clustering granularity (the latitude ℓ) and similarity between clusterings (the correlation ρ).The idea behind the projection method is that we map a graph to a point in this same geometry, and then find the clustering vector that is closest to that point.For a graph with adjacency matrix A, we denote the vector that it is mapped to by q(A) ∈ R N , and refer to it as the query vector of A. We refer to q(•) as the query mapping.The name 'query' comes from the fact that in the second step of the projection method, we search for the clustering vector b(C) that minimizes d a (q(A), b(C)).
That is, among the set of clustering vectors, we find the one that is nearest to q(A).In short, we arrive at the following definition of the projection method: Definition 1.A community detection method is a projection method if it can be described by the following two-step approach: (1) the graph with adjacency matrix A is first mapped to a query vector q(A); and (2) the query vector is projected to the set of clustering vectors by minimizing d a (q(A), b(C)) over the set of clusterings C.
There exist infinitely many ways to map graphs to query vectors.One of the simplest ways is to simply turn the adjacency matrix into a vector like q(A) ij = 1 2 (A ij + A ji ), where the average is taken in case A is directed.In general, we define the half-vectorization of a matrix The vector v(A r ) counts the number of paths of length r between each pair of vertices.In particular, v(A 2 ) ij corresponds to the number of neighbors that the nodes i and j share.
There are many more ways in which we can construct a query vector based on A. For example, the entry q(A) ij can also depend on the degrees of i and j or even the length of the shortest path between i and j.

Modularity
Markov stability PPM-Likelihood Equivalence for specific parameters Figure 1: A schematic overview of the equivalences between the community detection and clustering methods that are described in Section 3. The projection method is completely equivalent to correlation clustering, while modularity, PPMlikelihood and Markov stability are subsets.For certain parameter choices, these latter three methods are equivalent.
Finally, because we are minimizing the angular distance, the length of q(A) is not relevant.It may be natural to normalize all vectors to a Euclidean length

√
N so that they have the same length as clustering vectors, but we will not do so to avoid cluttering the notation.
In summary, the hyperspherical geometry comes with three key measures: firstly, the angular distance d a (q(A), b(C)) is the quality measure that we minimize in order to detect communities.Secondly, the latitude measures the granularity of a clustering.That is, ℓ(b(T )) measures the granularity of the planted clustering, while ℓ(b(C)) measures the granularity of the detected clustering.Thirdly, the correlation distance between the planted and detected communities d ρ (b(C), b(T )) (or its cosine, ρ(C, T ) = cos d ρ (b(C), b(T ))) measures the performance of the detection.In Section 5.2, we will additionally see that d ρ (q(A), b(T )) is a useful measure.The angular distance, correlation distance and latitude are related by the hyperspherical law of cosines, given in (4).

Equivalences to other community detection methods
In this section, we will prove that the class of projection methods generalizes several community detection methods.We prove that the class of projection methods is equivalent to correlation clustering, and we prove that the remaining methods are subclasses of the class of projection methods.For each of the related clustering and community detection methods, we will provide the query mapping of the corresponding projection method.The relations between the methods that are discussed here are illustrated in Figure 1.

Correlation clustering
Correlation clustering [Bansal et al., 2004] is a framework for clustering where pairwise similarity and dissimilarity values w + ij and w − ij are given for every pair of objects i, j.The objective is to maximize the similarity within the clusters and the dissimilarity between the clusters.Such agreement of a clustering C with weights w ± ij is expressed as Correlation clustering solves the maximization problem Equivalently, one can express the disagreement of a clustering C with weights w ± ij as Then ( 6) can be stated as a minimization problem Somewhat counter-intuitively, the equivalent formulations ( 6) and ( 7) lead to different approximation results.Indeed, for minimization problems ( 7), an α-approximation guarantee means that for a given α > 1, an optimization algorithm C satisfies the condition where C * solves (7).On the other hand, for the maximization problem ( 6), a β-approximation guarantee means that for a given β < 1, the optimization algorithm C satisfies the condition Now, suppose, we have found an α-approximation in (8).Then which gives some approximation guarantee for the maximization problem, but not of the same multiplicative form as (9).
The motivation for correlation clustering originates from the setting where we are given a noisy classifier that, for each pair of objects, predicts whether they should be clustered together or apart [Bansal et al., 2004].This leads to the simple ±1 version of correlation clustering, where w ) holds for each pair ij.The best known approximation guarantee for the minimization formulation of the ±1 variant is 2.06, which is achieved by rounding the solution from an LP relaxation [Chawla et al., 2015].
For the general case where the weights are unconstrained, it is known that maximizing agreement is APX-hard [Charikar et al., 2005], which means that any constant-factor approximation is NP-hard.The same authors do provide a 0.7666approximation algorithm by rounding the semi-definite programming solution.Note that this does not contradict the APX-hardness result, as semi-definite programming is also NP-hard.
We now prove that correlation clustering corresponds to a projection method: Lemma 1. Correlation clustering with similarity and dissimilarity values w + = (w + ij ) i<j , w − = (w − ij ) i<j is equivalent to a projection method with query vector q (CC) given by ) , b(C)).We prove that this, in turn, is equivalent to maximizing CorClust max (C; w + , w − ) w.r.t. to the given values (w + ij ) i<j and (w − ij ) i<j : The last term does not depend on C, so we obtain that indeed maximizing ⟨q (CC) , b(C)⟩ is equivalent to maximizing CorClust max (C; w + , w − ), as required.
It is easy to see that any query vector q can also be turned into a correlation clustering objective by taking w + ij = max{0, q ij } and w − ij = max{0, −q ij }.This tells us that correlation clustering and projection methods are equivalent, in the sense that any correlation clustering instance can be mapped to a query vector and vice versa.Note, however, that both classes come with different invariances: correlation clustering is invariant to applying the same linearly increasing transformation f (x) = a + bx to all values w + ij and w − ij , where a, b ∈ R and b > 0. Similarly, projection methods are invariant to multiplying the query vector by any positive constant, due to the hyperspherical geometry.
The equivalence between correlation clustering and projection methods gives a new interpretation for correlation clustering in terms of hyperspherical geometry, and allows one to transfer hardness results from correlation clustering to projection methods.

Correlation clustering vs. community detection
While the fields of community detection and correlation clustering are similar, the focus of the two fields is notably different: correlation clustering studies clustering from an algorithmic viewpoint, where the goal is to design algorithms with provable optimization guarantees with respect to the correlation clustering quality function.In contrast, community detection focuses mainly on the choice of the quality function, with the aim to obtain a meaningful clustering into communities.In this context, 'meaningful' can mean either statistically significant, or similar to some ground truth clustering.In summary, community detection asks "What quality function to optimize?" while correlation clustering asks "What algorithm is best for optimizing the correlation clustering quality function?".Veldt et al. [2018] introduced an interesting variant of correlation clustering methods known as LambdaCC, and showed that it is related to Sparsest Cut, Normalized Cut and Cluster Deletion.They additionally introduce a degree-corrected variant of LambdaCC and prove that it is equivalent to CL-modularity, which we discuss in Section 3.3.

Markov stability
The Markov stability [Delvenne et al., 2010] of a clustering with respect to a network quantifies how likely a random walker is to find itself in the same community at the beginning and end of some time interval.When communities are clearly present in a network, then a random walker will tend to stay inside communities for long time periods and travel between communities infrequently.Markov stability can be defined for various types of discrete-or continuous-time random walks [Lambiotte et al., 2014].Let P (t) ij be the probability that the random walker is at node j at time t if it was at node i at time 0, and denote by P (t) = (P (t) ij ) ∈ R n×n the t-transition matrix of the random walk.For simplicity, in this paper, we only consider discrete time t = 0, 1, . .., thus P (t) = P t , where P = P (1).We assume that P is irreducible and aperiodic, so that the random walk has a unique stationary distribution s ∈ R n , which is the unique solution to s = sP , such that all elements of s sum up to one.We consider a random walker starting from the initial state sampled from s, and compare the distribution of its location at time t, to another location sampled from the stationary distribution.Markov stability measures the covariance between the community label indicators before and after the interval t.
Formally, let C be a clustering and let the clusters be numbered 1, 2, . . ., k, where k is the number of clusters in C. We denote by H(C) the n × k indicator matrix of the clustering C, where H(C) ia = 1 if node i belongs to the a-th cluster, and H(C) ia = 0 otherwise.Markov stability is defined as For more details on the definition of Markov stability and its variants, we refer to Delvenne et al. [2010] and Lambiotte and Schaub [2021].We show that maximizing Markov stability is a projection method with respect to the query vector where v(X) ∈ R N is the half-vectorization of the matrix X, defined by v(X) ij = 1 2 (X ij + X ji ) for i < j.We show that for any matrix X ∈ R n×n , maximizing Trace(H(C) ⊤ XH(C)) is a projection method: Lemma 2. For any matrix X ∈ R n×n , maximizing the trace Proof.The trace is written as We write Now, note that H(C) ia H(C) ja = 1 whenever i and j are both in community a, and H(C) ia H(C) ja = 0 otherwise.Therefore, summing over a, we get a H X ii .
Note that the sum i∈[n] X ii does not depend on C, so that omitting it will not affect the optimization.In addition, we can subtract To conclude, trace maximization is equivalent to maximizing ⟨v(X), b(C)⟩, which is equivalent to minimizing d a (v(X), b(C)) over the set of clusterings C.
It is known [Delvenne et al., 2010] that for a discrete-time Markov chain and t = 1, Markov stability is equivalent to CL-modularity maximization with γ = 1, which we define in Section 3.3.The time parameter t controls the granularity of the detected communities.When t = 0, we get communities of size 1, because P 0 = I, so that diag(s)P t − s ⊤ s ij = −s i s j < 0 for all i ̸ = j, therefore (10) is maximized when in (13) we have H ia H ja = 0 for all a.Furthermore, Delvenne et al. [2010] show that in the limit t → ∞, Markov stability in continuous time and with normalized Laplacian instead of the matrix P , divides the network in two communities, corresponding to the positive and the negative coordinates of the Fiedler vector, that is the eigenvector corresponding to the second smallest eigenvalue of the normalized Laplacian.

Matrix vs. vector representation
Note that the vector and matrix representations of clusterings are related by b(C) = 2v(H(C)H(C) ⊤ ) − 1.This relation makes it possible to re-define the hyperspherical geometry from Section 2 entirely in terms of n × n matrices instead of n 2 -dimensional vectors.However, we refrain from doing so, because we believe that in most cases, the vector representations are easier to work with.To illustrate, the matrix formulation of projection methods amounts to replacing the query vector q with a query matrix Q ∈ R n×n .Note that Q has n 2 entries i, j ∈ [n], while q has only n 2 entries i < j, but these extra entries do not carry any additional information.Indeed, concerning the off-diagonal elements i > j, it does not matter whether for any Q ∈ R n×n .Furthermore, the diagonal entries i = j play no role because adding any diagonal matrix diag(y), for y ∈ R n , to Q, does not affect the optimization: The vector representation has the advantage that it omits these unimportant values, which is why we use query vectors instead of query matrices.Nevertheless, the matrix representation allows for an easier analysis in some settings.For example, in Liu and Barahona [2018], the spectral properties of Markov stability are leveraged to create an optimization algorithm.

Modularity
Modularity maximization is one of the most widely-used community detection methods [Newman and Girvan, 2004].Modularity measures the excess of edges inside communities, compared to a null model; a random graph model without community structure.This null model is usually either the Erdős-Rényi (ER) model or the Chung-Lu (CL, Chung and Lu [2001]) model.Modularity comes with a resolution parameter that controls the granularity of the detected clustering.For the ER and CL null models, modularity is given by where we recall that d i is the degree of node i, m = 1 2 i∈[n] d i is the number of edges, and N = n 2 is the number of node pairs.In Gösgens et al. [2023], we have proven that modularity maximization is a projection method.In addition, the equivalence between modularity maximization and correlation clustering was already proven by Veldt et al. [2018].Hence, Lemma 1, too, establishes that modularity maximization is a projection method.Finally, Newman [2006] shows that modularity can be written in a similar trace-maximization form as (12), which additionally allows one to use Lemma 2 to prove that modularity maximization is a projection method.Either way, we get the following query vectors: where v(A) is the adjacency vector (the half-vectorization of the adjacency matrix); m is the number of edges; and For a fixed null model, the vectors that are obtained by varying γ have the following geometric interpretation [Gösgens et al., 2023]: because the query vector q (Mod) ER is a linear combination of v(A) and 1 with coefficients depending on γ, it can be shown that q (Mod) ER (A, γ) lies on the meridian of v(A) for every value of γ.The

CL
Figure 2: Illustration of the geodesics formed by varying the resolution parameter of the modularity vector for a fixed null model.The ER-modularity vectors lie on a single meridian, in contrast to the CL-modularity vectors.latitude of this modularity vector is related to the resolution parameter by The vector q (Mod)  Note that by ( 14), the latitude of the ER-modularity vector is a monotone function of the resolution parameter γ.Thus, choosing the resolution parameter is equivalent to setting the latitude of the query vector λ = ℓ(q (Mod)

ER
).The hyperspherical geometry suggests two ways to choose λ.The first approach is to choose λ to minimize the angular distance d a (q (Mod) ER , b(T k,s )).Note that changing the query latitude λ does not affect the correlation distance d ρ (q (Mod) ER , b(T k,s )).This allows us to use the law of cosines (3), to express cos d a (q )) as a function of λ yields tan λ = cos θ tan λ T , i.e., λ ∼ cos(θ)λ T as λ T → 0. The second approach for choosing λ is to simply equate the query latitude to the latitude of T k,s , so λ = λ T .Both approaches yield λ → 0 as k → ∞, so that by the triangle inequality the distance between T k,s and the modularity vector with latitude λ vanishes as In terms of the resolution parameter γ, both these approaches yield γ = Θ(k).Moreover, it can be shown that for both of these approaches, merging neighboring cliques does not decrease d a (q (Mod) ER , b(T k,s )).This demonstrates that these two approaches effectively address the granularity problem for the ring of cliques.Let us note that both these approaches heavily rely on the hyperspherical geometry, and that they cannot be applied to the modularity function directly.Indeed, the first approach does not work because maximizing ERM(T ; A, γ) as a function of γ yields the trivial solution γ = 0, while the second approach is ill-defined without the notion of latitude.
While these two approaches effectively address the granularity problem of modularity for this ring of cliques network, they do not work well in general.We provide a better and more universal approach in Section 5.2.

Likelihood of generalized planted partition models
The Planted Partition Model (PPM) is one of the simplest random graph models that incorporates community structure.In this model, we assume there is some planted clustering into communities (the ground truth partition) and that nodes of the same community are connected with probability p in , while nodes of different communities are connected with probability p out < p in .The likelihood of the PPM was derived in Holland et al. [1983].For an adjacency matrix A, the likelihood that it was generated by a PPM with clustering C, is given by In this standard PPM, we see that the adjacency matrix has binary entries.In Avrachenkov et al. [2020], the PPM is generalized to allow for pairwise interactions in any measurable set I. That is, we require A ij ∈ I for all i < j.For example, by taking I = R, we get a weighted undirected graph, while directed graphs can be modeled by I = R 2 , so that each interaction corresponds to the tuple of weights w ij and w ji .Furthermore, this generalization also allows one to model temporal and multilayer networks.Similarly to the binary PPM, it is assumed that the distribution of the interaction between i and j only depends on whether i and j are in the same community.That is, there are likelihood functions f in , f out that measure the likelihood of an interaction A ij ∈ I resulting from an intra-or inter-community interaction.Importantly, we require the interactions to be pairwise independent.This allows us to express the likelihood of the interaction matrix A as the product of these pairwise likelihoods: After taking the logarithm, it is easy to see that this is equal to the maximization variant of correlation clustering with w + ij = log f in (A ij ) and w − ij = log f out (A ij ), so that, by Lemma 1, it is a projection method with query vector The standard binary PPM is recovered for The granularity bias of likelihood maximization.
It has been observed that likelihood maximization methods for community detection have a bias towards communities of sizes close to log n [Zhang and Peixoto, 2020;Peixoto, 2021;Gösgens et al., 2023].This can be understood by linking likelihood maximization to Bayesian inference [Peixoto, 2021]: Bayesian community detection methods [Peixoto, 2019] assume a prior distribution over the set of clusterings and then find the clustering with the highest posterior probability.Bayes' rule reads .
Note that the denominator is constant w.r.t.C. If we were to assume a uniform prior, i.e.Prior(C) ∝ 1, then we get Posterior(C|A) ≡ Likelihood(A|C).This tells us that likelihood maximization is equivalent to Bayesian inference under the assumption of a uniform prior.The uniform distribution over clusterings has been studied in the field of combinatorics for decades [Harper, 1967;Sachkov, 1997].For example, it is known that, asymptotically as n → ∞, almost all clusters will have sizes close to log n.This explains why likelihood maximization methods have a bias towards clusterings of this granularity.

Which methods are not projection methods?
Not every community detection method fits our hyperspherical framework of community detection.A community detection method is not a projection method if the quality function cannot be monotonously transformed to a sum over intra-cluster pairs.Also, in projection methods, the contribution of a node-pair ij to the sum may depend on the input data (e.g., the graph), but not on the clustering C. In this section, we give a few examples of methods that do not fit this framework.

Other inferential methods
In Section 3.4 we saw that some likelihood methods fit our hyperspherical framework.However, not all inferential methods are projection methods.For example, suppose we take a PPM where the intra-community density is such that each node has λ intra-community neighbors in expectation.Then, if i and j are in a community of size s, they should be connected with probability λ/(s − 1).Hence, the likelihood function f in in ( 15) would depend on s, which requires the query vector q (PPM) to depend on C. Our hyperspherical framework does not allow for this.Similarly, the Bayesian stochastic blockmodeling inference from Peixoto [2019] does not fit our hyperspherical framework because the contribution of each node pair ij depends on the size (and label) of the communities of i and j in an intricate way that cannot be captured by a query vector.

k-means clustering
The k-means algorithm is arguably the oldest and most well-studied clustering method [Jain, 2010].The aim of k-means is to divide given vectors x 1 , . . ., x n ∈ R d into k clusters.For each cluster, we compute the center as the arithmetic mean of the vectors inside this cluster.The k-means algorithm iteratively computes the centers and re-assigns each vector to its nearest center until convergence.In Dhillon et al. [2004] it is shown that k-means is equivalent to where X ∈ R n×n is defined by X ij = ⟨x i , x j ⟩, and if node i is part of the community with label a and size s a .Note that this form resembles the trace-maximization of Markov stability.A straightforward computation shows that Therefore, the contribution of each community is again normalized by its size, which is not allowed in our hyperspherical framework.
Another clustering method closely related to k-means is spectral clustering [Von Luxburg, 2007].In spectral clustering, we are given an affinity matrix X ∈ R n×n and consider its leading eigenvectors.These leading eigenvectors define coordinates for the n objects, which are then clustered using spatial clustering methods like k-means.Spectral clustering differs from other clustering and community detection methods in the sense that it does not explicitly optimize a quality function.However, when using k-means for the final clustering step, one could consider it to be optimizing something of the form of ( 16), with X replaced by its low-rank approximation.Therefore, spectral clustering also does not fit our framework of projection methods.

Projection algorithms
The previous section shows that many community detection methods fit our definition of projection methods.A consequence of this is that the same optimization algorithms can be used for each of them.However, it is known that this optimization is NP-hard, and in some forms even APX-hard.

Exact optimization
The general problem of correlation clustering [Bansal et al., 2004] and the subproblem of maximizing modularity [Brandes et al., 2007;Meeks and Skerman, 2020] are known to be NP-complete.However, modularity maximization is known to be Fixed-Parameter Tractable (FPT) when parametrized by the size of the minimum vertex cover of the graph [Meeks and Skerman, 2020].Nevertheless, it has been shown that modularity maximization [Dinh et al., 2015] and the maximization variant of correlation clustering [Charikar et al., 2005] are APX-hard, meaning that approximating it to any constant factor is NP-hard.This tells us that for large graphs, it may be prohibitively expensive to compute the clustering that minimizes d a (q, b(C)) over the set of clusterings C, for a general query vector q.
Nevertheless, there are some approaches that are able to optimize some of the objectives from Section 3 with surprising efficiency.In particular, the Bayan algorithm [Aref et al., 2022] for modularity maximization is able to find the exact modularity maximum in graphs of up to several thousands nodes within hours.This approach relies on an Integer Linear Programming (ILP) formulation.Similar ILP formulations exist for the general problem of correlation clustering [Bansal et al., 2004].These can be converted to the following general ILP formulation of the projection method: in the projection step, we maximize

Approximate optimization
There are many approximate maximization algorithms that are able to quickly find clusterings with high modularity.The Louvain [Blondel et al., 2008] and Leiden [Traag et al., 2019] algorithms are perhaps the most well-known heuristics for modularity maximization.These algorithms iterate over the nodes and make use of the network sparsity to find the greedy relabeling of a node.For a node i, finding this greedy relabeling has complexity O(d i ).The Louvain algorithm terminates when it achieves a local maximum.Since it is non-trivial to bound the number of iterations needed to reach a local maximum, there are no theoretical guarantees for the complexity.However, the running time empirically scales linearly with the number of edges.While these algorithms often attain values close to the global optimum, they rarely find the exact global optimum [Aref et al., 2023].
The algorithms proposed in the field of correlation clustering come with theoretical approximation guarantees.Due to the equivalence established in Lemma 1, these algorithms can be applied to modularity maximization.Conversely, modularity maximization algorithms like the Louvain algorithm can be applied to the correlation clustering quality function, allowing for comparisons between these algorithms.Interestingly, while there exist no optimization guarantees for the Louvain algorithm, it does seem to outperform correlation clustering algorithms in such comparisons [Veldt et al., 2018].The Louvain algorithm can be modified to minimize d a (q, b(C)) with similar performance.However, the computational complexity may depend on the particular query vector q [Gösgens et al., 2023].More precisely, our modification of Louvain assumes a query vector of the form q = v(S + L), where S ∈ R n×n is a sparse matrix and L ∈ R n×n is a low-rank matrix.This way, finding the greedy relabeling for a node i has linear complexity in terms of the number of non-zero entries of S adjacent to i.We observe that the running time of this re-implementation of Louvain is proportional to the number of elements in non-zero elements in S [Gösgens et al., 2023].
We denote by L(q) the clustering vector that results from minimizing d a (q, b(C)) over the set of clusterings C by the Louvain algorithm.

Do we need the global optimum?
The modularity landscape is known to be glassy [Good et al., 2010], which means that there are many local maxima with values close to the global maximum.It is likely that for a general query vector q, the landscape of d a (q, b(C)) suffers from a similar glassiness, which explains why its exact minimization is computationally expensive, while its approximate minimization is computationally cheap.
However, the ultimate goal of community detection is not to minimize the distance to some query vector, but to obtain a meaningful clustering of the network nodes.In settings where we have a generative model, like the PPM, the popular LFR benchmark [Lancichinetti et al., 2008], or the more recent ABCD benchmark [Kamiński et al., 2021], a meaningful clustering is a clustering that is similar to the planted clustering.However, there is no guarantee that the planted clustering corresponds to the global (or even a local) optimum.Most prominently, in sparse network models, it is highly unlikely that a locally optimal clustering corresponds to the planted clustering.A simple argument for this is that a sparse network model contains isolated nodes with high probability, and these nodes will not be assigned to their true community in any locally optimal clustering.
Moreover, when applying the Louvain algorithm to graphs from generators, it has been observed that the obtained modularity often exceeds the modularity of the planted clustering.This tells us that simple greedy optimization algorithms like Louvain already result in a clustering vector L(q) that is nearer to q than the planted clustering vector b(T ), i.e. d a (q, L(q)) ≤ d a (q, b(T )).
In the experiments for this paper, we have applied the Louvain algorithm 4150 times for different combinations of query mappings and networks.In most of these cases, we chose a query vector using a heuristic, which will be explained in Section 5.2, that is designed to ensure d a (q, L(q)) ≈ d a (q, b(T )).However, in all these 4150 applications of the Louvain algorithm, we have only observed 253 cases where d a (q, L(q)) > d a (q, b(T )).Furthermore, most of these might be attributed to numerical errors because there are only 10 instances where d a (q, b(C)) is more than 1% larger than d a (q, b(T )) and no instances where it is more than 2% larger.This tells us that approximate optimization algorithms like Louvain easily find clusterings with higher quality (i.e., lower d a (q, b(C))) than the planted clustering.
The important conclusion from these observations is that better optimization algorithms do not necessarily result in more meaningful clusterings.Instead, it seems more important to choose the query vector so minimizers of d a (q, b(C)) are close to the ground truth clustering vector b(T ).

Choosing the query vector: heuristics and open problems
In the previous section, we have seen that the choice of the quality function (or equivalently, the query vector) may have a bigger impact on the performance of a community detection method than the choice of the optimization algorithm.
Choosing a quality function is difficult because it is hard to compare two quality functions in a meaningful way.However, when restricting to the set of projection methods, the hyperspheric geometry provides us with additional tools to compare the query vectors: for example, we can compute the latitude of the query vector and distances between query vectors.This provides us with some information about the relative position of the query vectors.In addition, these query vectors define a vector space, so that linear combinations of query vectors also correspond to community detection methods.In this section, we discuss several ways to choose a query mapping, which maps graphs to a query vectors q.

Graph generators
We assume that we are given some generator, which produces a tuple (A, T ) of an adjacency matrix and a planted clustering (the ground truth).This generator defines a joint distribution on (A, T ).In our experiments, we make use of several different generators: The Planted Partition Model (PPM).The standard (not generalized) PPM from Section 3.4 is the simplest random graph model with community structure.In this model, there is a planted clustering (partition) of the nodes, and nodes of the same community are more likely to connect to each other than nodes of different communities.We discuss three different variants of the PPM.The first one is a random graph model with homogeneity in both the degree and the community size distribution.We consider k equally sized communities of size n/k (assuming k divides n), and assume that each node has (in expectation) the same number of neighbors inside and outside its community, given by λ in and λ out , respectively.We then set the connection probabilities as This way, each node's degree follows the same distribution, which is a sum of two binomially distributed random variables, which can be approximated by a Poisson distribution with mean λ in + λ out .
The Heterogeneously-sized PPM (HPPM).The second variant of the PPM has homogeneous degrees (again, approximately Poisson distributed), but has heterogeneity in the community-size distribution.We draw k community sizes from a power-law distribution with some power-law exponent δ, meaning that the probability of obtaining a size s decays as s −δ .We make sure that each node has on average λ in intra-community neighbors and λ out neighbors outside of its community, by setting p in (s) = λ in s − 1 for nodes in communities of size s, and , where m T is the number of intra-community pairs in the planted clustering T .
The Degree-Corrected PPM (DCPPM).To obtain a graph generator with degree heterogeneity and homogeneous community sizes, we assign a weight θ i > 0 to each node and use the PPM parametrization that was proposed in Prokhorenkova and Tikhonov [2019].We consider k equally-sized communities of size n/k (again, assuming k divides n).We denote the sum of weights inside the a-th community by Θ a and denote the total weight by Θ = k a=1 Θ a .Nodes i and j that are both in the a-th community are connected with probability and nodes from different communities are connected with probability With these parameters, a node has on average approximately λ in neighbors inside its community and λ out neighbors outside its community.In addition, the expected degree of a node i is approximately equal to its weight θ i .To obtain a degree distribution with power-law exponent τ , we draw the weights from a distribution with this same power-law exponent.
The Artificial Benchmark for Community Detection (ABCD).The Artificial Benchmark for Community Detection (ABCD) is a graph generator that incorporates heterogeneity in both the degree and community-size distribution in order to generate graphs that resemble real-world networks [Kamiński et al., 2021].This is done by generating a sequence of community sizes and degrees with power-law exponents δ and τ .Then, it performs a matching process to assign degrees to nodes inside communities.The generator has a parameter ξ that controls the fraction of edges that are inter-community edges.
Parameter choices in our graph generators.We set the parameters of the graph generators as follows: we consider graphs with n = 1000 nodes and mean degree λ in + λ out = 8.We choose the parameters of these generators so that each node has (in expectation) λ out = 2 neighbors outside its community.For DCPPM and ABCD, we set the power-law exponent of the degree distribution to τ = 2.5.We generate the planted partitions as follows: For PPM and DCPPM, we consider k = 50 communities of size s = 20 each.For ABCD, we set ξ = 1 4 .

A heuristic for controlling the granularity of the detected clustering
Modularity and Markov stability both have a parameter that controls the granularity of the detected clusterings.Modularity comes with a resolution parameter γ, and increasing γ typically results in detecting communities of smaller sizes.However, it is unclear how this resolution parameter should be chosen in order to detect clusterings of the desired granularity.With 'desired', we mean that the granularity of the detected clustering is similar to the granularity of the planted clustering in cases where the graph is drawn from a graph generator.For ER-modularity, there is a particular value of γ(p in , p out ) for which maximizing ER-modularity is equivalent to maximizing the likelihood of a PPM with parameters p in , p out .However, as mentioned in Section 3.4, it is known that maximizing this likelihood is biased towards communities of logarithmic size.Markov stability comes with a time parameter t which controls the granularity of the detected clustering.Increasing t results in detecting larger communities.Again, it is unclear how this time should be chosen in order to detect communities of the desired granularity.
Within the framework of projection methods, a natural measure of the granularity of a clustering C is the latitude ℓ(b(C)) of the corresponding clustering vector.Hence, in cases where the graph is drawn from a generator with a planted clustering T , a clustering with 'desired' granularity is a clustering C with ℓ(b(C)) ≈ ℓ(b(T )).In turn, the desired ℓ(b(C)) can be obtained by choosing the right latitude of a query vector.How to make this choice is the topic of the remainder of this Section 5.2.
The simplest way to change a query vector in order to detect clusterings of coarser granularity, is to add a multiple of 1.That is, a new query vector q ′ = q + c • 1 for some c > 0. The vectors q ′ and q lie on the same meridian, i.e., d ρ (q, q ′ ) = 0, while q ′ is further away from −1, so that ℓ(q ′ ) > ℓ(q).Hence, adding c • 1 is equivalent to projecting the vector q to a different latitude, i.e., q ′ = P λ (q) for some λ ∈ (ℓ(q), π].Similarly, the simplest way to change a query vector in order to detect clusterings of finer granularity, is to subtract a multiple of 1, i.e., q ′ = q − c • 1, which is equivalent to q ′ = P λ (q) for λ ∈ [0, ℓ(q)).Hence, the question becomes: given a query vector q, how should λ be chosen such that q ′ = P λ (q) results in a clustering with similar granularity as b(T ), i.e., such that ℓ (L(q ′ )) ≈ ℓ(b(T ))?
We briefly illustrate how solving d a (q * , b(T )) = θ leads to (17): we use (4) to express cos θ in terms of λ T , λ * and d a (q * , b(T )) like Squaring both sides and making the substitutions cos d a (q * , b(T )) = cos θ and sin 2 λ * = 1 − cos 2 λ * yields This can be rewritten to the following quadratic equation in cos λ * : which has solutions This gives two possible solutions for λ * , one of which corresponds to(17).We refer to Gösgens et al. [2023] for the remaining details of the derivation and the experimental validation of this heuristic.
Note that cos θ is the Pearson correlation coefficient between q and b(T ), and can be considered a measure of how much information q carries of the clustering T .As a special case, note that cos θ = 1 implies that q and b(T ) lie on the same meridian, and we can see that λ * (λ T , 0) = λ T , so that q * = b(T ).In the other extreme, where q is not correlated with b(T ) (i.e., θ = π/2), we have λ * (λ T , π/2) = π/2, so that the resulting query vector lies on the equator (just like the modularity vector for γ = 1).For θ ∈ (0, π/2), the heuristic latitude λ * (λ T , θ) is between λ T and π/2.
To compute the heuristic latitude choice in (17), we need estimates of λ T and θ, which requires some knowledge of the planted clustering T .It might seem that requiring this knowledge of the planted clustering defeats the purpose of community detection.However, requiring partial knowledge of the planted clustering is not uncommon in other community detection methods, such as likelihood-based methods [Newman, 2016;Prokhorenkova and Tikhonov, 2019].  is the Pearson correlation between the clustering vectors, which we wish to make close to 1.We see that q * (q (MS) t ) strongly outperforms q (MS) t .
Moreover, when we have access to the graph generator, we can use this to estimate the means of λ T and θ, and use these estimates in (17).
In Gösgens et al. [2023], we have shown that this granularity heuristic works well for several query vectors q, including modularity vectors.In this section, we demonstrate that this heuristic also works well for Markov stability vectors.For t ∈ {1, . . ., 5}, we consider the projection method with query mapping q (MS) t from (11), which is equivalent to maximizing the Markov stability for a discrete-time random walk with time t.We compare this method to the projection method with query mapping q * q (MS) t , which corresponds to applying our granularity heuristic to the Markov stability vector.
To quantify the quality of the approximation ℓ(b(C)) ≈ ℓ(b(T )), we define the relative granularity error as ℓ(b(C))/ℓ(b(T )) − 1, which we want to be close to 0. Positive values indicate that the detected clustering is more coarse-grained than the planted clustering, while negative values indicate that the detected clustering is too fine-grained.We measure the similarity between the detected and planted communities by the correlation coefficient ρ(C, T ) = cos d ρ (b(C), b(T )).Values close to 1 indicate that the clusterings are highly similar, while values close to zero indicate that C is not more similar to T than a random relabeling of C.
For the PPM, HPPM, DCPPM and ABCD graph generators, the results are shown in Figures 3, 4, 5 and 6.For each generator, we generate 50 graphs and show boxplots of the outcomes for each of the query mappings.we wish to make close to 1.We see that q * (q (MS) t ) strongly outperforms q (MS) t .
Effect of the heuristic.In Figures 3a, 4a, 5a and 6a, we see that the granularity heuristic indeed leads to detecting clusterings with granularity closer to the granularity of the planted clustering.For almost all cases, we see that the median relative granularity error after applying the heuristic is closer to zero than before applying the heuristic.The only exception is t = 1 for the HPPM generator.We see that overall, the granularity heuristic results in clusterings that are slightly more fine-grained than the planted clustering.
In addition, Figures 3b, 4b, 5b and 6b show that the granularity heuristic typically leads to an increased similarity to the planted clustering.The only two exceptions are HPPM and ABCD for t = 1, where the granularity heuristic results in slightly lower performance.For the PPM generator, Figure 3b shows that for each of the values of t, the detection is near-perfect (all similarities are higher than ρ = 0.97).For the HPPM and DCPPM generators, Figures 4b and 5b show that the heterogeneity in the community sizes and degrees result in slightly lower performance on these generators.
Markov stability time and granularity.Figures 3a, 4a, 5a and 6a show that for the query vector q (MS) t (without applying the heuristic), larger values of t indeed lead to detecting more coarse-grained clusterings.However, t = 1 already results in clusterings that are more coarse-grained than the planted clustering.We see similar outcomes in Figures 4a, 5a and (to a lesser extent) Figure 6a.Since we are using a discrete-time Markov chain, we cannot consider times t ∈ (0, 1).For a continuous-time Markov chain, this is possible.
Sparsity and computation time.Note that the running time of the Louvain algorithm for Markov Stability vectors depends on the sparsity of the transition matrix P (t).To illustrate: for t = 1, the transition matrix has the same number of positive entries as the adjacency matrix, and (our implementation of) the Louvain algorithm runs in around 5 seconds.we wish to make close to 1.We see that q * (q (MS) t ) outperforms q (MS) t for t ̸ = 1.set and a validation set.The training set is used to find the best hyperparameter combination, while the validation set is used to get an unbiased estimate of the performance of the obtained hyperparameters values.
To demonstrate this method, we show how we can optimize a projection method for the ABCD graph generator from the previous section.To allow for comparison with this previous section, we consider the same 50 ABCD graphs as the validation set.For the training set, we generate 15 new ABCD graphs from this same generator.
We consider query vectors that are linear combinations of four vectors: the constant vector 1 (to control granularity), the adjacency vector v(A), the degree-product vector d(A) and the Jaccard vector j(A).The latter is defined as follows: let N (i) denote the neighborhood of i.We follow the convention that i ∈ N (i) for all i ∈ [n].Then j(A) ij is the Jaccard similarity between the neighborhoods of i and j: We consider query vectors that are linear combinations of these four vectors, q = c 1 •1+c A •v(A)+c d •d(A)+c j •j(A).
Since in the hyperspherical geometry, the length of the query vector does not affect the detected clustering, we can reduce the number of hyperparameters by one.Assuming that the best combination has c A > 0, we set c A = 1, thereby making the grid search more efficient.Moreover, for most values of the coefficient c 1 , the detection method will result in clusterings of a wrong granularity.Thus, instead of fitting c 1 , we use the granularity heuristic from Section 5.2.Then we are left with the following parametrization of query vectors: q(A; c j , c d ) = q * (v(A) + c j • j(A) + c d • d(A)) .
This leaves two hyperparameters to be tuned by the grid search.Note that the vector d(A) is a correction term for the degrees.Because of that, the best performance is likely to be found for c d ≤ 0. We choose the interval c d ∈ [−6, 0], which we discretize in steps of 1 2 .For the parameter c j , we discretize the interval [0, 1] into steps of size 1 10 .The results are shown in Figure 7.We see that there is a large region where the method performs well.In particular, the best-performing coefficients are c d = − 5 2 and c j = 1 2 with a median performance of ρ = 0.993 on the training set2 .We apply this query mapping to the validation set and a median performance of ρ = 0.973, which is slightly lower than the performance on the training set, as expected due to selection bias.Let us compare this to the performance of the Markov Stability query mappings from Section 5.2 on these graphs.In Figure 6b, we see that Markov stability with time t = 1 without applying the heuristic achieves the best median performance on this set of networks.Note that this is equivalent to CL-modularity maximization with γ = 1.This query mapping achieved a median performance of ρ = 0.912, which is good, but significantly lower than the performance of our optimized query mapping.In this demonstration, we have kept the setup relatively simple by taking combinations of only 4 vectors, reducing this to two coefficients.We did this for simplicity and so that we can visualize the performance on the training set by a two-dimensional heatmap.This already led to strong performance.It is likely that we can improve performance even further by including a larger number of query vectors, and using a grid search instead of the granularity heuristics to determine the coefficient c 1 .The obvious downside is that optimization for a larger number of parameters becomes computationally more demanding, and one must increase the size of the training set to avoid overfitting.
To conclude, we have shown that the class of projection methods unifies many popular community detection methods and is expressive enough to fit realistic benchmark generators like ABCD.This work paves the way to many follow-up research.On the one hand, there are many algorithmic questions: what projection algorithms work well for what query vectors?Are there sets of query vectors for which the minimization of d a (q, b(C)) is not NP-hard?On the other hand, there are also several methodological questions: how do the best-performing coefficients of the linear combination depend on the network properties?For example, how does the best-performing coefficient c d depend on the mean and variance of the degree distribution?
(a) Granularity errors of Markov stability on PPM networks.(b) Performance of Markov stability on PPM networks.

Figure 3 :
Figure 3: For 50 graphs drawn from the Planted Partition Model (PPM), we evaluate Markov stability with time t ∈ [5], and compare the clusterings that are obtained with and without applying the granularity heuristic.q * (•) denotes the heuristic.A positive granularity error indicates that the detected clustering is coarser than the planted clustering.ρ(C, T ) = cos d ρ (b(C), b(T ))is the Pearson correlation between the clustering vectors, which we wish to make close to 1.We see that q * (q(MS) (a) Granularity errors of Markov stability on HPPM networks.(b) Performance of Markov stability on HPPM networks.

Figure 4 :
Figure 4: For 50 graphs drawn from a Heterogeneously-sized Planted Partition Model (HPPM), we evaluate Markov stability with time t ∈ [5], and compare the clusterings that are obtained with and without applying the granularity heuristic.q * (•) denotes the heuristic.A positive granularity error indicates that the detected clustering is coarser than the planted clustering.ρ(C, T ) = cos d ρ (b(C), b(T )) is the Pearson correlation between the clustering vectors, whichwe wish to make close to 1.We see that q * (q(MS) (a) Granularity errors of Markov stability on ABCD networks.(b) Performance of Markov stability on ABCD networks.

Figure 6 :
Figure 6: For 50 graphs drawn from the Artificial Benchmark for Community Detection (ABCD), we evaluate Markov stability with time t ∈ [5], and compare the clusterings that are obtained with and without applying the granularity heuristic.q * (•) denotes the heuristic.A positive granularity error indicates that the detected clustering is coarser than the planted clustering.ρ(C, T ) = cos d ρ (b(C), b(T )) is the Pearson correlation between the clustering vectors, whichwe wish to make close to 1.We see that q * (q(MS)

Figure 7 :
Figure 7: A heatmap of the median performance (similarity between the detected and planted clusterings, as measured by ρ(C, T )) for different linear combinations of query mappings.The medians are computed over 10 samples of ABCD graphs, with the same parameters as in the experiments of Section 5.2.The best performance is marked with a white triangle.