<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2019.00007</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>You Can&#x00027;t See Me: Anonymizing Graphs Using the Szemer&#x000E9;di Regularity Lemma</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Foffano</surname> <given-names>Daniele</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Rossi</surname> <given-names>Luca</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/709426/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Torsello</surname> <given-names>Andrea</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/725825/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Dipartimento di Scienze Ambientali, Informatica e Statistica, Universit&#x000E0; Ca&#x00027; Foscari Venezia</institution>, <addr-line>Venezia</addr-line>, <country>Italy</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Computer Science and Engineering, Southern University of Science and Technology</institution>, <addr-line>Shenzhen</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Roberto Interdonato, T&#x000E9;l&#x000E9;d&#x000E9;tection et Information Spatiale (TETIS), France</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Ruggero Gaetano Pensa, University of Turin, Italy; Matteo Zignani, University of Milan, Italy</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Luca Rossi <email>rossil&#x00040;sustech.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big Data</p></fn></author-notes>
<pub-date pub-type="epub">
<day>31</day>
<month>05</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>2</volume>
<elocation-id>7</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>03</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>05</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2019 Foffano, Rossi and Torsello.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Foffano, Rossi and Torsello</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Complex networks gathered from our online interactions provide a rich source of information that can be used to try to model and predict our behavior. While this has very tangible benefits that we have all grown accustomed to, there is a concrete privacy risk in sharing potentially sensitive data about ourselves and the people we interact with, especially when this data is publicly available online and unprotected from malicious attacks. <italic>k</italic>-anonymity is a technique aimed at reducing this risk by obfuscating the topological information of a graph that can be used to infer the nodes&#x00027; identity. In this paper we propose a novel algorithm to enforce <italic>k</italic>-anonymity based on a well-known result in extremal graph theory, the Szemer&#x000E9;di regularity lemma. Given a graph, we start by computing a regular partition of its nodes. The Szemer&#x000E9;di regularity lemma ensures that such a partition exists and that the edges between the sets of nodes behave almost randomly. With this partition, we anonymize the graph by randomizing the edges within each set, obtaining a graph that is structurally similar to the original one yet the nodes within each set are structurally indistinguishable. We test the proposed approach on real-world networks extracted from Facebook. Our experimental results show that the proposed approach is able to anonymize a graph while retaining most of its structural information.</p></abstract>
<kwd-group>
<kwd>privacy</kwd>
<kwd>anonymity</kwd>
<kwd>social networks</kwd>
<kwd>graph</kwd>
<kwd>regularity lemma</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="28"/>
<page-count count="6"/>
<word-count count="4679"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>The beginning of the twenty-first century has been characterized by the rise of online social media and data-hungry artificial intelligence (AI). In this context, sophisticated machine learning algorithms feed off massive amounts of data produced by our digital personas to perfect the way they model and predict our behavior, both online and offline. However, the comforts of an increasingly AI-assisted life are overshadowed by the threat it poses to our privacy and freedom (Fung et al., <xref ref-type="bibr" rid="B9">2010</xref>; Rossi and Musolesi, <xref ref-type="bibr" rid="B21">2014</xref>; Rossi et al., <xref ref-type="bibr" rid="B23">2015b</xref>; Qian et al., <xref ref-type="bibr" rid="B20">2016</xref>). At the same time, the digital traces we produce, particularly interactions between users in an online social network, are often abstracted using a graph representation and made available in the form of public datasets, as they offer a unique opportunity for researchers to study real-world complex networks of interactions (Kwak et al., <xref ref-type="bibr" rid="B13">2010</xref>; Chorley et al., <xref ref-type="bibr" rid="B4">2016</xref>).</p>
<p>A common practice to protect the identity of the users whose interactions are captured by the graph is that of stripping the nodes of sensitive information (e.g., the users names), generating a random identifier to label the graph nodes. However, it has been shown that this does not guarantee that the user&#x00027;s privacy is preserved (Backstrom et al., <xref ref-type="bibr" rid="B1">2007</xref>). Indeed, it is possible to disclose the identity of an individual participating in the network with minimal external background information. One common example is that of a user for which the number of connections in the network is known (i.e., the number of friends on Facebook) and this number happens to be unique for that individual. In other words, this piece of information alone would be sufficient to identify that user among the rest of the nodes. Most importantly, once the identity is revealed, other potentially sensitive pieces of information can be inferred. For instance, the individual may turn out to belong to a group of nodes labeled with a certain sensitive attribute, e.g., health condition.</p>
<p>For these reasons, the problem of anonymizing graph data is becoming an increasingly studied one (Hay et al., <xref ref-type="bibr" rid="B11">2008</xref>; Liu and Terzi, <xref ref-type="bibr" rid="B16">2008</xref>; Rossi et al., <xref ref-type="bibr" rid="B22">2015a</xref>; Qian et al., <xref ref-type="bibr" rid="B20">2016</xref>). A common anonymity model is <italic>k</italic>-anonymity, which aims to ensure that each node in a network is structurally indistinguishable from at least other <italic>k</italic> nodes. Different works have focused on different definitions of &#x0201C;structurally indistinguishable.&#x0201D; Liu and Terzi (<xref ref-type="bibr" rid="B16">2008</xref>) considered the case of <italic>k</italic>-degree anonymous graphs, where <italic>k</italic>-degree anonymity guarantees that each node of the graph shares the same degree of at least <italic>k</italic> other nodes. Successive works attempted to reduce the total running time of Liu and Terzi (<xref ref-type="bibr" rid="B16">2008</xref>) to make it feasible to scale up to large networks (Hay et al., <xref ref-type="bibr" rid="B11">2008</xref>). Rossi et al. (<xref ref-type="bibr" rid="B22">2015a</xref>), on the other hand, extended the concept of <italic>k</italic>-degree anonymity to multi-layer and time-varying graphs. Other researchers considered different structural distinguishability criteria where the attacker has increasing levels of information available to deanomymize the nodes (Hay et al., <xref ref-type="bibr" rid="B11">2008</xref>; Cheng et al., <xref ref-type="bibr" rid="B3">2010</xref>; Zhou and Pei, <xref ref-type="bibr" rid="B28">2011</xref>), however the main issue with these approaches lies in the need to add increasing amounts of noise as increasingly complex structural information needs to be obfuscated. More recently Rousseau et al. (<xref ref-type="bibr" rid="B24">2018</xref>) considered the problem of anonymizing a graph maximizing the amount of preserved community information. Finally, Qian et al. (<xref ref-type="bibr" rid="B20">2016</xref>) and Ma et al. (<xref ref-type="bibr" rid="B17">2018</xref>) looked at the complementary problem of deanonymizing a graph in the case where the attacker has access to richer features as well as structural information.</p>
<p>While most of the previous <italic>k</italic>-anonymity approaches assume that the attacker has access only to a certain level of structural information (from the degree of a node, to its immediate neighborhood or even the whole graph), in this paper we propose a method that creates <italic>k</italic>-anonymous groups of nodes where no degree of structural information can help to break the anonymity guarantee. Our approach is based on the Szemer&#x000E9;di regularity lemma (Diestel, <xref ref-type="bibr" rid="B5">2012</xref>), a well-known result of extremal graph theory. The Szemer&#x000E9;di regularity lemma has been successfully applied to several problems, from graph theory (Koml&#x000F3;s and Simonovits, <xref ref-type="bibr" rid="B12">1996</xref>) to computer vision and pattern recognition (Sperotto and Pelillo, <xref ref-type="bibr" rid="B26">2007</xref>; Pelillo et al., <xref ref-type="bibr" rid="B19">2017</xref>). The lemma roughly states that every sufficiently large and dense graph<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> can be approximated by the union of random-like bipartite graphs called regular pairs. Our observation is that the groups of graph nodes that form these regular pairs can be anonymized by rewiring the intra-group edges according to an Erd&#x000F6;s-R&#x000E9;nyi process (Erd&#x00151;s, <xref ref-type="bibr" rid="B6">1960</xref>). Thanks to the theoretical guarantees of the Szemer&#x000E9;di regularity lemma, this has minimal effect on the overall graph structure and, together with the random-like behavior of the inter-group connections, ensures that the each group is anonymous.</p>
<p>The reminder of the paper is organized as follows. We start by reviewing the key graph theoretical concepts underpinning our work in section 2. In section 3 we propose our anonymization method based on the Szemer&#x000E9;di regularity lemma and in section 4 we evaluate it on three different networks abstracted from Facebook. Finally, section 5 concludes the paper.</p>
</sec>
<sec id="s2">
<title>2. Szemer&#x000E9;di Regularity Lemma</title>
<p>Let <italic>G</italic> &#x0003D; (<italic>V, E</italic>) be an undirected graph with no self-loops, where <italic>V</italic> is the set of nodes and <italic>E</italic> is the set of edges. If <italic>X</italic> and <italic>Y</italic> are disjoint subsets of <italic>V</italic>, the <italic>edge density</italic> of this pair (<italic>X, Y</italic>) is defined as <inline-formula><mml:math id="M1"><mml:mi>d</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo>|</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>X</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>Y</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:math></inline-formula>, where <italic>E</italic>(<italic>X, Y</italic>) is the set of edges connecting nodes in <italic>X</italic> to nodes in <italic>Y</italic>. The edge density satisfies 0 &#x02264; <italic>d</italic>(<italic>X, Y</italic>) &#x02264; 1.</p>
<p>Given a positive real &#x003B5; &#x0003E; 0, a pair of node sets <italic>X</italic> and <italic>Y</italic> is called &#x003B5;<italic>-regular</italic> if for all subsets <italic>A</italic>&#x02286;<italic>X</italic> and <italic>B</italic>&#x02286;<italic>Y</italic> satisfying |<italic>A</italic>| &#x02265; &#x003B5;|<italic>X</italic>| and |<italic>B</italic>| &#x02265; &#x003B5;|<italic>Y</italic>| we have |<italic>d</italic>(<italic>X, Y</italic>)&#x02212;<italic>d</italic>(<italic>A, B</italic>)| &#x02264; &#x003B5;. Stated otherwise, the distribution of the edges between an &#x003B5;-regular pair is almost uniform, i.e., the graph over <italic>X</italic>&#x0222A;<italic>Y</italic> behaves like a random bipartite graph.</p>
<p>Let the node set <italic>V</italic> be divided into a partition <inline-formula><mml:math id="M2"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula> of <italic>l</italic> sets <italic>V</italic><sub>1</sub>, &#x022EF;&#x02009;, <italic>V</italic><sub><italic>l</italic></sub>. <inline-formula><mml:math id="M3"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula> is an &#x003B5;<italic>-regular partition</italic> if: (1) |||<italic>V</italic><sub><italic>i</italic></sub>| &#x02212; |<italic>V</italic><sub><italic>j</italic></sub>|| &#x02264; 1, for 1 &#x02264; <italic>i</italic> &#x0003C; <italic>j</italic> &#x02264; <italic>l</italic> and (2) all except at most &#x003B5;<italic>l</italic><sup>2</sup> pairs (<italic>V</italic><sub><italic>i</italic></sub>, <italic>V</italic><sub><italic>j</italic></sub>) (1 &#x02264; <italic>i</italic> &#x0003C; <italic>j</italic> &#x02264; <italic>l</italic>), are &#x003B5;-regular. With these definitions in hand, we can finally state the following.</p>
<p><bold>Lemma 2.1</bold> (Szemer&#x000E9;di regularity lemma). <italic>For every positive real</italic> &#x003B5; &#x0003E; 0 <italic>and every positive integer</italic> <italic>m</italic><italic>, there exist positive integers</italic> <italic>N</italic> &#x0003D; <italic>N</italic>(&#x003B5;, <italic>m</italic>) <italic>and</italic> <italic>M</italic> &#x0003D; <italic>M</italic>(&#x003B5;, <italic>m</italic>) <italic>such that, if</italic> <italic>G</italic> &#x0003D; (<italic>V, E</italic>) <italic>is a graph with</italic> |<italic>V</italic>| &#x02265; <italic>N</italic> <italic>nodes, there is an</italic> &#x003B5;<italic>-regular partition of</italic> <italic>V</italic> <italic>into</italic> <italic>l</italic> <italic>groups with sizes that differ at most by 1, where</italic> <italic>m</italic> &#x02264; <italic>l</italic> &#x02264; <italic>M</italic>.</p>
<p>In other words, the Szemer&#x000E9;di regularity lemma states that a graph can be seen as a collection of groups of nodes such that the edges between these groups are almost uniformly distributed. More generally, as stated by Koml&#x000F3;s and Simonovits (<xref ref-type="bibr" rid="B12">1996</xref>), the regularity lemma states that every graph can be approximated by generalized random graphs. Note that the lemma also states that there may be a number of &#x003B5;-irregular pairs that do not behave like random bipartite graphs. However, for a sufficiently small &#x003B5;, the number of such pairs will be low (i.e., smaller than &#x003B5;<italic>l</italic><sup>2</sup>).</p>
<p>Given a graph <italic>G</italic> and an &#x003B5;-regular partition of its nodes, a reduced graph can be constructed by replacing each pair of &#x003B5;-regular groups with two nodes connected by an edge. As shown by the Key lemma (Koml&#x000F3;s and Simonovits, <xref ref-type="bibr" rid="B12">1996</xref>), the reduced graph inherits many of the fundamental structural properties of the original graph, to the point that the graph obtained by simply replacing each pair of connected nodes of the reduced graph with a complete bipartite graph over 2<italic>t</italic> nodes yields a new graph that can be used as a surrogate of the original one, where <italic>t</italic> &#x02265; 1 is an integer.</p>
<p>Recall that the aim of this paper is to anonymize a graph <italic>G</italic> &#x0003D; (<italic>V, E</italic>) by grouping <italic>V</italic> into sets of <italic>k</italic>-anonymous nodes. The Szemer&#x000E9;di regularity lemma states that the node set of each graph can be rearranged to reveal a random-like structure, where pairs of groups of <italic>k</italic> nodes are connected in an almost uniform (in other words, random) way. That is, for the purpose of graph de-anonymization, the edge information between the groups of nodes is unusable. Unfortunately, the intra-group connections can be still exploited to deanonymize the nodes. However, the Szemer&#x000E9;di regularity lemma and the fact that the reduced graph (where the intra-group connections are lost) preserves the fundamental structural properties of the original graph imply that these intra-group connections are small in number and structurally negligible.</p>
</sec>
<sec id="s3">
<title>3. Anonymization Framework</title>
<p>In the previous section we introduced the Szemer&#x000E9;di regularity lemma and we showed how this can be seen as a first step toward obtaining a <italic>k</italic>-anonymous graph. To achieve full <italic>k</italic>-anonymity, however, we need to obfuscate the structural information contained in the intra-group connections of the &#x003B5;-regular partition. Our solution involves rewiring these connections using the Erd&#x000F6;s-R&#x000E9;nyi model (Erd&#x00151;s, <xref ref-type="bibr" rid="B6">1960</xref>), effectively replacing each subgraph (i.e., each group of the &#x003B5;-regular partition) with an Erd&#x000F6;s-R&#x000E9;nyi graph over the same set of nodes. Crucially, for each subgraph, we set the parameter <italic>p</italic>, which governs the probability of adding/deleting an edge, equal to the density of the original subgraph. More specifically, our approach follows three steps: (1) we first find a regular partition using the regularity lemma; (2) then, we randomize the groups&#x00027; intra-connections; and (3) finally, we randomize the edges connecting irregular pairs.</p>
<p>In the <bold>first step</bold> we apply the algorithm implemented by Fiorucci et al. (<xref ref-type="bibr" rid="B7">2019</xref>)<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>. This extends the previous algorithm of Fiorucci et al. (<xref ref-type="bibr" rid="B8">2017</xref>) by proposing a novel heuristic procedure where the node set is first partitioned into two groups of nodes and then these are recursively split into smaller groups until a desired cardinality is met and certain conditions that measure quality of the &#x003B5;-regularity of the partition are satisfied (Pelillo et al., <xref ref-type="bibr" rid="B19">2017</xref>). In particular Fiorucci et al. propose two different heuristics to split the groups, one called <italic>degree based</italic>, which groups together nodes with similar degrees (Fiorucci et al., <xref ref-type="bibr" rid="B8">2017</xref>), and a second one called <italic>indeg guided</italic>, which splits a sparse (dense) partition into two sparse (dense) partitions. Note that using this method we can only get a number of &#x003B5;-regular groups which is a power of 2.</p>
<p>The <bold>second step</bold> involves randomly rewiring the connections within each group of vertices. To this end, we add or delete an edge with a probability <italic>p</italic> equal to the density of the subgraph <italic>H</italic> spanned by the group of nodes we are trying to anonymize. Note that we only change the internal connections of <italic>H</italic>, so we are not altering the &#x003B5;-regularity relations. The resulting subgraph <italic>H</italic>&#x02032; will have the same density of <italic>H</italic>, however its structural information will not be of any use when trying to deanonymize its nodes.</p>
<p>Recall that each &#x003B5;-regular partition allows up to &#x003B5;<italic>l</italic><sup>2</sup> irregular pairs, where <italic>l</italic> is the number of sets of the &#x003B5;-regular partition. So far we ensured that the connections within and between &#x003B5;-regular pairs are anonymous, however we have not yet dealt with irregular pairs. The <bold>third step</bold> addresses this and requires rewiring the connections between groups forming an &#x003B5;-irregular pair. Let (<italic>V</italic><sub><italic>i</italic></sub>, <italic>V</italic><sub><italic>j</italic></sub>) be one such pair, with total number of nodes <italic>n</italic>. Consider the bipartite subgraph <italic>H</italic> &#x0003D; (<italic>V</italic><sub><italic>i</italic></sub> &#x0222A; <italic>V</italic><sub><italic>j</italic></sub>, <italic>E</italic><sub><italic>ij</italic></sub>) where we only consider the set of edges <italic>E</italic><sub><italic>ij</italic></sub> connecting nodes in <italic>V</italic><sub><italic>i</italic></sub> with nodes in <italic>V</italic><sub><italic>j</italic></sub>. In order to render the structural information contained in these edges unusable for deanonymization purposes, we randomly rewire each pair of nodes (<italic>u, v</italic>), with <italic>u</italic>&#x02208;<italic>V</italic><sub><italic>i</italic></sub> and <italic>v</italic>&#x02208;<italic>V</italic><sub><italic>j</italic></sub>, by adding/deleting an edge to <italic>E</italic><sub><italic>ij</italic></sub> with probability <italic>p</italic> equal to |<italic>E</italic><sub><italic>ij</italic></sub>|/(<italic>V</italic><sub><italic>i</italic></sub> &#x000D7; <italic>V</italic><sub><italic>j</italic></sub>).</p>
<p>In this framework &#x003B5; can be interpreted as a measure of the error made by the Szemer&#x000E9;di regularity lemma approximation, i.e., the smaller &#x003B5; the better the anonymized graph approximates the original graph. In fact, the amount of structural information preserved is inversely proportional to the number of edges we need to rewire. The Szemer&#x000E9;di regularity lemma allows us to safely rewire intra-group connections, knowing that these are small in number and structurally negligible. So the key to preserving the structural information of the original graph is to minimize the number of &#x003B5;-irregular pairs. This becomes particularly relevant when anonymizing real-world complex networks, which often display a scale-free structure (Barab&#x000E1;si and Albert, <xref ref-type="bibr" rid="B2">1999</xref>). In these networks a small number of nodes (i.e., hubs) has a very large degree. If an irregular pair contains a hub we will end up rewiring a large number of edges, potentially compromising the structural information for the sake of anonymity. Therefore, minimizing the number of &#x003B5;-irregular pairs is of fundamental importance. Also, recall that the method of Fiorucci et al. is based on heuristics, and in general different runs of their algorithm can result in different &#x003B5;-regular partitions. For this reason, we repeat the computation of the &#x003B5;-regular partition <monospace>max_iter</monospace>times and we choose the partition with the minimum &#x003B5; and number of &#x003B5;-irregular pairs. Note that each iteration of the algorithm of Fiorucci et al. has computational cost <italic>O</italic>(<italic>n</italic><sup>2.376</sup>), and this cost dominates in the overall anonymization complexity.</p>
</sec>
<sec id="s4">
<title>4. Experimental Results</title>
<p>We test the proposed method on three real-world networks abstracted from Facebook. Note that all the graphs are sparse, as shown in <xref ref-type="table" rid="T1">Table 1</xref>. <italic>Facebook Combined</italic> represents circles (or friend lists) from Facebook. It was introduced for the first time by Mcauley and Leskovec in Leskovec and Mcauley (<xref ref-type="bibr" rid="B14">2012</xref>). The two remaining networks, <italic>Tv Shows</italic> and <italic>Politicians</italic> describe blue verified pages of different kinds, where edges represent mutual likes among them (Rozemberczki et al., <xref ref-type="bibr" rid="B25">2018</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Summary of the main structural characteristics of the original graphs.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center"><bold>Nodes</bold></th>
<th valign="top" align="center"><bold>Density</bold></th>
<th valign="top" align="center"><bold>Edges</bold></th>
<th valign="top" align="center"><bold>Avg. clustering coefficient</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Facebook Combined</td>
<td valign="top" align="center">4,039</td>
<td valign="top" align="center">0.011</td>
<td valign="top" align="center">88,234</td>
<td valign="top" align="center">0.606</td>
</tr>
<tr>
<td valign="top" align="left">Politicians</td>
<td valign="top" align="center">3,892</td>
<td valign="top" align="center">0.002</td>
<td valign="top" align="center">41,729</td>
<td valign="top" align="center">0.385</td>
</tr>
<tr>
<td valign="top" align="left">Tv shows</td>
<td valign="top" align="center">5,908</td>
<td valign="top" align="center">0.002</td>
<td valign="top" align="center">17,262</td>
<td valign="top" align="center">0.374</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>With these graphs in hand, we compute their anonymized versions and we measure the amount of structural information lost with respect to the original graphs. In particular, we track the changes in number of edges, degree distribution, average clustering coefficient (Watts and Strogatz, <xref ref-type="bibr" rid="B27">1998</xref>), and page rank vector (Page et al., <xref ref-type="bibr" rid="B18">1999</xref>). We compute these changes for different levels of <italic>k</italic>-anonymity, which in turn correspond to different choices of the partition cardinality <italic>l</italic>. Recall in fact that <italic>k</italic> and <italic>l</italic> are related by the fact that in a graph with <italic>n</italic> nodes an &#x003B5;-regular partition groups the vertices into <italic>l</italic> sets of cardinality <inline-formula><mml:math id="M4"><mml:mi>k</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02248;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>.</p>
<p>Note also that larger values of <italic>l</italic> also imply larger values of &#x003B5;<italic>l</italic><sup>2</sup>, the maximum number of &#x003B5;-irregular pairs we can find in the network. Irregular pairs force us to randomly rewire connections that are not guaranteed to be structurally negligible by the Szemer&#x000E9;di regularity lemma (like the intra-group connections), so in general for large values of <italic>l</italic> more effort has to go into finding an &#x003B5;-regular partition with minimum value of &#x003B5; (in these experiments we vary &#x003B5; from 0.01 to 0.2, with steps of 0.025). This is also the reason why we were only able to compute the &#x003B5;-regular partitions for a small range of values of <italic>l</italic>. In fact, for some combinations of dataset and <italic>l</italic>, the algorithm of Fiorucci et al. was unable to find an optimal partition within <monospace>max_iter</monospace> &#x0003D; 100 iterations. In our experiments, the runtime to compute an &#x003B5;-regular partition varies between approximately 10 and 80 s, on a machine with an 8-core 3.6 GHz CPU and 16GB of RAM.</p>
<p>We start by comparing the degree distributions of the original graphs and the anonymized ones, using both the <italic>degree based</italic> and the <italic>indeg guided</italic> heuristics. <xref ref-type="fig" rid="F1">Figure 1</xref> shows the log-log plots of the results. Note that larger values of <italic>l</italic> tend to correspond to more accurate approximations of the original degree distribution. This is confirmed by looking at the Jensen-Shannon (JS) divergence Lin (<xref ref-type="bibr" rid="B15">1991</xref>) between the degree distributions, which for the <italic>degree guided</italic> heuristic and the <italic>Politicians</italic> dataset goes from 0.062 (with <italic>l</italic> &#x0003D; 4) to 0.011 (with <italic>l</italic> &#x0003D; 32)<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. Interestingly, the <italic>indeg guided</italic> heuristic seems to yield the best approximations. This could be because the degree-based heuristic struggles to create groups of nodes with similar degree when there are hubs among them. Indeed, for the <italic>indeg guided</italic> heuristic the JS divergence goes from 0.066 (with <italic>l</italic> &#x0003D; 4) to 0.016 (<italic>l</italic> &#x0003D; 8), whereas for <italic>l</italic> &#x0003D; 8 the <italic>degree guided</italic> heuristic achieves a JS divergence of 0.034<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref>. In the remainder of the experiments we focus only on the <italic>indeg guided</italic> heuristic.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Degree distribution of the graphs with <italic>degree based</italic> <bold>(A&#x02013;C)</bold> and <italic>indeg guided</italic> <bold>(D&#x02013;F)</bold> heuristics.</p></caption>
<graphic xlink:href="fdata-02-00007-g0001.tif"/>
</fig>
<p><xref ref-type="table" rid="T2">Table 2</xref> shows the variation in the number of edges and average clustering coefficient with respect to the original graph. More precisely, we report <inline-formula><mml:math id="M8"><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where <italic>s</italic><sub><italic>G</italic></sub> and <inline-formula><mml:math id="M9"><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow></mml:msub></mml:math></inline-formula> are statistics computed on the original and anonymized graphs, respectively (averaged over 10 anonymizations). We first note that the number of edges of the graphs changes only very slightly. Indeed, when we alter the structure of a group of vertices we do it by adding/deleting edges with a probability equal to the original edge density of the group. This in turn has the effect of keeping the number of edges approximately the same, regardless of the size <italic>k</italic> of the anonymity sets.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Average variation in the number of edges (average clustering coefficient) between the original graph <italic>G</italic> and the anonymized graph <inline-formula><mml:math id="M5"><mml:mover accent="true"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula>, calculated as <inline-formula><mml:math id="M6"><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where <italic>s</italic><sub><italic>G</italic></sub> and <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow></mml:msub></mml:math></inline-formula> are the statistics considered.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center"><bold><italic>l</italic> &#x0003D; 4</bold></th>
<th valign="top" align="center"><bold><italic>l</italic> &#x0003D; 8</bold></th>
<th valign="top" align="center"><bold><italic>l</italic> &#x0003D; 16</bold></th>
<th valign="top" align="center"><bold><italic>l</italic> &#x0003D; 32</bold></th>
<th valign="top" align="center"><bold><italic>l</italic> &#x0003D; 64</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Facebook Combined</td>
<td valign="top" align="center">0.0012<break/> (0.7162)</td>
<td valign="top" align="center">0.0012<break/> (0.6310)</td>
<td valign="top" align="center">0.0010<break/> (0.5696)</td>
<td valign="top" align="center">0.0010<break/> (0.5302)</td>
<td valign="top" align="center">0.0010<break/> (0.4822)</td>
</tr>
<tr>
<td valign="top" align="left">Politicians</td>
<td valign="top" align="center">0.0021<break/> (0.6983)</td>
<td valign="top" align="center">0.0020<break/> (0.6415)</td>
<td valign="top" align="center">0.0015<break/> (0.5261)</td>
<td valign="top" align="center">0.09<break/> (0.2395)</td>
<td valign="top" align="center">n.a.</td>
</tr>
<tr>
<td valign="top" align="left">Tv shows</td>
<td valign="top" align="center">0.0034<break/> (0.6553)</td>
<td valign="top" align="center">0.0036<break/> (0.5064)</td>
<td valign="top" align="center">0.0013<break/> (0.3158)</td>
<td valign="top" align="center">n.a.</td>
<td valign="top" align="center">n.a.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We then check the effect of the anonymization on the average clustering coefficient of the graph. <xref ref-type="table" rid="T2">Table 2</xref> shows that these statistics change significantly. Recall that the average clustering coefficient is proportional to the number of triangles in a network (Watts and Strogatz, <xref ref-type="bibr" rid="B27">1998</xref>), however the Erd&#x000F6;s-R&#x000E9;nyi rewiring used to anonymize the vertex groups and the &#x003B5;-irregular pairs is likely to break these triangles. While the Szemer&#x000E9;di regularity lemma ensures that the vertex groups are sufficiently sparse that we can ignore their inner structure, this clearly does not hold for &#x003B5;-irregular pairs, which we also need to anonymize. This is particularly an issue when hubs fall within such an irregular pair. However, note that increasing <italic>l</italic> (i.e., reducing the size <italic>k</italic> of the anonymity sets) allows us to preserve the average clustering coefficient better. In general, a low value of <italic>l</italic> implies larger anonymity groups, but it also forces the heuristic procedure used to approximate the &#x003B5;-regular partition to bring more edges (and triangles) inside the groups, which are then affected by the Erd&#x000F6;s-R&#x000E9;nyi rewiring. Indeed, high anonymity demands several more structural modifications. In practice it is common to look for smaller <italic>k</italic>-anonymity groups (i.e., larger <italic>l</italic>), and for these values we are better able to preserve the average clustering coefficient information.</p>
<p>Finally, <xref ref-type="fig" rid="F2">Figure 2</xref> shows the cosine similarity and the Spearman&#x00027;s rank correlation between the page rank vectors (Page et al., <xref ref-type="bibr" rid="B18">1999</xref>) of the original and anonymized graphs. The results confirm that the proposed anonymization procedure is able to preserve well the centrality information of the nodes, once again with the quality of the approximation generally improving as we reduce the size of the anonymity groups.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Cosine similarity and Spearman&#x00027;s correlation of the page rank vectors (<italic>indeg guided</italic> heuristic). <bold>(A)</bold> Tv shows, <bold>(B)</bold> Politicians, and <bold>(C)</bold> Facebook Combined.</p></caption>
<graphic xlink:href="fdata-02-00007-g0002.tif"/>
</fig>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusion</title>
<p>We considered the problem of protecting the identity of the nodes of a network from an attacker with background structural knowledge. We proposed to use the Szemer&#x000E9;di regularity lemma to compute an &#x003B5;-regular partition of the original graph which is then anonymized by injecting Erd&#x000F6;s-R&#x000E9;nyi at selected locations. This creates a <italic>k</italic>-anonymous graph where the loss of structural information is minimized. We validated our method on three real-world networks abstracted from Facebook. Future work should perform a more extensive evaluation of the proposed method on larger graphs, with a wider range of values, and compare our method with alternative anonymization approaches.</p>
</sec>
<sec id="s6">
<title>Data Availability</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: a <ext-link ext-link-type="uri" xlink:href="https://snap.stanford.edu/data/index.html">https://snap.stanford.edu/data/index.html</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>AT: conceptualization. LR and AT: methodology. DF: software. DF, LR, and AT: investigation, writing&#x02013;review, and editing. LR: writing&#x02013;original draft preparation.</p>
<sec>
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Backstrom</surname> <given-names>L.</given-names></name> <name><surname>Dwork</surname> <given-names>C.</given-names></name> <name><surname>Kleinberg</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography,</article-title> in <source>Proceedings of the 16th International Conference on World Wide Web (WWW &#x00027;07)</source> (<publisher-loc>Banff, AB</publisher-loc>), <fpage>181</fpage>&#x02013;<lpage>190</lpage>. <pub-id pub-id-type="doi">10.1145/1242572.1242598</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barab&#x000E1;si</surname> <given-names>A.-L.</given-names></name> <name><surname>Albert</surname> <given-names>R.</given-names></name></person-group> (<year>1999</year>). <article-title>Emergence of scaling in random networks</article-title>. <source>Science</source> <volume>286</volume>, <fpage>509</fpage>&#x02013;<lpage>512</lpage>. <pub-id pub-id-type="doi">10.1126/science.286.5439.509</pub-id><pub-id pub-id-type="pmid">10521342</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>J.</given-names></name> <name><surname>Fu</surname> <given-names>A. W.-C.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>K-isomorphism: privacy preserving network publication against structural attacks,</article-title> in <source>Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD &#x00027;10)</source> (<publisher-loc>Indianapolis, IN</publisher-loc>), <fpage>459</fpage>&#x02013;<lpage>470</lpage>. <pub-id pub-id-type="doi">10.1145/1807167.1807218</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chorley</surname> <given-names>M. J.</given-names></name> <name><surname>Rossi</surname> <given-names>L.</given-names></name> <name><surname>Tyson</surname> <given-names>G.</given-names></name> <name><surname>Williams</surname> <given-names>M. J.</given-names></name></person-group> (<year>2016</year>). <article-title>Pub crawling at scale: tapping untappd to explore social drinking,</article-title> in <source>Tenth International AAAI Conference on Web and Social Media</source> (<publisher-loc>Cologne</publisher-loc>). Available at: <ext-link ext-link-type="uri" xlink:href="https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13048">https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13048</ext-link> (accessed May 20, 2019).</citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Diestel</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <source>Graph Theory</source>. Graduate Texts in Mathematics, <volume>Vol. 173</volume>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.springer.com/gp/book/9783662536216">https://www.springer.com/gp/book/9783662536216</ext-link></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Erd&#x00151;s</surname> <given-names>P.</given-names></name></person-group> (<year>1960</year>). <article-title>Graphs with prescribed degrees of vertices (hungarian)</article-title>. <source>Mat. Lapok</source> <volume>11</volume>, <fpage>264</fpage>&#x02013;<lpage>274</lpage>.</citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fiorucci</surname> <given-names>M.</given-names></name> <name><surname>Pelosin</surname> <given-names>F.</given-names></name> <name><surname>Pelillo</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Separating structure from noise in large graphs using the regularity lemma</article-title>. <source>CoRR</source> abs/1905.06917.</citation></ref>
<ref id="B8">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Fiorucci</surname> <given-names>M.</given-names></name> <name><surname>Torcinovich</surname> <given-names>A.</given-names></name> <name><surname>Curado</surname> <given-names>M.</given-names></name> <name><surname>Escolano</surname> <given-names>F.</given-names></name> <name><surname>Pelillo</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>On the interplay between strong regularity and graph densification,</article-title> in <source>11th IAPR-TC-15 International Workshop, GbRPR 2017</source> (<publisher-loc>Anacapri</publisher-loc>), <fpage>165</fpage>&#x02013;<lpage>174</lpage>.</citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fung</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Chen</surname> <given-names>R.</given-names></name> <name><surname>Yu</surname> <given-names>P. S.</given-names></name></person-group> (<year>2010</year>). <article-title>Privacy-preserving data publishing: a survey of recent developments</article-title>. <source>ACM Comput. Surveys</source> <volume>42</volume>:<fpage>14</fpage>. <pub-id pub-id-type="doi">10.1201/9781420091502</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gerke</surname> <given-names>S.</given-names></name> <name><surname>Steger</surname> <given-names>A.</given-names></name></person-group> (<year>2005</year>). <article-title>The sparse regularity lemma and its applications</article-title>. <source>Surveys Combin.</source> <volume>327</volume>, <fpage>227</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1017/CBO9780511734885.010</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hay</surname> <given-names>M.</given-names></name> <name><surname>Miklau</surname> <given-names>G.</given-names></name> <name><surname>Jensen</surname> <given-names>D.</given-names></name> <name><surname>Towsley</surname> <given-names>D.</given-names></name> <name><surname>Weis</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <article-title>Resisting structural re-identification in anonymized social networks</article-title>. <source>Proc. VLDB Endow.</source> <volume>1</volume>, <fpage>102</fpage>&#x02013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.14778/1453856.1453873</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koml&#x000F3;s</surname> <given-names>J.</given-names></name> <name><surname>Simonovits</surname> <given-names>M.</given-names></name></person-group> (<year>1996</year>). <article-title>Szemer&#x000E9;di&#x00027;s regularity lemma and its applications in graph theory</article-title>. <source>Combinatorics</source> <volume>2</volume>, <fpage>295</fpage>&#x02013;<lpage>352</lpage>.</citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kwak</surname> <given-names>H.</given-names></name> <name><surname>Lee</surname> <given-names>C.</given-names></name> <name><surname>Park</surname> <given-names>H.</given-names></name> <name><surname>Moon</surname> <given-names>S.</given-names></name></person-group> (<year>2010</year>). <article-title>What is twitter, a social network or a news media?,</article-title> in <source>Proceedings of the 19th International Conference on World Wide Web (WWW &#x00027;10)</source> (<publisher-loc>Raleigh, NC</publisher-loc>), <fpage>591</fpage>&#x02013;<lpage>600</lpage>. <pub-id pub-id-type="doi">10.1145/1772690.1772751</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leskovec</surname> <given-names>J.</given-names></name> <name><surname>Mcauley</surname> <given-names>J. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Learning to discover social circles in ego networks,</article-title> in <source>Advances in Neural Information Processing Systems</source>, <fpage>539</fpage>&#x02013;<lpage>547</lpage>.</citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>J.</given-names></name></person-group> (<year>1991</year>). <article-title>Divergence measures based on the shannon entropy</article-title>. <source>IEEE Trans. Inform. Theor.</source> <volume>37</volume>, <fpage>145</fpage>&#x02013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1109/18.61115</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>K.</given-names></name> <name><surname>Terzi</surname> <given-names>E.</given-names></name></person-group> (<year>2008</year>). <article-title>Towards identity anonymization on graphs,</article-title> in <source>Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD &#x00027;08)</source> (<publisher-loc>Vancouver, BC</publisher-loc>), <fpage>93</fpage>&#x02013;<lpage>106</lpage>.</citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Qiao</surname> <given-names>Y.</given-names></name> <name><surname>Hu</surname> <given-names>G.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Sangaiah</surname> <given-names>A. K.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>De-anonymizing social networks with random forest classifier</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>10139</fpage>&#x02013;<lpage>10150</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2017.2756904</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Page</surname> <given-names>L.</given-names></name> <name><surname>Brin</surname> <given-names>S.</given-names></name> <name><surname>Motwani</surname> <given-names>R.</given-names></name> <name><surname>Winograd</surname> <given-names>T.</given-names></name></person-group> (<year>1999</year>). <source>The Pagerank Citation Ranking: Bringing Order to the Web</source>. Technical Report 1999-66, Stanford InfoLab.</citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pelillo</surname> <given-names>M.</given-names></name> <name><surname>Elezi</surname> <given-names>I.</given-names></name> <name><surname>Fiorucci</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Revealing structure in large graphs: Szemer&#x000E9;di&#x00027;s regularity lemma and its use in pattern recognition</article-title>. <source>Pattern Recogn. Lett.</source> <volume>87</volume>, <fpage>4</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2016.09.007</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Qian</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>X.-Y.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>De-anonymizing social networks and inferring private attributes using knowledge graphs,</article-title> in <source>IEEE INFOCOM 2016&#x02013;The 35th Annual IEEE International Conference on Computer Communications</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>.</citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rossi</surname> <given-names>L.</given-names></name> <name><surname>Musolesi</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>It&#x00027;s the way you check-in: identifying users in location-based social networks,</article-title> in <source>Proceedings of the Second ACM Conference on Online Social Networks (COSN &#x00027;14)</source> (<publisher-loc>Dublin</publisher-loc>), <fpage>215</fpage>&#x02013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1145/2660460.2660485</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rossi</surname> <given-names>L.</given-names></name> <name><surname>Musolesi</surname> <given-names>M.</given-names></name> <name><surname>Torsello</surname> <given-names>A.</given-names></name></person-group> (<year>2015a</year>). <article-title>On the k-anonymization of time-varying and multi-layer social graphs,</article-title> in <source>Ninth International AAAI Conference on Web and Social Media</source> (<publisher-loc>Oxford</publisher-loc>).</citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rossi</surname> <given-names>L.</given-names></name> <name><surname>Williams</surname> <given-names>M.</given-names></name> <name><surname>Stich</surname> <given-names>C.</given-names></name> <name><surname>Musolesi</surname> <given-names>M.</given-names></name></person-group> (<year>2015b</year>). <article-title>Privacy and the city: user identification and location semantics in location-based social networks,</article-title> in <source>Ninth International AAAI Conference on Web and Social Media</source> (<publisher-loc>Oxford</publisher-loc>).</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rousseau</surname> <given-names>F.</given-names></name> <name><surname>Casas-Roma</surname> <given-names>J.</given-names></name> <name><surname>Vazirgiannis</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Community-preserving anonymization of graphs</article-title>. <source>Knowl. Inform. Syst.</source> <volume>54</volume>, <fpage>315</fpage>&#x02013;<lpage>343</lpage>. <pub-id pub-id-type="doi">10.1007/s10115-017-1064-y</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rozemberczki</surname> <given-names>B.</given-names></name> <name><surname>Davies</surname> <given-names>R.</given-names></name> <name><surname>Sarkar</surname> <given-names>R.</given-names></name> <name><surname>Sutton</surname> <given-names>C.</given-names></name></person-group> (<year>2018</year>). <article-title>Gemsec: Graph embedding with self clustering</article-title>. <volume>arXiv preprint arXiv</volume>:<fpage>1802.03997</fpage>.</citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sperotto</surname> <given-names>A.</given-names></name> <name><surname>Pelillo</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Szemer&#x000E9;di&#x00027;s regularity lemma and its applications to pairwise clustering and segmentation,</article-title> in <source>Proceedings of the 6th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR&#x00027;07)</source> (<publisher-loc>Ezhou</publisher-loc>), <fpage>13</fpage>&#x02013;<lpage>27</lpage>.</citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watts</surname> <given-names>D. J.</given-names></name> <name><surname>Strogatz</surname> <given-names>S. H.</given-names></name></person-group> (<year>1998</year>). <article-title>Collective dynamics of small-world networks</article-title>. <source>Nature</source> <volume>393</volume>, <fpage>440</fpage>. <pub-id pub-id-type="doi">10.1038/30918</pub-id><pub-id pub-id-type="pmid">9623998</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>B.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks</article-title>. <source>Knowl. Inform. Syst.</source> <volume>28</volume>, <fpage>47</fpage>&#x02013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1007/s10115-010-0311-2</pub-id></citation>
</ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>Note that the lemma has been extended to sparse graphs as well (Gerke and Steger, <xref ref-type="bibr" rid="B10">2005</xref>).</p></fn>
<fn id="fn0002"><p><sup>2</sup>Code available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/MarcoFiorucci/graph-summarization-using-regular-partitions">https://github.com/MarcoFiorucci/graph-summarization-using-regular-partitions</ext-link>.</p></fn>
<fn id="fn0003"><p><sup>3</sup>The JS divergence takes a value between 0 and 1, with 0 indicating identical distributions. Results on other datasets are omitted due to space constraints.</p></fn>
<fn id="fn0004"><p><sup>4</sup>Note, however, that the value of the JS divergence is biased by the fact that most of the probability mass is on low-degree nodes.</p></fn>
</fn-group>
</back>
</article>