<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Phys.</journal-id>
<journal-title>Frontiers in Physics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Phys.</abbrev-journal-title>
<issn pub-type="epub">2296-424X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1251319</article-id>
<article-id pub-id-type="doi">10.3389/fphy.2023.1251319</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Dissimilarity-based hypothesis testing for community detection in heterogeneous networks</article-title>
<alt-title alt-title-type="left-running-head">Xu et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fphy.2023.1251319">10.3389/fphy.2023.1251319</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Xu</surname>
<given-names>Xin-Jian</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1066726/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Cheng</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mendes</surname>
<given-names>J. F. F.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/80682/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Mathematics</institution>, <institution>Shanghai University</institution>, <addr-line>Shanghai</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Physics</institution>, <institution>I3N</institution>, <institution>University of Aveiro</institution>, <addr-line>Aveiro</addr-line>, <country>Portugal</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1736046/overview">Duxin Chen</ext-link>, Southeast University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/434467/overview">Yilun Shang</ext-link>, Northumbria University, United Kingdom</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/92980/overview">Salvatore Micciche&#x27;</ext-link>, University of Palermo, Italy</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Xin-Jian Xu, <email>xinjxu@shu.edu.cn</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>11</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>11</volume>
<elocation-id>1251319</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>07</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>10</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Xu, Chen and Mendes.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Xu, Chen and Mendes</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Identifying communities within networks is a crucial and challenging problem with practical implications across various scientific fields. Existing methods often overlook the heterogeneous distribution of nodal degrees or require prior knowledge of the number of communities. To overcome these limitations, we propose an efficient hypothesis test for community detection by quantifying dissimilarities between graphs. Our approach centers around examining the dissimilarity between a given random graph and a null hypothesis which assumes a degree-corrected Erd&#xf6;s&#x2013;R&#xe9;nyi type. To compare the dissimilarity, we introduce a measure that takes into account the distributions of vertex distances, clustering coefficients, and alpha-centrality. This measure is then utilized in our hypothesis test. To simultaneously uncover the number of communities and their corresponding structures, we develop a two-stage bipartitioning algorithm. This algorithm integrates seamlessly with our hypothesis test and enables the exploration of community organization within the network. Through experiments conducted on both synthetic and real networks, we demonstrate that our method outperforms state-of-the-art approaches in community detection.</p>
</abstract>
<kwd-group>
<kwd>community detection</kwd>
<kwd>stochastic block model</kwd>
<kwd>hypothesis test</kwd>
<kwd>graph dissimilarity</kwd>
<kwd>divergence</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Interdisciplinary Physics</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>The theory of complex networks has emerged as a powerful tool for studying complex systems. Networks represent interactions between units within a system, with vertices denoting systematic units and edges capturing their interactions [<xref ref-type="bibr" rid="B1">1</xref>]. With the increasing availability of real-world data, researchers have been able to conduct studies across various fields. One crucial aspect in these studies is the identification of community structure, where individuals or entities are organized into distinct groups. This task, commonly referred to as community detection [<xref ref-type="bibr" rid="B2">2</xref>], shares similarities with graph clustering. Although numerous algorithms have been proposed for community detection including clustering algorithms [<xref ref-type="bibr" rid="B3">3</xref>, <xref ref-type="bibr" rid="B4">4</xref>], modularity-based algorithms [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B6">6</xref>], and dynamic algorithms [<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B8">8</xref>], no single algorithm performs well across all types of networks [<xref ref-type="bibr" rid="B9">9</xref>, <xref ref-type="bibr" rid="B10">10</xref>]. Consequently, there is a persistent demand for a general and efficient method for community detection.</p>
<p>From a probabilistic perspective, vertices belonging to the same community are more likely to be connected compared to those in different communities. Therefore, the stochastic block model (SBM) [<xref ref-type="bibr" rid="B11">11</xref>] has been widely employed for community detection. The SBM offers a theoretical framework for studying detection thresholds and developing corresponding algorithms. A notable contribution by Decelle et al. [<xref ref-type="bibr" rid="B12">12</xref>] introduced the concept of a phase transition for community detection at the Kesten&#x2013;Stigum threshold, leading to various investigations into different transition thresholds under varying recovery conditions [<xref ref-type="bibr" rid="B13">13</xref>, <xref ref-type="bibr" rid="B14">14</xref>]. Furthermore, numerous algorithms have been proposed for the SBM, often tailored to specific research questions or the characteristics of the system under study. These algorithms encompass spectral methods [<xref ref-type="bibr" rid="B15">15</xref>, <xref ref-type="bibr" rid="B16">16</xref>], semi-definite programming methods [<xref ref-type="bibr" rid="B17">17</xref>], profile-likelihood maximization [<xref ref-type="bibr" rid="B18">18</xref>], and pseudo-likelihood maximization [<xref ref-type="bibr" rid="B19">19</xref>]. Of particular interest, Peixoto [<xref ref-type="bibr" rid="B20">20</xref>, <xref ref-type="bibr" rid="B21">21</xref>] approached the SBM from a microcanonical perspective, focusing on the number of edges rather than their connection probabilities. This alternative viewpoint offered valuable insights into the SBM.</p>
<p>The standard SBM assumes that vertices within the same community are stochastically equivalent and possess the same expected degree, which does not align with real-world networks where the presence of prominent &#x201c;hubs&#x201d; is widespread. To tackle this limitation, Karrer and Newman [<xref ref-type="bibr" rid="B22">22</xref>] introduced the degree-corrected SBM (DCSBM) by incorporating vertex-specific &#x201c;degree parameters&#x201d; that multiply the edge probability between vertices <italic>i</italic> and <italic>j</italic>. Building upon this concept, numerous studies have focused on utilizing the DCSBM for community detection. Zhao et al. [<xref ref-type="bibr" rid="B23">23</xref>] established a comprehensive theory for assessing the consistency of community detection in the context of the DCSBM. They also compared various community detection criteria applicable to both the SBM and DCSBM. Chen et al. [<xref ref-type="bibr" rid="B24">24</xref>] proposed a method based on convex programming relaxation of modularity maximization and developed a weighted <italic>&#x2113;</italic>
<sub>1</sub>-norm <italic>k</italic>-medoids algorithm within the DCSBM framework. In contrast, Gao et al. [<xref ref-type="bibr" rid="B25">25</xref>] derived the misclassification proportion by evaluating asymptotic minimax risks, which depend on the degree parameter, community size, and connection parameter. It is important to note that all of these algorithms presuppose prior knowledge regarding the number of communities.</p>
<p>In practical scenarios, the only information available to us is the set of vertices and the set of edges, indicating which vertices are connected to each other and which are not. Consequently, determining the appropriate number of communities becomes a challenging task. To the best of our knowledge, existing approaches have primarily focused on the SBM framework. One direction involves initially detecting the optimal community structure for different numbers of communities and then using methods such as minimum description length [<xref ref-type="bibr" rid="B26">26</xref>], the Akaike information criterion [<xref ref-type="bibr" rid="B27">27</xref>], or the Bayesian information criterion [<xref ref-type="bibr" rid="B28">28</xref>] to penalize the model parameters. Another direction involves developing hypothesis tests to determine the number of communities, considering aspects such as asymptotic consistency [<xref ref-type="bibr" rid="B29">29</xref>] or the principal eigenvalue of a normalized adjacency matrix [<xref ref-type="bibr" rid="B30">30</xref>]. However, both of these approaches suffer from certain limitations. They either require considerable time for large networks or may underestimate or overestimate the number of communities.</p>
<p>The goal of this paper is to simultaneously uncover the number of communities and the corresponding structure in heterogeneous networks in an efficient way. To this end, we propose a novel hypothesis test based on graph dissimilarity, which incorporates three distribution functions of the vertex distance, clustering coefficient, and alpha-centrality. The null hypothesis is assuming that the original network is a one-block DCSBM, i.e., the degree-corrected Erd&#xf6;s&#x2013;R&#xe9;nyi graph (DCERG), from which one can estimate the connecting parameter and the degree parameter. Then, we compute the dissimilarity between the original network and the posterior DCERG and use the kernel density estimation (KDE) to formulate the dissimilarity distribution among DCERGs generated by the same parameters. If the hypothesis is rejected, we split the network by the bipartitioning algorithm until each subgraph accepts the hypothesis.</p>
</sec>
<sec id="s2">
<title>2 Hypothesis test</title>
<p>The standard SBM finds its origins in the realms of machine learning and statistics literature. Within theoretical computer science, it is commonly referred to as a planted partition model [<xref ref-type="bibr" rid="B31">31</xref>], and in mathematical contexts, it is acknowledged as an inhomogeneous random graph model [<xref ref-type="bibr" rid="B32">32</xref>]. This probabilistic generative model for random graphs with community structures seamlessly blends the rigidity of a block model with a stochastic component. It stands as a benchmark in the challenging task of recovering community structures from network data.</p>
<p>To introduce the SBM, we begin with an unweighted and undirected graph denoted as <italic>G</italic>, consisting of <italic>N</italic>-labeled vertices that are organized into <italic>K</italic> blocks. The connections among these <italic>K</italic> nodes are represented by the adjacency matrix <bold>
<italic>A</italic>
</bold>, where <italic>a</italic>
<sub>
<italic>ij</italic>
</sub> &#x3d; 1 if there exists an edge between nodes <italic>i</italic> and <italic>j</italic>, and 0 otherwise. It is important to note that self-connections are not allowed, <italic>a</italic>
<sub>
<italic>ii</italic>
</sub> &#x3d; 0. For each node <italic>i</italic>, we assign a label <italic>b</italic>
<sub>
<italic>i</italic>
</sub> to represent its membership in a particular community. Consequently, each vertex <italic>i</italic> &#x2208; [<italic>N</italic>] belongs to a block determined by the prior probability <italic>p</italic>
<sub>
<italic>j</italic>
</sub> with <italic>j</italic> &#x2208; [<italic>K</italic>], and these probabilities satisfy the normalization <inline-formula id="inf1">
<mml:math id="m1">
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>. Additionally, we introduce <bold>
<italic>W</italic>
</bold> as a <italic>K</italic> &#xd7; <italic>K</italic> matrix, where each element <italic>w</italic>
<sub>
<italic>st</italic>
</sub> represents the probability of connectivity between one vertex in block <italic>s</italic> and the other in block <italic>t</italic>. With these definitions, we can now express the conditional expectation of the adjacency matrix <bold>
<italic>A</italic>
</bold> given the block assignments <bold>
<italic>b</italic>
</bold> as follows:<disp-formula id="e1">
<mml:math id="m2">
<mml:mtext>E</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi mathvariant="bold-italic">b</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>When all labels are identical, the model simplifies to the classic Erd&#xf6;s&#x2013;R&#xe9;nyi graph (ERG) [<xref ref-type="bibr" rid="B33">33</xref>], where meaningful reconstruction of communities becomes unfeasible. In the context of real-world networks, the model can be adjusted by maximizing this expectation concerning vertex labels <bold>
<italic>b</italic>
</bold>. The primary objective of the community detection problem is to accurately reconstruct these labels.</p>
<p>In the standard SBM, the connecting probabilities between any two vertices within the same block are uniform. In such a configuration, the emergence of &#x201c;hubs&#x201d; becomes unlikely, and maximizing the log-likelihood function based on it tends to partition the graph into two groups: one consisting of high-degree vertices and the other composed of low-degree vertices. To overcome this limitation, Karrer and Newman [<xref ref-type="bibr" rid="B22">22</xref>] introduced the DCSBM, which replaces Eq. <xref ref-type="disp-formula" rid="e1">1</xref> with the following equation:<disp-formula id="e2">
<mml:math id="m3">
<mml:mtext>E</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">b</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>where <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub> is a degree parameter. In contrast to the SBM, the DCSBM modifies the edge probability between vertices <italic>i</italic> and <italic>j</italic> by multiplying it with the product of <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub>
<italic>&#x3b8;</italic>
<sub>
<italic>j</italic>
</sub>. Notably, the DCSBM simplifies to the standard SBM when <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; 1 is the same for every vertex <italic>i</italic> &#x2208; [<italic>N</italic>]. The value of <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub> plays a crucial role in determining the degree of vertex <italic>i</italic>, enabling flexibility in accommodating arbitrary degree variations within blocks. However, it is essential to ensure that <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub> satisfies specific constraints. In this paper, we impose the constraint that <inline-formula id="inf2">
<mml:math id="m4">
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula> holds true for all blocks.</p>
<p>A challenge encountered in both the SBM and DCSBM is the necessity of prior knowledge regarding the precise number of blocks in the network. However, the use of hypothesis testing offers a potential solution to mitigate this requirement. Essentially, the task of determining whether a DCSBM consists of either <italic>K</italic> or <italic>K</italic> &#x2b; 1 blocks can be viewed as an inductive decision between one block or two. This line of thinking leads us to the formulation of a null hypothesis: the network follows a one-block DCSBM, i.e., the DCERG. The expected adjacency matrix for the DCERG is expressed as follows:<disp-formula id="e3">
<mml:math id="m5">
<mml:mtext>E</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold-italic">A</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold-italic">D</mml:mi>
<mml:mi mathvariant="bold-italic">Z</mml:mi>
<mml:mi mathvariant="bold-italic">D</mml:mi>
<mml:mo>,</mml:mo>
</mml:math>
<label>(3)</label>
</disp-formula>with <bold>
<italic>D</italic>
</bold> &#x3d; diag(<italic>&#x3b8;</italic>
<sub>1</sub>, <italic>&#x3b8;</italic>
<sub>2</sub>, &#x2026; , <italic>&#x3b8;</italic>
<sub>
<italic>N</italic>
</sub>) and <bold>
<italic>Z</italic>
</bold> &#x3d; <italic>Nw</italic>
<bold>
<italic>ee</italic>
</bold>
<sup>T</sup> &#x2212; <italic>w</italic>
<bold>
<italic>I</italic>
</bold>, where <bold>
<italic>e</italic>
</bold> is a vector with <inline-formula id="inf3">
<mml:math id="m6">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:math>
</inline-formula> for <italic>i</italic> &#x2208; [<italic>N</italic>] and <bold>
<italic>I</italic>
</bold> is the identity matrix. Assuming that the graph is generated by the DCERG, we need to estimate <italic>&#x3b8;</italic> and <italic>w</italic>. The former is given by the following equation:<disp-formula id="e4">
<mml:math id="m7">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:math>
<label>(4)</label>
</disp-formula>with <inline-formula id="inf4">
<mml:math id="m8">
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> being the degree of vertex <italic>i</italic>, while the later can be written as follows<disp-formula id="e5">
<mml:math id="m9">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(5)</label>
</disp-formula>with <inline-formula id="inf5">
<mml:math id="m10">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. Now, the problem becomes to distinguish the DCSBM(<italic>N</italic>, <italic>p</italic>, <italic>W</italic>, <italic>&#x3b8;</italic>) and DCERG<inline-formula id="inf6">
<mml:math id="m11">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. If they demonstrate a significant dissimilarity, we reject the null hypothesis and partition the community. This process continues until each subgraph conforms to a DCERG, allowing us to determine the number of communities in the network simultaneously.</p>
<p>In the realm of graph analysis, gauging the structural dissimilarity of large graphs presents a formidable challenge due to the frequently unwieldy computational complexity associated with analysis techniques [<xref ref-type="bibr" rid="B34">34</xref>, <xref ref-type="bibr" rid="B35">35</xref>]. Despite the abundance of literature on this subject, the majority of studies have traditionally focused on examining simple graphs, often overlooking factors such as degree heterogeneity and community structure. To surmount this limitation, Xu et al. [<xref ref-type="bibr" rid="B36">36</xref>] introduced a precise and efficient method for quantifying dissimilarities between graphs, denoted as <italic>G</italic> and <italic>G</italic>&#x2032;. Their approach adopts a perspective rooted in probability distribution functions:<disp-formula id="e6">
<mml:math id="m12">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>D</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="script">J</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="script">J</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mspace width="1em"/>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="script">J</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(6)</label>
</disp-formula>where <italic>&#x3b3;</italic>
<sub>1</sub>, <italic>&#x3b3;</italic>
<sub>2</sub>, and <italic>&#x3b3;</italic>
<sub>3</sub> are positive constants satisfying <italic>&#x3b3;</italic>
<sub>1</sub> &#x2b; <italic>&#x3b3;</italic>
<sub>2</sub> &#x2b; <italic>&#x3b3;</italic>
<sub>3</sub> &#x3d; 1. The values of these three parameters reflect the influence of global (first term), local (second term) features, and heterogeneity (third term) on the dissimilarity measure. <inline-formula id="inf7">
<mml:math id="m13">
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> denotes the average distance distribution, and <italic>n</italic>
<sub>
<italic>ik</italic>
</sub> is the number of vertices at distance <italic>k</italic> from vertex <italic>i</italic>. <inline-formula id="inf8">
<mml:math id="m14">
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> represents the average clustering coefficient distribution, and <italic>&#x3c0;</italic>
<sub>
<italic>c</italic>
</sub> is the clustering coefficient of vertex <italic>i</italic> in an increasing order. <inline-formula id="inf9">
<mml:math id="m15">
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> corresponds to the average centrality distribution, and <italic>&#x3c0;</italic>
<sub>
<italic>&#x3b1;</italic>
</sub> is the <italic>&#x3b1;</italic>-centrality of vertex <italic>i</italic> in an increasing order. <inline-formula id="inf10">
<mml:math id="m16">
<mml:mi mathvariant="script">J</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>ln</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>ln</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the Jensen&#x2013;Shannon divergence. Defined in this way, <italic>D</italic> captures both global and local dissimilarities of the two graphs. Moreover, it is easy to confirm that <inline-formula id="inf11">
<mml:math id="m17">
<mml:mi>D</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="[" close=")">
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>. Looking at Eq. <xref ref-type="disp-formula" rid="e6">6</xref>, it becomes evident that <italic>D</italic> is a random variable. To derive the probability distribution of <italic>D</italic>, we employ the KDE technique [<xref ref-type="bibr" rid="B37">37</xref>]. Given a collection of samples <italic>D</italic>
<sub>1</sub>, <italic>D</italic>
<sub>2</sub>, &#x2026; , <italic>D</italic>
<sub>
<italic>n</italic>
</sub>, the KDE offers a means to estimate the distribution as follows:<disp-formula id="e7">
<mml:math id="m18">
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
</mml:mstyle>
<mml:mi>&#x3ba;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(7)</label>
</disp-formula>where <inline-formula id="inf12">
<mml:math id="m19">
<mml:mi>&#x3ba;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and <italic>&#x3c3;</italic> is the bandwidth parameter to control the smoothness of the estimate. In the present work, we set <italic>&#x3c3;</italic> &#x3d; 0.34. Finally, we can calculate the <italic>p</italic>-value to accept or reject the null hypothesis.</p>
<p>Building upon the aforementioned rationale, we propose a two-stage hypothesis testing algorithm (see <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref>). In the first stage, we employ hypothesis testing to ascertain if the network is a single-community network, specifically a DCERG. The detailed procedure is outlined as follows: i) We begin by assuming that the target network <italic>G</italic> adheres to the DCERG and proceed to estimate its degree parameter <italic>&#x3b8;</italic> and edge parameter <italic>w</italic>. ii) Utilizing the estimated <inline-formula id="inf13">
<mml:math id="m20">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf14">
<mml:math id="m21">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, we generate <italic>n</italic> DCERGs denoted as <italic>G</italic>
<sub>1</sub>, <italic>G</italic>
<sub>2</sub>, &#x2026; , <italic>G</italic>
<sub>
<italic>n</italic>
</sub>. Subsequently, we compute the dissimilarities <italic>D</italic>(<italic>G</italic>, <italic>G</italic>
<sub>
<italic>i</italic>
</sub>) and <italic>D</italic>(<italic>G</italic>
<sub>
<italic>i</italic>
</sub>, <italic>G</italic>
<sub>
<italic>j</italic>
</sub>), where <italic>i</italic> and <italic>j</italic> are distinct and range from 1 to <italic>n</italic>. iii) Employing the KDE, we estimate the dissimilarity distribution <italic>P</italic> between isomorphic single-community networks. iv) We employ the disparity in average dissimilarity between the target network <italic>G</italic> and all generated DCERGs, denoted as <inline-formula id="inf15">
<mml:math id="m22">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, as the test statistic. Subsequently, we utilize the dissimilarity distribution <italic>P</italic> of isomorphic single-community networks as the test distribution in the application of hypothesis testing to ascertain whether <italic>G</italic> constitutes a single-community network.</p>
<p>
<statement content-type="step" id="Algorithm_1">
<label>Algorithm 1</label>
<p>Hypothesis test algorithm.<list list-type="simple">
<list-item>
<p>1:&#x2003; <bold>
<italic>A</italic>
</bold> &#x2190; adjacency matrix of <italic>G</italic>
</p>
</list-item>
<list-item>
<p>2:&#x2003; <inline-formula id="inf16">
<mml:math id="m23">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2190;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>, <inline-formula id="inf17">
<mml:math id="m24">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2190;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>
</p>
</list-item>
<list-item>
<p>3:&#x2003; For <italic>i</italic> &#x3d; 1, 2, &#x2026; , 50</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;<inline-formula id="inf18">
<mml:math id="m25">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2190;</mml:mo>
<mml:mtext>DCERG</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;<inline-formula id="inf19">
<mml:math id="m26">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mn>50</mml:mn>
</mml:math>
</inline-formula>
</p>
</list-item>
<list-item>
<p>4:&#x2003; For all <italic>i</italic> &#x2260; <italic>j</italic>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;<italic>D</italic>
<sub>
<italic>ij</italic>
</sub> &#x2190; <italic>D</italic>(<italic>G</italic>
<sub>
<italic>i</italic>
</sub>, <italic>G</italic>
<sub>
<italic>j</italic>
</sub>)</p>
</list-item>
<list-item>
<p>5:&#x2003; <inline-formula id="inf20">
<mml:math id="m27">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>DCERG</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2190;</mml:mo>
<mml:mtext>KDE</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
</list-item>
<list-item>
<p>6:&#x2003; pval <inline-formula id="inf21">
<mml:math id="m28">
<mml:mo>&#x2190;</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
</list-item>
<list-item>
<p>7:&#x2003; If pval <inline-formula id="inf22">
<mml:math id="m29">
<mml:mo>&#x3c;</mml:mo>
</mml:math>
</inline-formula> significant level <italic>&#x3b1;</italic>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;i) For each edge <italic>e</italic>
<sub>
<italic>ij</italic>
</sub>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;compute the edge betweenness <italic>B</italic>
<sub>
<italic>ij</italic>
</sub>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;compute the edge clustering coefficient <italic>C</italic>
<sub>
<italic>ij</italic>
</sub>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;<italic>L</italic>
<sub>
<italic>ij</italic>
</sub> &#x2190; <italic>&#x3b2;</italic>
<sub>1</sub>
<italic>B</italic>
<sub>
<italic>ij</italic>
</sub> &#x2212; <italic>&#x3b2;</italic>
<sub>2</sub>
<italic>C</italic>
<sub>
<italic>ij</italic>
</sub>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;remove edge <italic>e</italic>
<sub>
<italic>ij</italic>
</sub> with <italic>L</italic> &#x3d; max(<italic>L</italic>
<sub>
<italic>ij</italic>
</sub>)</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;ii) If the graph is connected</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;go back to i)</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;Else</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;Output <italic>G</italic>&#x2032;, <italic>G</italic>&#x2033;</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;End if</p>
</list-item>
<list-item>
<p>Else</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;Output <italic>G</italic>
</p>
</list-item>
<list-item>
<p>&#x2003;&#x2003;&#x2003;&#x2003;&#x2003;End if</p>
</list-item>
</list>
</p>
</statement>
</p>
<p>If the null hypothesis is rejected, the algorithm progresses to the second stage, where the original target network undergoes division into two distinct networks. In this phase, we enhance the Newman&#x2013;Girvan algorithm [<xref ref-type="bibr" rid="B18">18</xref>] by taking into consideration the significance of edges with regards to network connectivity and the local clustering of edges within the network. This approach simultaneously incorporates both global and local information, thereby augmenting the precision of community detection. The specific procedure is delineated as follows: i) computing the edge betweenness <italic>B</italic>
<sub>
<italic>ij</italic>
</sub> and the edge clustering coefficient <italic>C</italic>
<sub>
<italic>ij</italic>
</sub> for the target network <italic>G</italic>; ii) defining <italic>L</italic>
<sub>
<italic>ij</italic>
</sub> &#x3d; <italic>&#x3b2;</italic>
<sub>1</sub>
<italic>B</italic>
<sub>
<italic>ij</italic>
</sub> &#x2212; <italic>&#x3b2;</italic>
<sub>2</sub>
<italic>C</italic>
<sub>
<italic>ij</italic>
</sub>, where the values of <italic>&#x3b2;</italic>
<sub>1</sub> and <italic>&#x3b2;</italic>
<sub>2</sub> represent the balance between connectivity and clustering, and eliminating edge <italic>E</italic>
<sub>
<italic>ij</italic>
</sub> associated with the maximum <italic>L</italic>
<sub>
<italic>ij</italic>
</sub>; and iii) cycling back to step i) and iterating until the network is no longer connected. Consequently, the binary partitioning algorithm yields the original <italic>G</italic> as two separate networks, denoted as <italic>G</italic>&#x2032; and <italic>G</italic>&#x2033;. However, following the split, these two networks may not necessarily conform to the single-community assumption. Consequently, the testing algorithm (Stage I) and partitioning algorithm (Stage II) are iteratively applied until each network ultimately embodies a single-community network.</p>
<p>The computational efficiency of our algorithm is characterized by its polynomial time. Initially, the generation of an <italic>N</italic>-node DCERG incurs a cost of <italic>O</italic>(<italic>N</italic>). The computation of dissimilarity relies on the shortest path, a process that can be efficiently implemented in <italic>O</italic>(<italic>M</italic> &#x2b; <italic>NlogN</italic>) through the use of Fibonacci heaps. In the present work, we generate <inline-formula id="inf23">
<mml:math id="m30">
<mml:msubsup>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>50</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1225</mml:mn>
</mml:math>
</inline-formula> dissimilarities. Generalizing to any value <italic>m</italic>, the generation of <inline-formula id="inf24">
<mml:math id="m31">
<mml:msubsup>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> dissimilarities results in a time complexity of <italic>O</italic>(<italic>m</italic>
<sup>2</sup>), influencing the overall cost of the KDE. During the edge removal of partitioning, the time complexity associated with edge betweenness is <italic>O</italic>(<italic>NM</italic>), while the time complexity of edge clustering is <italic>O</italic>(<italic>N</italic> &#x2b; <italic>M</italic>). Finally, for a <italic>K</italic>-communities network, the algorithm iterates <italic>k</italic> &#x2212; 1 times, contributing to the overall efficiency of the approach.</p>
</sec>
<sec id="s3">
<title>3 Application to block models</title>
<p>To assess the effectiveness of our algorithm, we initiate testing on the balanced DCSBM, where each block is of identical size. In particular, we fixed the parameters at <italic>N</italic> &#x3d; 1000, <italic>K</italic> &#x3d; 2, and <italic>w</italic>
<sub>11</sub> &#x3d; <italic>w</italic>
<sub>22</sub> &#x3d; 0.2. The degree parameters, denoted as <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub>, are drawn from an adjusted normal distribution characterized by <inline-formula id="inf25">
<mml:math id="m32">
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mtext>Normal</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0,0.25</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>, which exhibits a right-skewed profile. It should be noted that we also explore alternative distributions, although those results are not presented here. To maintain generality, we set the mean of this distribution to E(<italic>&#x3b8;</italic>) &#x3d; 1. The process of generating the graph aligns with a straightforward implementation of the block model. It involves (i) drawing a Poisson-distributed number of edges between each pair of blocks 1 and 2 with <italic>w</italic>
<sub>12</sub> &#x3d; <italic>w</italic>
<sub>21</sub> (or <italic>w</italic>
<sub>11</sub>/2 &#x3d; <italic>w</italic>
<sub>22</sub>/2 for intra-block connections and (ii) probabilistically assigning each end of an edge to a vertex within the respective block, guided by the parameter <italic>&#x3b8;</italic>
<sub>
<italic>i</italic>
</sub>.</p>
<p>To explore different levels of community structure within the generated networks, we systematically increased the value of <italic>w</italic>
<sub>12</sub>(&#x3d;<italic>w</italic>
<sub>21</sub>) from 0.02 to 0.2 in increments of 0.02. We calculate the error bars on <italic>p</italic>-values based on the outcomes of 100 random runs. Essentially, a larger <italic>p</italic>-value suggests that the hypothesis test perceives the graph as being closer to an ERG. As illustrated in <xref ref-type="fig" rid="F1">Figure 1A</xref>, we observed an increasing trend in the <italic>p</italic>-value as <italic>w</italic>
<sub>12</sub> increases, indicating a diminishing block structure in the network. <xref ref-type="fig" rid="F1">Figure 1B</xref> provides a visual representation of the adjacency matrix for the case of <italic>w</italic>
<sub>12</sub> &#x3d; 0.02. In this representation, rows and columns are ordered based on the underlying community structure. Importantly, the block structure detected by our algorithm closely aligns with the intended model settings.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Simulation results of the hypothesis test algorithm for the balanced two-block DCSBM: <italic>p</italic>-value as a function of the connecting parameter <italic>w</italic>
<sub>12</sub> <bold>(A)</bold> and the illustration of the adjacency matrix for <italic>w</italic>
<sub>12</sub> &#x3d; 0.02 <bold>(B)</bold>. Dashed line corresponds to the significant level <italic>&#x3b1;</italic> &#x3d; 0.05.</p>
</caption>
<graphic xlink:href="fphy-11-1251319-g001.tif"/>
</fig>
<p>We proceed by applying our algorithm to the DCSBM with unbalanced blocks. Specifically, we examine the scenario where the two blocks have different sizes, denoted as <italic>n</italic>
<sub>1</sub> and <italic>n</italic>
<sub>2</sub>, respectively. To investigate the impact of community size, we set <italic>w</italic>
<sub>12</sub> &#x3d; <italic>w</italic>
<sub>21</sub> &#x3d; 0.02 and <italic>w</italic>
<sub>11</sub> &#x3d; <italic>w</italic>
<sub>22</sub> &#x3d; 0.2.</p>
<p>
<xref ref-type="fig" rid="F2">Figure 2A</xref> illustrates the behavior of the <italic>p</italic>-value as <italic>n</italic>
<sub>1</sub> increases from 50 to 100. Notably, the <italic>p</italic>-value consistently decreases with the growth of <italic>n</italic>
<sub>1</sub>. This trend is straightforward to comprehend as the detection of the planted block becomes increasingly easier with a larger <italic>n</italic>
<sub>1</sub>. In fact, the DCSBM demonstrates a clear block structure when <italic>n</italic>
<sub>1</sub> &#x2265; 77. In contrast, in <xref ref-type="fig" rid="F2">Figure 2B</xref>, we set <italic>n</italic>
<sub>1</sub> &#x3d; 100 and plot the <italic>p</italic>-value against the varying values of <italic>w</italic>
<sub>12</sub>. Here, an interesting observation emerges: the <italic>p</italic>-value displays a consistent rise with an increase in <italic>w</italic>
<sub>12</sub>. This outcome aligns with expectations since the graph gradually loses its block structure, particularly noticeable when <italic>w</italic>
<sub>12</sub> &#x2265; 0.068.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>
<italic>p</italic>-value as a function of <italic>n</italic>
<sub>1</sub> <bold>(A)</bold> and <italic>w</italic>
<sub>12</sub> <bold>(B)</bold> for the unbalanced two-block DCSBM. The dashed lines correspond to the significant level <italic>&#x3b1;</italic> &#x3d; 0.05.</p>
</caption>
<graphic xlink:href="fphy-11-1251319-g002.tif"/>
</fig>
</sec>
<sec id="s4">
<title>4 Application to empirical networks</title>
<p>We now apply our algorithm to real-world networks. The first example we consider is the network of a karate club at an American university. This network consists of 34 nodes, and the relationships between these nodes were recorded by Zachary [<xref ref-type="bibr" rid="B38">38</xref>] over a span of 2&#xa0;years. Due to a disagreement between an instructor (node 0) and an administrator (node 33) regarding class fees, the club ultimately split into two distinct groups. The knowledge of the members within each group makes the karate club network an ideal benchmark for studying community detection.</p>
<p>Upon applying our algorithm to this network, the results obtained are shown in <xref ref-type="fig" rid="F3">Figure 3A</xref>. In the figure, solid circles and squares represent clusters corresponding to instructors and administrators, respectively. Overall, our algorithm successfully splits the vertices in accordance with the known communities, aside from a misclassification of two vertices (nodes 8 and 9) located on the boundary between the two groups. Furthermore, <xref ref-type="fig" rid="F3">Figure 3B</xref> presents a density image of the adjacency matrix, serving as additional confirmation of the block structure within the network.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Performance of the hypothesis test algorithm for the karate club: the illustration of the community division <bold>(A)</bold> and the density plot for the network <bold>(B)</bold>.</p>
</caption>
<graphic xlink:href="fphy-11-1251319-g003.tif"/>
</fig>
<p>As a second real-world example, we turn our attention to the American college football network [<xref ref-type="bibr" rid="B39">39</xref>]. This network is comprised of teams within a league, with each node representing an individual team. Nodes are connected if the corresponding teams played against each other during a specific season. Specifically, our dataset focuses on the 2000 season of the American College Football Division 1-A and includes a total of 115 teams. These teams are organized into 12 conferences, and it is worth noting that games are more commonly played between members of the same conference rather than between teams from different conferences, resulting in a recognizable community structure.</p>
<p>In <xref ref-type="fig" rid="F4">Figure 4A</xref>, we present the community structure obtained through the application of our algorithm to this network. This analysis reveals that the majority of teams have been accurately grouped with other teams from their respective conferences. However, there are a few independent teams that have been assigned to conferences with which they share the closest associations, demonstrating a high level of agreement between the algorithm&#x2019;s results and the ground truth community structure. Furthermore, <xref ref-type="fig" rid="F4">Figure 4B</xref> displays a density plot of the adjacency matrix, providing further clarity on this phenomenon.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Community division <bold>(A)</bold> and density matrix <bold>(B)</bold> for the American college football network.</p>
</caption>
<graphic xlink:href="fphy-11-1251319-g004.tif"/>
</fig>
<p>To quantitatively compare the results of our algorithm to the ground truth and those of the state-of-the-art methods, we introduce the following two measures: the adjusted Rand index <italic>S</italic>
<sub>AR</sub> and <italic>F</italic>
<sub>1</sub> score. Given two kinds of classifications <italic>P</italic>
<sub>
<italic>a</italic>
</sub> and <italic>P</italic>
<sub>
<italic>b</italic>
</sub>, we denote the count of node pairs that classified together in both partitions by <italic>q</italic>
<sub>11</sub>, classified together in <italic>P</italic>
<sub>
<italic>a</italic>
</sub> but different in <italic>P</italic>
<sub>
<italic>b</italic>
</sub> by <italic>q</italic>
<sub>10</sub>, different in <italic>P</italic>
<sub>
<italic>a</italic>
</sub> but classified together in <italic>P</italic>
<sub>
<italic>b</italic>
</sub> by <italic>q</italic>
<sub>01</sub>, and different in both by <italic>q</italic>
<sub>00</sub>. It is worth noting that <inline-formula id="inf26">
<mml:math id="m33">
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>01</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>00</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>M</mml:mi>
</mml:math>
</inline-formula>, and the adjusted Rand index is defined as follows <xref ref-type="bibr" rid="B40">[40]:</xref>
<disp-formula id="e8">
<mml:math id="m34">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>AR</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>01</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>01</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>01</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>Another measure comparing <italic>P</italic>
<sub>
<italic>a</italic>
</sub> and <italic>P</italic>
<sub>
<italic>b</italic>
</sub> is <italic>F</italic>
<sub>1</sub> score, defined as follows [<xref ref-type="bibr" rid="B41">41</xref>]:<disp-formula id="e9">
<mml:math id="m35">
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mtext>precision</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mtext>recall</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mtext>precision</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mtext>recall</mml:mtext>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:math>
<label>(9)</label>
</disp-formula>with precision(<italic>P</italic>
<sub>
<italic>a</italic>
</sub>, <italic>P</italic>
<sub>
<italic>b</italic>
</sub>) &#x3d; &#x7c;<italic>P</italic>
<sub>
<italic>a</italic>
</sub> &#x2229; <italic>P</italic>
<sub>
<italic>b</italic>
</sub>&#x7c;/&#x7c;<italic>P</italic>
<sub>
<italic>b</italic>
</sub>&#x7c; and recall(<italic>P</italic>
<sub>
<italic>a</italic>
</sub>, <italic>P</italic>
<sub>
<italic>b</italic>
</sub>) &#x3d; &#x7c;<italic>P</italic>
<sub>
<italic>a</italic>
</sub> &#x2229; <italic>P</italic>
<sub>
<italic>b</italic>
</sub>&#x7c;/&#x7c;<italic>P</italic>
<sub>
<italic>a</italic>
</sub>&#x7c;.</p>
<p>As depicted in <xref ref-type="table" rid="T1">Table 1</xref>, our method outperforms state-of-the-art approaches in identifying communities for both real networks. Specifically, we successfully identified two communities in the karate club network and 11 communities in the football network, surpassing the results obtained by other methods. Notably, our approach yielded the highest values for <italic>S</italic>
<sub>AR</sub> and <italic>F</italic>
<sub>1</sub>, indicating a superior alignment with the ground truth communities.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Comparison of the results of the hypothesis test algorithm to the ground truth and those of the state-of-the-art algorithms.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center"/>
<th colspan="3" align="center">Karate club</th>
<th colspan="3" align="center">College football</th>
</tr>
<tr>
<th align="left"/>
<th align="center">Communities</th>
<th align="center">
<italic>S</italic>
<sub>AR</sub>
</th>
<th align="center">
<italic>F</italic>
<sub>1</sub>
</th>
<th align="center">Communities</th>
<th align="center">
<italic>S</italic>
<sub>AR</sub>
</th>
<th align="center">
<italic>F</italic>
<sub>1</sub>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">Hypothesis test</td>
<td align="center">2</td>
<td align="center">0.7717</td>
<td align="center">0.9410</td>
<td align="center">11</td>
<td align="center">0.8927</td>
<td align="center">0.8697</td>
</tr>
<tr>
<td align="center">Motif-based k-means</td>
<td align="center">2</td>
<td align="center">0.6682</td>
<td align="center">0.9117</td>
<td align="center">10</td>
<td align="center">0.7939</td>
<td align="center">0.8120</td>
</tr>
<tr>
<td align="center">Modularity-based</td>
<td align="center">3</td>
<td align="center">0.5684</td>
<td align="center">0.5189</td>
<td align="center">6</td>
<td align="center">0.4741</td>
<td align="center">0.3711</td>
</tr>
<tr>
<td align="center">Louvain</td>
<td align="center">4</td>
<td align="center">0.4646</td>
<td align="center">0.3033</td>
<td align="center">10</td>
<td align="center">0.8035</td>
<td align="center">0.6961</td>
</tr>
<tr>
<td align="center">Infomap</td>
<td align="center">3</td>
<td align="center">0.5906</td>
<td align="center">0.5666</td>
<td align="center">10</td>
<td align="center">0.8165</td>
<td align="center">0.6940</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec sec-type="conclusion" id="s5">
<title>5 Conclusion</title>
<p>As a prominent model for the analysis of structural data, the SBM and its variants, particularly the DCSBM, have garnered significant attention in the realm of community detection within networks [<xref ref-type="bibr" rid="B42">42</xref>]. The DCSBM, in particular, stands out for its efficacy in handling networks characterized by a highly skewed degree distribution. In this paper, we introduced a novel hypothesis test designed for community detection in complex networks, making dual contributions in terms of both model and algorithm. On the modeling front, we introduced a graph dissimilarity measure that incorporates the vertex distance distribution, clustering coefficient distribution, and alpha-centrality distribution. Utilizing this dissimilarity measure between the DCSBM and DCERG, we proposed a hypothesis testing statistic. In the algorithmic domain, we devised a two-stage algorithm. Initially, we determined whether the original network adhered to the DCERG. If not, we iteratively bipartitioned it until each subgraph conformed to the DCERG. A new criterion for bipartitioning was introduced, integrating edge betweenness and edge clustering coefficient. We applied the algorithm to both synthetic and real networks. Overall, the proposed method marks a significant advancement over existing state-of-the-art approaches. It demonstrates feasibility in detecting communities within networks characterized by broad degree distributions, even when the actual number of communities is unknown.</p>
<p>There are several promising directions for future research in this field. One key area involves exploring alternative approaches to measuring graph dissimilarity, as it remains an open problem. Particularly, for networks with higher-order architecture, it would be beneficial to consider measures that go beyond pairwise interactions to enhance the model&#x2019;s capacity [<xref ref-type="bibr" rid="B43">43</xref>]. Additionally, while the Gaussian distribution is commonly chosen for the kernel density distribution, it may be valuable to explore other distributions, like the widely used Epanechnikov distribution in financial data analysis, to cater to specific interests. In terms of computational complexity, a crucial avenue would involve determining the theoretical distribution for dissimilarity, ultimately contributing to significant reductions in computational overhead. Furthermore, the proposed framework can be extended to incorporate more sophisticated block models, such as exponential [<xref ref-type="bibr" rid="B44">44</xref>], multilevel [<xref ref-type="bibr" rid="B45">45</xref>], and dynamic [<xref ref-type="bibr" rid="B46">46</xref>] models, offering additional benefits and insights.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author contributions</title>
<p>X-JX and JM conceived and designed the study. CC developed the code and performed the simulations. X-JX and JM interpreted the results and wrote the manuscript. All authors contributed to the article and approved the submitted version.</p>
</sec>
<ack>
<p>X-JX and CC acknowledge financial support from NSFC 12071281 and STCSM 22JC1401401. JM acknowledges financial support from the project I3N and FCT/MEC UIDB/50025/2020 and UIDP/50025/2020.</p>
</ack>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Barab&#xe1;si</surname>
<given-names>A-L</given-names>
</name>
</person-group>. <source>Network science</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name> (<year>2015</year>).</citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fortunato</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Community detection in graphs</article-title>. <source>Phys Rep</source> (<year>2010</year>) <volume>486</volume>:<fpage>75</fpage>&#x2013;<lpage>174</lpage>. <pub-id pub-id-type="doi">10.1016/j.physrep.2009.11.002</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Maqbool</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Babri</surname>
<given-names>HA</given-names>
</name>
</person-group>. <article-title>The weighted combined algorithm: a linkage algorithm for software clustering</article-title>. In: <source>Proceedings of the 8th European conference on software maintenance and reengineering</source> (<year>2004</year>). p. <fpage>15</fpage>&#x2013;<lpage>24</lpage>.</citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Newman</surname>
<given-names>MEJ</given-names>
</name>
</person-group>. <article-title>Modularity and community structure in networks</article-title>. <source>Proc Natl Acad Sci</source> (<year>2006</year>) <volume>103</volume>(<issue>23</issue>):<fpage>8577</fpage>&#x2013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0601602103</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clauset</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Newman</surname>
<given-names>MEJ</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Finding community structure in very large networks</article-title>. <source>Phys Rev E</source> (<year>2004</year>) <volume>70</volume>:<fpage>066111</fpage>. <pub-id pub-id-type="doi">10.1103/physreve.70.066111</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blondel</surname>
<given-names>VD</given-names>
</name>
<name>
<surname>Guillaume</surname>
<given-names>J-L</given-names>
</name>
<name>
<surname>Lambiotte</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lefebvre</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Fast unfolding of communities in large networks</article-title>. <source>J Stat Mech</source> (<year>2008</year>) <volume>2008</volume>:<fpage>P10008</fpage>. <pub-id pub-id-type="doi">10.1088/1742-5468/2008/10/p10008</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Palla</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Der&#xe9;nyi</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Farkas</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Vicsek</surname>
<given-names>T</given-names>
</name>
</person-group>. <article-title>Uncovering the overlapping community structure of complex networks in nature and society</article-title>. <source>Nature</source> (<year>2005</year>) <volume>435</volume>:<fpage>814</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1038/nature03607</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosvall</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bergstrom</surname>
<given-names>CT</given-names>
</name>
</person-group>. <article-title>Maps of random walks on complex networks reveal community structure</article-title>. <source>Proc Natl Acad Sci</source> (<year>2008</year>) <volume>105</volume>(<issue>4</issue>):<fpage>1118</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0706851105</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hric</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Darst</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Fortunato</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Community detection in networks: structural communities versus ground truth</article-title>. <source>Phys Rev E</source> (<year>2014</year>) <volume>90</volume>:<fpage>062805</fpage>. <pub-id pub-id-type="doi">10.1103/physreve.90.062805</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Algesheimer</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tessone</surname>
<given-names>CJ</given-names>
</name>
</person-group>. <article-title>A comparative analysis of community detection algorithms on artificial networks</article-title>. <source>Sci Rep</source> (<year>2016</year>) <volume>6</volume>:<fpage>30750</fpage>. <pub-id pub-id-type="doi">10.1038/srep30750</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holland</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Laskey</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Leinhardt</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Stochastic blockmodels: first steps</article-title>. <source>Soc Netw</source> (<year>1983</year>) <volume>5</volume>:<fpage>109</fpage>&#x2013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.1016/0378-8733(83)90021-7</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Decelle</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Krzakala</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Zdeborov&#xe1;</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications</article-title>. <source>Phys Rev E</source> (<year>2011</year>) <volume>84</volume>:<fpage>066106</fpage>. <pub-id pub-id-type="doi">10.1103/physreve.84.066106</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Abbe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sandon</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery</article-title>. In: <source>Proceedings of the 56th annual symposium on foundations of computer science</source> (<year>2015</year>). p. <fpage>670</fpage>&#x2013;<lpage>88</lpage>.</citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abbe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Bandeira</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>G</given-names>
</name>
</person-group>. <article-title>Exact recovery in the stochastic block model</article-title>. <source>IEEE Trans Inform Theor</source> (<year>2016</year>) <volume>62</volume>(<issue>1</issue>):<fpage>471</fpage>&#x2013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1109/tit.2015.2490670</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohe</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chatterjee</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>B</given-names>
</name>
</person-group>. <article-title>Spectral clustering and the high-dimensional stochastic block model</article-title>. <source>Ann Statist</source> (<year>2011</year>) <volume>39</volume>(<issue>4</issue>):<fpage>1878</fpage>&#x2013;<lpage>915</lpage>. <pub-id pub-id-type="doi">10.1214/11-aos887</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sarkar</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bickel</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Role of normalization in spectral clustering for stochastic blockmodels</article-title>. <source>Ann Statist</source> (<year>2015</year>) <volume>43</volume>(<issue>3</issue>):<fpage>962</fpage>&#x2013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1214/14-aos1285</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu&#xe9;don</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Vershynin</surname>
<given-names>R</given-names>
</name>
</person-group>. <article-title>Community detection in sparse networks via grothendieck&#x2019;s inequality</article-title>. <source>Probab Theor Relat Fields</source> (<year>2016</year>) <volume>165</volume>:<fpage>1025</fpage>&#x2013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1007/s00440-015-0659-z</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bickel</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>A nonparametric view of network models and Newman-Girvan and other modularities</article-title>. <source>Proc Natl Acad Sci</source> (<year>2009</year>) <volume>106</volume>(<issue>50</issue>):<fpage>21068</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0907096106</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amini</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bickel</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Levina</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Pseudo likelihood methods for community detection in large sparse networks</article-title>. <source>Ann Statist</source> (<year>2013</year>) <volume>41</volume>(<issue>4</issue>):<fpage>2097</fpage>&#x2013;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.1214/13-aos1138</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peixoto</surname>
<given-names>TP</given-names>
</name>
</person-group>. <article-title>Parsimonious module inference in large networks</article-title>. <source>Phys Rev Lett</source> (<year>2012</year>) <volume>110</volume>:<fpage>148701</fpage>. <pub-id pub-id-type="doi">10.1103/physrevlett.110.148701</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peixoto</surname>
<given-names>TP</given-names>
</name>
</person-group>. <article-title>Model selection and hypothesis testing for large-scale network models with overlapping groups</article-title>. <source>Phys Rev X</source> (<year>2015</year>) <volume>5</volume>:<fpage>011033</fpage>. <pub-id pub-id-type="doi">10.1103/physrevx.5.011033</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karrer</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Newman</surname>
<given-names>MEJ</given-names>
</name>
</person-group>. <article-title>Stochastic blockmodels and community structure in networks</article-title>. <source>Phys Rev E</source> (<year>2011</year>) <volume>83</volume>:<fpage>016107</fpage>. <pub-id pub-id-type="doi">10.1103/physreve.83.016107</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Levina</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Consistency of community detection in networks under degree-corrected stochastic block models</article-title>. <source>Ann Statist</source> (<year>2012</year>) <volume>40</volume>(<issue>4</issue>):<fpage>2266</fpage>&#x2013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1214/12-aos1036</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Convexified modularity maximization for degree-corrected stochastic block models</article-title>. <source>Ann Statist</source> (<year>2018</year>) <volume>46</volume>(<issue>4</issue>):<fpage>1573</fpage>&#x2013;<lpage>602</lpage>. <pub-id pub-id-type="doi">10.1214/17-aos1595</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>AY</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>HH</given-names>
</name>
</person-group>. <article-title>Community detection in degree-corrected block models</article-title>. <source>Ann Statist</source> (<year>2018</year>) <volume>46</volume>(<issue>5</issue>):<fpage>2153</fpage>&#x2013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1214/17-aos1615</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosvall</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bergstrom</surname>
<given-names>CT</given-names>
</name>
</person-group>. <article-title>An information-theoretic framework for resolving community structure in complex networks</article-title>. <source>Proc Natl Acad Sci</source> (<year>2007</year>) <volume>104</volume>(<issue>18</issue>):<fpage>7327</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0611034104</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Burnham</surname>
<given-names>KP</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>DR</given-names>
</name>
</person-group>. <source>Model selection and multi-model inference: a practical information-theoric approach</source>. <publisher-loc>Colorado</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name> (<year>2004</year>).</citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Handcock</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Raftery</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Tantrum</surname>
<given-names>JM</given-names>
</name>
</person-group>. <article-title>Model-based clustering for social networks</article-title>. <source>J Roy Stat Soc A</source> (<year>2007</year>) <volume>170</volume>:<fpage>301</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-985x.2007.00471.x</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Levina</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Community extraction for social networks</article-title>. <source>Proc Natl Acad Sci</source> (<year>2011</year>) <volume>108</volume>(<issue>18</issue>):<fpage>7321</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1006642108</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bickel</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Sarkar</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Hypothesis testing for automated community detection in networks</article-title>. <source>J Roy Stat Soc B</source> (<year>2016</year>) <volume>78</volume>:<fpage>253</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1111/rssb.12117</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bui</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Chaudhuri</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Leighton</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Sipser</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Graph bisection algorithms with good average case behavior</article-title>. <source>Combinatorica</source> (<year>1987</year>) <volume>7</volume>:<fpage>171</fpage>&#x2013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1007/bf02579448</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Characterization of expansion-related properties of modular graphs</article-title>. <source>Disc Appl Math</source> (<year>2023</year>) <volume>338</volume>:<fpage>135</fpage>&#x2013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1016/j.dam.2023.06.002</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Erd&#xf6;s</surname>
<given-names>P</given-names>
</name>
<name>
<surname>R&#xe9;nyi</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>On the evolution of random graphs</article-title>. <source>Publ Math Inst Hung Acad</source> (<year>1960</year>) <volume>5</volume>:<fpage>17</fpage>&#x2013;<lpage>61</lpage>.</citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Emmert-Streib</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Dehmer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Fifty years of graph matching, network alignment and network comparison</article-title>. <source>Inform Sci</source> (<year>2016</year>) <volume>346&#x2013;347</volume>:<fpage>180</fpage>&#x2013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2016.01.074</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schieber</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Carpi</surname>
<given-names>L</given-names>
</name>
<name>
<surname>D&#xed;az-Guilera</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pardalos</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Masoller</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ravetti</surname>
<given-names>MG</given-names>
</name>
</person-group>. <article-title>Quantification of network structural dissimilarities</article-title>. <source>Nat Commun</source> (<year>2017</year>) <volume>8</volume>:<fpage>13928</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms13928</pub-id>
</citation>
</ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>X-J</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Mendes</surname>
<given-names>JFF</given-names>
</name>
</person-group>. <article-title>Quantifying dissimilarities between heterogeneous networks with community structure</article-title>. <source>Physica A</source> (<year>2022</year>) <volume>588</volume>:<fpage>126574</fpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2021.126574</pub-id>
</citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parzen</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>On estimation of a probability density function and mode</article-title>. <source>Ann Math Stat</source> (<year>1962</year>) <volume>33</volume>:<fpage>1065</fpage>&#x2013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177704472</pub-id>
</citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zachary</surname>
<given-names>W</given-names>
</name>
</person-group>. <article-title>An information flow model for conflict and fission in small groups</article-title>. <source>J Anthropol Res</source> (<year>1977</year>) <volume>33</volume>:<fpage>452</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1086/jar.33.4.3629752</pub-id>
</citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Newman</surname>
<given-names>MEJ</given-names>
</name>
</person-group>. <article-title>Communities, modules and large-scale structure in networks</article-title>. <source>Nat Phys</source> (<year>2012</year>) <volume>8</volume>:<fpage>25</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1038/nphys2162</pub-id>
</citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Vinh</surname>
<given-names>NX</given-names>
</name>
<name>
<surname>Epps</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Information theoretic measures for clusterings comparison: is a correction for chance necessary?</article-title> In: <source>Proceedings of the 26th annual international conference on machine learning</source> (<year>2010</year>). p. <fpage>1073</fpage>&#x2013;<lpage>80</lpage>.</citation>
</ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Larsen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Aone</surname>
<given-names>C</given-names>
</name>
</person-group>. <article-title>Fast and effective text mining using linear-time document clustering</article-title>. In: <source>Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining</source> (<year>1999</year>). p. <fpage>16</fpage>&#x2013;<lpage>22</lpage>.</citation>
</ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Snijders</surname>
<given-names>TAB</given-names>
</name>
<name>
<surname>Nowicki</surname>
<given-names>K</given-names>
</name>
</person-group>. <article-title>Estimation and prediction for stochastic blockmodels for graphs with latent block structure</article-title>. <source>J Classification</source> (<year>1997</year>) <volume>14</volume>(<issue>1</issue>):<fpage>75</fpage>&#x2013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1007/s003579900004</pub-id>
</citation>
</ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lacasa</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stramaglia</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Marinazzo</surname>
<given-names>D</given-names>
</name>
</person-group>. <article-title>Beyond pairwise network similarity: exploring mediation and suppression between networks</article-title>. <source>Commun Phys</source> (<year>2021</year>) <volume>4</volume>:<fpage>136</fpage>. <pub-id pub-id-type="doi">10.1038/s42005-021-00638-9</pub-id>
</citation>
</ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schweinberger</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Consistent structure estimation of exponential-family random graph models with block structure</article-title>. <source>Bernoulli</source> (<year>2020</year>) <volume>26</volume>(<issue>2</issue>):<fpage>1205</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.3150/19-bej1153</pub-id>
</citation>
</ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chabert-Liddell</surname>
<given-names>S-C</given-names>
</name>
<name>
<surname>Barbillon</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Donnet</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lazega</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>A stochastic block model approach for the analysis of multilevel networks: an application to the sociology of organizations</article-title>. <source>Comput Statist Data Anal</source> (<year>2021</year>) <volume>158</volume>:<fpage>107179</fpage>. <pub-id pub-id-type="doi">10.1016/j.csda.2021.107179</pub-id>
</citation>
</ref>
<ref id="B46">
<label>46.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bartolucci</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pandolfi</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>An exact algorithm for time-dependent variational inference for the dynamic stochastic block model</article-title>. <source>Pattern Recognit Lett</source> (<year>2020</year>) <volume>138</volume>:<fpage>362</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2020.07.014</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>