Abstract
In this study, we simulate the degree and betweenness node attack over a large set of 200 real-world networks from different areas of science. We perform an initial node attack approach, where the node centrality rank is computed at the beginning of the simulation, and it is not updated along the node removal process. We quantify the network damage by tracing the largest connected component () and evaluate the network robustness with the “percolation threshold ,” i.e., the fraction of nodes removed, for which the size of the is quasi-zero. We correlate with 20 network structural indicators (NSIs) from the literature using single linear regression (SLR), multiple linear regression (MLR) models, and the Pearson correlation coefficient test. The NSIs cover most of the essential structural features proposed in network science to describe real-world networks. We find that the Estrada heterogeneity () index, evaluating the degree difference of connected nodes, best predicts . The index measures the network node degree heterogeneity based on the difference of functions of node degrees for all pairs of linked nodes. We find that the value decreases as a function of the index, unveiling that heterogeneous real-world networks with a higher variance in the degree of connected nodes are more vulnerable to node attacks.
1 Introduction
Networks can model many real-world complex systems, where nodes (vertices) represent the constituent components and links (edges) describe the relationships among the node components [1, 2]. A paramount issue in complex network science is to determine the robustness of the overall system to the failure or attack of its nodes [3–10]. On the other hand, the robustness in complex networks is a problem closely related to understanding which kind of node removal (attack) strategy is the most effective in damaging the network [3, 11–14]. The node attack may model different real-world problems of high interest, such as the nodes/species extinction in ecological networks [15–17], the aging of nodes/chromophores in the photosynthetic network [18], the vaccination of nodes/individuals in social networks [19–22], or the malfunctioning of nodes/routers in computer networks [23, 24].
Network robustness to node attack may change in real-world networks with different structures [11]. Iyer et al. [3] studied network robustness as a function of the node clustering coefficient (or node transitivity). This study demonstrates that networks with higher clustering coefficients are more robust, with the most critical effect for the node degree and node betweenness attack. Nguyen and Trang [25] studied the Facebook social network. They found that those networks with higher modularity, i.e., networks presenting communities of nodes that are highly connected among them, have lower robustness to node removal. Zhou et al. [26] observed that increasing the assortativity of a network makes the network more robust against node removal and the network less stable. Nguyen et al. [27] showed that machine learning approaches unveil the degree assortativity, global closeness, and average node degree as the most critical factors in predicting the robustness () of real-world social networks.
Network science research shows contrasting outcomes about the role of the network structure in affecting its robustness to node attacks. On one hand, these studies are often based on small datasets of real-world networks, and they need more (robust) statistical analyses. On the other hand, research outcomes generally restrict the investigation, focusing on a few structural features of the networks, thus lacking a wide comparison of network structural indicators (NSIs) to forecast network robustness. For these reasons, understanding which structural features of real-world networks affect their robustness to node removal is still an urgent problem in network science.
In this research, we implement two well-known node attack strategies, i.e., the degree and betweenness node removal over a large set of 200 real-world networks from different areas of science.
We quantify the network functioning damage along the node attack sequence using the largest connected component () indicator [3, 11, 28]. To evaluate the network robustness against the node attack, we adopt the “percolation threshold” (), i.e., the fraction of nodes removed at which the network becomes disconnected or, in other terms, the fraction of nodes removed for which the size of the is quasi-zero [29].
Then, to understand how the network structure affects the network robustness (and the node attack efficacy), we correlate with 20 NSIs from the literature. To study this correlation, we performed regression analysis, single linear regression (SLR), multiple linear regression (MLR) models, and the Pearson correlation coefficient test to find the best NSI predictors of the target variable .
We find that the Estrada heterogeneity () index [30] best predicts in both the SLR and MLR models. The value decreases as a function of the index. The index measures network degree heterogeneity based on the difference in functions of node degrees for all pairs of linked nodes [30]. This result indicates that the degree heterogeneity of linked nodes may negatively affect the real-world network robustness to node attack, specifically the network robustness against removing the most connected and highest betweenness nodes. Our outcomes shed light on the role of the real-world network structure in shaping their robustness and can help assemble more robust network structures.
2 Methods
2.1 The node attack strategies
We simulated two classic node attack (removal) strategies. The first is the removal of nodes according to their degree (DEG), i.e., the number of links to the node [3, 4, 31]. The DEG strategy removes nodes in decreasing order of connectivity, i.e., the most connected nodes (hubs) are removed first. The second node attack strategy removes nodes in decreasing order of betweenness centrality (BET) [3, 7, 32]. The betweenness centrality is a node centrality based on the shortest paths between node pairs (also called geodesic paths). The shortest path between two nodes is the minimum number of links required to travel from one node to another [33]. The betweenness centrality of a node returns the number of shortest paths from every node pair of the network passing along that node. The betweenness of the node is , where is the total number of shortest paths between nodes and and is the number of these shortest paths passing through the node , and is the number of nodes.
We perform an “initial node attack approach,” i.e., the node centrality rank is computed at the beginning of the simulation, and it is not updated along the node removal process [11]. The “initial node attack approach” differs from the recalculated (also named adaptive) node attack, in which node centralities are updated after node removals [11, 28]. The initial node attack describes the case where it is not possible to collect information about node features during the node removal process, such as vaccinating nodes/individuals in a social contact network with limited resources (limited time or vaccines) [34] or attacking nodes/routers in a computer network with a simultaneous node attack [28].
For both the node attack strategies, in the case of ties, i.e., nodes with equal ranking, we randomly sort their sequence. We perform 103 simulations for each node attack strategy. We implemented the node attack simulations using the igraph package of the R program. The simulations are carried out on the high-performance computing (HPC) cluster of the “Università degli Studi di Parma.”
2.2 Real-world networks
We analyzed a large dataset of real-world network systems composed of 200 networks from different fields of science. The real-world networks analyzed here come from social, biological, Internet, road, transportation, neuronal, and ecological networks. The networks analyzed here are undirected (i.e., do not account for link directionality) and unweighted (do not account for link weight). The number of network nodes ranges from = 25 to = 75,811; the average is = 4,955.6. The real-world network datasets analyzed in this study are available in the “Netzschleuder” repository [https://networks.skewed.de/], in the “Stanford Large Network Dataset Collection” repository [https://snap.stanford.edu/data/index.html], and in “the Colorado Index of Complex Networks (ICON)” repository [https://icon.colorado.edu/#!/]. The complete list of the real-world networks is provided in Supplementary Table Al in Supplementary Appendix A1.
2.3 Network structure indexes
We considered 20 different NSIs from the network science literature, graph theory, and chemical graph theory to predict in a large real-world network dataset. NSI adopted in this work covers most of the salient structural features of the real-world networks proposed in the network science literature, such as the node connectivity level [35], presence of a community structure [36, 37], degree heterogeneity [30, 38], node assortativity [39], node transitivity (or clustering) [3, 40], distance among nodes [41], and different notions of node centrality [42]. The list of NSIs is provided in Table 1.
TABLE 1
| ID | Key | Full name | Formula | Definition | Reference |
|---|---|---|---|---|---|
| 1 | Node number | is the number of nodes in the network | |||
| 2 | Link number | is the number of links in the network | |||
| 3 | Connectance | is the number of links, and is the number of nodes | [15] | ||
| 4 | Average node degree | is the degree of the node , and is the nodes’ number | [1] | ||
| 5 | Node degree standard deviation | is the degree of the node , is the average node degree, and is the nodes’ number | [49] | ||
| 6 | Albertson index | is the link connecting nodes and j, is the degree of the node , is the degree of the node , and is the network link set. | [46] | ||
| 7 | Normalized Albertson index | is the Albertson index, and is the number of links | [49] | ||
| 8 | Estrada heterogeneity index | is the link connecting nodes and , is the degree of the node i and is the degree of the node , is the network link set, and is the node number | [30] | ||
| 9 | Network assortativity | is the standard deviation of the excess degree distribution, is the fraction of links connecting nodes of degree and , and and are the excess degree of nodes of degrees and , respectively | [39] | ||
| 10 | Average node distance | is the distance between nodes and , and is the node number | [41] | ||
| 11 | Network eccentricity | is the eccentricity of the node , and is the node number | [41] | ||
| 12 | Network diameter | is the distance between and , and the node number | [41] | ||
| 13 | Network radius | is the eccentricity of the node | [41] | ||
| 14 | Network efficiency | is the distance between node and node , and is the node number | [52] | ||
| 15 | Average node transitivity | is the transitivity of the node , and is the node number | [3] | ||
| 16 | Average node betweenness | is the number of nodes and the betweenness of the node | [53] | ||
| 17 | Average normalized node betweenness | is the number of nodes, and is the normalized betweenness of the node | [53] | ||
| 18 | Average node closeness | is the closeness of the node , and is the node number | [54] | ||
| 19 | Average normalized node closeness | is the normalized closeness of the node , and is the node number | [49] | ||
| 20 | Network modularity | is the total number of links in the network; is the element of the adjacency matrix, equal to 1 if and are connected, and 0 otherwise; and are the degrees of and , respectively; and are the modules (or community) of nodes and j, respectively; and is 1 if and 0 otherwise | [36] |
Network structural indicator (NSI) list with a short definition and reference.
2.4 The network robustness
To evaluate the networks’ response to node attack, we trace as a function of the fraction of nodes removed . (also named the giant component) is the maximum number of connected nodes [1]. In other terms, is the maximal set of nodes in the network such that a path connects each node pair. is the most commonly used measure to evaluate the network response to node removal [11]. Then, to evaluate the network robustness to node attack, we use that represents the fraction of nodes to remove for reducing to quasi-zero [29]. This work defines as the fraction of nodes removed to reduce the value equal to or lower to 0.05 of its initial size. The lower the value, the lower the network robustness (Figure 1). Furthermore, the lower the value, the higher the efficacy of the node attack strategies to dismantle the network [29].
FIGURE 1

as a function of the node removal fraction (). The percolation threshold value corresponds to the q-value at which is quasi-zero. A higher percolation threshold denotes a slower decrease. Consequently, a higher percolation threshold denotes a more robust network. The red line presents lower describing a more vulnerable network response to node attack than the black strategy. In other words, the black line denotes a more robust network response to a node attack.
2.5 The linear regression models
We perform regression model analyses to understand the relationship between NSI and the value of the real-world networks. First, we perform SLR. The SLR model between and an NSI x is expressed by the following linear equation:where a is the intercept and is the slope. We choose the one with the highest R-squared among the significant SLRs to evaluate the best SLR model and, consequently, the best predictor. In linear regression, R-squared (R2), also named the coefficient of determination, measures how close the data points are to the fitted line. Higher R2 denotes better regression fitting models [48].
Then, we perform MLR models. MLR is an extension of SLR for multi-dimension variables . The linear equation between the value and NSIs becomeswhere are coefficients obtained performing the ordinary least square (OLS) method and are NSIs. The coefficients quantify the association between NSI (variable) and (response). We interpret as the average effect on of a one-unit increase in NSI, holding all other NSI predictors fixed [48]. In practice, we often have more than one predictor, and the MLR model, differently from SLR, can directly accommodate multiple predictors. To evaluate the best predictor carried out by the MLR model, we choose the significant NSI with the highest absolute t-value. The t-value used in MLR is the t di-student statistic value from a two-sided t-test. The larger the absolute value of the t-test statistic, the less likely the results occurred by chance [48]. For this, larger absolute t-values are associated with better predictors (NSIs).
We use the lm function of the R program to perform the SLR and MLR models. The fitting process is computed using the OLS method, which estimates the coefficients by minimizing an appropriate loss function [49].
Last, we perform the Pearson correlation coefficient () to test the goodness of the correlation between NSI and . The coefficient is the most common way of measuring the strength of a linear correlation [50]. It is a number between −1 and 1 that measures the strength and direction of the relationship between two variables. To evaluate the best correlation performed by the coefficient, we choose the significant NSI with the highest absolute t-value. Last, we furnish the p-value to show the statistical significance of each model.
3 Results
Figure 2 shows the scatterplots of qc vs. NSIs for the DEG node attack strategy. Figure 3 shows the scatterplots of qc vs. NSIs for the BET node attack strategy.
FIGURE 2

Scatterplots of the percolation threshold (qc) vs. the network structural indicators (NSIs) for the DEG node attack strategy, removing nodes with higher degrees first.
FIGURE 3

Scatterplots of the percolation threshold (qc) vs. the network structural indicators (NSIs) for the BET node attack strategy, removing nodes with higher betweenness first.
Table 2 shows the outcomes of the SLR model. The best NSI to fit an SLR model with is the index for both DEG (p-value <10–4, R2 = 0.567) and BET (p-value <10–4, R2 = 0.671) strategies. SLR returns the lowest p-values and the highest R2 for both node attack strategies (Table 2). The fitting slopes are negative, indicating that qc decreases as a function of , i.e., the robustness of the network is negatively correlated with for both node attack strategies (Figures 2, 3).
TABLE 2
| DEG | BET | |||||||
|---|---|---|---|---|---|---|---|---|
| NSI | Intercept | Slope | p-value | R2 | Intercept | Slope | p-value | R2 |
| 0.515 | 0.000 | <10–4 *** | 0.149 | 0.519 | 0.000 | <10–4 *** | 0.149 | |
| 0.467 | 0.000 | 0.903 | 0.000 | 0.477 | 0.000 | 0.713 | 0.001 | |
| 0.413 | 1.887 | <10–4 *** | 0.217 | 0.431 | 1.502 | <10–4 *** | 0.159 | |
| 0.493 | 0.000 | <10–4 *** | 0.079 | 0.497 | 0.000 | <0.001** | 0.070 | |
| 0.446 | 4.184 | 0.05 | 0.019 | 0.443 | 5.792 | <0.05* | 0.041 | |
| 0.429 | 56.559 | <10–4 *** | 0.146 | 0.441 | 49.661 | <10–4 *** | 0.130 | |
| 0.213 | 0.939 | <10–4 *** | 0.195 | 0.305 | 0.628 | <10–4 *** | 0.101 | |
| 0.558 | −0.019 | <10–4 *** | 0.089 | 0.536 | −0.013 | <0.05* | 0.048 | |
| 0.560 | −0.008 | <10–4 *** | 0.105 | 0.539 | −0.005 | <0.001** | 0.059 | |
| 0.565 | −0.015 | <10–4 *** | 0.094 | 0.540 | −0.010 | <0.05* | 0.049 | |
| 0.290 | 0.866 | <10–4 *** | 0.330 | 0.330 | 0.703 | <10–4 *** | 0.251 | |
| 0.424 | 0.002 | <10–4 *** | 0.114 | 0.446 | 0.002 | <0.001** | 0.059 | |
| 0.460 | 0.000 | 0.401 | 0.004 | 0.479 | 0.000 | 0.642 | 0.001 | |
| 0.454 | 0.520 | <10–4 *** | 0.294 | 0.460 | 0.532 | <10–4 *** | 0.355 | |
| 0.595 | −0.234 | <0.05* | 0.037 | 0.530 | −0.100 | 0.214 | 0.008 | |
| 0.471 | 0.000 | 0.449 | 0.003 | 0.480 | 0.000 | 0.185 | 0.009 | |
| 0.665 | −0.940 | <10–4 *** | 0.567 | 0.674 | −0.952 | <10–4 *** | 0.671 | |
| 0.556 | −0.010 | <10–4 *** | 0.097 | 0.535 | −0.007 | <10–4 *** | 0.053 | |
| 0.183 | 0.963 | <10–4 *** | 0.237 | 0.274 | 0.679 | <10–4 *** | 0.136 | |
| 0.524 | −0.001 | <10–4 *** | 0.200 | 0.534 | −0.001 | 0.000 | 0.261 | |
Single linear regression model outcomes. The best significant predictor with the highest R2 value is in bold.
Table 3 shows the outcomes of the MLR model. The best NSI to predict with the MLR model is the index for both DEG (t-value = −11.9, p-value <10–23) and BET (t-value = −11.8, p-value <10–23) strategies. MLR estimates a negative correlation between and for both node attack strategies (negative correlation estimate, Table 3).
TABLE 3
| DEG | BET | |||||
|---|---|---|---|---|---|---|
| NSI | Estimate | t-value | p-value | Estimate | t-value | p-value |
| 1.36·10−06 | 0.574 | 0.567 | 1.326·10−06 | 0.589 | 0.556 | |
| 6.537·10−07 | 3.475 | <0.001** | 6.145·10−07 | 3.439 | <0.001** | |
| −0.767 | −3.422 | <0.001** | −0.614 | −2.885 | <0.05* | |
| −6.725·10−07 | −1.029 | 0.305 | −8.068·10−07 | −1.299 | 0.196 | |
| −1.387 | −0.975 | 0.331 | −1.016 | −0.752 | 0.453 | |
| −5.647 | −0.927 | 0.355 | −9.027 | −1.560 | 0.121 | |
| −3.551 | −3.552 | <0.001** | −4.915 | −5.173 | <10–4 *** | |
| 0.036 | 3.188 | <0.05* | 0.029 | 2.788 | <0.05* | |
| −0.02 | −2.705 | <0.05* | −0.017 | −2.378 | <0.05* | |
| 0.0006 | 0.049 | 0.961 | 0.015 | 1.384 | 0.168 | |
| −0.199 | −3.179 | <0.05* | −0.315 | −5.290 | <10–4 *** | |
| −0.0006 | −0.757 | 0.449 | −0.001 | −1.493 | 0.137 | |
| 0.001 | 1.257 | 0.210 | 0.003 | 1.752 | 0.081 | |
| 0.203 | 5.023 | <10–4 *** | 0.222 | 5.789 | <10–4 *** | |
| 0.015 | 0.283 | 0.778 | 0.004 | 0.085 | 0.932 | |
| −1.33·10−09 | −2.635 | <0.05* | −1.243·10−09 | −2.597 | <0.05* | |
| −0.915 | −11.836 | <10–4 *** | −0.874 | −11.904 | <10–4 *** | |
| 0.0121 | 0.934 | 0.352 | 0.002 | 0.125 | 0.900 | |
| 4.945 | 5.254 | <10–4 *** | 6.033 | 6.747 | <10–4 *** | |
| −1.47·10−07 | −0.001 | 0.999 | −4.193·10−05 | −0.432 | 0.666 | |
| Intercept | 1.505·10−01 | 0.089 | 2.13·1002 | 0.012 | ||
| Outcome | RSE: 0.06 multiple R2: 0.94 p-value: <0.001 | RSE: 0.06 multiple R2: 0.94 p-value | ||||
Multiple linear regression model outcomes. The best significant predictor with the highest absolute t-value is in bold.
Table 4 summarizes the coefficient test outcomes. The best NSI to correlate is the index for both DEG (t-value = −11.9, p-value <10–23) and BET (t-value = −11.8, p-value <10–23) strategies. The coefficient estimates a negative correlation between and for both node attack strategies (−16.063 for DEG and −20.035 for BET, Table 4).
TABLE 4
| DEG | BET | |||||
|---|---|---|---|---|---|---|
| NSI | Estimate | t-value | p-value | Estimate | t-value | p-value |
| −0.386 | −5.876 | <10–4 *** | −0.386 | −5.88 | <10–4 *** | |
| 0.009 | 0.123 | 0.903 | −0.026 | −0.369 | 0.713 | |
| 0.466 | 7.388 | <10–4 *** | 0.398 | 6.092 | <10–4 *** | |
| −0.281 | −4.102 | <10–4 *** | −0.264 | −3.841 | 0.001 | |
| 0.137 | 1.928 | 0.06 | 0.203 | 2.896 | 0.004 | |
| 0.382 | 5.805 | <10–4 *** | 0.36 | 5.42 | <10–4 *** | |
| 0.442 | 6.917 | <10–4 *** | 0.318 | 4.7 | <10–4 *** | |
| −0.299 | −4.398 | <10–4 *** | −0.218 | −3.141 | 0.002 | |
| −0.324 | −4.807 | <10–4 *** | −0.243 | −3.512 | 0.001 | |
| −0.307 | −4.527 | <10–4 *** | −0.22 | −3.17 | 0.002 | |
| 0.575 | 9.859 | <10–4 *** | 0.501 | 8.128 | <10–4 *** | |
| 0.337 | 5.022 | <10–4 *** | 0.243 | 3.514 | 0.001 | |
| 0.06 | 0.842 | 0.401 | −0.033 | −0.466 | 0.642 | |
| 0.542 | 9.059 | <10–4 *** | 0.596 | 10.408 | <10–4 *** | |
| −0.192 | −2.752 | 0.007 | −0.088 | −1.246 | 0.214 | |
| −0.054 | −0.759 | 0.448 | −0.094 | −1.329 | 0.185 | |
| −0.753 | −16.063 | <10–4 *** | −0.819 | −20.035 | <10–4 *** | |
| −0.311 | −4.599 | <10–4 *** | −0.23 | −3.312 | 0.001 | |
| 0.487 | 7.825 | <10–4 *** | 0.369 | 5.569 | <10–4 *** | |
| −0.447 | −7.02 | <10–4 *** | −0.511 | −8.349 | <10–4 *** | |
Pearson correlation coefficient test outcomes. The best significant predictor with the highest absolute t-value is in bold.
4 Discussion
The index is the best predictor of n in our NSI set. Estrada [30] proposed the index as a unique characterization of network degree heterogeneity based on the difference in functions of node degrees for all pairs of linked nodes. quantifies the degree heterogeneity of the network as a quadratic form of the Laplacian matrix of the network. It takes the value of zero if all nodes have the same degree as it happens in regular networks, and it is maximized when the difference of both degrees increases. The index has two bound or limit structures, i.e., it is equal to zero for any regular network (where all nodes present the same degree) and equal to one only for star graphs, i.e., networks in which -1 nodes are directly connected to a single central node [30]. We find that decreases as a function of the index (Figures 2, 3). This finding indicates that heterogeneous real-world networks with a higher variance in the degree of connected nodes are more vulnerable to node attacks.
is conceived as a refining of the Albertson index (), which computes the sum of the absolute value of the degree difference of the connected nodes [44]. The index, its normalized version , and the node degree standard deviation are all indicators we used to quantify the network degree heterogeneity. The statistical analyses we performed, both SLR and MLR and the coefficient test, indicate that these NSIs are not good predictors of . did not return significant fittings for all statistical models (Tables 2–4). evaluates the whole node degree heterogeneity, neglecting whether the node degree variance is among connected nodes. Differently, the index measures the degree difference among connected nodes [30]. For this reason, we can argue that the node degree heterogeneity would play a significant role in affecting the network robustness only if the node degree heterogeneity is located (and evaluated) among connected nodes.
The third and fourth ring roads of Beijing City, the capital of China, are the real-world networks of the lowest in our dataset ( = 0.008 and 0.009). In these networks, nodes represent the road intersections and links depict the roads connecting nodes [51]. The connected nodes present homogenous degrees, and for this reason, removing higher-degree road intersections would cause a slower network fragmentation with very high values ( = 0.6 and 0.56), indicating lower network damage. On the contrary, the academia US faculty hiring network shows the highest value ( = 0.73). In this network, a node is a Ph.D.-granting institution, and a link from node to node indicates that a person received their Ph.D. from node and was tenure-track faculty at node [52]. This network presents the highest degree heterogeneity of connected nodes, i.e., famous higher-degree nodes/institutions are connected with many lower-degree institutions. Therefore, the removal of the highest degree nodes, i.e., the removal of famous institutions sending many Ph.D. to other institutions, can cause a quick network disconnection. Therefore, the academia US faculty hiring network returns a lower value ( = 0.13), indicating more significant network damage.
The index is computed by averaging the original index over the number of links in the network. It can be viewed as the average degree difference among connected nodes [43]. shows significant fitting for SLR (Table 2). Nonetheless, R2 of SLR is much higher for than that for the index (0.567 for and 0.200 for , Table 2), indicating that the index can better explain the data. returns significant fitting for MLR (Table 3), but the absolute t-value for is much lower than that for (Table 3). Furthermore, did not return a significant coefficient test (Table 4). These statistical results indicate that only correlates the nodes’ degree heterogeneity of the networks with their robustness to the attack of connected nodes. On the other hand, these results suggest that networks presenting, on average, similar node degrees of the connected nodes should be robust to node attack. For this reason, networks of lower should show higher robustness to node attack and higher .
The assortativity coefficient (Table 1) measures how nodes tend to be connected with nodes of similar degrees [39]. “Assortative networks” present a preference for a network’s nodes to attach to others with similar node degrees [39, 53]. On the contrary, a network is “disassortative” when, on average, high-degree nodes are connected to nodes with a lower degree, and on average, low-degree nodes are connected to nodes with a higher degree. Positive values of indicate a correlation between nodes of similar degrees, while negative values indicate relationships between nodes of different degrees [39].
Given a certain node degree heterogeneity, assortative networks should have, on average, lower than disassortative networks. The linear regression indicates a negative correlation (p-value < 0.001) in our real-world network dataset and confirms this hypothesis, i.e., higher values of are associated with lower (Figure 4).
FIGURE 4

Scatterplot of the assortativity coefficient (A) vs. the Estrada heterogeneity () index. The black line represents the significant linear regression (p-value < 0.001).
Consequently, assortative networks should show higher robustness to node attack and higher . According to this hypothesis, we find that increases as a function of A (Figures 2, 3), and all models SLR, MLR (Tables 2, 3 respectively), and the coefficient (Table 4) return a positive significant fitting between and . The literature research results corroborate this finding, unveiling that increasing the assortativity of a network makes the network more robust against node removal [26], and a moderate assortativity increase positively affects the network’s robustness against targeted node attacks [54]. Therefore, real-world networks with higher-degree differences of connected nodes are likely to present lower .
To further investigate the relationship between node degree heterogeneity and network robustness, we perform an MLR model holding only and as predictors of , i.e., we fit the model . The outcomes of this analysis are shown in Table 5. is highly significant for the DEG strategy and presents the lowest t-value, whereas is not a significant predictor. is highly significant for the BET strategy and presents a smaller t-value than . This finding supports as NSI that can correlate with the real-world networks .
TABLE 5
| DEG | BET | |||||
|---|---|---|---|---|---|---|
| NSI | Estimate | t-value | p-value | Estimate | t-value | p-value |
| 0.093 | 1.584 | 0.115 | 0.103 | 2.175 | 0.031 * | |
| −0.862 | −11.300 | <10–4 *** | −0.865 | −14.051 | <10–4 *** | |
| Outcome | RSE: 0.16 multiple R2: 0.57 p-value: <0.001 | RSE: 0.12 multiple R2: 0.68 p-value: <0.001 | ||||
Multiple linear regression model
5 Conclusion
Investigating node attack strategies provides valuable insights into enhancing network robustness by anticipating potential threats and identifying components that need protection. On the other side of the coin, node attack research plays a crucial role when the aim is to perform a fast network disruption, such as halting the spread of a disease or stopping the diffusion of a computer virus. Here, we investigate the relationship between the network structure and its robustness to node attack in a large dataset of real-world networks. Our results indicate that the degree heterogeneity of connected nodes negatively affects the network robustness. Specifically, the index evaluates the node degree heterogeneity, and it is the best predictor of in our NSI set. This result unveils that heterogeneous real-world networks presenting higher differences in the degree of connected nodes are more vulnerable to node attacks. These results may help quantify real-world networked systems’ robustness and build more robust networks.
This paper presents some limitations that may open new lines of research. First, we perform linear regression models only. The relationship between NSIs and the percolation threshold qc of the real-world networks may follow nonlinear models. Therefore, a natural extension of this research may consider nonlinear regression models, such as logistic, monomolecular, or exponential functions, to describe the relationship between the structure and the percolation threshold of real-world networks. Then, we adopt an initial node attack approach to study network robustness. Future research may analyze the robustness of real-world networks using recalculated node attacks, in which node ranking is updated after each node removal. Last, it would be interesting to investigate how NSIs correlate with other robustness indexes besides , such as, for example, the network robustness index robustness proposed by Schneider et al [55]. The measurement considers the size of during the whole node attack process not only at the point the network collapses. Therefore, adopting R may unveil a new correlation pattern between NSIs and network robustness.
Statements
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.
Author contributions
MB, RA, and DC conceived the research. MB wrote the simulation codes. MB and RA performed the simulations. MB performed statistical analyses. All authors contributed to the article and approved the submitted version.
Funding
This research is funded by a grant from the Italian Ministry of Foreign Affairs and International Cooperation, by the Ecosister project, funded under the National Recovery and Resilience Plan (NRRP), and Mission 4 Component 2 Investment 1.5—Call for tender No. 3277 of 30/12/2021 of Italian Ministry of University and Research funded by the European Union—NextGenerationEU Award Number: Project code ECS00000033, Concession Decree No. 1052 of 23/06/2022 adopted by the Italian Ministry. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. (816313)). This work is supported by the Vietnam’s Ministry of Science and Technology (MOST) under the Vietnam-Italy scientific and technological cooperation program for the period 2021–2023. This work is supported by the Vietnam National University Ho Chi Minh City (VNU-HCM), Ho Chi Minh City, Vietnam, under grant number B2018-42-01. This research is funded by a grant from the Italian Ministry of Foreign Affairs and International Cooperation.
Acknowledgments
MB, MT, DC, and RA acknowledge the Italian Ministry of Foreign Affairs and International Cooperation. The authors are greatly thankful to Van Lang University, Vietnam, for providing the budget for this study. This research has benefited from the high-performance computing (HPC) cluster of the Università degli Studi di Parma. They thank Fabio Sartori for the revision of the first manuscript draft. They also thank Prof. Stefano Poletti for the intriguing discussions about this research.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphy.2023.1245564/full#supplementary-material
References
1.
BoccalettiSLatoraVMorenoYChavezMHwangDU. Complex networks: Structure and dynamics. Phys Rep (2006) 424:175–308. 10.1016/j.physrep.2005.10.009
2.
BellingeriMBevacquaDScotognellaFAlfieriRNguyenQMontepietraDet alLink and node removal in real social networks: A review. Front Phys (2020) 8:8. 10.3389/fphy.2020.00228
3.
IyerSKillingbackTSundaramBWangZ. Attack robustness and centrality of complex networks. PLoS One (2013) 8:e59613. 10.1371/journal.pone.0059613
4.
NguyenQVuTDinhHCassiDScotognellaFAlfieriRet alModularity affects the robustness of scale-free model and real-world social networks under betweenness and degree-based node attack. Appl Netw Sci (2021) 6:82. 10.1007/s41109-021-00426-y
5.
Da CunhaBRGonzález-AvellaJCGonçalvesS. Fast fragmentation of networks using module-based attacks. PLoS One (2015) 10. 10.1371/journal.pone.0142824
6.
CerquetiRCicirettiRDalòANicolosiM. A new measure of the resilience for networks of funds with applications to socially responsible investments. Phys A Stat Mech Its Appl (2022) 593:126976. 10.1016/j.physa.2022.126976
7.
LekhaDSBalakrishnanK. Central attacks in complex networks: A revisit with new fallback strategy. Phys A Stat Mech Its Appl (2020) 549:124347. 10.1016/j.physa.2020.124347
8.
ShangY. Random lifts of graphs: Network robustness based on the Estrada index (2012). Available at: http://www.math.nthu.edu.tw/amen/.
9.
ShangYL. Local natural connectivity in complex networks. Chin Phys Lett (2011) 28:068903. 10.1088/0256-307X/28/6/068903
10.
ShangY. Biased edge failure in scale-free networks based on natural connectivity. Indian J Phys (2012) 86:485–8. 10.1007/s12648-012-0084-4
11.
WandeltSSunXFengDZaninMHavlinS. A comparative analysis of approaches to network-dismantling. Sci Rep (2018) 8:13513. 10.1038/s41598-018-31902-8
12.
TianLBashanAShiDNLiuYY. Articulation points in complex networks. Nat Commun (2017) 8:14223. 10.1038/ncomms14223
13.
BellingeriMLuZMCassiDScotognellaF. Analyses of the response of a complex weighted network to nodes removal strategies considering links weight: The case of the Beijing urban road system. Mod Phys Lett B (2018) 32:1850067–11. 1850067. 10.1142/S0217984918500677
14.
CuadraLSalcedo-SanzSDel SerJJiménez-FernándezSGeemZW. A critical review of robustness in power grids using complex networks concepts. Energies (2015) 8:9211–65. 10.3390/en8099211
15.
BellingeriMVincenziS. Robustness of empirical food webs with varying consumer’s sensitivities to loss of resources. J Theor Biol (2013) 333:18–26. 10.1016/j.jtbi.2013.04.033
16.
CalizzaECostantiniMLRossiL. Effect of multiple disturbances on food web vulnerability to biodiversity loss in detritus-based systems. Ecosphere (2015) 6:art124–20. 10.1890/ES14-00489.1
17.
DunneJa.WilliamsRJMartinezND. Network structure and biodiversity loss in food webs: Robustness increases with connectance. Ecol Lett (2002) 5:558–67. 10.1046/j.1461-0248.2002.00354.x
18.
MontepietraDBellingeriMRossAMScotognellaFCassiD. Modelling photosystem I as a complex interacting network. J R Soc Interf (2020) 17:20200813. 10.1098/rsif.2020.0813
19.
SartoriFTurchettoMBellingeriMScotognellaFAlfieriRNguyenNKKet alA comparison of node vaccination strategies to halt SIR epidemic spreading in real-world complex networks. Sci Rep (2022) 12:21355. 10.1038/s41598-022-24652-1
20.
WangZZhaoDWWangLSunGQJinZ. Immunity of multiplex networks via acquaintance vaccination. EPL (2015) 112:48002. 10.1209/0295-5075/112/48002
21.
GallosLKLiljerosFArgyrakisPBundeAHavlinS. Improving immunization strategies. Phys Rev E - Stat Nonlinear, Soft Matter Phys (2007) 75:045104. 10.1103/PhysRevE.75.045104
22.
HartnettGSParkerEGuldenTRVardavasRKravitzD. Modelling the impact of social distancing and targeted vaccination on the spread of COVID-19 through a real city-scale contact network. J Complex Networks (2021) 9:cnab042. 10.1093/comnet/cnab042
23.
WangJJiangCQianJ. Robustness of Internet under targeted attack: A cascading failure perspective. J Netw Comput Appl (2014) 40:97–104. 10.1016/j.jnca.2013.08.007
24.
DoyleJCAldersonDLLiLLowSRoughanMShalunovSet alThe “robust yet fragile” nature of the Internet. Proc Natl Acad Sci U S A (2005) 102:14497–502. 10.1073/pnas.0501426102
25.
NguyenQLeT-T. Structure and robustness of Facebook’s pages networks. In: Proceeding of the 2019 the 10th conference on network modeling and analysis; November 06 - 08, 2019; Dijon, France (2019).
26.
ZhouDStanleyHED’AgostinoGScalaA. Assortativity decreases the robustness of interdependent networks. Phys Rev E - Stat Nonlinear, Soft Matter Phys (2012) 86:066103. 10.1103/PhysRevE.86.066103
27.
NguyenN-K-KNguyenQPhamH-HLeT-TNguyenT-MCassiDet alPredicting the robustness of large real-world social networks using a machine learning model. Complexity (2022) 2022:1–16. 10.1155/2022/3616163
28.
BellingeriMCassiDVincenziS. Efficiency of attack strategies on complex model and real-world networks. Phys A Stat Mech Its Appl (2014) 414:174–80. 10.1016/j.physa.2014.06.079
29.
HolmePJun KimBNo YoonCKee HanS. Attack vulnerability of complex networks (2002).
30.
EstradaE. Quantifying network heterogeneity. Phys Rev E - Stat Nonlinear, Soft Matter Phys (2010) 82:066102. 10.1103/PhysRevE.82.066102
31.
NieTGuoZZhaoKLuZM. New attack strategies for complex networks. Phys A Stat Mech Its Appl (2015) 424:248–53. 10.1016/j.physa.2015.01.004
32.
NguyenQPhamHDCassiDBellingeriM. Conditional attack strategy for real-world complex networks. Phys A Stat Mech Its Appl (2019) 530:121561. 10.1016/j.physa.2019.121561
33.
FreemanHE. A set of measures of centrality based on betweenness. Sociometry (1977) 40:35. 10.2307/3033543
34.
SartoriFTurchettoMBellingeriMScotognellaFAlfieriRNguyenN-K-Ket alA comparison of node vaccination strategies to halt SIR epidemic spreading in real-world complex networks. Res Sq (2022). 10.21203/rs.3.rs-1870717/v1
35.
AlbertRBarabásiA. Statistical mechanics of complex networks. Rev Mod Phys (2002) 74:47–97. 10.1103/revmodphys.74.47
36.
ClausetCNewmanMJMooreC. Finding community structure in very large networks. Phys Rev E (2004) 70:066111. 10.1103/physreve.70.066111
37.
SalatheMJamesJ. Dynamics and control of diseases in networks with community structure. Plos Comput Biol (2010) 6:e1000736. 10.1371/journal.pcbi.1000736
38.
EstradaE. The many facets of the Estrada indices of graphs and networks. Sema J (2022) 79:57–125. 10.1007/s40324-021-00275-w
39.
NoldusRMieghemPV. Assortativity in complex networks. J Complex Networks (2014) 3:507–42. 10.1093/comnet/cnv005
40.
GleesonJPMelnikSHackettA. How clustering affects the bond percolation threshold in complex networks. Phys Rev E - Stat Nonlinear, Soft Matter Phys (2010) 81:066114. 10.1103/PhysRevE.81.066114
41.
BuckleyFHararyF. Distance in graphs. Redwood City, CA: Addison-Wesley Publishing Company (1990). 10.1201/b16132-64
42.
LüLChenDRenXLZhangQMZhangYCZhouT. Vital nodes identification in complex networks. Phys Rep (2016) 650:1–63. 10.1016/j.physrep.2016.06.007
43.
BellingeriMBevacquaDTurchettoMScotognellaFAlfieriRNguyenNKKet alNetwork structure indexes to forecast epidemic spreading in real-world complex networks. Front Phys (2022) 10:10. 10.3389/fphy.2022.1017015
44.
AlbertsonMO. The irregularity of a graph. Ars Comb (1997) 46:219–25.
45.
LatoraVMarchioriM. Efficient behavior of small-world networks. Phys Rev Lett (2001) 87:198701––4. 10.1103/PhysRevLett.87.198701
46.
BarthélemyM. Betweenness centrality in large complex networks. Eur Phys J B (2004) 163–8. 10.1140/epjb/e2004-00111-4
47.
RochatY. Closeness centrality extended to unconnected graphs: The harmonic centrality index. Appl Soc Netw Anal (2009) 117.
48.
JamesGWittenDHastieTTibshiraniR. An introduction to statistical learning with applications in R.Cham: Springer (2013).
49.
GoldbergerAS. Econometric theory. New Jersey, United States: Wiley (1964).
50.
SchoberPSchwarteLA. Correlation coefficients: Appropriate use and interpretation. Anesth Analg (2018) 126:1763–8. 10.1213/ANE.0000000000002864
51.
BellingeriMBevacquaDScotognellaFLuZ-MCassiD. Efficacy of local attack strategies on the Beijing road complex weighted network. Phys A Stat Mech Its Appl (2018) 510:316–28. 10.1016/j.physa.2018.06.127
52.
WapmanKHZhangSClausetALarremoreDB. Quantifying hierarchy and dynamics in US faculty hiring and retention. Nature (2022) 610:120–7. 10.1038/s41586-022-05222-x
53.
NewmanMEJ. Mixing patterns in networks. Phys Rev E - Stat Physics, Plasmas Fluids Relat Interdiscip Top (2003) 67:026126. 10.1103/PhysRevE.67.026126
54.
TrajanovskiSMartín-HernándezJWinterbachWVan MieghemP. Robustness envelopes of networks. J Complex Networks (2013) 1:44–62. 10.1093/comnet/cnt004
55.
SchneiderCMMoreiraAAAndradeJSJrHavlinSHerrmannHJ. Mitigation of malicious attacks on networks. Proc Natl Acad Sci (2011) 108(10):3838–41. 10.1073/pnas.1009440108
Summary
Keywords
complex network, network robustness and resilience, machine learning, node attack sequence, statistical physics
Citation
Bellingeri M, Turchetto M, Scotognella F, Alfieri R, Nguyen N-K-K, Nguyen Q and Cassi D (2023) Forecasting real-world complex networks’ robustness to node attack using network structure indexes. Front. Phys. 11:1245564. doi: 10.3389/fphy.2023.1245564
Received
23 June 2023
Accepted
22 September 2023
Published
11 October 2023
Volume
11 - 2023
Edited by
Nuno Crokidakis, Fluminense Federal University, Brazil
Reviewed by
Divya Sindhu Lekha, Indian Institute of Information Technology, India
Jihui Han, Zhengzhou University of Light Industry, China
Updates
Copyright
© 2023 Bellingeri, Turchetto, Scotognella, Alfieri, Nguyen, Nguyen and Cassi.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Michele Bellingeri, michele.bellingeri@unipr.it
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.