Network structure indexes to forecast epidemic spreading in real-world complex networks

Bellingeri, Michele; Bevacqua, Daniele; Turchetto, Massimiliano; Scotognella, Francesco; Alfieri, Roberto; Nguyen, Ngoc-Kim-Khanh; Le, Thi Trang; Nguyen, Quang; Cassi, Davide

doi:10.3389/fphy.2022.1017015

ORIGINAL RESEARCH article

Front. Phys., 02 November 2022

Sec. Social Physics

Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.1017015

Network structure indexes to forecast epidemic spreading in real-world complex networks

MB
Michele Bellingeri ^1,2,3^*
DB
Daniele Bevacqua ⁴
MT
Massimiliano Turchetto ^2,3
FS
Francesco Scotognella ^1,5
RA
Roberto Alfieri ^2,3
NN
Ngoc-Kim-Khanh Nguyen ⁶
TT
Thi Trang Le ⁷
QN
Quang Nguyen ^7,8,9
DC
Davide Cassi ^4,5

1. Dipartimento di Fisica, Politecnico di Milano, Milano, Italy
2. Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università di Parma, Parma, Italy
3. INFN, Gruppo Collegato di Parma, Parma, Italy
4. PSH, UR 1115, INRAE, Avignon, France
5. Center for Nano Science and Technology@PoliMi, Istituto Italiano di Tecnologia, Milan, Italy
6. Faculty of Fundamental Sciences, Van Lang University, Ho Chi Minh City, Vietnam
7. John von Neumann Institute, Vietnam National University, Ho Chi Minh City, Vietnam
8. Institute of Fundamental and Applied Sciences, Duy Tan University, Ho Chi Minh City, Vietnam
9. Faculty of Natural Sciences, Duy Tan University, Da Nang, Vietnam

Article metrics

View details

Citations

2,8k

Views

678

Downloads

Abstract

Complex networks are the preferential framework to model spreading dynamics in several real-world complex systems. Complex networks can describe the contacts between infectious individuals, responsible for disease spreading in real-world systems. Understanding how the network structure affects an epidemic outbreak is therefore of great importance to evaluate the vulnerability of a network and optimize disease control. Here we argue that the best network structure indexes (NSIs) to predict the disease spreading extent in real-world networks are based on the notion of network node distance rather than on network connectivity as commonly believed. We numerically simulated, via a type-SIR model, epidemic outbreaks spreading on 50 real-world networks. We then tested which NSIs, among 40, could a priori better predict the disease fate. We found that the “average normalized node closeness” and the “average node distance” are the best predictors of the initial spreading pace, whereas indexes of “topological complexity” of the network, are the best predictors of both the value of the epidemic peak and the final extent of the spreading. Furthermore, most of the commonly used NSIs are not reliable predictors of the disease spreading extent in real-world networks.

Introduction

The fundamental role of networks in epidemiology has been recognized in the last years [1–12]. The disease spreading can be modeled as a network where nodes (vertices) represent the individuals (i.e., the hosts) and links (edges) indicate the social contacts among them [1–9]. Real-world complex networks display many structural connectivity patterns, such as the heavy-tailed degree distribution, small-world effect, high clustering coefficient, self-similarity, assortativity, community structures, etc. [1, 13–18]. These network structural connectivity patterns may affect the evolution of the spreading process [1, 5, 18–21]. Knowing the relationship between network structure indexes (NSIs) and the spreading dynamics is crucial to prevent and control diseases [17].

The field measures and analyses of real-world complex networks can be extremely consuming, in terms of both money and time. It is therefore necessary to know which features of the network structure should be first measured to assess the network vulnerability to disease and consequently optimize the control [1, 18–21]. To address this issue, we gathered a dataset of 50 real-world complex systems. They represent archetypical examples of network structures in different domains of reality, ranging from social, computers, internet, transportation, biological, and ecological networks (see Supplementary Materials S1.2 for details). We explicitly simulated a disease spreading over them via a classical compartmental susceptible–infected–recovered (SIR) model [1–5].

We derived three indicators of the speed and magnitude of the disease spread: 1) the time steps needed for the disease to strike 15% of the network nodes, ; 2) the overall number of nodes eventually affected by the disease, ; and 3) the maximum disease prevalence, i.e. the maximum number of nodes concurrently infected, . The first is a measure of the speed of the spreading process. The second is a measure of the impact of the disease over the population and it is likely to correlate with the number of severe, and possibly fatal, cases. The third is a measure of the peak and can be used, e.g., to predict the pressure on the care structures.

We considered 40 different NSIs, and we tested them, using 4 different regression models, which were the best predictors of the epidemic vulnerability simulated by the SIR model. We considered both classic NSIs from network science literature, graph theory, chemical graph theory, and original NSIs conceived in the present work (See Table 2 in the Methods and Supplemental Material S1.1). Regarding the type of relationship between the 3 disease spread indicators , representing the dependent variable, and the 40 candidate NSI , representing the independent variable, we considered 1) linear , 2) quadratic , 3) exponential , and 4) monomolecular regressions.

To select the best, among 40, NSI predictor, and the best, among 4, regression type, we ranked the 40*4 = 160 different models via the Akaike information criterion (AIC). AIC aims to select the model with the best goodness of fit to data while discouraging overparameterization and model complexity [31]. Eventually, for any model, we computed the fraction of variance unexplained (FVU). FVU is a measure of the goodness of fitting of the model, with FVU tending to zero for “ideal” models explaining the entire variability in the observations.

Results

The best results of the model selection procedures and the best model performances are reported in Table 1. The forms and fitting of the best regression models, for different spreading indicators and values of transmissibility, are reported in Figure 1. The spreading indicators vs. NSI scatterplots are in Supplementary Figures S3–S7. All the results of the model selection procedures and performances are in Supplementary Tables S2–S5.

TABLE 1


	SIR parameters (β = 0.03,γ = 0.04)	SIR parameters (β = 0.06,γ = 0.04)	SIR parameters (β = 0.03,γ = 0.04)	SIR parameters (β = 0.06,γ = 0.04)	SIR parameters (β = 0.03,γ = 0.04)	SIR parameters (β = 0.06,γ = 0.04)
1	- Exp (0; 0.026)	-Linear (0; 0.02)	-Mono (0; 0.078)	-Mono (0; 0.091)	- Mono (0; 0.091)	- Mono (0; 0.181)
2	-Para (20.84; 0.037)	- Exp (34.28; 0.031)	- Mono (5.57; 0.088)	- Mono (2.31; 0.096)	- Mono (0.93; 0.093)	- Mono (1.12; 0.185)
3	- Exp (33.64; 0.05)	- Para (42.93; 0.037)	-Mono (40.08; 0.175)	-Exp (31.46; 0.177)	- Exp (27.46; 0.165)	-Mono (3.66; 0.194)
4	- Para (42.61; 0.058)	- Exp (43.19; 0.061)	-Mono (40.08; 0.175)	-Exp (34.22; 0.188)	- Para (42.5; 0.214)	- Mono (4.01; 0.196)
5	- Exp (43.19; 0.061)	- Para (49.82; 0.042)	- Mono (45.89; 0.196)	- Mono (38.03; 0.195)	- Mono (43.44; 0.218)	- Mono (4.01; 0.196)
6	- Para (45.33; 0.061)	- Exp (59.62; 0.053)	- Exp (52.21; 0.230)	- Mono (40.32; 0.204)	- Mono (43.44; 0.218)	- Mono (23.95; 0.291)
7	- Exp (72.03; 0.108)	- Para (73.89; 0.069)	-Exp (52.53; 0.233)	- Mono (41.62; 0.210)	- Mono (44.77; 0.223)	- Mono (24.00; 0.29)
8	- Linear (72.76; 0.105)	Linear (85.19; 0.093)	- Mono (55.40; 0.237)	- Mono (41.62; 0.210)	- Mono (44.92; 0.224)	- Exp (26.84; 0.321)
9	- Linear (75.36; 0.115)	- Para (95.73; 0.107)	- Exp (59.48; 0.061)	- Exp (44.93; 0.233)	- Mono (46.98; 0.234)	- Exp (28.63; 0.333)
10	- Mono (86.12; 0.143)	- Exp (108.58; 0.141)	- Mono (59.68; 0.259)	- Para (45.07; 0.232)	- Linear (52.11; 0.259)	- Mono (30.03; 0.329)

The best ten NSIs to predict epidemic spreading for each spreading index.

The values in the brackets indicate the delta AIC, i.e., the AIC difference from the minimum AIC (best model has AIC = 0), and the fraction of variance unexplained (FVU) for the non-linear regression model for the SIR parameters simulating low epidemic spreading (β = 0.03, γ = 0.04) and high epidemic transmission (β = 0.06, γ = 0.04).

FIGURE 1

The pace of the disease

When considering the initial pace of disease (), the best models use as explanatory variables the average normalized node closeness (in an exponential form, Figure 1A), for low epidemic transmission (β = 0.03), and the average node distance (linear relationship, Figure 1B) for high epidemic transmission (β = 0.06). The ‘distance’ between two nodes u and v is the minimum length of a path joining them [14]. In other terms, the “distance” between two nodes u and v is the shortest path length, i.e., the minimum number of links to travel between them [14]. The average node distance , also called characteristic path length, measures the mean number of links to travel along the shortest path among node pairs in the network [14]. Figure 1B shows, for the higher epidemic transmission rate, the strong positive linear relationship between and , indicating that the higher the average node distance , the higher the time to infect the 15% of the network nodes.

The node closeness (or closeness centrality) is a measure of centrality in a network, calculated as the reciprocal of the sum of the distances (shortest paths length) between the node and all other nodes in the network [32]. Usually, the node closeness centrality may be normalized by dividing it by the term , where is the network nodes number. It follows that the normalized node closeness of node i is the inverse average distance from node i to all other nodes (See Supplementary Material S1.1). Therefore, the new NSI “average normalized node closeness” , we propose in this study, can be viewed as a measure of how many close network nodes are to each other, and it is an alternative indicator of evaluating the node distance in the network. For these reasons, even for a lower epidemic transmission rate, it emerges a strong negative relationship between the distance among network nodes () and the pace of the spreading (lower ) (Figure 1B). Noteworthy, both and return very high goodness of fitting models, by explaining almost the entire variability in the observations (FVU∼2%, see Table 1). Taking these results together, our analyses show that the most important network structural factor to predict initial spreading speed is the notion of node distance.

The infected peak

When considering the maximum number of concurrently infected nodes (), the best models use the predictor in a mono-molecular form for both low and high epidemic transmission (Figures 1C,D). The accuracy of the regression model is high, by explaining more than the 90% variability in the observations for both low and high epidemic transmission (FVU < 10%, see Table 1). The network infected peak quickly raises with , and reaches a plateau for higher values. The index (originally index), as the ratio of the average node degree (i.e., the average number of links per node) and the average node distance , was introduced in mathematical graph theory to encompass the topological complexity of the network [15]. Thus, the peak of infected individuals in the network , that is the peak prevalence of the epidemic, is positively related to network connectivity (average node degree ), and it decreases as a function of the node distance ().

The total infected

When considering the overall number of nodes that have been infected during an epidemic (), for low epidemic transmission (β = 0.03) the best predictor is the index in a mono-molecular form (Figure 1E). The index was introduced by Bonchev and Buck [15] to improve the measurement, and it follows the same rationale, accounting for the ratio between the node degree and a measure of the node distance (i.e., the farness) in the network. Let’s define the “farness” of the node i as , where is the distance between node i and node j, the index is where is the node degree of i and the is the “farness” of the node i. We find that follows a saturating function of index, showing how the total number of infected individuals may increase with network connectivity (node degree ) and decrease as a function of the node distance (here measured by the farness ).

For high epidemic transmission (β = 0.06) the best predictor is the average node coreness , in a mono-molecular form (Figure 1F). Node coreness (or coreness centrality) is a node centrality measure that shares the nodes in different sub-networks called k-core. The k-core of a network is a maximal sub-network in which each node has at least degree k [5]. In other terms, the coreness of a node is k if it belongs to the k-core but not to the (k + 1)-core. Kitsak et al. [5] showed that nodes of higher coreness are “influential spreaders” in the network, i.e., the nodes located in the network core determine a higher speed of network spreading. On the other hand, the epidemic starting in the network core may cover a large number of nodes, and the coreness centrality may be an efficient measure to individuate the nodes acting as efficient spreaders [5]. In this research, we introduce the index as the average value of node coreness to evaluate the global network spreading. We can interpret networks with higher as compact structures, where nodes of a higher degree are also located in the core of the network. We find that is well fitted by a saturating function of , showing how the total number of infected individuals may increase in networks of higher average node coreness. Nonetheless, we outline that the performance of is only slightly better than the prediction, and the regression models return almost equal goodness of fitting, with almost the same AIC and FVU (Table 1).

Discussion

Our results show that to predict network spreading to consider the distance among nodes is more important than focusing on their connectivity level. The most usual NSI evaluating the connectivity level of the network, i.e. the average node degree [13], return a poor prediction of the network spreading for all the three spreading indicators used in this study (Table 2). In specific, is strongly ineffective to explain the initial speed of the spreading (FVU∼0.5, Supplementary Tables S2, S4).

TABLE 2

ID	Full name	Definition	References
1	Linkage density	L is the link number and N is the number of nodes.	[22]
2	Connectance	L is the number of links and N is the number of nodes.	[22]
3	Average node degree	k_i is the degree of the node i, and N is the nodes number	[14]
4	Node degree harmonic mean	k_i is the degree of the node i, and N is the number of network nodes.	New—from 3
5	Node degree variance	k_i is the degree of the node i, the average node degree, and N is the nodes number.	New—from 3
6	Node degree standard deviation	k_i is the degree of the node i, the average node degree, and N is the nodes number.	New—from 3
7	Node degree normalized standard deviation	is the standard deviation of the node degree and the average node degree.	New—from 3
8	Degree 1	is the number of nodes of degree k = 1, and N is the nodes number.	New—from 3
9	Degree 2	is the number of nodes of degree k ≤ 2, and N is the nodes number.	New—from 3
10	Hub index	where is the sum of the degree of the 1% of the most connected nodes, and is sum of the degree of all nodes in the networks.	New—from 3
11	Albertson index	i,j is the link connecting nodes i and j, k_i is the degree of the node i, k_j the degree of the node j, and L is the network links set.	[23]
12	Normalized Albertson index	is the Albertson index and L the number of links.	New—from 12
13	Estrada index	i,j is the link connecting nodes i and j, k_i is the degree of the node i and k_jis the degree of the node j, and L is the network links set.	[24]
14	Node degree Shannon index	k_i is the degree of the nodes i and N is the number of nodes.	[25]
15	Network assortativity	is the standard deviation of the excess degree distribution, is the fraction of links connecting nodes of degree j and k, and , are the excess degree of nodes of degree j and k.	[18]
16	Average node distance	is the distance between nodes i and j and N the nodes number.	[14]
17	Node distance harmonic mean	is the distance between i and j and N nodes number	New—from 16
18	Node distance standard deviation	is the distance between node i and node j, is the average node distance and N is nodes number	New—from 16
19	Node distance normalized standard deviation	is the node distance standard deviation, is the average node distance and N is the nodes number.	New—from 18
20	Wiener index	is the distance between node i and node j, is the average node distance and N is the nodes number.	[26]
21	Network eccentricity	is the eccentricity of the node i and N is the nodes number.	[14]
22	Normalized network eccentricity	is the average network eccentricity and the average node distance	New—from 21
23	Network diameter	is the distance between i and j and N the nodes number	[14]
24	Normalized diameter	is the network diameter and the average node distance.	New—from 23
25	Network radius	ɛ(i) is the eccentricity of the node i.	[14]
26	Normalized network radius	is the network diameter and the average node distance.	New—from 25
27	Radius-diameter ratio	ɛ(i) is the eccentricity of the node i.	New—from 23 to 25
28	Radius-diameter normalized ratio	ɛ(i) is the eccentricity of the node i and the average node distance.	New—from 27
29	Network efficiency	is the distance between node i and node j and N the nodes number	[27]
30	Network communicability	N is the number of network nodes, and is the communicability between node p and q.	[28]
31	Network communicability logarithm	is the communicability of the network	New—from 30
32	Average node transitivity	is the transitivity of the node i, and N is the nodes number	[13]
33	Average node betwenness	N is the number of nodes and the betwenness of the node i.	[29]
34	Average normalized node betwenness	N is the number of nodes, is the normalized betwenness of the node i.	[29]
35	Average node closeness	is the closeness of the node i and N is the nodes number.	[29]
36	Average normalized node closeness	is the normalized closeness of the node i and N is the nodes number.	New—from 35
37	Average node coreness	is the coreness of the node i and N is the number of nodes.	[5]
38	Network modularity	L is the number of links, a_ij is the element of the A adjacency matrix in row i and column j, i, is the degree of i, is the degree of j, is the module (or community) of node i, that of j, the sum goes over all i and j pairs of nodes, and is 1 if x = y and 0 otherwise.	[30]
39	index	is the average node degree, is the average node distance.	[15]
40	Centricity index	is the degree of node i, is the farness of node i.	[15]

Network structural indexes (NSI) list. For the NSIs from the literature is indicated the reference between square brackets; for the new NSIs is indicated “new” and the NSI number from they are derived.

This seems counter-intuitive, since higher connectivity levels correlate, on average, with lower node distance in the network [1–13].

Focusing and indexes and ideal-types of the network structure we can figure out how the network connectivity level alone (i.e., the density of network links) may induce misleading predictions of the network epidemic spreading.

Both and increase from the chain network (lower complexity), through the star network (medium complexity), to the complete network (maximum complexity) (Figure 2). Following this simplified ideal scheme, it is possible to figure out the classes of real-world networks and their epidemic spreading entity: it would be the lowest in chain-like network owing smallest average node degree and highest average distance (or farness ), average in the star network owing similar to the chain network, but lower , and highest in the complete network, that maximize and minimize (or farness ).

FIGURE 2

In particular, the higher spreading of the star network with respect to the chain network, hence these ideal-types of network show similar network connectivity, they present very different node distance, allows to explain how the network connectivity alone may not be a reliable predictor of the spreading entity, and networks of similar connectivity level may present very different spreading entity. On the other hand, our outcomes show that the magnitude of the de-correlation between connectivity and node distance of the real-world networks may be higher enough to make the network connectivity alone a scarce predictor of the epidemic spreading.

This outcome is particularly important in the context of the epidemic spreading, such as the SARS-Cov2 research. Important and recent research by Thurner and colleagues [11] focusing SIR epidemic spreading on networks showed that classic epidemiological models formulated as differential equations, and based on the mean-field approximation (assuming that every node/individual in principle can infect any other), can produce a misleading prediction of the real epidemic spreading extent. Consequently, Thurner et al. [11] questioned the applicability of standard compartmental models, which neglect the network structure, to describe the real epidemic spreading and the SARS-Cov2 containment phase.

From one side, the outcomes of our research strongly support the Thurner et al. [11] main statement showing how neglecting the network structure may perform erroneous predictions of the real epidemic spreading. On the other side, our results go further and extend the Thurner et al. [11] research outcomes. Here, we show that the network epidemic models investigating the SARS-Cov2 epidemic spreading focus on the network connectivity density as a main structural feature to parameterize network epidemic spreading, as done by Thurner et al. [11] and many recent network epidemic models [6, 8], may perform incomplete or even unrealistic spreading predictions.

Further, most of the non-pharmaceutical interventions (NPIs) implemented to curb the SARS-Cov2 epidemic follow the rationale to reduce social interactions [33, 34], that is to decrease the number of the network links. Our analyses suggest that implementing NPIs with the aim to space out the nodes, i.e., increasing the node distance in the network, would be a more effective strategy to halt the epidemic. This would translate into a reduced peak of infected individuals () and, consequently, a lower number of infected individuals at the end of the epidemics ().

Last, we outline that many of the NSIs conceived in complex network science to encompass important network features that may potentially be leading to differently spreading entities, are not able to perform reliable predictions of the SIR epidemic spreading in real-world networks. The modularity (), the transitivity (), the degree assortativity (), and the different degree heterogeneity indicators (, , , ) of the network, that are assumed to influence network spreading [1, 2, 18, 20], return very low fitting model outcomes (Supplementary Tables S2, S4). For example, the degree assortativity returns FVU > 0.8 for all the spreading indicators, and the network modularity shows FVU > 0.45 for all the spreading indicators. We argue that the weak outcomes of these NSIs may be due to two main reasons. On the one hand, in real-world networks, the aforementioned NSIs may present non-linear relationships among them, with contrasting effects in determining the network spreading entity. For example, Volz et al. [17] showed that the average node transitivity () alone is not always sufficient to determine the full epidemiological dynamics, since the epidemic spreading depends not only on the node transitivity, but also on the nature of the interactions with other network structural features [17].

On the other hand, our results show that the node distance is the most important factor affecting the network spreading. The aforementioned NSIs may not correlate with node distance, and, as explained above for the relationship between average node degree and node distance (Figure 2), real-world networks with the same value for these NSIs may present different node distance . For example, the relationship between degree assortativity () and node distance in the network is not linear, with contrasting effects on the epidemic spreading [18]. For this reason, real-world networks with similar values of these NSIs may present different spreading entities.

Materials and methods

Network structural indexes

In Table 2 we list the network structural indexes (NSIs) used in this study, a short definition, and their reference. For the NSIs coming from literature, we indicate the literature reference. For the new ones formulated in the present study by modifying or combining notions or indicators from literature, we list the indicators from which the new ones are derived. In the Supplementary Material S1.1, we furnish the extended definition of each network structural indicator.

Real-world complex networks database

We analyzed a set of 50 high-quality real-world networks from different fields of science (see Supplementary Material S1.2). The number of real-world networks for different areas of science is: road transportation 6, airports transportation 2, cargo-ship transportation 1, biological 4, ecological 2, social 13, citation 2, phone 2, internet 5, financial 1, computers 9, email 3. The complete list of real-world networks with network type and reference is in Supplementary Table S1.

The susceptible–infectious–recovered dynamic epidemics model

We used a susceptible-infected-recovered (SIR) model to numerically simulate the spreading entity over real-world networks. Type SIR models can successfully predict the dynamics of many infectious diseases. See Keeling and Rohani [35] for an overview. When considering SIR models over a network, at any time, a node can be in one of three possible compartments: susceptible (S), infected (I), and recovered (R). If a node/individual is infected, it will infect susceptible nodes linked to it with a transmission rate, β. An infected node/individual stays infectious on average for γ⁻¹ consecutive days, i.e., recovers with a rate equal to γ. Recovered node/individual can no longer infect others and its state will no longer change, which is equivalent to assume that immunization does not vanish in the considered time horizon. We initialized the system by fixing all nodes/individuals as susceptible except one, randomly chosen, whose state is set as infected. The system dynamics can then be solved and permit to model the epidemics evolution over time. To simulate the SIR spreading on a network we used the NDlib Python library presented in Rossetti et al. [36]. We fix the SIR parameters β equals 0.03 or 0.06, and γ = 4. We adopt two different transmission rate values of parameter β to describe low and high epidemic transmission. Higher values of β represent epidemics with higher transmissibility. We chose relatively small values for β, according to Kitsak et al. [5], so that the infected percentage of the population in the network remains small and the simulation can outline the role of the network structure for the spreading. In the case of larger β values, where spreading can reach a large fraction of the population in a few steps, the spreading would cover almost all the network in a few time steps thus hiding the role of topological structure to affect the pace of the spreading. For each real-world network, we implemented 10³ independent SIR simulations each with a different node/individual initially infected.

The pace of the epidemic spreading can also be evaluated by the time to infect a given part of the population [37]. We define the time to reach the 15% of infected nodes in the network. corresponds to the time steps of the SIR simulation necessary to have 15% of infected nodes (both considering the currently infected nodes and the recovered ones). The lower is the time to infect the fixed fraction of nodes/individuals, the faster is epidemic spreading.

Then, we assessed the pace of the epidemic spreading by the total number of individuals that have been infected () at the end of the simulation, i.e., when there are no more infected nodes [5, 12] and by the maximum value of infected nodes in a given day () [12]. The indicator corresponds to the cumulative sum of new cases, which is equivalent to the number of recovered nodes at the end of the dynamics, when, by model construction, no more nodes can be infected. This is the indicator used to quantify the influence of a given node of the network in a SIR spreading process by Kitsak et al. [5] and to evaluate the efficacy of link removal strategies to curb the SIR spreading in social networks [12]. The indicator provides an estimate of the spread of the disease within a population and it is likely to correlate with the number of severe, and possible fatal, cases. The indicator, besides the evaluation of the spreading pace, it provides an estimate of the pressure over the sanitary system which might collapse, thus causing higher mortality probabilities of infected individuals, when a critical threshold is exceeded [12]. Since in epidemiology, “prevalence” is the fraction of a population currently infected [35], can be defined as the maximum prevalence occurring during the epidemic simulations.

The list of the spreading indicators with their definition is in Table 3.

TABLE 3

Indicator key	Name	Definition
	Time to 15%	Time steps of the simulation to reach the 15% of infected nodes (both recovered and infected).
	Total infected	Fraction of total infected nodes (both infected and recovered) at the end of the simulation.
	Infected peak	Maximum fraction of currently infected nodes occurring along the simulation. It represents the maximum epidemics prevalence.

Spreading indicators used in this study.

The regression models

To estimate the goodness of the relationship between the spreading indicator value (response variable Y) and the network structural indicator value (independent variable or predictor X) we performed four types of regression models: linear, quadratic, exponential, and monomolecular.

Linear: . It represents the simplest relationship between two variables i.e. one increases/decreases proportionally to the other.

Exponential: . It is used to model situations in which 1) the response of one variable, to the change of another, begins slowly and then accelerates rapidly without bound, or 2) its decay begins rapidly and then slows down to get closer and closer to zero. A multitude of situations can be modeled by exponential functions, such as investment growth, radioactive decay, atmospheric pressure changes, temperatures of a cooling object, etc.

Quadratic: . It represents those cases in which the maximum (or minimum) value of a variable is obtained at intermediate values of the independent variable. In biology, the growth rate of organisms is often modelled as a quadratic function of temperature. Such pattern can arise when the disease spread depends on the interaction of two processes which respond differently to the same NSI

Monomolecular (also known as Brody or Mitscherlich function): where and are growth parameters, and is the asymptotic size [38]. The monomolecular is a special case of the generalised logistic function and it is a widely used growth curve model for saturating biological phenomena [38]. This typically occurs when other elements of the system interfere with the effect of the considered NSI and smooth its effect

The model selection criterion

We selected the best model using the Akaike information criterion (AIC) [31].where k is the number of estimated parameters in the regression model (2 or 3 according to the regression model), and is the maximum value of the likelihood function for the model [31]. Given a set of candidate models for the data, the best model is the one with the minimum AIC value. Thus, AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discourages overfitting, which is desired because increasing the number of parameters in the model almost always improves the goodness of the fit. Minimization was performed using the R program function nlm (Gauss-Newton algorithm).

Eventually, to provide an easily interpretable measure of the goodness of the fitting model performances over network structural indexes (predictors), we computed the fraction of variance unexplained (FVU), calculated as:where is the observed value of the variable (i.e. the observed spreading indicators value for the network i), is the estimated value of the variable (i.e., the value of the spreading indicators estimated by the fitting model for the network i), is the average observed value of the spreading indicators over the all networks set and is the total number of networks.

Statements

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: The datasets analysed during the current study are available in the “Netzschleuder” repository [https://networks.skewed.de/], in the “Stanford Large Network Dataset Collection” repository [https://snap.stanford.edu/data/index.html], and in “The Colorado Index of Complex Networks (ICON)” repository [https://icon.colorado.edu/#!/].

Author contributions

BM, CD, AR, and BD conceived the research. BM, AR, and TM performed the analyses. All the authors wrote the manuscript.

Funding

This research is funded by a grant from the Italian Ministry of Foreign Affairs and International Cooperation. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme [grant agreement No. (816313)]. This work is supported by the Vietnam’s Ministry of Science and Technology (MOST) under the Vietnam-Italy scientific and technological cooperation program for the period 2021–2023. This work is supported by the Vietnam National University Ho Chi Minh City (VNU-HCM), Ho Chi Minh city, Vietnam under grant number B2018-42-01.

Acknowledgments

BM, TM, CD, and AR acknowledge the Italian Ministry of Foreign Affairs and International Cooperation. We are greatly thankful to Van Lang University, Vietnam for providing the budget for this study. We thank F. Sartori for helpful discussions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphy.2022.1017015/full#supplementary-material

References

1.
Pastor-SatorrasRCastellanoCVan MieghemPVespignaniA. Epidemic processes in complex networks. Rev Mod Phys (2015) 87:925–79. 10.1103/RevModPhys.87.925
- CrossRef
- Google Scholar
2.
Pastor-SatorrasRVespignaniA. Immunization of complex networks. Phys Rev E (2002) 65:036104. 10.1103/PhysRevE.65.036104
- CrossRef
- Google Scholar
3.
NewmanM. Spread of epidemic disease on networks. Phys Rev E (2002) 66:016128. 10.1103/PhysRevE.66.016128
- CrossRef
- Google Scholar
4.
ChenYPaulGHavlinSLiljerosFStanleyH. Finding a better immunization strategy. Phys Rev Lett (2008) 101:058701. 10.1103/PhysRevLett.101.058701
- CrossRef
- Google Scholar
5.
KitsakMGallosLKHavlinSLiljerosFMuchnikLStanleyHEet alIdentification of influential spreaders in complex networks. Nat Phys (2010) 6:888–93. 10.1038/nphys1746
- CrossRef
- Google Scholar
6.
FirthJAHellewellJKlepacPKisslerSJitMAtkinsKEet alUsing a real-world network to model localized COVID-19 control strategies. Nat Med (2020) 26:1616–22. 10.1038/s41591-020-1036-8
- CrossRef
- Google Scholar
7.
AmaralMAOliveiraMMd.JavaroneMA. An epidemiological model with voluntary quarantine strategies governed by evolutionary game dynamics. Chaos Solitons Fractals (2021) 143:110616. 10.1016/j.chaos.2020.110616
- CrossRef
- Google Scholar
8.
NishiADeweyGEndoANemanSIwamotoSKNiMYet alNetwork interventions for managing the COVID-19 pandemic and sustaining economy. Proc Natl Acad Sci U S A (2020) 117:30285–94. 10.1073/pnas.2014297117
- CrossRef
- Google Scholar
9.
HuYJiSJinYFengLEugene StanleyHHavlinS. Local structure can identify and quantify influential global spreaders in large scale social networks. Proc Natl Acad Sci U S A (2018) 115:7468–72. 10.1073/pnas.1710547115
- CrossRef
- Google Scholar
10.
PeiSMakseHA. Spreading dynamics in complex networks. J Stat Mech (2013) 2013:P12002. 10.1088/1742-5468/2013/12/P12002
- CrossRef
- Google Scholar
11.
ThurnerSKlimekPHanelR. A network-based explanation of why most COVID-19 infection curves are linear. Proc Natl Acad Sci U S A (2020) 117:22684–9. 10.1073/pnas.2010398117
- CrossRef
- Google Scholar
12.
BellingeriMTurchettoMBevacquaDScotognellaFAlfieriRNguyenQet alModeling the consequences of social distancing over epidemics spreading in complex social networks: From link removal analysis to SARS-CoV-2 prevention. Front Phys (2021) 9:1–7. 10.3389/fphy.2021.681343
- CrossRef
- Google Scholar
13.
BoccalettiSVitoLMorenoYChavezMHwangD. Complex networks: Structure and dynamics. Phys Rep (2006) 424:175–308. 10.1016/j.physrep.2005.10.009
- CrossRef
- Google Scholar
14.
BuckleyFHararyF. Distance in graphs. Redwood City, CA: Addison-Wesley Publishing Company (1990). 10.1201/b16132-64
- CrossRef
- Google Scholar
15.
BonchevDBuckGA. Quantitative measures of network complexity. Complex Chem Biol Ecol (2005) 2005:191–235. 10.1007/0-387-25871-X_5
- CrossRef
- Google Scholar
16.
De DomenicoMGranellCPorterMAArenasA. The physics of spreading processes in multilayer networks. Nat Phys (2016) 12:901–6. 10.1038/nphys3865
- CrossRef
- Google Scholar
17.
VolzEMMillerJCGalvaniAMeyersL. Effects of heterogeneous and clustered contact patterns on infectious disease dynamics. Plos Comput Biol (2011) 7:e1002042. 10.1371/journal.pcbi.1002042
- CrossRef
- Google Scholar
18.
NoldusRMieghemPV. Assortativity in complex networks. J Complex Netw (2015) 3:507–42. 10.1093/comnet/cnv005
- CrossRef
- Google Scholar
19.
MillerJC. Spread of infectious disease through clustered populations. J R Soc Interf (2009) 6:1121–34. 10.1098/rsif.2008.0524
- CrossRef
- Google Scholar
20.
SalatheMJamesJ. Dynamics and control of diseases in networks with community structure. Plos Comput Biol (2010) 6:e1000736. 10.1371/journal.pcbi.1000736
- CrossRef
- Google Scholar
21.
BadhamJStockerR. The impact of network clustering and assortativity on epidemic behaviour. Theor Popul Biol (2010) 77:71–5. 10.1016/j.tpb.2009.11.003
- CrossRef
- Google Scholar
22.
BellingeriMVincenziS. Robustness of empirical food webs with varying consumer’s sensitivities to loss of resources. J Theor Biol (2013) 333:18–26. 10.1016/j.jtbi.2013.04.033
- CrossRef
- Google Scholar
23.
AlbertsonMO. The irregularity of a graph. Ars Comb (1997) 46:219–25.
- Google Scholar
24.
EstradaE. Quantifying network heterogeneity. Phys Rev E (2010) 82:066102–8. 10.1103/PhysRevE.82.066102
- CrossRef
- Google Scholar
25.
SpellerbergIFFedorP. A tribute to Claude Shannon (1916–2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’ Index. Glob Ecol Biogeogr (2003) 12:177–9. 10.1046/j.1466-822x.2003.00015.x
- CrossRef
- Google Scholar
26.
RouvrayD. The rich legacy of half a century of the wiener index. Topology Chem (2002) 2002:16–37. 10.1533/9780857099617.16
- CrossRef
- Google Scholar
27.
LatoraVMarchioriM. Efficient behavior of small-world networks. Phys Rev Lett (2001) 87:198701. 10.1103/PhysRevLett.87.198701
- CrossRef
- Google Scholar
28.
EstradaEHatanoN. Communicability in complex networks. Phys Rev E (2008) 77:036111. 10.1103/physreve.77.036111
- CrossRef
- Google Scholar
29.
FreemanHE. A set of measures of centrality based on betweenness. Sociometry (1977) 40:35–41. 10.2307/3033543
- CrossRef
- Google Scholar
30.
ClausetCNewmanMJMooreC. Finding community structure in very large networks. Phys Rev E (2004) 70:066111. 10.1103/physreve.70.066111
- CrossRef
- Google Scholar
31.
BurnhamKPAndersonDR. Multimodel inference: Understanding AIC and BIC in model selection. Sociol Methods Res (2004) 33:261–304. 10.1177/0049124104268644
- CrossRef
- Google Scholar
32.
MarchioriMLatoraV. Harmony in the small-world. Physica A: Stat Mech its Appl (2000) 285:539–46. 10.1016/s0378-4371(00)00311-3
- CrossRef
- Google Scholar
33.
FlaxmanSMishraSGandyAUnwinHJTMellanTACouplandHet alEstimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature (2020) 584:257–61. 10.1038/s41586-020-2405-7
- CrossRef
- Google Scholar
34.
PerraN. Non-pharmaceutical interventions during the COVID-19 pandemic: A review. Phys Rep (2021) 913:1–52. 10.1016/j.physrep.2021.02.001
- CrossRef
- Google Scholar
35.
MattKPejmanR. Modeling infectious diseases in humans and animals. New Jersey, United States: Princeton University Press (2008).
- Google Scholar
36.
RossettiGMilliLRinzivilloSSîrbuAPedreschiDGiannottiF. NDlib: A python library to model and analyze diffusion processes over complex networks. Int J Data Sci Anal (2018) 5:61–79. 10.1007/s41060-017-0086-6
- CrossRef
- Google Scholar
37.
ChenDLüLShangMSZhangYCZhouT. Identifying influential nodes in complex networks. Physica A: Stat Mech its Appl (2012) 391:1777–87. 10.1016/j.physa.2011.09.017
- CrossRef
- Google Scholar
38.
ThornleyJHMFranceJ. Mathematical models in agriculture: Quantitative methods for the plant, animal and ecological sciences. Wallingford, United Kingdom: Cabi (2007).
- Google Scholar

Summary

Keywords

complex networks, network spreading, network epidemics, network structural characteristics, SIR (susceptible infected recovered) model

Citation

Bellingeri M, Bevacqua D, Turchetto M, Scotognella F, Alfieri R, Nguyen N-K-K, Le TT, Nguyen Q and Cassi D (2022) Network structure indexes to forecast epidemic spreading in real-world complex networks. Front. Phys. 10:1017015. doi: 10.3389/fphy.2022.1017015

Received

11 August 2022

Accepted

19 October 2022

Published

02 November 2022

Volume

10 - 2022

Edited by

Ayse Peker-Dobie, Istanbul Technical University, Turkey

Reviewed by

Divya Sindhu Lekha, Indian Institute of Information Technology, Kottayam, India

Önder Mehmet Pekcan, Kadir Has University, Turkey

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Michele Bellingeri, michele.bellingeri@polimi.it

This article was submitted to Social Physics, a section of the journal Frontiers in Physics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Social Physics

ORIGINAL RESEARCH article

Network structure indexes to forecast epidemic spreading in real-world complex networks

Abstract

Introduction