An effective method for epidemic suppression by edge removing in complex network

Since the birth of human beings, the spreading of epidemics such as COVID-19 affects our lives heavily and the related studies have become hot topics. All the countries are trying to develop effective prevention and control measures. As a discipline that can simulate the transmission process, complex networks have been applied to epidemic suppression, in which the common approaches are designed to remove the important edges and nodes for controlling the spread of infection. However, the naive removal of nodes and edges in the complex network of the epidemic would be practically infeasible or incur huge costs. With the focus on the effect of epidemic suppression, the existing methods ignore the network connectivity, leading to two serious problems. On the one hand, when we remove nodes, the edges connected to the nodes are also removed, which makes the node is isolated and the connectivity is quickly reduced. On the other hand, although removing edges is less detrimental to network connectivity than removing nodes, existing methods still cause great damage to the network performance in reality. Here, we propose a method to measure edge importance that can protect network connectivity while suppressing epidemic. In the real-world, our method can not only lower the government’s spending on epidemic suppression but also persist the economic growth and protect the livelihood of the people to some extent. The proposed method promises to be an effective tool to maintain the functionality of networks while controlling the spread of diseases, for example, diseases spread through contact networks.


Introduction
For the convenience of investigating practical systems, they are usually modeled as complex networks, such as biology, ecology, physics and medicine [1][2][3][4][5][6][7][8]. The abbreviated complex network usually consists of nodes and edges which represent individuals and connections among individuals, respectively. With these complex network models, various phenomena can be revealed through studying the properties of nodes, edges and communities. Among existing studies, the determination of edge/node importance is of great significance which can provide valuable guidance in addressing practical problems. For instance, in the aviation network, the nodes correspond to the important airports and the related studies are critical.
Since the birth of network science, scholars devote to studying the heterogeneity of nodes. Various approaches are presented to measure the importance of nodes, such as degree centrality [9,10], betweenness centrality [11,12], eigenvector centrality [13], k-shell [14], etc.
In recent years, many infectious diseases spread globally, countries around the world are actively looking for effective countermeasures to suppress epidemic. Studying heterogeneity of nodes and edges may play an important role in reducing virus impact. For example, nodes and edges to be removed in aviation networks corresponds to airports to be isolated and routes to be canceled, respectively. With these measures, some undesired effects might be limited. However, removing nodes has a more serious impact on the connectivity of the network than removing edges; furthermore, greater cost and impact might also be incurred in practice. To achieve the purpose of effectively suppressing epidemic and protecting network connectivity simultaneously, we propose a method E BD to measure edge importance through combining edge degree centrality and edge betweenness centrality. After that, extensive experiments are conducted to verify its effectiveness.
The reminder of our work is organized as follows: In Section 2, some existing edge importance methods are reviewed. Then, in Section 3, our proposed method is explained explicitly while the difference between our method and existing ones are presented. Later, extensive experiments are conducted with the obtained results being provided in Section 4. The performance of our method is compared to that of existing ones with sufficient discussions. Eventually, Section 5 concludes this work with some future research directions being provided.

Related work
Firstly, we suppose a complex network is formulated as G(V, E), where V and E stand for the node set and edge set, respectively, n |V| denotes the number of nodes, m |E| represents the number of edges. For brevity, we provide some edge importance derivation methods as follows.

EBC
In [28], Freeman presents the node betweenness centrality, and then Girvan and Newman extend the definition of betweenness centrality from nodes to edges [16]. Later, Newman implements it in [17]. The concept of edge betweenness centrality indicates that the edges which are more important than the others are likely to be central. We use C B to denote quantitative value of edge importance which is defined as: where e ∈ E, S ij represents the number of shortest paths from node i to node j and s ij (e) indicates the number of the shortest paths from node i to node j which pass through edge e. If there exists no path linking i and j, we set sij(e) Sij 0 for convenience. According to Eq. 1, a lager C B usually indicates better control ability over the network. For instance, in the aviation network, if the key routes obtained by C B are canceled, the aviation network is likely to be decomposed into a number of small ones. This is important for epidemic suppression which will be discussed explicitly in Section 3.

WKP
De Meo et al. apply Random-Walk to measure edge importance [15]. A total of m − 1 rounds of loops are carried out with performing 10 Random-Walk in each loop from a random node (m indicates the number of edges in the network). Then, the number of times for each edge being traversed will be recorded. After that, the edge importance sequence is sorted in descending order according to the recorded traversal times (i.e., C W ).

LI
Aiming to suppress epidemic effectively under SIS model, Joan T. Matamalas et al. propose a method for assessing the importance of edges during the spreading process, called Link Importance (LI) [18]. Specifically, they first define a model named ELE which is similar as the SIS model. Through the ELE model, they conduct the disease spreading process until the system converges to a steady state. Then, if nodes i and j are directly connected, they can obtain the probability P(ϖ i = S, ϖ j = I) (i.e., the probability of node i is susceptible and node j is infectious) which is represented by Φ ij , and the probability P(ϖ i = ϖ j = I) (given by Θ I ij ). Here, ϖ i represents the state of node i, S and I represent the susceptible and the infectious states, respectively. The edge importance is calculated by: where I ij stands for the importance of edge e ij and A represents the adjacency matrix of network G. According to Eq. 2, after a node being infected by its neighbors, we can effectively evaluate the extent of the impact being applied on neighboring nodes by the newly infected node. For instance, we assume that two nodes being connected by an edge e ij are in different states, i.e., susceptible and infectious, respectively. When the susceptible node is infected by an infectious one, if the newly affected node has a larger number of neighbors, the authors suppose it will infect more nodes.
According to the analysis of existing studies, EBC tends to achieve better effect of spreading control compared with the others; however, greater damage will be incurred to network connectivity. WKP has high randomness; hence, it is not effective in spreading control and network connectivity protection. LI can protect network connectivity well, but the effect of spreading control is worse than that of EBC. In summary, existing methods cannot achieve excellent performance in both epidemic suppression and network connectivity protection. However, both aspects are of great significance. Taking the aviation network as an example, on the one hand, it is necessary to suppress epidemic to reduce the spreading Frontiers in Physics frontiersin.org 02 range; thus, it is very important to reduce the necessary overhead as much as possible. On the other hand, to meet the requirements of world-wide transportation, it is also of great significance to ensure the accessibility between countries. Therefore, we propose a method (being denoted as E BD ) which is effective in spreading control and protecting network connectivity simultaneously.

Methods
In this work, we propose a new method to measure the importance of edges through combining degree centrality and EBC. With our proposed approach, we can control the spreading phenomena and protect the network connectivity effectively after removing some of the most important edges.
In this paper, we adopt the widely utilized SIS model to simulate the spreading process. Accordingly, each node can be either in a state of S (susceptible) or I (infectious). The node in the I state can infect the neighboring nodes in the S state with probability μ, while the node in the I state can recover with probability γ, the overall state transition process can be depicted as in Figure 1. According to [29], we can derive the epidemic threshold λ c (i.e., if μ/γ > λ c , the virus will spread forever; otherwise, the virus will disappear quickly) in degree uncorrelated networks which is provided as: where 〈k〉 stands for the average degree of all nodes and 〈k 2 〉 represents the mean of squared degree of all nodes. According to the definition of λ c , it is obvious that we can control the virus spreading process through increasing the epidemic threshold. After a theoretical analysis, we find that when we remove an edge (e ij ∈ E, i, j ∈ V), 〈k〉 is reduced by 2/n; thus, we can only increase λ c by minimizing 〈k 2 〉 as much as possible. Before removing e ij , we assume that the degrees of nodes i and j are denoted as k i and k j , respectively. Then, after removing e ij , 〈k 2 〉 is reduced by As in Eq. 4, we find that a larger (k i + k j ) usually indicates a larger λ c ′. Then, we should find the edge e ij with the largest k i + k j . This means that the edge with more neighbors is more important than the others and the important edge plays a greater role in the epidemic diffusion process. Similar as k i + k j , k i × k j can also indicate the number of neighbors being connected by e ij . Van Mieghem et al. verify that the performance of utilizing k i × k j is much better than the adoption of k i + k j during epidemic suppression process [30]. To verify this, we conduct some experiments. Firstly, we combine k i + k j with k i × k j by an adjustable parameter η ∈ [0, 1] to obtain a new criteria to measure edge importance which is provided as where υ M and σ M represent the average value and standard deviation of k i × k j in all edges, respectively; υ A and σ A stand for the average value and standard deviation of k i + k j in all edges, respectively; υ and σ are used to normalize k i + k j and k i × k j . As in Eq. 5, we find that the influence of the two aspects in the MA can be adjusted by changing η and we set η = {0, 0.2, 0.4, 0.6, 0.8, 1}. Finally, we repeatedly remove the edge with the largest MA for different η. Then, we obtain the fraction of infectious nodes I prop in steady state by simulating the SIS model after every 0.04m edges are removed in usAir97 network. The experimental results are shown in Figure 2. According to Figure 2, we find that when the fraction of removed edges is less than 0.45, the effect of spreading control becomes better when a smaller η is adopted. However, there are many nodes that are infected in all η. Then, when 0.65 > β > 0.45, large η leads to better effect of spreading control and small spread range of virus. In addition, the effect of spreading control for the scenario of η = 0 is much worse than that for η = 1 when 0.65 > β > 0.45. Note that, since the lines in Figure 2 are relatively dense, the error bars are not provided. Moreover, the maximum standard deviation of the data points Illustration of the adopted SIS model. Frontiers in Physics frontiersin.org 03 on each line in Figure 2 is in [0.03, 0.04]. From above analysis, compared with k i + k j , the adoption of k i × k j can achieve the purpose of reducing the epidemic on a large scale while removing fewer edges. Therefore, we adopt to represent the importance of edge, where C D (e st ) represents degree centrality of edge e st . In addition to the spreading ability of edges, we also consider the effect of edge removal on network connectivity when studying the edge sorting algorithm. Generally speaking, C D (e st ) is large given a large number of neighbors of s and t. Thus, when we remove the edge e st with large C D , their high-order neighbors can likely be connected because nodes s and t are of large degrees. As in Figure 3A, the edge e 58 has the largest C D . When we remove it from the network, nodes 5 and 8 are still connected via other paths, such as {5, 6, 8}. Then, we continuously remove the edge that has the largest C D in the remaining network. After removing four edges, the network is still connected as in Figure 3B. However, if the degrees of s and t are relatively small, when edge e st is removed, the probability of s and t being in two connected components respectively increases greatly, such as edge e 89 in Figure 3A. If we continuously remove the edge with the largest EBC, when we remove the most important edge (e 23 ), the network becomes an unconnected one. Aiming to further verify the performance of network connectivity under different edge removing strategies, we conduct experiments on the Global airline network with the obtained results being provided in Figure 4. As in Figure 4, we utilize the number of connected components (C N ) to reflect the network connectivity (i.e., the better methods have fewer connected components when the same number of edges are removed) and compare the effects of adopting different methods (i.e., ADCM, ABC and Random). Here, ADCM and ABC repeatedly remove the edge with the highest C D and C B , respectively. Random randomly selects an edge to remove at each time. As in Figure 4, we find that when the proportion of removed edges is less than 0.7, ADCM is of limited effect on the network connectivity; however, ABC and Random have already decomposed the network into many connected components. Among these methods, ABC has the greatest impact on network connectivity. These results can further validate the effectiveness of C D in protecting network connectivity.
From another perspective, since infectious diseases always originate from one or a few people, the spreading range is usually limited in the connected components where the seed nodes are located in. Therefore, when some edges are removed, the method which dismantles the network into many connected components is effective in epidemic suppression, while C D is less effective in epidemic suppression (the corresponding experimental results will be provided in Section 4). To solve this problem, it is necessary to find a method that incorporates C D to increase the effect of epidemic suppression when fewer edges are removed. Furthermore, individual aggregation is a common

FIGURE 4
The effect of edge removing on network connectivity based on different methods. We show the number of connected components (C N ), as a function of the faction of removed edges (β) in the Global airline network. We compare three different approaches: ADCM, ABC and Random.
Frontiers in Physics frontiersin.org 04 phenomenon in practical networks. For example, in the aviation network, there are close connections between airports belonging to the same country, but fewer routes between airports in different countries; this results in the fact that modules are usually connected by few edges. According to [31], edges with higher C B are usually the links connecting different modules. Therefore, removing edges with high C B first can quickly decompose the network into small connected components. Figure 4 also shows that EBC-based ABC can quickly decompose the network into many connected components.
Based on the above discussions, we propose our method E BD through combining EBC (C B ) with C D and a new edge importance evaluation index C BD is defined as: where e is an element from the edge set E and α stands for an adjustable parameter (α ∈ [0, 1]). Since C D and C B belong to different orders of magnitude (e.g., C D (e 58 ) = 16 and C B (e 58 ) = 0.25 in Figure 3A), these two indicators should be normalized before fusion. In this paper, we adopt the max normalization (i.e., max(C D (e)) and max(C B (e))). Here, max(C D (e)) represents the largest C D among all the edges in the current network, max(C B (e)) is similar as max(C D (e)). Obviously, the influence of the two aspects in C BD can be changed by adjusting the parameter α.
In this article, we dynamically update C BD to derive the edge importance ranking sequence (i.e., S BD ). The updating processes are provided as: • Step 1: Initialization, we select a complex network G(V, E) and set an empty edge ranking sequence S BD = {}; • Step 2: Calculate C BD of each edge in the current network G according to Eq. 7; • Step 3: Remove the edge with the largest C BD from G (if there exist multiple edges with same C BD , we will randomly select one and then remove it); then, the removed edge is added to the end of sequence S BD ; • Step 4: If there still exist edges in G, we need to go back to Step 2; • Step 5: Determine the edge importance ranking sequence S BD .
Through the above steps, an edge ranking sequence sorted in declining order according to the edge importance index C BD will be obtained (S BD e 1 , e 2 , . . . , e m { } ). Removing edges according to this sequence can protect the network connectivity efficiently while suppressing epidemic simultaneously.

Results and analysis 4.1 Baseline models
To evaluate the performance of our proposed method, some baselines are adopted, including ADCM, Random, WKP, ABC and LI. For ADCM, we need to repeatedly remove the edge e ij with the largest k i × k j from the remaining network (here, k i indicates the degree of node i). As to Random, we randomly remove edges. For WKP, we need to repeatedly remove the edge with the largest C W from the remaining network. For ABC [32], we repeatedly remove the edge with the largest C B from the remaining network. As to LI, the edge with the largest I ij is repeatedly removed from the remaining network. Similarly, in our approach, we continuously remove the edge with the largest E BD from the remaining network.

Data description
In the following experiments, we mainly consider seven practical networks including usAir97 (UA) [33], Global airline (GA), Facebook combined (FC) [34], Ca netscience (CN) [33], Soc hamsterster (SH) [33], ca CondMat (CC) [33] and email EU (EE) [33]. The UA network is a small aviation network which is constructed based on the US aviation network in 1997 (nodes and edges stand for airports and routes, respectively). The GA network is obtained from OpenFlights (https://openflights.org), in which airports are denoted by nodes and airlines between airports are captured by the corresponding edges. As to  Frontiers in Physics frontiersin.org 05 FC, it consists of "circles" (or "friends lists") from Facebook, the data is collected from survey participants using this Facebook app (nodes stand for Facebook users, if two users are friends then we need to add an edge between them). CN and CC denote collaborative networks of researchers (nodes stand for researchers, if two researchers have collaboration on articles, then an edge need to be added between them). As to SH, it indicates a social network (nodes stand for people, if two people are friends, an edge exists between them). EE is an email network (nodes stand for email users, if two users are contacted by an email, then an edge exists between them). Table 1 illustrates the characteristics of the adopted networks, where n, m, < k > , c and r stand for the number of nodes, the number of edges, the average degree of all nodes, the average clustering coefficient, and the assortativity, respectively.
To evaluate the performance, two indicators are adopted (i.e., I prop and C N ). Firstly, we utilize the proportion of infectious nodes in stable state (I prop ) to stand for the effect of epidemic suppression. Here, the SIS model is considered; with the provided network topology, we can simulate the spreading phenomena accordingly. Certain number of iterations of the SIS model will result in a steady number of infectious and susceptible nodes in the network. Smaller I prop indicates the method is more effective in epidemic suppression. Secondly, as in Section 3, we also consider network connectivity after removing edges. Same as Figure 4, we consider the number of connected components in the network (C N ) to evaluate the performance of different methods.
A small C N indicates that the corresponding method is less harmful to network connectivity. Like epidemic suppression, network connectivity also possesses great practical significance. For example, the aviation network is decomposed into many connected components due to epidemic suppression, which leads to a large number of routes that have been eliminated must be added back to the aviation network if there is an emergency situation that necessitates traveling from one connected component to another. However, if there are few connected components in the network, then only a few routes need to be added. To sum up, when we remove the same number of edges, the smaller the values of I prop and C N , the better the performance of the method.

Performance Evaluation
To evaluate the performance of E BD , the corresponding results are compared with those obtained by ADCM, random, WKP, ABC Frontiers in Physics frontiersin.org 06 and LI. Firstly, we use the comparison methods and our method to obtain the edge importance sequence, respectively. Secondly, we remove edges according to these edge importance sequences respectively. We remove the edges at a certain portion (i.e., 4%) each time until all edges are removed and simulate 100 times using SIS model. Finally, we calculate the average I prop and standard deviation of 100 simulation results after each removing. Note that when we simulate the propagation process through the SIS model, we randomly select only one node as the seed node initially, and the number of iterative times is large enough for the system to converge. The means of calculating C N is provided as follows: we remove the top 0.04m edges until all edges are removed according to edge importance sequence and calculate C N per time.
Based on the above preparation, we first study the effect of different values of α on our method in the GA network, as the results are shown in Figure 5. According to Figure 5, given the same number of edges are removed, with the increase of α, I prop decreases continuously, in contrast to the constant increase of C N . In addition, the standard deviation remains relatively stable and does not exhibit significant change. Thus, these results confirm our assumption. With the increase α, the effect of betweenness centrality increases gradually leading to the network connectivity is more severely destroyed and enhancing epidemic suppression effectiveness when a few edges are removed. Controlling the spread of virus is relatively effective when α = 0.5, and network connectivity is not significantly affected. When α is less than 0.5, even though network connectivity protection is improved, epidemic suppression is ineffective; when α is greater than 0.5, the effect of epidemic suppression becomes better, but network connectivity cannot be protected. As we normalize C B and C D before the fusion process, the above phenomena can be explained as follows. Firstly, when α is greater than 0.5, EBC dominates E BD . Furthermore, since EBC selects the most central edge to remove, the network will be quickly decomposed into a number of small connected components which leads to better  performance in epidemic suppression at a cost of sacrificing connectivity protection. Secondly, when α is less than 0.5, ADCM dominates E BD . As ADCM selects the edge with the largest degree multiplication for removal, E BD with small α has an advantage of connectivity protection at a cost of sacrificing epidemic suppression when some edges are removed. Note that since increasing the epidemic threshold is the starting point of ADCM, when α decreases, fewer edges need to be removed to total epidemic control which means I prop → 0. Thus, we set α = 0.5 to achieve a trade-off between the two aspects.
In the following experiments, we set α = 0.5 in Eq. 7. In Figure 6, Figure 7 and Table 2, E BD achieves better performance at epidemic suppression and protection of network connectivity. According to Figure 6 and Table 2, we can carefully make the following conclusions. Firstly, the performance of E BD with α = 0.5 in epidemic suppression is better than that of LI, WKP, ADCM and Random, which proves the efficiency of our proposed method. Secondly, the error bar of E BD falls within the acceptable range (the maximum standard deviations of E BD in FC, GA, CN, SH, EE and CC are 0.012, 0.015, 0.011, 0.025, 0.022 and 0.027, respectively) which demonstrates the effectiveness of results. Finally, according to Figure 7, when α = 0.5 in E BD , it has a good effect in protecting network connectivity while the other methods with similar effects in protecting connectivity, are much worse than E BD at epidemic suppression. The authors of the LI method emphasize the advantages of connectivity protection at the cost of losing epidemic suppression. Our method has both the above advantages.
In addition to the practical networks, we also conduct experiments on generated networks (i.e., BA and ER networks). As shown in Figure 8, we find that the performance of our proposed method in epidemic suppression seems to be similar as three methods (i.e., ABC, ADCM and LI) which is much better than the others. In the ER network, since the properties of all edges are relatively comparable, the performance of these methods in epidemic suppression are similar.
In the BA network, we design a sequence similarity verification experiment to explore the potential power of the proposed method in epidemic suppression. Furthermore, we adopt Jaccard index [35] to reflect the correlations between edge ranking sequences derived by different methods. The specific calculation steps of Jaccard index are as follows. Firstly, the correlation between two sequences is calculated as:

FIGURE 8
Performance comparison of our method E BD (α =0.5) and the other methods in epidemic suppression. We show the spread range (I prop ), as a function of the faction of removed edge (β) in two generated networks (BA network and ER network). For BA network, the number of nodes equals to 5,000 and the corresponding average degree is 6; as to ER network, the number of nodes is 5,000 and the average degree is 12. For both networks, the recovery probability γ is set to 0.5; the infectious probability μ is assigned to 0.2 and 0.06 for BA and ER networks, respectively.

FIGURE 9
The correlation heat map for the four indices of edge importance over BA network. Here, B, D, L and E represent ABC, ADCM, LI and E BD , respectively.
Frontiers in Physics frontiersin.org J a ( ) where a indicates a parameter (a ∈ [0, 1]), S 1 (a) and S 2 (a) represent the sets formed by the first a × m elements of sequence S 1 and sequence S 2 , respectively. Then, we calculate the correlation for the whole sequence: where 100 indicates that we divide the sequence into 100 short sequences of the same length, incrementally calculate the correlation of the sequences, and eventually obtain the average. According to Eq. 8 and Eq. 9, we conduct some experiments. As shown in Figure 9, the Jaccard indexes between our method and ABC, ADCM, and LI in the BA network are all greater than 0.79, which reveals the edge sequence obtained by our method is similar to those obtained by the other methods. Therefore, the four methods can achieve good performance in epidemic suppression.

Conclusion
In summary, we propose a method to rank edges according to the degree of the connected nodes and betweenness centrality of the edge. From the perspectives of epidemic suppression and connectivity protection, we first consider using the product of the degrees of the two nodes to increase epidemic threshold and protect network connectivity. However, when some portion of edges are removed, it has bad effect on epidemic suppression. Here, we propose a new method (i.e., E BD ) combined with EBC to eliminate the bad effect. The experiments demonstrate that the proposed approach achieves better results compared with the other methods when some edges are removed. Furthermore, our method can not only protect the connectivity of networks, but also possess much better epidemic suppression effects. In the future, we will study and expand our method in other spread models (e.g., SIR and SEIR) or other types of networks (e.g., time-varying network and double-layer network) to increase the universality of our method.

Author contributions
GL wrote the original manuscript. XC and PZ proposed the idea, supervised the research work, and revised the manuscript. GL and PZ discussed and analyzed the results.