Network Reconstruction in Terms of the Priori Structure Information

Fu, Jia-Qi; Guo, Qiang; Yang, Kai; Liu, Jian-Guo

doi:10.3389/fphy.2021.732835

ORIGINAL RESEARCH article

Front. Phys., 11 August 2021

Sec. Social Physics

Volume 9 - 2021 | https://doi.org/10.3389/fphy.2021.732835

This article is part of the Research TopicNetwork Resilience and Robustness: Theory and ApplicationsView all 21 articles

Network Reconstruction in Terms of the Priori Structure Information

Jia-Qi Fu¹

Qiang Guo¹

Kai Yang²

Jian-Guo Liu^3,4*

¹Research Center of Complex Systems Science, University of Shanghai for Science and Technology, Shanghai, China
²College of Information Engineering, Yangzhou University, Yangzhou, China
³Institute of Accounting and Finance, Shanghai University of Finance and Economics, Shanghai, China
⁴Shanghai Engineering Research Center of Finance Intelligence, Shanghai University of Finance and Economics, Shanghai, China

In this paper, we investigate the reconstruction of networks based on priori structure information by the Element Elimination Method (EEM). We firstly generate four types of synthetic networks as small-world networks, random networks, regular networks and Apollonian networks. Then, we randomly delete a fraction of links in the original networks. Finally, we employ EEM, the resource allocation (RA) and the structural perturbation method (SPM) to reconstruct four types of synthetic networks with 90% priori structure information. The experimental results show that, comparing with RA and SPM, EEM has higher indices of reconstruction accuracy on four types of synthetic networks. We also compare the reconstruction performance of EEM with RA and SPM on four empirical networks. Higher reconstruction accuracy, measured by local indices of success rates, could be achieved by EEM, which are improved by 64.11 and 47.81%, respectively.

1 Introduction

Reconstructing a network based on priori structure information has attracted lots of attention for the network science [1]. Prior information about the connectivity patterns or potential interactions of the networks are accessible via public database [2, 3], high-throughput experiments [4], or data mining of interaction knowledge [5–7]. A wide diversity of methods based on priori structure information have been developed for the problem of network reconstruction [1, 8, 9]. Among various models, a few reconstruction models would provide a reliable estimate of a network’s structure with priori structure information. Link prediction is a typical method which uses accessible structure to estimate the likelihood of existence of unobserved links or identifies spurious links in a network [10, 11]. The unknown structure of a network is then reconstructed by link prediction. A few link prediction models are validated in both synthetic networks and empirical networks, which are local similarity indices [12–14], maximum likelihood methods [11, 15] and methods based on predictability [16, 17].

The other method uses accessible structure information to reconstruct a class of networks with evolutionary games [18, 19]. Such model, known as compressive sensing reconstruction model (CSR), is initially proposed to solve the problems of global network reconstruction [20–22]. The CSR method provides theoretical framework to dealing with networks purely from measured time-series information. To reconstruct a network with N nodes, the CSR method reconstructs the adjacent matrix column by column and each column is a vector with N elements [23, 24]. Contrary to the CSR method, the adjacent matrix is reconstructed by the Element Elimination Method (EEM) in a similar fashion, but the number of elements in different column might be N_i(N_i ≤ N, i = 1, 2, … , N) because EEM initially eliminates coupling nodes based on priori structure information. Exploiting the natural sparsity of the vectors, the pioneering work has applied EEM to achieve a successful reconstruction in scale-free networks with a small fraction of hubs [25]. However, in many cases, examples of real-world networks are not characterized by scale-free [26], i.e., the collaboration network of film actors [27, 28], the neural network of the worm Caenorhabditis elegans [26], the power grid of the western United States [29, 30], and drug trafficking network [31], et al. In addition, unique structure could be observed in world airline networks [32, 33] and Apollonian networks [34–36], which are characterized by scale-free and also satisfies basic features of small-world. EEM for reconstructing networks characterized by other features has not been fully explored. We are interested in, to achieve a successful reconstruction, the detailed amount of time-series information required for EEM in spite of the priori structure information. This motivates us to investigate the application of EEM to other networks characterized by different features.

In this paper, we investigate the reconstruction of general networks, which are characterized by four types of synthetic networks as small-world networks, random networks, regular networks and Apollonian networks. Typically, the reconstruction accuracy of EEM is evaluated on four types of networks. We will show the performance of EEM, characterized by low information requirements and high reconstruction accuracy. Experiments on four synthetic networks demonstrate that comparing with the resource allocation (RA) [12] and the structural perturbation method (SPM) [16], EEM can effectively enhance the reconstruction accuracy. Further, three local indices of success rates demonstrate that the reconstruction accuracy obtained by EEM when reconstructing three separately local structure in a network is close. In addition, experiments on four empirical networks demonstrate that EEM outperforms RA and SPM. Compared with RA and SPM, EEM has higher reconstruction accuracy, measured by local indices of success rates, which are improved by 64.11 and 47.81%, respectively.

2 Methods and Models

2.1 The Procedure of the Network Reconstruction

Uncovering a network’s structure has many potential applications so that we can assess the system’s resilience [37–39], understand the dynamical mechanisms [40], identify significant nodes in a network [41, 42], detect community structure [43], locate diffusion sources Hu et al. [44, 45], and analyze the networks’ properties [46–48]. In this paper, an Element Elimination Method (EEM) [25] is employed to reconstruct the structure of networks. We then give the illustration of the procedures of employing EEM to reconstruct synthetic networks: 1) Generate synthetic networks. 2) Extract time-series information from observed data. 3) Reconstruct the networks with EEM. Noting that the adjacent relationships between nodes in the network are sparse and would not change over time, we could explore the casual relationships between nodes’ time-series information. Consequently, we could uncover the unknown link set E^P of the networks by EEM based on priori link set E^T.

As illustrated in Figure 1, a procedure of network reconstruction is presented. Supposing the relationships between node 2 and other 5 nodes should be reconstructed, and only one adjacent relationship (a blue line in Figure 1A) is known. However, we are confused about which one is the original network from vastly different networks with possible connective relationships. Simultaneously, the network is evolving over the time, and a few time-series information of nodes’ strategies and payoffs could be obtained. We then build a model to bridge node 2’s strategies and its payoffs, as Figure 1B illustrated. Consequently, we can use EEM to reconstruct the network’s structure and obtain the adjacent relationships as shown in Figure 1C.

FIGURE 1

FIGURE 1. (Color Online) An illustration of reconstructing the hidden structure of a node based on priori structure information. (A) Original adjacent relationships of a node. For a node 2 in red with two neighbors, node 3 and node 6 in purple, we can observe a priori relationship, represented with a blue line, between node 2 and node 3. (B) EEM. We establish vector G₂ and matrix Φ₂ in the reconstruction form G₂ = Φ₂ ⋅A₂ from time-series information, where vector A₂ captures the adjacent relationships between node 2 and the other nodes. After subtracting time-series information determined by node 2 and priori neighbor, node 3, on the both sides of the equation, the unknown connections of node 2 can be reconstructed by optimizing the solution of the following equation $G_{2}^{'} = Φ_{2}^{'} \cdot A_{2}^{'}$ using EEM. (C) A reconstructed adjacent matrix. The unknown neighbors of node 2 could be uncovered by EEM. The adjacent matrix is presented, in which golden blocks represent reconstructed link.

2.2 Generation of Synthetic Network

In order to evaluate the reconstruction performance of EEM in small-world networks and networks characterized by other features, we generate four types of synthetic networks. Noting that small-world network is a model of network that can be tuned between random network and regular network [26], we also consider the networks when their connection topology is assumed to be completely regular or completely random. Besides, the performance on the Apollonian networks by EEM has seldom been evaluated. Then, we generate four types of synthetic networks which are small-world networks, random networks, regular networks and Apollonian networks. The precedent findings indicate that the assortative coefficient has a direct influence on the accuracy of network reconstruction [49]. Therefore, some statistical properties have to be tuned when the networks are generated.

Supposing a network is composed of N nodes and |E| links. To minimize the influence from different network structure, we fix a default mean assortative coefficient $⟨r⟩$ for three types of synthetic networks, excluding Apollonian networks. Given wiring rules between nodes, we could generate vastly different networks with the given number of nodes N. Initially, the generated synthetic networks should have sufficient links that the total number of links of the network should exceed the number of links |E|. Then we randomly delete some of the links so that the number of the residual links is equal to |E|. In this way, the generated synthetic networks would have N nodes and |E| links. We select one network from the synthetic network set whose mean assortative coefficient is close to the value of default $⟨r⟩$ (the absolute error is less than 10^–3). The other types of synthetic networks are generated by another wiring rules in a similar way. Actually, synthetic networks generated whose statistical properties are close to default value are limited. On the other hand, the generation procedure of the regular network and the Apollonian network results in merely one realization of the synthetic networks. In this paper, each synthetic network has performed only one realization for the experiments.

Due to privacy or confidentiality issues, the complete structure of a network is not accessible. In addition, it is an impossible mission for us to record nodes’ complete time-series information. In spite of the difficulties, some priori information about the adjacent relationships between a few nodes, and discrete records of nodes’ time-series information might be available. Despite the limited information, the connective relationships between nodes has a direct effect on the individual node, which contributes to node’s attitude or selection in the next time. The dependence from the network’s structure on nodes’ interactions provide information for us to utilize the time-series information of nodes to describe the adjacent relationships behind them [24, 50].

2.3 The Model of the Evolutionary Game

The main challenge lies in that the structure of the network is inaccessible, also in that merely limited nodes’ time-series information is available. Since the time-series information is closely related to the connective relationships between nodes, we can reconstruct the unknown structure from the limited time-series information.

We use an evolutionary game model, the Prisoner Dilemma Game (PDG) model, to describe the nodes’ dynamics [51–53]. In each round of the game, the nodes usually weigh the benefits against the risks and selects a strategy. Here, we use SY_i(t) to define the strategy of node i. We denote vector SY_i(t) = (1,0)^T to represent a cooperation strategy, while we denote SY_i(t) = (0,1)^T to represent a defection strategy. Here, T stands for ‘transpose’.

When node i and node j trigger a game, the payoff of node i is dependent on both two nodes’ strategies and a uniform payoff matrix P, which is defined as:

P = (\begin{matrix} 1 & 0 \\ b & 0 \end{matrix}) (1)

where b (1 < b < 2) is a parameter characterizing the volume of payoff when node i select a defection strategy. In the t round, node i would play with all its different neighbors with the same strategy. When node i encounters a neighbor j, node i would gain payoff from node j as:

F_{i j} (t) = {S Y}_{i}^{T} (t) \cdot P \cdot {S Y}_{j} (t) . (2)

In the same round, node i’s total payoffs G_i would be calculated, and it is the sum of the payoffs from all node i’s neighbors.

In a new round, node i would attempt to maximize its payoffs by updating its strategy. According to Fermi rule [54], node i randomly select a node j from its neighbors after t round. In t + 1 round, node i would then adopt node j’s strategy with the probability

W ({S Y}_{i} (t + 1) \leftarrow {S Y}_{j} (t)) = \frac{1}{1 + e x p [(T G_{i} (t) - T G_{j} (t)) / κ]}, (3)

where TG_i(t) is node i’s cumulative payoffs from 1 to t round. TG_j(t) is similarly defined. Parameter κ characterizes node’s rationality when it update strategies. Parameter κ = 0 corresponds to rational selection behavior of nodes.

Since game occurs among connected nodes, the information of the adjacent relationships between nodes are hidden in their dynamical records of strategies or payoffs in the game. Then we can utilize the information to uncover a networks’ structure when we collect the time-series information about the strategies and payoffs of nodes. When we reconstruct a certain network, the limited time-series information is usually presented in a random sample of sufficient time-series information.

2.4 Element Elimination Method

Given limited time-series information of nodes, an EEM could be applied to reconstruct a network based on priori structure information. EEM is a variant of the CSR method, which utilizes priori structure information to exclude the priori connective relationships before reconstruction. Suppose that the relationships between nodes in a certain network can be represented by an adjacency matrix A with dimensions N × N, where N is the number of nodes in the network. EEM decomposes the process of reconstructing the entire network into many subnetwork recovery problems, and the network structure, namely, the adjacency matrix A, is reconstructed column by column [55,56]. An adjacency vector A_i of a node is used to describe the adjacent relationships between node i (i = 1, 2, … , N) and the other N − 1 nodes in the network, which contains no loop. The adjacency vector $A_{i} = {(a_{i 1}, a_{i 2}, \dots, a_{i, i - 1}, a_{i, i + 1}, \dots, a_{i, N})}^{T}$ with element a_ij = 1 when node i and node j are connected, and a_ij = 0 otherwise. Suppose that N_i (N_i ≤ N − 1) nodes in the adjacency vector A_i have undetermined relationships with node i. EEM is employed to find out node i’s (i = 1, 2, … , N) direct neighbors from N_i possible nodes, namely a shorter adjacency vector $A_{i} = {(a_{i 1}^{'}, a_{i 2}^{'}, \dots, a_{i, N_{i}}^{'})}^{T}$ of node i (i = 1, 2, … , N).

The training set E^T sheds light on the priori neighbor set $Γ_{i}^{K}$ of node i, which contains (N − N_i − 1) nodes. Then we could calculate the sum of payoffs $G_{Γ_{i}^{K}}^{'}$ of node i obtained from the priori neighbors in neighbor set $Γ_{i}^{K}$ according to Eq. 2. Subtracting payoffs $G_{Γ_{i}^{K}}^{'}$ from G_i, we obtain payoffs $G_{i}^{'}$ of node i. The payoffs $G_{i}^{'}$ implies the hidden adjacent relationships between node i and N_i other nodes because node i gains payoffs merely from its neighbors.

Most real-world networks are characterized by natural sparsity and the adjacency vector A_i of node i is sparse, which refers to vector A_i has only a few nonzero elements (i.e. a_ij = 1). Noting that the value of each element in node i’s priori adjacency vector $A_{Γ_{i}^{K}}$ is 1, vector $A_{i}^{'}$ would still be sparse because the number of zero elements has not been changed but the number of nonzero elements has decreased when we remove the priori adjacency vector $A_{Γ_{i}^{K}}$ from vector A_i. The sparsity of $A_{i}^{'}$ makes EEM applicable. Initially, the nodes’ strategies and payoffs are recorded in discrete round t₁, t₂, … , t_M. Since new payoffs are obtained from the game between node i and N_i nodes, we can build a model as Eq. 4. The sparse vector $A_{i}^{'}$ then can be reconstructed by solving the following convex optimization problem [57, 58]:

\begin{aligned} m i n ‖ A_{i}^{'} ‖_{1} \\ s . t . G_{i}^{'} = Φ_{i}^{'} \cdot A_{i}^{'}, \end{aligned} (4)

where $‖ A_{i}^{'} ‖_{1} = \sum_{j = 1}^{N_{i}} | a_{i j}^{'} |$ is the L₁ norm of vector $A_{i}^{'}$ . The available dynamical payoffs of node i can be expressed by $G_{i}^{'} = {(G_{i}^{'} (t_{1}), G_{i}^{'} (t_{2}), \dots, G_{i}^{'} (t_{M}))}^{T}$ . The payoffs of node i obtained from the corresponding nodes in limited rounds can be expressed by an M × N_i sensing matrix $Φ_{i}^{'}$ (M ≪ N_i). In particular, we write $Φ_{i}^{'} =$

(\begin{matrix} F_{i 1} (t_{1}) & F_{i, 2} (t_{1}) & \dots & F_{i, N_{i}} (t_{1}) \\ F_{i 1} (t_{2}) & F_{i, 2} (t_{2}) & \dots & F_{i, N_{i}} (t_{2}) \\ ⋮ & ⋮ & \dots & ⋮ \\ F_{i 1} (t_{M}) & F_{i, 2} (t_{M}) & \dots & F_{i, N_{i}} (t_{M}) \end{matrix}) .

The elements in matrix $Φ_{i}^{'}$ could be calculated using the formula shown in Eq. 2. According to Eq. 4, we could obtain adjacency vector $A_{i}^{'} = {(a_{i 1}^{'}, a_{i 2}^{'}, \dots, a_{N_{i}}^{'})}^{T}$ by solving the convex optimization problem. We could obtain the complete adjacency vector $A_{i} = {(a_{i 1}, a_{i 2}, \dots, a_{i N})}^{T}$ by combining the reconstructed vector $A_{i}^{'}$ and the priori neighbor set $A_{Γ_{i}^{K}}$ of node i. In a similar fashion, the neighbor-connection vectors of all the other nodes can be obtained, yielding the network’s adjacency matrix A = (A₁, A₂, … , A_N).

3 Experimental Results

3.1 Datasets

In order to understanding the performance of EEM in reconstructing the synthetic networks, the experiments are conducted in four types of networks. The basic statistical properties of the synthetic networks are presented in Table 1. N and |E| are the number of nodes and links. $⟨k⟩$ is the mean degree, $⟨r⟩$ is the mean assortative coefficient, $⟨C⟩$ is the mean clustering coefficient, and $⟨D⟩$ is the mean shortest distance. Here, we use abbreviation WS, RM, RG and AP to represent small-world networks, random networks, regular networks and Apollonian networks, respectively.

TABLE 1

TABLE 1. The statistical properties of four synthetic networks.

We assume that the strategies and payoffs of each node in a certain round t is one piece of time-series information. In the experiments, we use M pieces of accessible time-series information obtained from discrete round t₁ to round t_M to reconstruct different networks. In this paper, we set N, namely the number of nodes in the network, as the maximum value of M. Then we use an index of information sufficiency η(η ≡ M/N) to represent the size of the time-series information used in the network reconstruction. Intuitively, the time-series information is sufficient when the pieces of the accessible time-series information M = N, while the time-series information is insufficient when 0 < M < N. Correspondingly, the accessible time-series information is sufficient when the index of information sufficiency η = 1 and the accessible time-series information is insufficient when 0 < η < 1. The reconstruction models are also applied to reconstruct networks with different priori information of the structure, measured by a probability P_s(0 ≤ P_s ≤ 1).

In addition, the performance of EEM is also evaluated in reconstructing the empirical networks. Table 2 shows the basic statistical properties of all four networks. These networks are chosen because they are characterized by large clustering coefficient and short distance.

TABLE 2

TABLE 2. The statistical properties of four empirical networks.

3.2 Metrics

To test the EEM’s accuracy, the original existent link set, E, are randomly divided into two parts: the priori set E^T, and the probe set E^P. Clearly, E = E^T ∪E^P and E^T ∩E^P = ∅. In this paper, the priori set always contains P_s of links, and the remaining 1 − P_s of links constitute the probe set. We apply four standard indices to quantify the reconstruction accuracy: the success rates of existent links SR, the success rates of nonexistent links SN [24], precision PRE [61, 62] and the area under the receiver operating characteristic curve AUC [63] are applied. In addition, we apply local indices of success rates in the experiments.

Both the success rates of existent links SR and the success rates of nonexistent links SN estimate the similarity of the reconstructed networks and the original networks. The success rates of existent links SR denotes the ratio of the number of links reconstructed by the reconstruction models to the number of real existent links in the network. The success rates of nonexistent links SN denotes the ratio of the number of nonexistent links distinguished by the reconstruction models to the number of real nonexistent links in the network. We obtain

S R = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{i o} \cap Γ_{i r} |}{| Γ_{i o} |} (5)

S N = \frac{1}{N} \sum_{i = 1}^{N} \frac{| \bar{Γ_{i o}} \cap \bar{Γ_{i r}} |}{| \bar{Γ_{i o}} |} (6)

where Γ_io and Γ_ir denote real neighbor set of node i and neighbor set of node i reconstructed by the reconstruction models, respectively. |⋅| denotes the number of elements in a set ⋅. $\bar{Γ_{i o}}$ and $\bar{Γ_{i r}}$ are the supplementary set of set Γ_io and Γ_ir. Each node in set $\bar{Γ_{i o}}$ is not adjacent to node i. Correspondingly, each node in reconstructed set $\bar{Γ_{i r}}$ is not adjacent to node i. A successful reconstruction is achieved when the success rates of existent links SR (0 ≤ SR ≤ 1) and the success rates of nonexistent links SN(0 ≤ SN ≤ 1) are close to the value of 1.

Precision PRE is defined as the ratio of existent links reconstructed by models to the number of the whole unknown existent links. In our case, to calculate precision we need to rank all the unknown links in decreasing order according to existent possibilities computed by reconstruction models. Then we focus on the top-L (here L = |E^P|) links. If there are H links successfully reconstructed, then

P R E = \frac{H}{L} (7)

The area under the receiver operating characteristic curve AUC evaluates the reconstruction models’ performance according to the whole unknown link list. Provided the existent possibility of all unknown links, AUC can be interpreted as the probability that a randomly chosen unknown existent link is given a higher existent possibility than a randomly chosen nonexistent link. In the implementation, the value of AUC is calculated with a function perfcurve by Matlab.

Clearly, a higher value of the success rates of existent links SR, the success rates of nonexistent links SN, precision PRE or the area under the receiver operating characteristic curve AUC means a higher reconstruction accuracy. We conduct 50 times independent simulation for averaging the indices of reconstruction accuracy as the mean success rates of existent links $⟨S R⟩$ , the mean success rates of nonexistent links $⟨S N⟩$ , the mean precision $⟨P R E⟩$ and the mean area under the receiver operating characteristic curve $⟨A U C⟩$ .

To understand the reconstruction performance of EEM when reconstructing local structure of the network divide the structure of each type of network into separately local structure. Supposing that the roles of nodes in the network are leaders, brokers and peripheral executors. We denote leaders are nodes with small degrees and the number of leaders in each type of network is 6. In addition, the subnetwork composed of leaders is a connected subgraph. Then brokers are nodes which are connected with leaders, and the residual nodes are peripheral executors. The sets of leaders, brokers and peripheral executors are not overlapped. We use letters L, B and P to represent the adjacent relationships between leaders, the adjacent relationships between leaders and brokers, and the adjacent relationships among peripheral executors and brokers, respectively. Then, we could obtain the success rates of existent links of each local structure normalized by the number of real existent links |Γ_io| of the network.

S R_{L r} = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{L i o} \cap Γ_{L i r} |}{| Γ_{i o} |} (8)

S R_{B r} = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{B i o} \cap Γ_{B i r} |}{| Γ_{i o} |} (9)

S R_{P r} = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{P i o} \cap Γ_{P i r} |}{| Γ_{i o} |} (10)

The sum of three local success rates of existent links is equal the global success rates of existent links.

S R = S R_{L r} + S R_{B r} + S R_{P r} (11)

Correspondingly, the maximum of three local success rates of existent links would be

S R_{L o} = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{L i o} |}{| Γ_{i o} |} (12)

S R_{B o} = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{B i o} |}{| Γ_{i o} |} (13)

S R_{P o} = \frac{1}{N} \sum_{i = 1}^{N} \frac{| Γ_{P i o} |}{| Γ_{i o} |} (14)

when the original network is successfully reconstructed. To quantify the success rates of three different local structure, we define local indices of success rates as follows:

A P P_{S R L} = \frac{S R_{L r}}{S R_{L o}} (15)

A P P_{S R B} = \frac{S R_{B r}}{S R_{B o}} (16)

A P P_{S R P} = \frac{S R_{P r}}{S R_{P o}} (17)

Similarly, a higher value of local index of success rates APP_SRL, APP_SRB, or APP_SRP means a higher reconstruction accuracy. We conduct 50 times independent simulation for averaging the indices of success rates $⟨A P P_{S R L}⟩$ , $⟨A P P_{S R B}⟩$ and $⟨A P P_{S R P}⟩$ .

3.3 Experimental Results on Synthetic and Empirical Networks

In order to understand the performance of EEM, four types of synthetic networks hosting a PDG dynamical process are considered in our paper. Figure 2A depicts the index of reconstruction accuracy for a synthetic small-world network, measured by the mean success rates of existent links $⟨S R⟩$ , based on 90% priori structure information. The mean success rates of existent links $⟨S R⟩$ increases monotonously when the index of information sufficiency η is varying from 0.1 to 0.4. Especially the mean reconstruction accuracy $⟨S R⟩$ reaches the maximum value of 1 when the index of information sufficiency η = 0.4. The increment rate of the mean reconstruction accuracy $⟨S R⟩$ is 9.97%. Then the mean reconstruction accuracy $⟨S R⟩$ keeps the value of 1 when the index of information sufficiency η is larger than 0.4. As shown in Figures 2B–H, the mean reconstruction accuracy $⟨S R⟩$ increases monotonously when the index of information sufficiency η is less than 0.4. In addition, the mean reconstruction accuracy $⟨S R⟩$ reaches 1 for the different types of synthetic networks when the index of information sufficiency η exceeds 0.4.

FIGURE 2

FIGURE 2. (Color Online) The mean success rates of existent links $⟨S R⟩$ of reconstructing four types of networks: (A) small-world network with 120 nodes, (B) random network with 120 nodes, (C) regular network with 120 nodes, (D) Apollonian network with 124 nodes, (E) small-world network with 250 nodes, (F) random network with 250 nodes, (G) regular network with 250 nodes, (H) Apollonian network with 367 nodes, hosting a PDG dynamical process. The lines with circle, triangle and inverted triangle symbols are the mean success rates of existent links $⟨S R⟩$ obtained by RA, SPM and EEM based on 90% priori structure information. The mean reconstruction accuracy indices are achieved by averaging over 50 independent experimental results. For each experiment, measurements are randomly picked from a time series of temporary evolution. The index of information sufficiency rate η indicates the amount of the available time-series information used in the reconstruction. The payoff parameter for the PDG is b = 1.2.

Moreover, we compare the experimental results between EEM and two link prediction models which are the resource allocation (RA) and the structural perturbation method (SPM). Figures 2A–H show that when the index of information sufficiency η is low (i.e., η = 0.1), the mean success rates of existent links $⟨S R⟩$ obtained by EEM on small-world networks, random networks, regular networks and Apollonian networks reaches 0.9093, 0.9085, 0.9021, 0.9361, 0.9823, 0.9897, 0.9402 and 0.9982, respectively. Compared with RA and SPM, EEM’s mean success rates of existent links $⟨S R⟩$ are higher, which is improved by at least 8.07 and 12.22% on the networks with 120–124 nodes, respectively. Compared with RA and SPM, EEM’s mean success rates of existent links $⟨S R⟩$ are higher, which is improved by at least 17.53 and 22.81% on the networks with 250–367 nodes, respectively. The experimental results of Figure 2 indicate that EEM has a well tradeoff that provides high quality reconstruction accuracy while requiring less time-series information.

Intuitively, a network’s structure would be accurately reconstructed when more priori information about the structure of the network are presented. Figure 3 shows the dependence of the values of $⟨S R⟩$ on probability P_s, the priori information of the structure, where we see that, in the cases of lower index of information sufficiency η (η ≤ 0.4), $⟨S R⟩$ increases monotonously when the probability P_s increases. On the other hand, the mean success rates of existent links $⟨S R⟩$ approaches the maximum value of 1 when the index of information sufficiency η is larger than 0.4. In terms of the probability P_s, the highest performance is achieved for the highest P_s. The intuitive reason for the relatively superior performance with the four synthetic networks lies in the sufficiency of the available information of the networks’ structure.

FIGURE 3

FIGURE 3. (Color Online) The mean success rates of existent links $⟨S R⟩$ of reconstructing four types of networks: (A) small-world network with 120 nodes, (B) random network with 120 nodes, (C) regular network with 120 nodes, (D) Apollonian network with 124 nodes, hosting a PDG dynamical process. The lines with different symbols are the mean success rates of existent links $⟨S R⟩$ obtained by EEM when the index of information sufficiency rate η catches different values. The mean reconstruction accuracy indices are achieved by averaging over 50 independent experimental results. For each experiment, measurements are randomly picked from a time series of temporary evolution. The priori information of the structure, measured by a probability P_s, indicates the amount of available priori information of the structure used in the reconstruction. The payoff parameter for the PDG is b = 1.2.

In the following, we verify the performance of EEM in local structure of the networks. We divide the structure of each type of network into three separately local structure with subscript L, B, P for them. Figure 4A depicts reconstruction success rate of a small-world network, measured by the mean local index of success rates $⟨A P P_{S R L}⟩$ , $⟨A P P_{S R B}⟩$ , and $⟨A P P_{S R P}⟩$ , based on 90% priori structure information.

FIGURE 4

FIGURE 4. (Color Online) The mean local indices of success rates $⟨A P P_{S R L}⟩$ , $⟨A P P_{S R B}⟩$ , and $⟨A P P_{S R P}⟩$ of reconstructing four types of networks: (A) small-world network with 120 nodes, (B) random network with 120 nodes, (C) regular network with 120 nodes and (D) Apollonian network with 124 nodes, hosting a PDG dynamical process. The lines with circle, triangle and inverted triangle symbols are the mean local indices of success rates $⟨A P P_{S R L}⟩$ , $⟨A P P_{S R B}⟩$ , and $⟨A P P_{S R P}⟩$ obtained by RA, SPM and EEM based on 90% priori structure information. The mean reconstruction accuracy indices are achieved by averaging over 50 independent experimental results. For each experiment, measurements are randomly picked from a time series of temporary evolution. The index of information sufficiency rate η indicates the amount of the available time-series information used in the reconstruction. The payoff parameter for the PDG is b = 1.2.

As illustrated in the main graph in Figure 4A, the mean local index of success rates $⟨A P P_{S R L}⟩$ obtained by EEM is higher than RA or SPM. Especially the mean local index of success rates $⟨A P P_{S R L}⟩$ obtained by EEM reaches 96.41% when the index of information sufficiency η = 0.1, while the mean local index of success rates $⟨A P P_{S R L}⟩$ obtained by RA and SPM are both 88.95%. The mean local index of success rates $⟨A P P_{S R B}⟩$ and $⟨A P P_{S R P}⟩$ obtained by EEM are 93.95 and 86.68% when the index of information sufficiency η = 0.1, as shown in the subgraph (α)-(β) in Figure 4A. Correspondingly, the mean local index of success rates $⟨A P P_{S R B}⟩$ and $⟨A P P_{S R P}⟩$ obtained by RA are 62.67 and 79.51%, $⟨A P P_{S R B}⟩$ and $⟨A P P_{S R P}⟩$ obtained by SPM are 62.67 and 73.06%. The similar experimental results could also be found in the cases of random network, regular network and Apollonian network in Figures 4B–D, which indicate that EEM can achieve higher reconstruction accuracy with low time-series information than RA or SPM.

The underlying reason that EEM could obtain higher reconstruction accuracy than RA or SPM might be twofold. Firstly, EEM is applicable to reconstruct networks with sparse connective relationships because Wang et al. developed a paradigm [19, 24, 25] to address the network reconstruction problems and Candès et al. provided the theoretical framework for this paradigm [57, 58]. Both EEM and two link prediction models utilize the identical priori structure information of the network to obtain direct information of the unknown structure. In addition, EEM bridges the relationships between the nodes’ payoffs and strategies by virtue of time-series information because the payoffs can merely be obtained from each node’s neighbors. Then EEM could extract indirect information of the unknown structure from the above relationships which strengthens the reliability of the experimental results. RA and SPM could also extract valuable indirect information of the unknown structure, but the valuable information still originates from the priori structure information of the network due to lack of a universal theoretical framework.

Secondly, both the reconstruction accuracy of the local structure and the reconstruction accuracy of the global structure obtained by EEM highly consist. As illustrated in Figure 4, the absolute error between three mean local index of success rates $⟨A P P_{S R L}⟩$ , $⟨A P P_{S R B}⟩$ and $⟨A P P_{S R P}⟩$ obtained by EEM on each network is less than 0.1, which indicates that the reconstruction accuracy on three separate local structure obtained by EEM is almost the same. Consequently, the global reconstruction accuracy and the local reconstruction accuracy highly consist because the global reconstruction accuracy is the linear combination of three mean local index of success rates as: $⟨S R⟩ = S R_{L o} \cdot ⟨A P P_{S R L}⟩ + S R_{B o} \cdot ⟨A P P_{S R B}⟩ + S R_{P o} \cdot ⟨A P P_{S R P}⟩$ , where SR_Lo, SR_Bo and SR_Po are constant for each network. The high reconstruction accuracy of three separately local structure contribute to a high reconstruction accuracy of the global structure. We also observe that the reconstruction accuracy on three separate local structure obtained by RA or SPM fluctuates. Especially in the reconstruction experiments on synthetic random networks, the maximum absolute error between three mean local index of success rates obtained by RA or SPM reaches 0.3837. The experimental results indicate that the reconstruction accuracy obtained by RA and SPM is largely dependent on the priori structure information of the network. The reconstruction accuracy of RA or SPM would be high when the local priori structure is consistent with the global structure, and the reconstruction accuracy would be low otherwise.

Finally, we test the results for four empirical networks. As shown in Table 3, we reconstruct the network structure by EEM, RA and SPM with 90% priori structure information. The empirical results indicate that four indices of reconstruction accuracy obtained by EEM are higher than RA and SPM for four empirical networks when the index of information sufficiency rate η = 0.1. Four indices of reconstruction accuracy obtained by EEM are higher than RA and SPM. Compared with RA, EEM’s reconstruction accuracy, measured by the mean success rates of existent links $⟨S R⟩$ , which are improved by 355.54, 456.38, 96.37 and 64.11%, corresponding to FWMW, FWFW, Jazz musicians, Neural network of C. elegans. Compared with SPM, EEM’s reconstruction accuracy, measured by the mean success rates of existent links $⟨S R⟩$ , which are improved by 355.54, 154.07, 47.81 and 69.38%,corresponding to FWMW, FWFW, Jazz musicians, Neural network of C. elegans. Empirical results indicate that the empirical networks reconstructed by EEM are closer to the original networks than those reconstructed by RA and SPM.

TABLE 3

TABLE 3. The value of four indices of reconstruction accuracy for four empirical networks.

3.4 Conclusion

In summary, we have investigated the performance of EEM for reconstructing synthetic networks, which are characterized by four types of networks as small-world networks, random networks, regular networks and Apollonian networks, based on priori structure information. The mean success rates of existent links $⟨S R⟩$ obtained by EEM could achieve at least 0.9021 when the index of information sufficiency η is 0.1. Compared with RA and SPM, EEM has higher mean success rates of existent links $⟨S R⟩$ , which is improved by 8.07 and 12.22% on the networks with 120–124 nodes, respectively. Compared with RA and SPM, EEM has higher mean success rates of existent links $⟨S R⟩$ , which is improved by 17.53 and 22.81% on the networks with 250–367 nodes, respectively. The experimental results also indicate that separately local structure in each type of network could be accurately reconstructed by EEM. In addition, EEM’s reconstruction accuracy is also evaluated on four empirical networks. Compared with RA and SPM, EEM has higher mean success rates of existent links $⟨S R⟩$ , which is improved by 64.11 and 47.81%, respectively. The reason that EEM obtain higher reconstruction accuracy than RA or SPM might lie in that EEM could utilize time-series information to strengthen the reliability of the experimental results and EEM’s capability to reconstruct the local structure and the global structure highly consist. The evaluation of EEM on both synthetic networks and empirical networks suggest that EEM is applicable for networks with sparsely connective relationships and it has high reconstruction accuracy by low information requirements.

Although the efficiency of EEM has been measured in reconstructing network’s structure with both synthetic networks and empirical networks, there are still a lot of questions to be considered further. For example, the results show that EEM can give remarkably higher reconstruction accuracy on a network hosing a PDG dynamical process, but the performance of EEM has not been validated under another dynamical process. Although EEM could also be extended to cases with large-scale network, the computing time might increase exponentially. In addition, EEM’s capability to identify spurious links has not been explored. Noting that EEM can well capture the adjacent relationships from limited information and thus give more accurate reconstruction, such features make EEM appealing to reconstructing general networks with extremely low data requirement. Despite underlying challenges, we will make attempt to continue our research referring to the problems of network reconstruction.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

J-QF provided this topic and wrote the paper. QG, KY and J-GL guided, discussed and modified the manuscript. All authors contributed to manuscript and approved the submission version.

Funding

This work is supported by the National Natural Science Foundation of China (Grant Nos. 71771152 and 61773248), the National Social Science Fund of China (No.16BJY158), the Major Program of National Fund of Philosophy and Social Science of China (Nos. 20ZDA060 and 18ZDA088), and the Scientific Research Project of Shanghai Science and Technology Committee (Grant No. 19511102202).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors acknowledge the valuable discussion with Huan-Mei Qin, Guang Liang, Ren-De Li, Hong-Yi Ding, Shao-Yong Han.

References

1. Liao JC, Boscolo R, Yang Y-L, Tran LM, Sabatti C, Roychowdhury VP. Network Component Analysis: Reconstruction of Regulatory Signals in Biological Systems. Proc Natl Acad Sci (2003) 100:15522–7. doi:10.1073/pnas.2136632100

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Matys V, Fricke E, Geffers R, Göβling E, Haubrock M, Hehl R. TRANSFAC(R): Transcriptional Regulation, from Patterns to Profiles. Nucleic Acids Res (2003) 31:374–8. doi:10.1093/nar/gkg108

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT. Ecocyc: A Comprehensive Database Resource for Escherichia Coli. Nucleic Acids Res (2004) 33:D334–D337. doi:10.1093/nar/gki108

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK. Transcriptional Regulatory Networks in Saccharomyces Cerevisiae. Science (2002) 298:799–804. doi:10.1126/science.1075090

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Dong GG, Wang F, Shekhtmane LM, Danziger MM, Fan JF, Du RJ. Optimal Resilience of Modular Interacting Networks. Proc Natl Acad Sci USA. (2021) 118:e1922831118. doi:10.1073/pnas.1922831118

6. Bussemaker HJ, Li H, Siggia ED. Building a Dictionary for Genomes: Identification of Presumptive Regulatory Sites by Statistical Analysis. Proc Natl Acad Sci (2000) 97:10096–100. doi:10.1073/pnas.180265397

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Bussemaker HJ, Li H, Siggia ED. Regulatory Element Detection Using Correlation With Expression. Nat Genet (2001) 27:167–71. doi:10.1038/84792

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Chang C, Ding Z, Hung YS, Fung PCW. Fast Network Component Analysis (Fastnca) for Gene Regulatory Network Reconstruction From Microarray Data. Bioinformatics (2008) 24:1349–58. doi:10.1093/bioinformatics/btn131

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Cugueró-Escofet MÀ, Quevedo J, Alippi C, Roveri M, Puig V, García D. Model- vs. Data-Based Approaches Applied to Fault Diagnosis in Potable Water Supply Networks. J Hydroinformatics (2016) 18:831–50. doi:10.2166/hydro.2016.218