Network Reconstruction in Terms of the Priori Structure Information

In this paper, we investigate the reconstruction of networks based on priori structure information by the Element Elimination Method (EEM). We firstly generate four types of synthetic networks as small-world networks, random networks, regular networks and Apollonian networks. Then, we randomly delete a fraction of links in the original networks. Finally, we employ EEM, the resource allocation (RA) and the structural perturbation method (SPM) to reconstruct four types of synthetic networks with 90% priori structure information. The experimental results show that, comparing with RA and SPM, EEM has higher indices of reconstruction accuracy on four types of synthetic networks. We also compare the reconstruction performance of EEM with RA and SPM on four empirical networks. Higher reconstruction accuracy, measured by local indices of success rates, could be achieved by EEM, which are improved by 64.11 and 47.81%, respectively.


INTRODUCTION
Reconstructing a network based on priori structure information has attracted lots of attention for the network science [1]. Prior information about the connectivity patterns or potential interactions of the networks are accessible via public database [2,3], high-throughput experiments [4], or data mining of interaction knowledge [5][6][7]. A wide diversity of methods based on priori structure information have been developed for the problem of network reconstruction [1,8,9]. Among various models, a few reconstruction models would provide a reliable estimate of a network's structure with priori structure information. Link prediction is a typical method which uses accessible structure to estimate the likelihood of existence of unobserved links or identifies spurious links in a network [10,11]. The unknown structure of a network is then reconstructed by link prediction. A few link prediction models are validated in both synthetic networks and empirical networks, which are local similarity indices [12][13][14], maximum likelihood methods [11,15] and methods based on predictability [16,17].
The other method uses accessible structure information to reconstruct a class of networks with evolutionary games [18,19]. Such model, known as compressive sensing reconstruction model (CSR), is initially proposed to solve the problems of global network reconstruction [20][21][22]. The CSR method provides theoretical framework to dealing with networks purely from measured time-series information. To reconstruct a network with N nodes, the CSR method reconstructs the adjacent matrix column by column and each column is a vector with N elements [23,24]. Contrary to the CSR method, the adjacent matrix is reconstructed by the Element Elimination Method (EEM) in a similar fashion, but the number of elements in different column might be N i (N i ≤ N, i 1, 2, . . . , N) because EEM initially eliminates coupling nodes based on priori structure information. Exploiting the natural Uncovering a network's structure has many potential applications so that we can assess the system's resilience [37][38][39], understand the dynamical mechanisms [40], identify significant nodes in a network [41,42], detect community structure [43], locate diffusion sources Hu et al. [44,45], and analyze the networks' properties [46][47][48]. In this paper, an Element Elimination Method (EEM) [25] is employed to reconstruct the structure of networks. We then give the illustration of the procedures of employing EEM to reconstruct synthetic networks: 1) Generate synthetic networks. 2) Extract time-series information from observed data. 3) Reconstruct the networks with EEM. Noting that the adjacent relationships between nodes in the network are sparse and would not change over time, we could explore the casual relationships between nodes' time-series information. Consequently, we could uncover the unknown link set E P of the networks by EEM based on priori link set E T .
As illustrated in Figure 1, a procedure of network reconstruction is presented. Supposing the relationships between node 2 and other 5 nodes should be reconstructed, and only one adjacent relationship (a blue line in Figure 1A) is known. However, we are confused about which one is the original network from vastly different networks with possible connective relationships. Simultaneously, the network is evolving over the time, and a few time-series information of nodes' strategies and payoffs could be obtained. We then build a model to bridge node 2's strategies and its payoffs, as Figure 1B illustrated. Consequently, we can use EEM to reconstruct the network's structure and obtain the adjacent relationships as shown in Figure 1C.

Generation of Synthetic Network
In order to evaluate the reconstruction performance of EEM in small-world networks and networks characterized by other features, we generate four types of synthetic networks. Noting that small-world network is a model of network that can be tuned between random network and regular network [26], we also consider the networks when their connection topology is assumed to be completely regular or completely random. Besides, the performance on the Apollonian networks by EEM has seldom been evaluated. Then, we generate four types of synthetic networks which are small-world networks, random networks, regular networks and Apollonian networks. The precedent findings indicate that the assortative coefficient has a direct influence on the accuracy of network reconstruction [49]. Therefore, some statistical properties have to be tuned when the networks are generated.
Supposing a network is composed of N nodes and |E| links. To minimize the influence from different network structure, we fix a default mean assortative coefficient 〈r〉 for three types of synthetic networks, excluding Apollonian networks. Given wiring rules between nodes, we could generate vastly different networks with the given number of nodes N. Initially, the generated synthetic networks should have sufficient links that the total number of links of the network should exceed the number of links |E|. Then we randomly delete some of the links so that the number of the residual links is equal to |E|. In this way, the generated synthetic networks would have N nodes and |E| links. We select one network from the synthetic network set whose mean assortative coefficient is close to the value of default 〈r〉 (the absolute error is less than 10 -3 ). The other types of synthetic networks are generated by another wiring rules in a similar way. Actually, synthetic networks generated whose statistical properties are close to default value are limited. On the other hand, the generation procedure of the regular network and the Apollonian network results in merely one realization of the synthetic networks. In this paper, each synthetic network has performed only one realization for the experiments.
Due to privacy or confidentiality issues, the complete structure of a network is not accessible. In addition, it is an impossible mission for us to record nodes' complete time-series information. In spite of the difficulties, some priori information about the adjacent relationships between a few nodes, and discrete records of nodes' time-series information might be available. Despite the limited information, the connective relationships between nodes has a direct effect on the individual node, which contributes to node's attitude or selection in the next time. The dependence from the network's structure on nodes' interactions provide information for us to utilize the time-series information of nodes to describe the adjacent relationships behind them [24,50].

The Model of the Evolutionary Game
The main challenge lies in that the structure of the network is inaccessible, also in that merely limited nodes' time-series information is available. Since the time-series information is closely related to the connective relationships between nodes, we can reconstruct the unknown structure from the limited timeseries information.
We use an evolutionary game model, the Prisoner Dilemma Game (PDG) model, to describe the nodes' dynamics [51][52][53]. In each round of the game, the nodes usually weigh the benefits against the risks and selects a strategy. Here, we use SY i (t) to define the strategy of node i. We denote vector SY i (t) (1,0) T to represent a cooperation strategy, while we denote SY i (t) (0,1) T to represent a defection strategy. Here, T stands for 'transpose'.
When node i and node j trigger a game, the payoff of node i is dependent on both two nodes' strategies and a uniform payoff matrix P, which is defined as: is a parameter characterizing the volume of payoff when node i select a defection strategy. In the t round, node i would play with all its different neighbors with the same strategy. When node i encounters a neighbor j, node i would gain payoff from node j as: In the same round, node i's total payoffs G i would be calculated, and it is the sum of the payoffs from all node i's neighbors.
In a new round, node i would attempt to maximize its payoffs by updating its strategy. According to Fermi rule [54], node i randomly select a node j from its neighbors after t round. In t + 1 round, node i would then adopt node j's strategy with the probability where TG i (t) is node i's cumulative payoffs from 1 to t round. TG j (t) is similarly defined. Parameter κ characterizes node's rationality when it update strategies. Parameter κ 0 corresponds to rational selection behavior of nodes.
Since game occurs among connected nodes, the information of the adjacent relationships between nodes are hidden in their dynamical records of strategies or payoffs in the game. Then we can utilize the information to uncover a networks' structure when we collect the time-series information about the strategies and payoffs of nodes. When we reconstruct a certain network, the limited time-series information is usually presented in a random sample of sufficient time-series information.

Element Elimination Method
Given limited time-series information of nodes, an EEM could be applied to reconstruct a network based on priori structure information. EEM is a variant of the CSR method, which utilizes priori structure information to exclude the priori connective relationships before reconstruction. Suppose that the relationships between nodes in a certain network can be represented by an adjacency matrix A with dimensions N × N, where N is the number of nodes in the network. EEM decomposes the process of reconstructing the entire network into many subnetwork recovery FIGURE 1 | (Color Online) An illustration of reconstructing the hidden structure of a node based on priori structure information. (A) Original adjacent relationships of a node. For a node 2 in red with two neighbors, node 3 and node 6 in purple, we can observe a priori relationship, represented with a blue line, between node 2 and node 3. (B) EEM. We establish vector G 2 and matrix Φ 2 in the reconstruction form G 2 Φ 2 ·A 2 from time-series information, where vector A 2 captures the adjacent relationships between node 2 and the other nodes. After subtracting time-series information determined by node 2 and priori neighbor, node 3, on the both sides of the equation, the unknown connections of node 2 can be reconstructed by optimizing the solution of the following equation G 2 ′ Φ 2 ′ · A 2 ′ using EEM. (C) A reconstructed adjacent matrix. The unknown neighbors of node 2 could be uncovered by EEM. The adjacent matrix is presented, in which golden blocks represent reconstructed link.
Frontiers in Physics | www.frontiersin.org August 2021 | Volume 9 | Article 732835 problems, and the network structure, namely, the adjacency matrix A, is reconstructed column by column [55,56]. An adjacency vector A i of a node is used to describe the adjacent relationships between node i (i 1, 2, . . . , N) and the other N − 1 nodes in the network, which contains no loop. The 1 when node i and node j are connected, and a ij 0 otherwise. Suppose that N i (N i ≤ N − 1) nodes in the adjacency vector A i have undetermined relationships with node i. EEM is employed to find out node i's (i 1, 2, . . . , N) direct neighbors from N i possible nodes, namely a shorter adjacency vector The training set E T sheds light on the priori neighbor set Γ K i of node i, which contains (N − N i − 1) nodes. Then we could calculate the sum of payoffs G Γ K i ′ of node i obtained from the priori neighbors in The payoffs G i ′ implies the hidden adjacent relationships between node i and N i other nodes because node i gains payoffs merely from its neighbors.
Most real-world networks are characterized by natural sparsity and the adjacency vector A i of node i is sparse, which refers to vector A i has only a few nonzero elements (i.e. a ij 1). Noting that the value of each element in node i's priori adjacency vector A Γ K i is 1, vector A i ′ would still be sparse because the number of zero elements has not been changed but the number of nonzero elements has decreased when we remove the priori adjacency vector A Γ K i from vector A i . The sparsity of A i ′ makes EEM applicable. Initially, the nodes' strategies and payoffs are recorded in discrete round t 1 , t 2 , . . . , t M . Since new payoffs are obtained from the game between node i and N i nodes, we can build a model as Eq. 4. The sparse vector A i ′ then can be reconstructed by solving the following convex optimization problem [57,58]: The available dynamical payoffs of node i can be expressed by The payoffs of node i obtained from the corresponding nodes in limited rounds can be expressed by an The elements in matrix Φ i ′ could be calculated using the formula shown in Eq. 2. According to Eq. 4, we could obtain adjacency vector T by solving the convex optimization problem. We could obtain the complete adjacency vector A i (a i1 , a i2 , . . . , a iN ) T by combining the reconstructed vector A i ′ and the priori neighbor set A Γ K i of node i. In a similar fashion, the neighbor-connection vectors of all the other nodes can be obtained, yielding the network's adjacency matrix A (A 1 , A 2 , . . . , A N ).

Datasets
In order to understanding the performance of EEM in reconstructing the synthetic networks, the experiments are conducted in four types of networks. The basic statistical properties of the synthetic networks are presented in Table 1. N and |E| are the number of nodes and links. 〈k〉 is the mean degree, 〈r〉 is the mean assortative coefficient, 〈C〉 is the mean clustering coefficient, and 〈D〉 is the mean shortest distance. Here, we use abbreviation WS, RM, RG and AP to represent small-world networks, random networks, regular networks and Apollonian networks, respectively.
We assume that the strategies and payoffs of each node in a certain round t is one piece of time-series information. In the experiments, we use M pieces of accessible time-series information obtained from discrete round t 1 to round t M to reconstruct different networks. In this paper, we set N, namely the number of nodes in the network, as the maximum value of M. Then we use an index of information sufficiency η(η ≡ M/N) to represent the size of the time-series information used in the network reconstruction. Intuitively, the time-series information is sufficient when the pieces of the accessible time-series information M N, while the time-series information is insufficient when 0 < M < N. Correspondingly, the accessible time-series information is sufficient when the index of information sufficiency η 1 and the accessible time-series information is insufficient when 0 < η < 1. The reconstruction models are also applied to reconstruct networks with different priori information of the structure, measured by a probability P s (0 ≤ P s ≤ 1).
In addition, the performance of EEM is also evaluated in reconstructing the empirical networks. Table 2 shows the basic statistical properties of all four networks. These networks are chosen because they are characterized by large clustering coefficient and short distance.

Metrics
To test the EEM's accuracy, the original existent link set, E, are randomly divided into two parts: the priori set E T , and the probe  set E P . Clearly, E E T ∪ E P and E T ∩ E P ∅. In this paper, the priori set always contains P s of links, and the remaining 1 − P s of links constitute the probe set. We apply four standard indices to quantify the reconstruction accuracy: the success rates of existent links SR, the success rates of nonexistent links SN [24], precision PRE [61,62] and the area under the receiver operating characteristic curve AUC [63] are applied. In addition, we apply local indices of success rates in the experiments. Both the success rates of existent links SR and the success rates of nonexistent links SN estimate the similarity of the reconstructed networks and the original networks. The success rates of existent links SR denotes the ratio of the number of links reconstructed by the reconstruction models to the number of real existent links in the network. The success rates of nonexistent links SN denotes the ratio of the number of nonexistent links distinguished by the reconstruction models to the number of real nonexistent links in the network. We obtain where Γ io and Γ ir denote real neighbor set of node i and neighbor set of node i reconstructed by the reconstruction models, respectively. |·| denotes the number of elements in a set ·. Γ iō and Γ ir are the supplementary set of set Γ io and Γ ir . Each node in set Γ iō is not adjacent to node i. Correspondingly, each node in reconstructed set Γ ir is not adjacent to node i. A successful reconstruction is achieved when the success rates of existent links SR (0 ≤ SR ≤ 1) and the success rates of nonexistent links SN(0 ≤ SN ≤ 1) are close to the value of 1. Precision PRE is defined as the ratio of existent links reconstructed by models to the number of the whole unknown existent links. In our case, to calculate precision we need to rank all the unknown links in decreasing order according to existent possibilities computed by reconstruction models. Then we focus on the top-L (here L |E P |) links. If there are H links successfully reconstructed, then The area under the receiver operating characteristic curve AUC evaluates the reconstruction models' performance according to the whole unknown link list. Provided the existent possibility of all unknown links, AUC can be interpreted as the probability that a randomly chosen unknown existent link is given a higher existent possibility than a randomly chosen nonexistent link. In the implementation, the value of AUC is calculated with a function perfcurve by Matlab.
Clearly, a higher value of the success rates of existent links SR, the success rates of nonexistent links SN, precision PRE or the area under the receiver operating characteristic curve AUC means a higher reconstruction accuracy. We conduct 50 times independent simulation for averaging the indices of reconstruction accuracy as the mean success rates of existent links 〈SR〉, the mean success rates of nonexistent links 〈SN〉, the mean precision 〈PRE〉 and the mean area under the receiver operating characteristic curve 〈AUC〉.
To understand the reconstruction performance of EEM when reconstructing local structure of the network divide the structure of each type of network into separately local structure. Supposing that the roles of nodes in the network are leaders, brokers and peripheral executors. We denote leaders are nodes with small degrees and the number of leaders in each type of network is 6. In addition, the subnetwork composed of leaders is a connected subgraph. Then brokers are nodes which are connected with leaders, and the residual nodes are peripheral executors. The sets of leaders, brokers and peripheral executors are not overlapped. We use letters L, B and P to represent the adjacent relationships between leaders, the adjacent relationships between leaders and brokers, and the adjacent relationships among peripheral executors and brokers, respectively. Then, we could obtain the success rates of existent links of each local structure normalized by the number of real existent links |Γ io | of the network.

SR Lr
The sum of three local success rates of existent links is equal the global success rates of existent links.
SR SR Lr + SR Br + SR Pr (11) Correspondingly, the maximum of three local success rates of existent links would be when the original network is successfully reconstructed. To quantify the success rates of three different local structure, we define local indices of success rates as follows: Similarly, a higher value of local index of success rates APP SRL , APP SRB , or APP SRP means a higher reconstruction accuracy. We Frontiers in Physics | www.frontiersin.org August 2021 | Volume 9 | Article 732835 conduct 50 times independent simulation for averaging the indices of success rates 〈APP SRL 〉, 〈APP SRB 〉 and 〈APP SRP 〉.

Experimental Results on Synthetic and Empirical Networks
In order to understand the performance of EEM, four types of synthetic networks hosting a PDG dynamical process are considered in our paper. Figure 2A depicts the index of reconstruction accuracy for a synthetic small-world network, measured by the mean success rates of existent links 〈SR〉, based on 90% priori structure information. The mean success rates of existent links 〈SR〉 increases monotonously when the index of information sufficiency η is varying from 0.1 to 0.4. Especially the mean reconstruction accuracy 〈SR〉 reaches the maximum value of 1 when the index of information sufficiency η 0.4. The increment rate of the mean reconstruction accuracy 〈SR〉 is 9.97%. Then the mean reconstruction accuracy 〈SR〉 keeps the value of 1 when the index of information sufficiency η is larger than 0.4. As shown in Figures 2B-H, the mean reconstruction accuracy 〈SR〉 increases monotonously when the index of information sufficiency η is less than 0.4. In addition, the mean reconstruction accuracy 〈SR〉 reaches 1 for the different types of synthetic networks when the index of information sufficiency η exceeds 0.4.
Moreover, we compare the experimental results between EEM and two link prediction models which are the resource allocation (RA) and the structural perturbation method (SPM). Figures  2A-H show that when the index of information sufficiency η is low (i.e., η 0.1), the mean success rates of existent links 〈SR〉 obtained by EEM on small-world networks, random networks, regular networks and Apollonian networks reaches 0.9093, 0.9085, 0.9021, 0.9361, 0.9823, 0.9897, 0.9402 and 0.9982, respectively. Compared with RA and SPM, EEM's mean success rates of existent links 〈SR〉 are higher, which is improved by at least 8.07 and 12.22% on the networks with 120-124 nodes, respectively. Compared with RA and SPM, EEM's mean success rates of existent links 〈SR〉 are higher, which is improved by at least 17.53 and 22.81% on the networks with 250-367 nodes, respectively. The experimental results of Figure 2 indicate that EEM has a well tradeoff that provides high quality reconstruction accuracy while requiring less time-series information.
Intuitively, a network's structure would be accurately reconstructed when more priori information about the structure of the network are presented. Figure 3 shows the dependence of the values of 〈SR〉 on probability P s , the priori information of the structure, where we see that, in the cases of lower index of information sufficiency η (η ≤ 0.4), 〈SR〉 increases monotonously when the probability P s increases. On the other hand, the mean success rates of existent links 〈SR〉 approaches the maximum value of 1 when the index of information sufficiency η is larger than 0.4. In terms of the probability P s , the highest performance is achieved for the highest P s . The intuitive reason for the relatively superior performance with the four synthetic networks lies in the sufficiency of the available information of the networks' structure.
In the following, we verify the performance of EEM in local structure of the networks. We divide the structure of each type of network into three separately local structure with subscript L, B, P for them. Figure 4A depicts reconstruction success rate of a small-world network, measured by the mean local index of success rates 〈APP SRL 〉, 〈APP SRB 〉, and 〈APP SRP 〉, based on 90% priori structure information.
As illustrated in the main graph in Figure 4A, the mean local index of success rates 〈APP SRL 〉 obtained by EEM is higher than RA or SPM. Especially the mean local index of success rates 〈APP SRL 〉 obtained by EEM reaches 96.41% when the index of information sufficiency η 0.1, while the mean local index of success rates 〈APP SRL 〉 obtained by RA and SPM are both 88.95%. The mean local index of success rates 〈APP SRB 〉 and 〈APP SRP 〉 obtained by EEM are 93.95 and 86.68% when the index of information sufficiency η 0.1, as shown in the subgraph (α)-(β) in Figure 4A. Correspondingly, the mean local index of   Figures 4B-D, which indicate that EEM can achieve higher reconstruction accuracy with low time-series information than RA or SPM. The underlying reason that EEM could obtain higher reconstruction accuracy than RA or SPM might be twofold. Firstly, EEM is applicable to reconstruct networks with sparse connective relationships because Wang et al. developed a paradigm [19,24,25] to address the network reconstruction problems and Candès et al. provided the theoretical framework for this paradigm [57,58]. Both EEM and two link prediction models utilize the identical priori structure information of the network to obtain direct information of the unknown structure. In addition, EEM bridges the relationships between the nodes' payoffs and strategies by virtue of time-series information because the payoffs can merely be obtained from each node's neighbors. Then EEM could extract indirect information of the unknown structure from the above relationships which strengthens the reliability of the experimental results. RA and SPM could also extract valuable indirect information of the unknown structure, but the valuable information still originates from the priori structure information of the network due to lack of a universal theoretical framework.
Secondly, both the reconstruction accuracy of the local structure and the reconstruction accuracy of the global structure obtained by EEM highly consist. As illustrated in Figure 4, the absolute error between three mean local index of success rates 〈APP SRL 〉, 〈APP SRB 〉 and 〈APP SRP 〉 obtained by EEM on each network is less than 0.1, which indicates that the reconstruction accuracy on three separate local structure obtained by EEM is almost the same. Consequently, the global reconstruction accuracy and the local reconstruction accuracy highly consist because the global reconstruction accuracy is the linear combination of three mean local index of success rates as: 〈SR〉 SR Lo · 〈APP SRL 〉 + SR Bo · 〈APP SRB 〉 + SR Po · 〈APP SRP 〉, where SR Lo , SR Bo and SR Po are constant for each network. The high reconstruction accuracy of three separately local structure contribute to a high reconstruction accuracy of the global structure. We also observe that the reconstruction accuracy on three separate local structure obtained by RA or SPM fluctuates. Especially in the reconstruction experiments on synthetic random networks, the maximum absolute error between three mean local index of success rates obtained by RA or SPM reaches 0.3837. The experimental results indicate that the reconstruction accuracy obtained by RA and SPM is largely dependent on the priori structure information of the network. The reconstruction accuracy of RA or SPM would be high when the local priori structure is consistent with the global structure, and the reconstruction accuracy would be low otherwise.
Finally, we test the results for four empirical networks. As shown in Table 3, we reconstruct the network structure by EEM, RA and SPM with 90% priori structure information. The empirical results indicate that four indices of reconstruction accuracy obtained by EEM are higher than RA and SPM for four empirical networks when the index of information sufficiency rate η 0.

CONCLUSION
In summary, we have investigated the performance of EEM for reconstructing synthetic networks, which are characterized by four types of networks as small-world networks, random networks, regular networks and Apollonian networks, based on priori structure information. The mean success rates of existent links 〈SR〉 obtained by EEM could achieve at least 0.9021 when the index of information sufficiency η is 0.1. Compared with RA and SPM, EEM has higher mean success rates of existent links 〈SR〉, which is improved by 8.07 and 12.22% on the networks with 120-124 nodes, respectively. Compared with RA and SPM, EEM has higher mean success rates of existent links 〈SR〉, which is improved by 17.53 and 22.81% on the networks with 250-367 nodes, respectively. The experimental results also indicate that separately local structure in each type of network could be accurately reconstructed by EEM. In addition, EEM's reconstruction accuracy is also evaluated on four empirical networks. Compared with RA and SPM, EEM has higher mean success rates of existent links 〈SR〉, which is improved by 64.11 and 47.81%, respectively. The reason that EEM obtain higher reconstruction accuracy than RA or SPM might lie in that EEM could utilize time-series information to strengthen the reliability of the experimental results and EEM's capability to reconstruct the local structure and the global structure highly consist. The evaluation of EEM on both synthetic networks and empirical networks suggest that EEM is applicable for networks with sparsely connective relationships and it has high reconstruction accuracy by low information requirements. Although the efficiency of EEM has been measured in reconstructing network's structure with both synthetic networks and empirical networks, there are still a lot of questions to be considered further. For example, the results show that EEM can give remarkably higher reconstruction accuracy on a network hosing a PDG dynamical process, but the performance of EEM has not been validated under another dynamical process. Although EEM could also be extended to cases with large-scale network, the computing time might increase exponentially. In addition, EEM's capability to identify spurious links has not been explored. Noting that EEM can well capture the adjacent relationships from limited information and thus give more accurate reconstruction, such features make EEM appealing to reconstructing general networks with extremely low data requirement. Despite underlying challenges, we will make attempt to continue our research referring to the problems of network reconstruction.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.