Identification of the Key Regulators of Spina Bifida Through Graph-Theoretical Approach

Spina Bifida (SB) is a congenital spinal cord malformation. Efforts to discern the key regulators (KRs) of the SB protein-protein interaction (PPI) network are requisite for developing its successful interventions. The architecture of the SB network, constructed from 117 manually curated genes was found to self-organize into a scale-free fractal state having a weak hierarchical organization. We identified three modules/motifs consisting of ten KRs, namely, TNIP1, TNF, TRAF1, TNRC6B, KMT2C, KMT2D, NCOA3, TRDMT1, DICER1, and HDAC1. These KRs serve as the backbone of the network, they propagate signals through the different hierarchical levels of the network to conserve the network’s stability while maintaining low popularity in the network. We also observed that the SB network exhibits a rich-club organization, the formation of which is attributed to our key regulators also except for TNIP1 and TRDMT1. The KRs that were found to ally with each other and emerge in the same motif, open up a new dimension of research of studying these KRs together. Owing to the multiple etiology and mechanisms of SB, a combination of several biomarkers is expected to have higher diagnostic accuracy for SB as compared to using a single biomarker. So, if all the KRs present in a single module/motif are targetted together, they can serve as biomarkers for the diagnosis of SB. Our study puts forward some novel SB-related genes that need further experimental validation to be considered as reliable future biomarkers and therapeutic targets.


INTRODUCTION
The vigorous efforts of researchers and an increase in research techniques have paved the way for discoveries in the study of polygenic and other complex human diseases. However, there remain a large fraction of genetic diseases with their molecular basis unknown, Spina bifida (SB) being one of them. SB, a spinal cord malformation, falls into the category of Neural Tube Defects (NTDs). This birth defect occurs when the neural tube of an embryo fails to close completely during the 4th week of pregnancy. As a result, the vertebral column remains split (bifid). Myelomeningocele (MMC) is the most common and serious type of SB and is often used interchangeably with it.
MMC causes partial/complete loss of all the functions below the level of injury. Defective bladder and bowel functions, hindbrain herniation (Chiari II malformation), hydrocephalus, orthopedic abnormalities also exist in these patients (Copp et al., 2015;Mohd-Zin et al., 2017). Apart from MMC, other open and closed forms of SB are also there (Brei and Houtrow, 2017).
Etiologically, SB involves both genetic as well as environmental factors. Though the genetic component of SB is believed to be at 60-70%, yet, non-genetic risk factors like reduced folate intake, maternal anticonvulsant therapy, diabetes mellitus, and obesity, cannot be ruled out on account of some important studies, for example, the one which states that up to 70% of NTDs in the general population can be prevented by periconceptional maternal supplementation with folic acid. Although SB has received much attention over the last decades because of its high prevalence of one case per 1,000 births globally, its exact causation is yet to be deciphered. The mechanism by which folic acid protects against SB also remains a challenge for the researchers despite its unambiguous impact on the disorder (Beaudin and Stover, 2007;Copp et al., 2015). Individuals with SB need medical management throughout their lives apart from surgical treatment .
A disease is more often a consequence of the perturbations of the complex intracellular network of interactions between functionally related genes, rather than a single gene abnormality. This led to the systemic approach to biological problems which is based on the principle that to understand the contribution of various genes/proteins in disease initiation and progression, one has to look at the network of interactions of a living system as a whole. Here comes into picture the concept of Network medicine, which aims to explore the complexity of a disease through the systematic identification of the disease pathways and modules. Here, the protein interaction maps are developed and later analyzed through graph/network theory to understand the theoretical aspect of complex networks (Re et al., 2008;Barabási et al., 2011). According to the graph theory, analysis of the topological structure of a network provides important information of the network through which novel disease genes and pathways, biomarkers, and drug targets for complex diseases can be identified (Browne et al., 2018;Chen et al., 2019).
A recent study on the complex protein-protein interaction (PPI) network suggests its conformity to scale-free topology on a hierarchical scale. On these networks, the problem arises that the central lethality rule does not apply where the stability and dynamics of the network are disrupted but not completely disrupted when the hubs are targeted . This may be due to the hierarchical organization of modules/sub-modules in complex networks and other biological networks at various topological levels, where specific roles are associated with them (Farooqui et al., 2018;Malik et al., 2019;Mangangcha et al., 2019Mangangcha et al., , 2020. We followed similar methods while conducting this study. Thus, the focus of our study is on the Protein-Protein Interaction (PPI) network/graph of SB, constructed from manually curated genes with an aim to understand the topology or the architectural principle of the network/graph (random, small world, scale-free or hierarchical) which is a prerequisite to identify the main key drivers/regulators of the network. We also analyzed a rich-club structural ordering of the SB network that signifies an efficient higher-order organization indicating the existence of the rich-club nodes which increase the network's stability as a whole. We further extended our study to the identification of the modules/motifs consisting of the important key regulators of the network that have fundamental importance due to their activities and regulating mechanisms in the network based on previous successful applications of related methodologies.

Data Mining and Manual Curation of the Candidate Genes Involved in Spina Bifida (SB)
The candidate genes of human SB were manually curated from literature through databases i.e., gene, PubMed, and OMIM (Hamosh et al., 2005), housed in NCBI. We got a list of 117 experimentally verified genes that we used as seed genes in the construction of the SB network.

Construction of Protein-Protein Interaction Network of SB
The candidate genes were mapped to their respective proteins to construct the Protein-Protein Interaction (PPI) network of SB in the STRING database (The Search Tool for the Retrieval of Interacting Genes 1 ) (Szklarczyk et al., 2017) with an interaction score >0.50 as the threshold, following the concept of one geneone protein. To construct the primary PPI network from the seed genes, we added 500 interactors in the first shell and 500 interactors in the second shell. Subsequently, the network was visualized and analyzed in the Cytoscape software (version 3.6.1) (Shannon et al., 2003).
of new vertices and the new vertices attach preferentially to already well-connected sites. This led to the observation that many large networks have a common property that their vertex connectivities follow a scale-free power-law distribution, meaning that in large complex networks the probability that a vertex in the network interacts with k other vertices decays as a power law, following P(k) ∼ k −γ . The fitting parameter values are given by rˆ2 = 0.410. This result indicates that large networks selforganize into a scale-free state despite their continuous growth.

Degree Distribution
The degree k of a node is a local measure of centrality of that node, it is the number of edges by which that node is linked to other nodes in a network (Raman, 2010). Degree distribution P(k), is the distribution of the node degrees over the whole network. P(k) gives the probability that a randomly selected node has exactly k links and is calculated by dividing the number of nodes (n k ) having a degree k (with k = 1,2,3. . .) by the total number of nodes (N) in the network; P(k) decays as a power-law P(k)∼k −γ for a scale-free network, where, γ is the degree exponent, whereas, the value of γ becomes close to γ∼ 2.26 (mean-field value) for hierarchical networks and if a network is random then P(k) follows poisson distribution, thus, P(k) can be used to distinguish between various network topologies (Albert and Barabási, 2002).

Neighborhood Connectivity
The set of neighbors of a given node n is the node's neighborhood and the number of its neighbors is its connectivity. The neighborhood connectivity of the node n is defined as the average connectivity of all the nearest neighbors of n (Maslov and Sneppen, 2002). Neighborhood connectivity is given by, where, P q k is the conditional probability that a link belonging to a node with connectivity k points to a node with connectivity q. While C N (k) obeys power law in the case of a hierarchical network, C N (k) ∼ k −β with β∼ 0.5, for a scale-free network, C N (k) ∼ constant (Pastor-Satorras et al., 2001;Malik et al., 2017). Positive and negative power dependence of C N (k) could be the indicators of assortativity and disassortativity in the network topology, respectively (Barrat et al., 2004), meaning that if C N (k) follows power-law with a positive value of exponent β (i.e., C N (k) ∼ k +β ) then edges between highly connected nodes prevail in a network, this shows assortative nature of the network, whereas, if C N (k) follows power-law with a negative value of exponent β (i.e., C N (k) ∼ k −β ) then edges between lowly connected and highly connected nodes prevail in a network, this shows disassortative nature of the network.

Clustering Coefficient
The clustering coefficient, for a node n, is a notion of how connected the neighbors of that node are, in a network. The clustering coefficient for n is a ratio of the number of edges between the neighbors of n, and the maximum number of edges that could possibly exist between the neighbors of n. For an undirected network, clustering coefficient C(k n ) of a node n can be calculated by, where, e n is the number of connected pairs between all nearestneighbors of the node n, and k n is the degree of the node n (Ravasz and Barabási, 2003). The clustering coefficient, when applied to an entire network is called the average clustering coefficient and is defined as the average of the clustering coefficients for all nodes in that network, it gives a measure of the tendency of the nodes in that network to cluster together.

Betweenness Centrality
Betweenness centrality is a measure of the frequency of occurrence of a node on all shortest paths between all pairs of nodes in a network (Brandes, 2001). The betweenness centrality of a node indicates the amount of influence it has over the flow of information in a network by behaving like an interaction bridge between other nodes in the network. The betweenness centrality C B (n) of a node n is given by the expression: where, s and t are nodes in the network other than n, σ st is the total number of shortest paths from s to t, and σ st (n) is the number of those shortest paths from s to t on which n lies (Albert and Barabási, 2002;Maslov and Sneppen, 2002;Raman, 2010).

Closeness Centrality
Closeness centrality (C C ) is a measure of how close a node is, to all the other nodes reachable from it in a network, thus, it points out the nodes which are able to spread information very efficiently through a network (Canright and Engø-Monsen, 2004). C C (n) of a node n is defined as the reciprocal of the average length of the shortest paths between n and all other nodes connected to it in the network, and is given by, where, dij is the length of the shortest path between two nodes i and j, and N is the total number of nodes in the network which are connected to the node n.

Eigenvector Centrality
Eigenvector centrality is a measure of a node's power to facilitate the spread of information in a network. Eigenvector centrality C E (n) of a node n in a network is proportional to the sum of it's nearest neighbors centralities, and is defined by the equation, where, nn(n) indicates the nearest neighbors of node n in the network. In the eigenvalue equation, Av n = λv n , λ is the eigenvalue of the eigenvector v i and A is the adjacency matrix (connection matrix) of the network. The principal eigenvector of A, which corresponds to maximum eigenvalue λ max , is considered to have a positive eigenvector centrality score (Barrat et al., 2004;Malik et al., 2017).

Community/Modules Detection
Characterization of the modular framework of the constructed network was requisite to define its behavior as a hierarchical network having several levels of hierarchy (Traag et al., 2011(Traag et al., , 2013. To detect communities that are distributed in a hierarchical fashion within the constructed network, we followed Newman and Girvan's community finding algorithm (Newman and Girvan, 2004) and the method used was Leading Eigen Vector method (LEV) (Newman, 2006) in R using "igraph" package 2 , LEV computation is considered fairly reliable since it reckons the eigenvalue of each edge, thus, indicating the significance of edges rather than nodes. We kept breaking the network into modules and then sub-modules till the motif level came which is the last level of network organization after which the network can not be broken further. Identifying any submodule as a community was based on the criterion that it should be found to contain at least one triangular motif [defined by G(3, 3) i.e., 3 nodes and 3 edges].
Modularity is used to estimate the strength of the division of a network into communities (Newman and Girvan, 2004). If m denotes the total number of edges in a community, Aij is the adjacency matrix of size i × j, k represents degrees, and the δ function yields 1 if nodes i and j are in the same community, then Modularity (Q) of the community can be defined as, Functional Enrichment and Pathway Enrichment of the SB Network's Seed Genes In the current study, we used The Database for Annotation, Visualization and Integrated Discovery (DAVID) which extracts biological meaning from large lists of genes (Newman, 2005;Mason and Verwoerd, 2007) for the functional enrichment and pathway enrichment analysis of the seed genes (Huang et al., 2009a,b). The functions and pathways with adjusted Benjaminiadjusted P-value < 0.01 were considered statistically significant.
Applying Local-Community-Paradigm (LCP) Approach to Estimate the Network's Compactness The LCP theory can be used to measure the self-organizing ability of a network by means of providing information regarding the number, size, and compactness of communities in the network. Thus, to check for the self-organizing ability of the SB network at each level of the organization, the LCP technique was employed using MATLAB 8.2. A Local Community (LC) is composed of a cohort of Common Neighbors (CNs) of a given link and their cross-interactions-Local Community Links (LCLs) (Cacciola et al., 2019). The CN index between two nodes x and y, is the amount of overlap between their sets of first-node-neighbors S(x) and S(y) given by, CN = (Sx)∩S(y), a large value of CN indicates that these two nodes are more likely to interact, therefore, increase in CN reflects an increase in compactness in the network, which eventually indicates faster information processing in the network. The LCLs between x and y, whose upper bound is defined by, max(LCL) = 1/2CN(CN-1), are the number of internal links in the local-community (LC) (Cannistraci et al., 2013).
According to the LCP theory, the number of CNs of each link in a complex network is positively correlated with the respective number of LCLs, this led to a new network measure called local-community-paradigm correlation (LCP-corr). High LCPcorr values (usually >0.8) suggest rapid delivery of information across the various network modules and local processing by the formation of new links between CNs, thus, suggesting more dynamic self-reorganization in a network. On the other hand, low LCP-corr values (usually <0.4) characterize energetically expensive connections, thereby, indicating weak interactions between the nodes in non-LCP networks (Cacciola et al., 2019). The LCP correlation (LCP-corr) is the Pearson correlation coefficient between the variables CN and LCL and the formula for computing it is: with CN > 1, where cov(CN, LCL) is the covariance between CN and LCL, σ CN and σ LCL are standard deviations of CN and LCL, respectively.

Tracking of the Seed Genes Through the Networks
The path that each of the seed genes followed in the SB network, was tracked through various modules and sub-modules of the network and the ones which reached the motif level, as well as their interacting partners, were identified as the Key Regulators (KRs) of the network. In graph theory, KRs are considered significant as they serve as the backbone of a network from top to bottom organization and vice versa to maintain the network's stability.

Knock-Out Experiment of the Motifs Consisting of the Key Regulators
The deletion of the KRs might result in lethality on account of their regulatory role in a network. The regulation exerted by the KRs in a network can be understood theoretically, by studying the changes observed in the connections and topological parameters of that network after the KRs being knocked-out from it, compared to the corresponding unmodified network. In this regard, the motifs consisting of the KRs that were identified in the last level of the network organization were separately removed from the constructed primary network, and the topological properties of the modified network were again calculated to discern the perturbations led by the knock-out of these motifs from the network. We repeated the in silico knockout experiment of the motifs successively at each level of the network systematization to comprehend the regulatory role of the constituent KRs in the network.

Distribution of Energy in the Network: Hamiltonian Energy Calculation
At each level of the organization of a network, be it a global or a modular level, a certain level of energy is required to keep the network organized at that level. This can be measured by calculating Hamiltonian Energy (HE) of the network at that level within the formalism of the Constant Potts Model (Newman and Girvan, 2004;Canright and Engø-Monsen, 2006). HE of a network/module/sub-module can be calculated by, where, c is any community containing e c and n c number of edges and nodes, respectively, and γ is the resolution parameter acting as edge density threshold which is set to be 0.5. To demonstrate the regulation exerted by the KRs in the same combination as they were found in the motifs, HE calculation and comparison of the original and the motifs knockout networks were done .

Rich-Club Analysis
The "rich-club" organization is attributed to the preferable linking of high-degree nodes (rich nodes) among themselves to form tight and well-interlinked sub-graphs (clubs) in a network (Colizza et al., 2006). The rich-club organization of a network can be discerned by computing the rich-club coefficient φ(k) across the degree range. If N > k denotes the number of nodes having a degree higher than a given value k and E > k stands for the number of links connecting the N > k nodes then φ(k) can be defined as: The rich-club coefficient φ(k) measures the connectedness of the rich nodes, by giving the ratio of the actual to the maximum number of links possible between the N > k nodes. However, a monotonic increase of φ(k) can often be misleading as it may rather be a consequence of a higher probability of highdegree nodes to share edges as compared to low-degree nodes. Therefore, φ(k) should be normalized by comparing it with those obtained from the maximally random networks with similar size and degree distribution in order to precisely assess the richclub structural ordering. The normalized rich-club coefficient φ norm (k) is computed as: Where, φ rand (k) is the average rich-club coefficient of the maximally random networks (Nunes Amaral and Guimera, 2006). φ norm (k) > 1 is the signature of a rich-club organization in a network. In contrast, φ norm (k) < 1 signifies a lack of interconnectivity among the rich nodes. We evaluated the existence of a rich-club structural ordering in the SB network which is known to aid in increasing a network's stability (Mangangcha et al., 2019(Mangangcha et al., , 2020. All the graphs were drawn using OriginPro 8.5 and the figures using Adobe Illustrator CS6.

Data Acquisition Through Manual Curation
A list of 117 genes that were experimentally verified to be involved in Spina Bifida (SB) in humans was retrieved through manual curation of literature ( Table 1). These genes were used for the construction of the SB network to understand the molecular mechanism behind the occurrence of the malformation and to provide a new angle for its future biomarkers and therapeutic targets.

SB Network Architecture Reveals Hierarchical Scale-Free Features
To construct the SB Protein-Protein Interaction (PPI) network, the candidate genes of SB listed in Table 1, were fed into STRING as seed genes. 116 of our seed genes got incorporated in the constructed primary network composed of 1,116 nodes and 40,886 edges, leaving TRPM6, which failed to make its way into the network. To gain structural insights into the primary PPI network of SB, we examined features of its topology or architecture i.e., probability of degree distribution P(k), clustering coefficient C(k), neighborhood connectivity C N (k), and centrality measurements. To find if the network has the scale-free fractal attribute and is hierarchical, the topological properties of the network must obey power-law behaviors as a function of degree k ( Barabási and Albert, 1999) and it can be written as; Topological Property (TP) ∼ k ∧ exp, Where, degree distribution, neighborhood connectivity, clustering coefficient, betweenness centrality, closeness centrality, and eigenvector centrality are the topological properties and γ, β, α, µ, δ, O are their exp (exponents), respectively.
We followed the standard statistical fitting technique put forward by Clauset et al. (2009) to verify that the features of the graph's architecture follow power-law behavior. All statistical P-values for all data sets, calculated against 2,500 random samplings, are found to be larger than a critical value 0.1, and goodness of fits are found to be less than and equal to 0.33. The data points of all the topological parameters are found to fit power law when plotted against the degree k of the SB network (First row labeled as Level "0" in Figures 1A, 2A, 3A). The straight line in the graphs represents the non-linear curve-fitting on the formula Y = a0 * xˆa1. Here a1 is the coefficient of the topological showing probability of degree distribution P(k), clustering coefficient C(k), neighborhood connectivity C N (k), betweenness centrality C B (k), closeness centrality C C (k), and eigenvector centrality C E (k) as a function of degree (k) for primary original network (level 0) and TNIP1 motif knockout networks at different levels of the organization (level 1-4). (B) showing changes in the values of the topological properties' exponents of the TNIP1 motif knockout networks (colors corresponding to the ones used in the topological properties plots i.e., blue for P(k), red for C(k), yellow for C N (k), magenta for C B (k), green for C C (k) and turquoise for C E (k)) compared with the topological properties' exponents of the corresponding original networks (black) at different levels of the organization. γ, α, β, µ, δ, and O are the exponents of the degree distribution, clustering coefficient, neighborhood connectivity, betweenness centrality, closeness centrality, and eigenvector centrality, respectively.
properties and we plot degree (k) on the X-axis. If these data fit well as stated by Al Barabasi et al., it is concluded that the network is scale-free and is hierarchical. In our study, it was seen that the topological properties of the network obey the power-law, thus, we conclude that the SB network is hierarchical and has scalefree fractal attributes. The values of the power-law exponents for each of the topological properties of the complete network were calculated: The values of the exponents of P(k), C(k), and C N (k), i.e., γ, α, and β, lead to the conclusion that the network though not having a strong hierarchy, still falls into the category of a weak hierarchical scale-free network, reflecting the presence of welldefined successive interconnected communities with meagerly distributed hubs (nodes with a high degree of interaction) in the network (Pastor-Satorras et al., 2001;Ravasz et al., 2002;Nafis et al., 2015). The negative values of γ (γ < 2) and α, indicate the hierarchical nature of the SB network. The negative value of α means that as k is increasing, C(k) is decreasing, indicating that nodes with a high degree have a low tendency to cluster, further showing a hierarchy of hubs, in which the most densely showing probability of degree distribution P(k), clustering coefficient C(k), neighborhood connectivity C N (k), betweenness centrality C B (k), closeness centrality C C (k), and eigenvector centrality C E (k) as a function of degree (k) for primary original network (level 0) and TNRC6B motif knockout networks at different levels of the organization (level 1-3). (B) showing changes in the values of the topological properties' exponents of the TNRC6B motif knockout networks [colors corresponding to the ones used in the topological properties plots i.e., blue for P(k), red for C(k), yellow for C N (k), magenta for C B (k), green for C C (k), and turquoise for C E (k)] compared with the topological properties' exponents of the corresponding original networks (black) at different levels of the organization. γ, α, β, µ, δ, and O are the exponents of the degree distribution, clustering coefficient, neighborhood connectivity, betweenness centrality, closeness centrality, and eigenvector centrality, respectively. connected hub is linked to a small fraction of all other nodes. The power-law obeyed by P(k) is also a sign of the scale-free nature of the SB network since scale-free networks are defined by a power-law degree distribution, the negative value of γ means that a minor section of the nodes exhibits a high degree with most of the nodes having a low degree which is in accordance with the definition of a scale-free network (Barabási and Albert, 1999). The positive value of β indicates the assortative nature of the SB network, which means that edges between heavily connected nodes predominate in the network to regulate the latter (Pastor-Satorras et al., 2001).
Centrality measurements, namely betweenness centrality C B (k), closeness centrality C C (k), and eigenvector centrality C E (k) are also observed to exhibit power-law or fractal behavior: The positive values of the exponents µ, δ, and θ of the three distributions C B (k), C C (k), and C E (k), respectively, also show that the network exhibits hierarchical scale-free or fractal features (Bonacich, 1987). The positive values of these exponents mean that C B, C C , and C E when plotted against degree k, increase as k increases. The increasing value of C B as k increased indicates that nodes with a high degree have high C B, thus, these larger hubs have major influence over the information transmission in the network than the nodes with a low degree. Similarly, direct proportionality between C C and k points toward the high C C of the hubs further indicating that these high-degree nodes are the quick spreader of the information in the network. C E is also in favor of highly connected nodes as reflected by the positive value of θ, showing that nodes with a high degree have high C E as well, thus indicating that these nodes are more influential in the network on account of their power of spreading information in the network. The positive value of θ signifies the connectedness between the high degree nodes, this is in agreement with the assortative mixing in the network. showing probability of degree distribution P(k), clustering coefficient C(k), neighborhood connectivity C N (k), betweenness centrality C B (k), closeness centrality C C (k), and eigenvector centrality C E (k) as a function of degree (k) for primary original network (level 0) and TRDMT1 motif knockout networks at different levels of the organization (level 1-3). (B) showing changes in the values of the topological properties' exponents of the TRDMT1 motif knockout networks [colors corresponding to the ones used in the topological properties plots i.e., blue for P(k), red for C(k), yellow for C N (k), magenta for C B (k), green for C C (k) and turquoise for C E (k)] compared with the topological properties' exponents of the corresponding original networks (black) at different levels of the organization. γ, α, β, µ, δ, and O are the exponents of the degree distribution, clustering coefficient, neighborhood connectivity, betweenness centrality, closeness centrality, and eigenvector centrality, respectively.
Thus, through meticulous study of these topological properties, the SB network was found to self-organize into a scale-free fractal state, having a weakly hierarchical organization.

Key Regulators Uncovered Through Clustering and Tracing
Through clustering using Newman and Girvan's algorithm, the network got divided into communities and sub-communities distributed in six hierarchical levels ( Figure 4A). It appeared that both modularity (Q) and Hamiltonian Energy (HE) decrease from top to bottom organization when plotted against the level of organization (Figures 5A,B). When the seed genes were traced from top to bottom organization of the SB network through these levels of hierarchy, only three of them were found to reach the last sixth level (motif level) of the SB network, namely, TNIP1, TNRC6B, and TRDMT1. These three seed genes along with their interacting partners in the last level of the network organization, namely, TNF and TRAF1 (interacting partners of TNIP1); KMT2C, KMT2D, and NCOA3 (interacting partners of TNRC6B); and DICER1 and HDAC1 (interacting partners of TRDMT1) were revealed to be the KRs of the SB network, the criterion being their ability to make triangular motifs [defined by G(3, 3)] ( Figure 6A) and their presence at every topological level ( Figure 4B). This is in agreement with the definition of KRs, according to which, KRs are the genes/proteins which are deeply rooted from the top to bottom organization of the network. These KRs are considered to be the backbone in maintaining a network's stability as they capacitate the network to combat any unacceptable alterations in it. Five more of our seed genes namely, ITPK1, FKBP8, TLE3, FOLH1, and TP53, reached the sixth level but they could not be considered as KRs because of their shortcoming to form triangular motifs.
The regulating ability of each of the KRs was estimated by defining the probability P KR (x l ) of the KRs to regulate the networks at different levels ( Figure 6B)    the proportion of the edges count (x), the KR has in the network/community/sub-community compared to the total edges count (E) present in the network/community/subcommunity at l level of organization: P KR (x l ) of all the KRs was found to be increasing with an increase in the level of organization. This suggests that the regulatory role of the KRs is even more powerful at deeper levels, thus these behave as active workers at the grassroots level.

Low Popularity Maintained by the Identified Key Regulators
All of our KRs had a somewhat low degree in the main network which dramatically changed in the subsequent levels of the organization. Of our six KRs, TNF has the popularity-rank (rank on the basis of degree) of 42 in the main network, decreasing as we move to HDAC1(70), NCOA3(374), KMT2D(559), KMT2C(612), TNRC6B(652), DICER1(735), TRAF1(806), TNIP1(1072), and TRDMT1 (1099). Surprisingly, the first 41 leading hubs at the complete network-level, failed to make their way to the last level of the organization leading us to infer that the popularity-rank of leading hubs does not assure the emergence of the hubs as KRs (Figure 7A).

Emergence of Low Degree Nodes Accompanied by High Degree Node as Key Regulators
In a network, when a node's degree is low, the node gains what strength it has from its neighbors and thus the influence it has over the network is a function of its neighbors' degree. In  BAX  CSNK1G2 TNFRSF10A TRAD DUCHL1   GPX1   CYCS  IL10RA  PRDX1   TNFRSF10B  CASP2  CDC37   TRAF2  INTU  UCP2  TLR3GNB4  GNAS   TNFRSF1B  C12orf5   TNFAIP3   CRTC2   RIPK3  TAB2  MAOA  HK2 PPP1R13B   GNG2  PTPN11  PRKAR2B   SFN LRP6  SLC2A1  AXIN1  BCL2L11   APC  MAPK1  MTOR  TNF MAPK8  G6PD  GLI 3   GLI 2  TNFRSF1A  PRICKLE1 MCL1  SOCS3 XIAP  STAT5B NOS3 YES1  PPP2R5D   IKBKB   APAF1 ATI C   CFLAR   DIABLO  SMO  MDM4 PMAIP1   DVL2  TXNRD 2  FAS  GNG13  NTRK1 IL18  CASP1  TSC2  PSMD10  CASP9  PSMC6  FANCD 2  SKP2PPP2R5E  STAT1  PSMD1 1  PSMD14  VHL  UIMC1 PSMC1  PSMB9   CBLB  PPP2R4   PSMC5   CCND3   NCOR2 CCND2 INSR  IRF3   PPP2R2A  NEDD4L  NFKBI A  PPP2CB SUFU  BCL2L1  GNG7  PTEN   DHH  CCT5  CDH1  PSMA5  FOXO 3  CREB1  GSK3B CTNNB1   IGF1  JAK2  GRB2  LEF1  FOS  SHH   FOXO 1  FLNA   TNRC6B  RPS6KA1IL10 CFL1 CDK5R1  SMAD 3  PPARG  ACTB   SMARCD3  NCOA2  TBL1X  PPARGC1A SET SIRT3  addition, a low-degree bridge node, connecting two high-degree nodes, is very important in a network despite its lower degree (Liu et al., 2016). Thus, a node's degree is not the sole determinant of its essentiality, rather, it depends on the topological position of that node. This is reflected in our results, where, the three seed genes, i.e., TNIP1, TNRC6B, and TRDMT1 have a relatively low degree compared to their interacting partners in the primary network, but all three of these genes, as well as their interacting partners, are found out to be the KRs of the SB network based on their ability to make it to the last level of the network organization. TNIP1 which has quite a low degree of 6, formed a motif in the last level of the organization with TNF and TRAF1 which have degree 197 and 31, respectively; likewise, TNRC6B which has a degree of 49 formed motifs with KMT2C, KMT2D, and NCOA3 which have degree 52, 59, and 87, respectively; and TRDMT1 which has such a low degree of 3 formed a motif with DICER1 and HDAC1 which have degree 40 and 168, respectively, in the primary network. This shows that the three low degree seed genes were regulated by their relatively high degree interacting partners from top to bottom organization of the SB network indicating that the until now unverified interacting partners could be directly or indirectly related to the pathophysiology of SB in humans.

Perturbations Driven by the Knockout of the Motifs Consisting of the Key Regulators
The hierarchical topology of the SB network saves it from breaking down completely after the removal of the KRs from it, owing to the strong self-organizing nature of a hierarchical network (Ravasz et al., 2002). However, when KRs are removed from a network it may lead to certain local and global disturbances in the network which will propagate through various levels of the hierarchy starting from top to bottom or bottom to top, causing changes in the network's topology. Based on the property of a module that its function is different from other modules and that its nodes have more relations to each other than to members of other modules (Dong and Horvath, 2007), we knocked out the KRs-containing modules/motifs that were found in the last level of the SB network organization from the main network-level till the last level and then studied the changes in the topological properties of the SB network due to the motifs knockout to comprehend the regulatory role of the KRs in the same combination as they were found in the motifs.
In the case of the TNIP1 motif (consisting of TNIP1, TNF, and TRAF1 KRs) the network/modules/submodules keep adapting themselves functionally to cope up with the removal of this motif till the fourth level, from the fifth level its knockout causes the sub-communities it is present in, to almost fall apart. Whereas, speaking of TNRC6B motif (consisting of TNRC6B, KMT2C, KMT2D, and NCOA3) and TRDMT1 motif (consisting of TRDMT1, DICER1, and HDAC1) the submodules from the fourth level itself lose their ability to adapt functionally in response to the loss of these motifs and thus break down. In all the cases a considerable change in the network's topological parameters was noticed, however, somehow the network reorganizes itself thereby proving to have an error tolerance. In all the motifs knockout networks, the value of γ is found to turn positive in the deeper organization levels, reflecting the loss of scale-free topology of the communities in the deeper levels of the organization (Barabási and Albert, 1999). For all the motifs, a consistent pattern is seen in the value of α in the knockout networks, an initial increase in the value of α indicates increased compactness of the networks so as to rescue the networks from falling apart but with a decrease in the level of organization, α is also found to be decreasing which points toward decreasing compactness of the communities (Ravasz and Barabási, 2003). The value of β becoming negative in the deeper levels of the organization shows that the networks have become disassortative in nature (Pastor-Satorras et al., 2001). The value of µ is noted to be decreasing in the deeper levels of the organization upon knock-out of the motifs, indicating the decreasing significance of the remaining hubs' regulatory roles in the networks (Borgatti and Everett, 2006). The increase in the value of δ reveals faster information processing within the network upon removal of the motifs so as to reorganize the perturbed network and rescue it from falling apart (Canright and Engø-Monsen, 2004). Lastly, the value of θ which is found to be decreasing, shows that information transmission is reduced because the KRs-containing motifs are removed (Canright and Engø-Monsen, 2006; Figures 1A, 2A, 3A).
In all the motifs knockout experiments, the exponent values of all of the topological properties showed drastic changes in lower levels of the hierarchy, signifying that local perturbation increases as we go to deeper levels starting from top to bottom in the network (Figures 1B, 2B, 3B).
By calculating and comparing the Hamiltonian energy (HE) of the original network/community/sub-community with the corresponding modified network/community/sub-community that resulted after the motifs knockout experiments, we observed a slight reduction in HE in the knockout networks at each level of organization ( Figure 7B). This demonstrates that the KRs knockout experiments result in loss of wiring/rewiring energy at all levels of the network organization.

Functional Enrichment and Pathway Enrichment of the SB Network's Seed Genes
We identified different molecular functions, biological pathways, and cellular components as well as KEGG pathways in which the seed genes (candidate genes for SB) are significantly enriched using DAVID functional annotation tool, listed in Table 2. It was found that most of the seed genes are enriched in neural tube closure, folic acid-binding, and regulation of transcription. These functions and pathways may serve important roles in the pathogenesis of SB.

Evidence of Self-Organization: Local-Community-Paradigm (LCP) Approach
Local-community-paradigm correlation of all the communities/sub-communities at each level of the hierarchy was computed. The average values of the LCP-corr for all the modules at each level are found to be greater than 0.95 and these values do not vary with an error bar (modules with zero LCP-corr were excluded while calculating the average) ( Figure 5C). The high values of the calculated LCP-corr for all the modules/sub-modules reflect strong compactness of these modules/submodules, this means that these modules/submodules are composed of tightly connected nodes, which strongly favors the preservation of the network properties and adaptation of the network against any unfavorable changes in it to prevent the network from breaking down.

Rich-Club Organization in SB Network
When a PPI network related to a disease exhibits a richclub formation, it evinces the existence of a pathological powerhouse within the network, composed of the most influential components which have to their credit of providing the network robustness and stability (Alawieh et al., 2015). We evaluated the existence of a rich-club structural ordering in the SB network using brainGraph package in R and the network was found to exhibit a rich club structural ordering as indicated by an increasing rich-club coefficient φ(k) as k increased ( Figure 8A). Further, the relevance of the uncovered rich-club organization was examined by assessing the normalized φ(k) (φ norm (k)) ( Figure 8B). φ norm (k) was calculated for the lowermost degree (1) to the third-highest degree (364) in the SB network with a rich-club showing between degrees 25 to 364. The degree found to have the highest φ norm (k) is 240 and the node having the highest φ norm (k) corresponding to the degree 240 is CTNNB1. The subnetwork of nodes with degrees corresponding to the normalized rich-club coefficient ≥1.1 is shown in Figure 8C and defined as the rich-club nodes. Besides being the KRs of the SB network, TNF, TRAF1, KMT2C, KMT2D, NCOA3, TNRC6B, DICER1, and HDAC1 emerged as rich-club nodes as well.

DISCUSSION
A dearth of knowledge about the genetic etiology of Spina bifida (SB) demands exploration of its genetic aspects. Though several candidate genes for SB have been reported in recent years, however, the potential mechanism underlying SB development remains unclear. To the best of our knowledge, our in silico study is the first attempt toward investigating the Protein-Protein Interaction (PPI) network of SB and identifying the Key Regulators (KRs) of SB through topological and rich-club analysis that regulate the whole network.
The study of the topological properties of our primary network (SB) consisting of 1,116 nodes and 40,886 edges, suggests that the network follows a weak hierarchical and scale-free fractal nature. The hierarchical topology indicates the two-tier organization of the network with one level representing local clustering of mostly low degree nodes into well-defined successive communities or modules and the other level representing more global connectivity in which the hubs serve as higher-order communication points between the interconnected communities. The fractal state of the network signifies a self-similar organization of the network and scale-free nature has to its credit of making the network stable. These topological properties allow efficient information processing within the network. The topological properties analysis of the SB network was essential to know the behavior of the network and to validate that the network is following the criteria of a scale-free and hierarchical network which further was requisite for the identification of the most significant KRs of the network. After clustering, the selforganizing behavior of the SB networks was investigated through the LCP technique which led us to infer that the networks sustain self-organization based on their dynamic nature and are tightly packed with efficient information processing. Through Gene Ontology and pathway enrichment analysis it was found that the seed genes are enriched in neural tube closure and regulation of transcription. These functions and pathways may serve important roles in the pathogenesis of SB.
Key regulators are the regulatory entities within a network that are sufficient to induce a complete complex developmental pathway. Out of our 117 seed genes, 3 genes, namely, TNIP1, TNRC6B, and TRDMT1, were found to reach the motif level of the SB network hierarchical organization. Though all the seed genes are involved in SB, however, the above-mentioned three seed genes are specifically considered to be the backbone of the SB network with regard to maintaining the network's stability based on their ability to make it to the grassroots level. Also, the importance of their interacting partner genes, namely, TNF and TRAF1 (interacting partners of TNIP1); KMT2C, KMT2D, and NCOA3 (interacting partners of TNRC6B); and DICER1 and HDAC1 (interacting partners of TRDMT1) cannot be ruled out as they formed motifs with these seed genes in the last level of the hierarchical organization of the SB network which means that these seed genes work in combination with their interacting partners and hence these genes, as well as their interacting partners, were revealed to be the KRs of the network. Since there is no direct strong evidence in the pre-existing literature for these interactor genes' involvement in SB to date, our study is indicating the potential role of these genes in SB.
The TNF (tumor necrosis factor) gene encodes a protein named TNF-α which is an inflammatory cytokine. This cytokine is responsible for a diverse range of signaling events within cells through the NF-κB pathway that ultimately leads to necrosis or apoptosis (Idriss and Naismith, 2000). TNF-α induces TNFAIP3, tumor necrosis factor-α-induced-protein 3 which is also known as A20, TNFAIP3 interacts with TNIP1 gene-encoded TNFAIP3 interacting protein 1 (TNIP1 protein), TNFAIP3 and TNIP1 then in conjunction regulate NF-κB dependent gene expression negatively (Heyninck et al., 1999). When NF-κB is active, it turns on the expression of those genes that keep the cell proliferating. So, a constitutively active state of NF-κB leads to various types of human tumors. Since TNIP1 is a negative regulator of the NF-κB pathway, so when the TNIP1-TNFAIP3 complex inhibits NF-κB, anti-apoptotic genes are not activated by NF-κB and hence apoptosis happens. TRAF1 gene-encoded tumor necrosis factor receptor-associated factor 1 (TRAF1) is an adapter in signal transduction and has roles in apoptotic processes in different cell types as well as in immunity. TRAF1 is one of the anti-apoptotic genes that are induced by NF-κB and its encoded protein, TRAF1, is responsible for resistance to apoptosis downstream of CD30 which is a protein of the tumor necrosis factor family (Zhang et al., 2014;Edilova et al., 2018). Hence, defects in all these three genes TNF, TRAF1, and TNIP1 are related to the formation of cancer in one way or the other. Also, a clear positive association has been shown in a study between SB and pediatric cancer pointing toward an overlap between genes impacting both development and malignancy. Therefore, these genes may have important roles in SB . NCOA3, Nuclear receptor coactivator 3, a member of the p160 family of coactivators facilitates the upregulation of gene expression by functioning as a transcriptional coactivator protein. It is recruited to the DNA promotion sites by ligandactivated nuclear receptors, where it acylates histones making the downstream DNA more accessible to transcription (Zhang et al., 2018). Like NCOA3, KMT2C (lysine methyltransferase 2C) and KMT2D (lysine methyltransferase 2D) proteins also modify histones by functioning as H3K4me1/2 methyltransferases, these two proteins have pivotal roles in embryonic morphogenesis, FIGURE 8 | (A) raw rich-club coefficient (φ) of the SB network as a function of degree k. (B) normalized rich-club coefficient (φ norm ) of the SB network as a function of degree k. (C) rich club nodes of the SB network, the KR genes which emerged as rich-club nodes are highlighted in red.
central nervous system development, and post-natal survival (Lavery et al., 2020). TNRC6B, Trinucleotide Repeat Containing Adaptor 6B protein participates in RNA-mediated gene silencing through both miRNA-dependent translational repression and siRNA-dependent endonucleolytic cleavage of complementary mRNAs by argonaute family proteins. It needs to be mentioned here that the TNRC6B paralog TNRC6A protein has been found to interact with histone-modifying complexes (Hicks et al., 2017). Hence, it is evident that these four genes, namely, NCOA3, KMT2C, KMT2D, and TNRC6B are interrelated to each other in their function. Also, it has been reported that the TNRC6B gene shows an association with myelomeningocele in the Mexican American population (Hebert et al., 2020). Therefore, these genes might have roles in SB.
HDAC1 (Histone Deacetylase 1) protein localizes into the nucleus and regulates eukaryotic gene expression via deacetylation of all the four core histones. It forms a complex with retinoblastoma tumor-suppressor protein and controls cell proliferation and differentiation. Also, in conjunction with metastasis-associated protein-2, it deacetylates and destabilizes p53 and modulates its effect on cell growth and apoptosis (Milazzo et al., 2020). The DICER1 gene encodes the endoribonuclease Dicer protein of the ribonuclease III family. MicroRNAs (miRNAs) are created by the Dicer endoribonuclease protein, which are known to control gene expression by binding to specific mRNAs in order to inhibit the access of ribosomes to these mRNAs and their subsequent translation (Robertson et al., 2018). TRDMT1 gene codes for a methyltransferase, tRNA aspartic acid methyltransferase 1, that methylates as the name suggests, a specific RNA molecule, the aspartic acid transfer RNA (tRNAAsp). It has been reported in a study carried out in a Dutch population that the A allele of the disease-associated rs2295809 polymorphism in TRDMT1 was associated with an increased RBC folate in the control population, which is in accordance with its risk-reducing effect observed in this study (Goll et al., 2006;Franke et al., 2009). So it is quite evident that one thing is common in these three genes that all of these are gene expression regulators and one of these i.e., TRDMT1 has been shown to be involved in SB. Therefore, these genes can be considered to have roles in the occurrence of SB.
Low popularity exhibited by all of the KRs in the main network leads us to infer that these KRs outrun the leading hubs in their significance to propagate signals through the different hierarchical levels of the network in order to conserve the network's stability and intrinsic properties. Motifs knockout experiments imparted maximum local perturbation in the deeper levels of the hierarchy but their removal did not trigger off the networks to break down, this substantiates the self-organizing ability of the networks that somehow kept harmonizing and coping with the motifs (consisting of the KRs) knockout.
The essentiality of the identification of rich-club nodes in a network lies in the fact that these nodes create a sub-structure within the network that increases the stability of the whole network and improves the efficacy of information transmission among high degree nodes. Through rich-club analysis, it was found that TNF, TRAF1, KMT2C, KMT2D, NCOA3, TNRC6B, DICER1, and HDAC1, apart from being the KRs of the SB network, also emerged as rich-club nodes. This finding further adds significance to our study as these KRs have proven out to be involved in the formation of the rich-club organization within the SB network to boot. This evinces that these key genes take part in becoming a transit backbone of the network and increase the network's stability as a whole thus providing robustness to the network.

CONCLUSION
In this study, we have employed graph theory to identify important functional modules/motifs consisting of the novel key regulators of SB. While our study puts forward the most important genes from among the candidate seed genes of SB, namely, TNIP1, TNRC6B, and TRDMT1, it also uncovers novel genes, namely, TNF, TRAF1, KMT2C, KMT2D, NCOA3, DICER1, and HDAC1 which can be directly or indirectly related to SB as these genes have been found to regulate the above-mentioned seed genes till the last level of the SB network, hence all of these genes are concluded to be the KRs of SB. As it is well known that the proteins in the same module share the same function so if all the KRs of a single module/motif are targetted together, they can serve as biomarkers for the diagnosis of SB. Since SB has multiple etiology and mechanisms so the combination of several biomarkers may have high diagnostic accuracy for SB compared to using a single biomarker. Experimental validation of these hypotheses would confirm the credibility of the identified KRs which may serve as the key targets for therapeutic interventions and as putative prognostic biomarkers.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
RI and NT conceived the study design instructed on data analysis. NT curated data and performed statistical and network analysis and prepared the figures of the numerical results. NT and AF analyzed and interpreted the simulated results. NT wrote the manuscript. SYA, SAA, AAJ, AF, ST, NA, and RI reviewed the manuscript. RI and NA jointly supervised the study and approved the final draft. All authors contributed to the article and approved the submitted version.