Origin of Biomolecular Networks

Biomolecular networks have already found great utility in characterizing complex biological systems arising from pair-wise interactions amongst biomolecules. Here, we review how graph theoretical approaches can be applied not only for a better understanding of various proximate (mechanistic) relations, but also, ultimate (evolutionary) structures encoded in such networks. A central question deals with the evolutionary dynamics by which different topologies of biomolecular networks might have evolved, as well as the biological principles that can be hypothesized from a deeper understanding of the induced network dynamics. We emphasize the role of gene duplication in terms of signaling game theory, whereby sender and receiver gene players accrue benefit from gene duplication, leading to a preferential attachment mode of network growth. Information asymmetry between sender and receiver genes is hypothesized as a key driver of the resulting network topology. The study of the resulting dynamics suggests many mathematical/computational problems, the majority of which are intractable but yield to efficient approximation algorithms, when studied through an algebraic graph theoretic lens.


Origin of Biomolecular Networks
. The influence of information asymmetry on growth of a PPI network. Interactions between macromolecules are envisaged as a biomolecular signaling game whereby a sender gene expresses a macromolecule, the signal, that then binds specifically to a receiver macromolecule, which then undergoes an action (such as an enzymatic reaction, or conformational change), which produces utility (fitness). The signal consists of the three-dimensional conformation and physicochemical properties of the macromolecule (1). The sender gene may undergo duplication, which has a dosage effect on the expressed macromolecule, resulting in signal amplification (2). This mechanism is expected to lower the Shapley value of the gene players in the genome, as the signal is partially redundant and so inefficient. Subsequently, the sender gene duplicate may acquire a new function (evolve a new signal) although the majority would be expected to undergo pseudogenization (3). Both these scenarios represent the re-establishment of a Nash equilibrium. If a new signal macromolecule evolves, it is likely to bind to the same receiver macromolecule initially. This preferential attachment arises because gene duplicates have a tendency to bind to their interaction partner initially, and then subsequently undergo interaction turnover Zhang et al. (2005), and is illustrated to the right of the figure. A key problem is how a new action by the receiver arises as the result of the evolution of a new signal; the new action may co-evolve with the new signal, or may be necessary first before a new signal can evolve. The latter would imply that receiver gene duplication and action genesis facilitates the evolution of new signals and sender genes (an exception would be when there is a conflict of interest; here the sender is more likely to make the first move in evolving a novel deceptive signal, and then the receiver would respond with a better discriminative recognition mechanism). This key, and novel aspect of gene duplication might be deciphered via consideration of the topology of directed graph representations of biomolecular interactions as sender-receiver signaling games. Refinements to the illustrated scheme include situations where the original signal protein binds to a variety of receiver proteins, or where the gene that codes for the receiver protein undergoes duplication ( Figure 2).
However, situations may arise where a sender has a conflict of interest with the receiver. This kind of misalignment can occur when a sender gene is selfish, and would prefer to replicate itself at the expense of the rest of the genome. Such genes are termed 'selfish elements,' and come in a variety of forms, all marked by decoupled replication from the rest of the genome Burt and Trivers (2006). In a signaling game, when there is a conflict of interest between sender and receiver, then the sender is expected to adopt some degree of deceptive signaling Crawford and Sobel (1982). Consistent with this prediction, there are a range of examples of selfish elements that utilize molecular deception Massey and Mishra (2018).
Gene duplication is a fundamental evolutionary driver of organismal complexity Lespinet et al. (2002). The first step in the process of duplication of a sender gene may be viewed as one of signal enhancement. Because gene duplication results in gene dosage effects, it also results in amplification of the signal, the expressed gene product. This strategy can be viewed as lowering the overall utility of the genome, given that there is a cost involved in producing excessive signal. It is, thus, expected to lower the Shapley value Shapley (1969) of the gene players that cooperate within the genome. This conflict is usually resolved when the duplicated gene becomes pseudogenized, the usual fate of gene duplicates Innan and Kondrashov (2010). Subsequent to duplication, the gene duplicates will sometimes diverge in function, although the exact mechanism remains to be elucidated Innan and Kondrashov (2010). This process represents signal divergence if the gene is a sender gene, and action divergence if the gene codes for a receiver macromolecule. The genesis of a new sender gene with a new signal may then promote evolution of a novel action by the receiver macromolecule, potentially facilitating duplication of the receiver gene itself. Likewise, the duplication of a receiver gene may facilitate the diversification of macromolecular signals that interact with the two duplicated receiver macromolecules. The process modifies the GRN or PPI network in a non-obvious manner and it deviates considerably from the way evolution of random graphs is usually treated, following Erdös and Rényi, discussed in more detail in Section 3 Erdös and Rényi (1959).
Signal and action genesis via gene duplication may have features in common with a Pólya's urn model of signal genesis Alexander et al. (2012) (Pólya's urn models are statistical models that involve sampling with replacement influenced by the identity of the sampled element. These models can lead to a 'rich get richer' effect, of which 'preferential attachment' is an example, discussed in more detail in Subsection 3.2). In this model, reinforcement of signals (similar to reinforcement learning) may promote the invention of new synonyms. These considerations may provide parallels for how signals originate elsewhere, not dissimilar to how new words in a language can arise from existing words by a process of derivation Cotterell et al. (2017). Mechanistic commonalities in the process of signal genesis in these diverse systems as exhibited in GRNs remain to be explored. These models hint at a possibly new, but universal form of "preferential attachment" that drives the variations in biomolecular networks as well as the selectivity in Darwinian evolution.

Network Topology, Evolution by Duplication, and Preferential Attachments
Consequently, the topology of gene networks is non-deterministic and yet not memoryless, since it must encode layers of ripples produced earlier via the dynamics of gene duplication (paralogs and orthologs), as amplified during the network's history. Just as physicists infer the theories of origin of universe from the cosmic background radiation, we expect to enrich our understanding of the origin of machinery of life (e.g., codon evolution, evolution of multicellularity, evolution of sex etc.) from a rigorous analysis of the signaling games and their equilibria, which has rippled through the extant biomolecular networks. Taking this analogy further, we observe that the ripples in gravitational waves have been proposed to reflect the existence of parallel universes, whose presence created asymmetries in the initial conditions, giving rise to filamentary structures in the visible universe Hawking and Hertog (2018) This comparison is inspired by the notion of a 'protein big bang' from a single (or handful of) ur-protein(s) in the first complex life forms, evolving by gene duplication into the extant 'protein universe,' hinting at the information asymmetries fossilized in the GRN and PPI networks. Dokholyan et al. (2002).
Likewise, we point out that information asymmetry in macromolecular sender-receiver interactions may point to evolutionary paths that might have been abandoned unexplored; which may suggest new engineering approaches needed by synthetic biology, or in drug discovery, or immuno-therapy. Note that during the process of evolution of signaling, gene duplication and deletion contribute to a certain degree of non-determinism and "conventionality" to the Nash equilibria that stabilize and manifest as non-trivial anisotropies in gene network topology.
In summary, the process of gene duplication, tempered by signal and action genesis can be thought of as a driver of preferential attachment in shaping the topology of gene networks, in which information asymmetry between senders and receivers is expected to play an indelible role. Figure 1 illustrates a basic mechanism whereby signal genesis may lead to preferential attachment during the growth of a PPI network. Topological features expected to hint at this process include: (i) the degree distribution, (ii) hierarchicity, (iii) assortativity and many others; they require powerful statistical and algebraic tools -covered in the later sections, where it is assumed that genome evolution is a complex process involving diverse groups of mutations such as insertions, deletions, conversions, duplications, transpositions, translocations and recombinations, and that it is further affected by selective constraints and effective population size and other factors such as the environment. With recent understanding of large scale cellular networks (regulatory, metabolic, protein-protein interactions) one must now aim at investigation between the evolutionary rates of a gene mutations and its effects on the network topology using mathematical models and analytics: see Wagner (1994). For instance, combining sequence analysis in a single genome and its close relatives, one can infer the rate and tempo of the evolutionary dynamics acting on the genome, and the resulting effects on the network's algebraic structures. We provide an example of how evolution by duplication leads to a preferential attachment mode of gene network growth in Figure 2, using the duplication of the p53 gene, and its paralogs p63 and p73 -all transcription factors regulating pathways involved in related phenotypes of somatic or developmental surveillance and interacting with similar family of genes (e.g., MDM2 or MDMX), as illustration 1 .
Note that these abstract models generate refutable hypotheses that need experimental verification and support from mechanistic explanations. However, unfortunately, the biochemical processes involved in the hypothesized preferential attachment dynamics are not fully understood. For example, the duplication processes are often driven by Non-Homologous End Joining (NHEJ), a pathway that repairs doublestrand breaks in DNA. To guide repair, NHEJ typically uses short homologous DNA sequences called microhomologies, which are often present in single-stranded overhangs on the ends of double-strand breaks Chang et al. (2017). When the overhangs are perfectly compatible, NHEJ usually repairs the break accurately. However, imprecise repair can lead to inappropriate NHEJ resulting in translocations, duplications and rearrangements Rodgers and McVey (2016), which add to variations that are random but not memoryless. Perhaps some of such hypotheses may need to be carefully examined using cancer genome data such as TCGA, and models of tumor progression. This analysis may also explain efficacy of certain therapeutic interventions in cancer as well as their failures via drug and immuno resistance.

NETWORK ANALYSIS
In this section, we discuss fundamentals of graphs, a mathematical formalism used in the study of biomolecular networks, as well as other related important topics. Consider a set of entities, denoted V and a set of binary relations between the entities E ⊆ V × V . When V denotes biomolecules and E denotes interactions between them (e.g., regulations, proximity, synteny, etc.), the resulting graph represents a biomolecular network. One important advantage of graphs is that they have intuitive graphical 1) The common ancestor of p53/p63/p73, and its DNA binding site, as players in a signaling game 2) Gene duplication produced p53 and the common ancestor of p63/p73 3) Gene duplication then produced p63 and p73  Figure 2. Gene duplication of p53, p63 and p73 as a signaling game, and GRN growth. An illustrative example of a signaling games view of network growth is provided by the paralogs p53, p63 and p73, which code for transcription factors, p53 being of critical importance in many cancers Joerger and Fersht (2006). Here, p53 and the common ancestor of p63/p73 duplicated (2), followed by the duplication and divergence of p63 and p73 Lu et al. (2009) Belyi et al. (2010) (3). The signal is the DNA binding site, while the receivers are the p53, p63 and p73 proteins (here the sender is the protein coding gene downstream of the DNA binding site). The receiver protein undergoes an action upon binding to the DNA binding site (the signal), which consists of the recruitment of additional transcription factors, and contribution to the assembly of the transcription initiation complex Nogales et al. (2017). The gene products of p53, p63 and p73 mostly bind to the same DNA binding sites Smeenk et al. (2008), thus each signal (and ultimately sender gene) has acquired two new binding partners, in addition to the original interaction with the gene product of the common ancestor of p53/p63/p73. This is a form of preferential attachment, which should influence network topology as the number of genes increase by duplication, as illustrated to the right of the figure. The signaling games perspective allows us to better understand scenarios where there is a conflict of interest between the genome, and a selfish entity such as a selfish element, a cancer or a virus. When there is a conflict of interest, a deceptive signal is expected to be emitted by the sender Crawford and Sobel (1982) (the selfish entity). Here, the DNA binding site of the selfish entity will mimic that of canonical DNA binding sites associated with normal cellular function, 'tricking' a transcription factor to bind to it, and altering the transcription of the sender gene (or alternatively abolishing transcription factor binding). Examples include cis-regulatory mutations in cancer Poulos et al. (2015) representation. Such networks evolve over time with additions and deletions to the sets V and E. In order to create a bridge to algebraic approaches, we extend the standard combinatorial definition by endowing it with additional maps.
Formally, a graph is a pair of sets G = (V, E) where V are the vertices (nodes, points) and E ⊆ V × V are the edges (arcs) respectively. When E is a set of unordered pair of vertices the graph is said to be undirected or simple. In a directed graph G = (V, E, o, t), E consists of an ordered set of vertex pairs, i.e. for each edge e ∈ E, e → (o (e) , t (e)) where o (e) is called the origin of the edge e and t (e) is called the terminus of the edge e  and ]. A graph is weighted if there is a map (weighting function, w : E → R + ) assigning to each edge a positive real-valued weight.

Topological Properties
Network properties are governed by its topology, such as degree distribution, clustering coefficients, motifs, assortativity, etc. Comprehensive treatments can be found in Thulasiraman et al. (2015); Loscalzo and Barabási (2016), and for more in-depth treatment regarding biomedical networks in Loscalzo et al. (2017).

Degree Distribution
The degree of a vertex v, deg(v), is the number of edges that connect the vertex with other vertices. In other words, the degree is the number of immediate neighbors of a vertex. In directed graphs in-degree and out-degree of a vertex can be defined as the number of incoming and outgoing edges respectively. Let n k be the number of vertices of degree k and |V | = N , the total number of vertices in the graph and |E| = M , the total number of edges in the graph. Note that k n k = N and kn k = v∈V deg(v) = 2|E| = 2M . The degree distribution is the fraction of vertices of degree k, P (k) = n k /N , and two isomorphic networks will have same degree distributions (though not necessarily the converse). Thus, the degree distributions can tell a great deal about the structure of a family of networks. For example, if the degree distribution is singly peaked, following the Poisson (or its Gaussian approximation) distributions, the majority of the nodes can be described by the average degree k = k kP (k) = 2M/N . The graph is said to be sparse, N )). Biomolecular networks are usually sparse, which can be fruitfully exploited in their algorithmic analysis. We can talk of typical nodes of the networks as being those that have degree distribution as those within 1 to 2 standard deviations from the average, while, with probability decreasing exponentially, it is possible to find nodes with degree much different from the average. While power-law degree distributions follow a completely different pattern: they are fat-tailed; the majority of the nodes have only few neighbors, while many nodes have relatively large number of neighbors. The highly-connected nodes are known as hubs.

Distance Metrics
One of the most fundamental metrics is the distance on a graph. First we define a walk of length m in a graph G from a vertex u to v as a finite alternating sequence of vertices and edges v 0 , e 1 , v 1 , e 2 , . . . , e m , v m , such that o (e i ) = v i−1 and t (e i ) = v i , for 0 < i ≤ m, such that u = v 0 and v = v m . Then the number of edges traversed in the shortest walk joining u to v is called the distance in G between u and v denoted by d (u, v). If there is a walk from u to itself, then we say that the set of vertices (respectively edges) form a cycle. The smallest number of m edges in a walk from u to itself is called a cycle of length m. The girth g(G), is the shortest cycle in G. A walk whose vertices are distinct is called a (simple) path.
The concept of a walk allows us to define other properties of the graph. A graph G = (V, E, o, e) is said to be connected, if any two vertices are the extremities of at least one walk. The maximally connected subgraphs are called the connected components of G. A giant component is a connected component containing a significant fraction of the nodes. The maximum value of the distance function in a connected graph is called the diameter of the graph. Frequently real life networks have small diameter and are said to exhibit small world phenomenon. For many biomolecular networks the average distance between two nodes depends logarithmically on the number of vertices in the graph.
Additionally, a complete graph G is the undirected graph, in which each vertex is a neighbor of all other vertices; deg(v) = N − 1, ∀v ∈ V ; or equivalently, each distinct pair of vertices are connected (or are adjacent) by a unique edge. G is then denoted as K N . A clique in an undirected graph is a subset of vertices such that its induced subgraph is complete. Additional combinatorial invariants of graphs useful in the analysis of networks can be defined (see Supplementary material for details).

Expanding Constants
Let G = (V, E, ·, ·) be an undirected graph. Then for all F ⊂ V , the boundary ∂F is the set of edges connecting F to V \ F . The expanding constant, or isoperimetric constant of X is defined as, For molecular network, then, the invariant h(X) measures the quality of the network with respect to the flow of information within it, (e.g., via chemical reactions, or signaling). Larger h(X) implies better expansion, faster mixing, faster partitioning, and many other related properties that may give the network a selective advantage.
Using various combinatorial algorithms devised for the study and analysis of biomolecular networks, one may compute h(X) to determine their complexity. However precise characterization of h(X) itself is an intractable (i.e., NP-complete) problem. Isoperimetric inequalities give bounds on h(X) in terms of a related algebraic invariant, γ(X) -called its spectral gap, determination of which has complexity O(|V |) c , where c is at most 3; furthermore, c = 1 for many sparse graphs. We give isoperimetric bounds and results applicable to biomolecular networks in the Supplement, where we also introduce local Cheeger constant. We also introduce algebraic invariants in Section 2.2.

Clustering and Clustering Coefficients
Biological networks are modular, forming communities and hierarchies, likely to have been sculpted by EBD (Evolution by Duplication). To study these local structures in network science, one may perform community analysis, which aims to identify a group of nodes that have a higher probability of connecting to each other than to nodes from other communities (see for exmple Pellegrini (2019)). Various notions such as k-cliques, k-clubs and k-clans have been developed to detect communities, but they are ultimately closely connected to the problem of finding cliques and consequently, do not generally lend themselves to any reasonable algorithm other than brute-force enumeration. However, even detecting communities approximately may prove valuable for general evolutionary studies, since in these biological networks communities determine how specific biological functions are encoded in cellular networks -and thus subjected to Darwinian selective pressure, since these players are likely to have formed communities in the first place to carry out specific cellular functions. (see Hartwell, Hartwell et al. (1999)). Figure 4 highlights significant evidence that communities play important role human disease networks (see Loscalzo et al. (2017)).
Usually a simpler approach is commonly employed and deals with the problem of clustering in a graph, which seeks to partition the graph into disjoint subgraphs such that nodes in each such subgraph are "closer" to the other nodes in the same subgraph, while they are "farther" from the nodes of other subgraphs. Hierarchical clustering algorithms have been developed to uncover communities (approximately) in polynomial time and depend upon the similarity matrix (x ij ), where the entry x ij equals the distance between node i and node j. Among the classical algorithms are included those by Ravasz and by Girvan and Newman Girvan and Newman (2002). Other related algorithms include those for random-walk betweenness and network centrality.
The local clustering coefficient captures the degree to which the neighbors of a given node link to each other. In general, for undirected graphs, the local clustering coefficient C i of node i with degree k i is defined as where the numerator L i is the actual number of connections between k i immediate neighbors of i, and the denominator is the number of connections if the neighbors formed a complete graph (i.e. a clique). Note that an undirected complete graph K k i of k i nodes has k i (k i − 1)/2 edges. Thus, a fully clustered node will have C i = 1 and for completely isolated node C i = 0. We can define the (average) clustering coefficient of the whole network with N nodes as The clustering coefficients can be used to characterize a network's modularity, as discussed later (in Section 3) in details.

Subgraphs and Motifs
Biomolecular networks have been found to contain network motifs, representing elementary interaction patterns between small subgraphs that occur substantially more often than as predicted by a completely random network of similar size and connectivity. The presence of such motifs is usually explained by an evolutionary process that can quickly create (usually by a variation involving duplication) or eliminate (usually by a selection process that favors pseudogenization and complementation) regulatory interactions in a fast evolutionary time scale -relative to the rate at which individual genes mutate. It is usually hypothesized that the underlying evolutionary processes are convergent. Thus efficient algorithms to detect such motifs are important in the analysis of biomolecular networks. These algorithms focus on estimating how much more frequently a subgraph isomorphic to a motif graph (with n vertices and m edges) occurs relative to what would be expected by pure chance.
The number N mn of subgraphs with n nodes and m interactions expected of a network of N nodes can be estimated from the two key topological parameters of a complex network -namely the power-law exponent β and the hierarchical exponent α as we discuss in equations (1 and 2) below. In general the subgraph motifs can be classified in two types: Type I motifs are those where (m − n + 1)α − (n − β) < 0, and type II subgraph motifs are those that satisfy the reverse inequality. One can determine their numbers N I and N II approximately as a function of (m − n + 1)α − (n − β) and n max , the degree of the most connected node in the network. One can show that N I nm >> N II . One can also show that the relative number of Type II subgraphs is vanishingly small compared to Type I.

Algebraic Invariants and Spectrum
The intuitive pictorial/combinatorial representation of graphs is an extremely useful aid to their understanding. However, computing the topological properties of graphs combinatorially is computationally challenging especially when the size of the graph becomes large. As noted earlier, indeed, most combinatorial algorithms on biomolecular networks such as on PPI networks and GRNs are computationally complex problems (most of them fall in the NP-complete complexity class) Karp (2011). Therefore, in order to carry out any quantitative and computational analysis, graphs are better represented as algebraic objects. This representation allows us to use linear algebra and mathematical analysis techniques. The key to this representation is the adjacency matrix A(G). It is defined as {0, 1} n×n matrix in which, A ij = 1 if the vertices i and j are connected (∃e ∈ E, o(e) = i, t(e) = j) and 0 otherwise. The matrix is symmetric if the graph is undirected. For weighted graphs we can assign weights w ij for existing edges.
Algebraic properties provide us with tools to deduce various properties of the biomolecular networks. In particular, the spectral representation of the graph is of importance for a number of applications such as graph classification, etc. We can think of the adjacency matrix A as operating on the space V = C n of complex n-tuples written as column vectors x,y as follows Ax → y. It can be shown that there are directions left invariant in this space. That is to say, A i x i = λ i x i where λ i are the eigenvalues and corresponding x i the eigenvectors (spanning invariant directions) of the adjacency matrix for 1 ≤ i ≤ n. The spectrum of the graph G is defined as the collection of eigenvalues of the adjacency matrix Spec(G) = Spec(A) = λ 1 , .., λ n . Naturally, if A is a real symmetric matrix, then the eigenvalues of A are real.
In particular, one algebraic invariant of the graph is the spectral gap γ(G). It can be shown that the spectral gap gives excellent bounds on a combinatorial invariant, the Cheeger constant h(G) (see the Supplementary material).

NETWORK EVOLUTION
Starting with the seminal work of Erdös and Rényi Erdös and Rényi (1959), a number of mathematical frameworks have been developed to model the "evolution" of graphs, covering the family of biomolecular networks. These frameworks may prove useful in explaining why most biological networks have certain non-obvious properties: namely, (i) Small-world property; (ii) High clustering coefficients (varying with degree distribution); (iii) Emergence of "hubs." Such network models are ultimately expected to capture various observed properties of biomolecular networks, and the evolutionary trajectories leading up to them.

Erdös and Rényi Model
The Erdös and Rényi model of random graphs (ER-graphs, denoted G(n, p)) is characterized by two parameters, the number of vertices in the network N and the fixed probability of choosing edges p Erdös and Rényi (1959). The graph G is generated by choosing N vertices and connecting each pair of vertices with probability p. The model yields a network with approximately p N 2 = O(pN 2 ) randomly distributed edges. The probability of choosing a specified graph G with N vertices and e edges is therefore M e p e (1 − p) M −e , where M = N 2 = the maximum number of possible edges connecting N vertices. It can be shown that in such random graph the average vertex degree is k = p(N − 1) = O(pN ). The diameter of such graph is d = ln N/ ln k ≈ ln N/(ln N − ln(1/p)) which is small compared to the graph size. Thus, random graphs exhibit "the small world property." The degree distribution for ER graphs is Then the local clustering coefficient is C i = p is independent of the degree of the node and the average clustering coefficient C = p/N scales with the network size. Therefore, the standard ER random model seems not to capture either the properties of degree distribution or the clustering coefficient of biomolecular networks.
Typically, an ER random graph model is used as a "null model" for the evolutionary process. However, while deviations from randomness are frequently used as evidence for the direct action of natural selection, often non-randomness may reflect neutrally generated (non-adaptive) emergent phenomena Massey (2015). We emphasize here that many topological features of biomolecular networks are unlikely to be directly selected for, but instead are a side-product of network growth, and decay, captured by the dynamics of edge and node addition and removal.

Small World Model
The biomolecular networks have features that are not captured by the Erdös and Rényi random graph model. As we have seen, random graphs have low clustering coefficient and they do not account for formation of hubs. To rectify some of these shortcomings, the small world model or popularly known as the six degree of separation model was introduced as the next level of complexity for probabilistic model with features that are closer to the real world networks Watts and Strogatz (1998); Watts (1999). The evolution and dynamics of such networks have been discussed in detail Watts (2003), in particular in the diseases propagation literature Dodds and Watts (2005).
In this model the graph G of N nodes is constructed as a ring lattice, in which, (i) first, wire: that is, connect every node to K/2 neighbors on each side and (ii) second, rewire: that is, for every edge connecting a particular node, with probability p reconnect it to a randomly selected node.
The average number of such edges is pN K/2. The first step of the algorithm produces local clustering, while the second dramatically reduces the distance in the network. Unlike the random graph, the clustering coefficient of this network C = 3(K − 2)/4(K − 1) is independent of the system size. Thus, the small world network model displays the small world property and the clustering of real networks, however, it does not capture the emergence of hubby nodes (e.g., P53 in biomolecular networks).

Scale-free Network Models
Most biomolecular networks are hypothesized to have a degree distribution, described as scale-free. In a scale free network the number of nodes n k of degree k is proportional to a power of the degree, namely, the degree distribution of the nodes follows a power-law where β > 1 is a coefficient characteristic of the network Barabási and Albert (1999). Unlike in random networks, where the degree of all nodes is centered around a single value -with the probability of finding nodes with much larger (or smaller) degree decaying exponentially, in scale-free networks there are nodes of large degree with relatively higher probability (fat tail). In other words, since the power low distribution decreases much more slowly than exponential, for large k (heavy or fat tails), scale-free networks support nodes with extremely high number of connections called "hubs." Power law distribution has been observed in many large networks, such as the Internet, the phone-call maps, the collaboration networks, etc. Képès (2007); Barabási (2009); Loscalzo and Barabási (2016). A caveat to these reports is that inappropriate statistical techniques are often been used to infer power law distributions, and alternative heavy tailed distributions may fit the data better Clauset et al. (2009a). However, the power law is a useful approximation that allows mechanisms of network growth to be explored, such as Preferential Attachment, discussed next, while the examination of alternative heavy tailed distributions is set as an Open Problem.

Preferential Attachment
The original model of preferential attachment was proposed by Barabási-Albert Barabási and Albert (1999). The scheme consists of a local growth rule that leads to a global consequence, namely a power law distribution. The network grows through the addition of new nodes linking to nodes already present in the system. There is higher probability to preferentially link to a node with a large number of connections. Thus, this rule gives more preferences to those vertices that have larger degrees. For this reason it is often referred to as the "rich-get-richer" or "Matthew" effect.
With an initial graph G 0 and a fixed probability parameter p, the preferential attachment random graph model G(p, G 0 ) can be described as follows: at each step the graph G t is formed by modifying the earlier graph G t−1 in two steps -with probability p take a vertex-step; otherwise, take an edge-step: (i) Vertex step: Add a new vertex v and an edge {u, v} from v to u by randomly and independently choosing u proportional its degree; (ii) Edge step: Add a new edge {r, s} by independently choosing vertices r and s with probability proportional to their degrees.
That is, at each step, we add a vertex with probability p, while for sure, we add an additional edge. If we denote by n t and e t the number of vertices and edges respectively at step t, then e t = t + 1 and n t = 1 + t i=1 z i , where z i 's are Bernoulli random variables with probability of success = p. Hence the expected value of nodes is n t = 1 + pt.
It can be shown that exponentially (as t asymptotically approaches infinity) this process leads to a scale-free network. The degree distribution of G(p) satisfies a power law with the parameter for exponent being β = 2 + p 2−p . Scale-free networks also exhibit hierarchicity. The local clustering coefficient is proportional to a power of the node degree where α is called the hierarchy coefficient.
This distribution implies that the low-degree nodes belong to very dense sub-graphs and those sub-graphs are connected to each other through hubs. In other words, it means that the level of clustering is much larger than that in random networks.
Consequently, many of the network properties in a scale-free network are determined by the local structures -namely, by a relatively small number of highly connected nodes (hubs). A consequence of this structure of the scale-free network is its extreme robustness to failure, a property also displayed by biomolecular networks and their modular structures. Such networks are highly tolerant of random failures (perturbations); however, they remain extremely sensitive to targeted attacks.

Assortativity Network Model
Assortative mixing refers to the property exhibited by a preference of nodes to attach to similar (respectively, dissimilar) nodes; for example, high-degree vertices exhibit preference to attach to highdegree (resp. low-degree) vertices. Network models, discussed earlier and including the preferential attachment model, do not capture such important properties exhibited by real biomolecular networks Girvan and Newman (2002). Assortativity can be measured by the Pearson correlation coefficient r of degrees of linked nodes Girvan and Newman (2002). Positive correlation means connections between nodes of similar degree (assortativity) and negative correlation means connections between nodes with different degree (disassortativity). Unlike technological networks and social networks (showing assortative mixing), biological networks appear to evolve in a disassortative manner.
Many genetic networks, especially the DNA networks, lead to directed graphs. Assortative mixing can be generalized to directed biological graphs . For directed networks two new measures, in-assortativity and the out-assortativity , can be defined measuring the correlation between the in-degree r in and out-degree r out of the nodes respectively. Biological networks, which have been previously classified as disassortative, have been shown to be assortative with respect to these new measures. Also it has been shown that in directed biological networks, out-degree mixing patterns contain the highest amount of Shannon information, suggesting that nodes with high local out-assortativity (regulators) dominate the connectivity of the network . The occurrence of assortativity in social networks has been attributed to a process of homophily (that is people tend to associate with others on the basis of ethnicity, religion, sports preferences etc McPherson et al. (2001); Newman (2003a)). The mechanisms that give rise to assortativity in biomolecular networks likely arises by a similar proximate mechanism of like nodes forming edges with like nodes, but the ultimate cause(s) remains unclear.

Duplication Model
Our earlier discussions suggest that biomolecular networks exhibit power-law degree distribution. However, unlike other complex networks, such as the Internet, the growth exponent of biomolecular networks typically falls into a lower range 1 < β < 2, as opposed to β ≥ 2. This discrepancy has been suggested to have resulted from evolution by gene duplication dominating evolutionary mechanism Chung et al. (2003). Various biomolecular networks have been studied using a partial duplication process, which proceeds in the following manner: Let the initial graph G 0 have N 0 vertices. In each step, G t is constructed from its previous graph G t−1 as follows: A random vertex u is selected. Then a new vertex v is added in such a way that for each neighbor w of u, a new edge (u, w) is added with probability p. The process is then applied repeatedly. The full duplication model is simply the partial model with p = 1.
It has been shown that as the number N of vertices becomes infinitely large, the partial duplication model with selection probability p generates power-law graphs with the exponent satisfying the transcendental equation Chung et al. (2003) p(β − 1) = 1 − p β−1 , whose solution determines the scale-free exponent β as a function of p. In particular, if 1/2 < p < 1 then β < 2.
For illustrative purposes, we describe below an abstract gene network growth model incorporating the processes of gene duplication and deletion, as described above ( Mishra and Zhou (2004) and Zhou (2005)). Using a Markov chain model the following features were investigated: (i) the origination of the segmental duplication; (ii) the effect of the duplication on the genome structure; and (iii) the role of duplication and deletion process in the genomic evolutionary distance. Unlike standard models of stationary Markov chain models, most processes in evolutionary biology belong to the group of non-stationary Markov processes, in which the transition matrix changes over time, or depends upon the current state.
This model results in the neutral emergence of scale-free degree distributions. It shows that the genomes of different organisms exhibit different network properties, likely reflecting differences in the rates of gene duplication and deletion Mishra and Zhou (2004). This analysis provides an example of how network topology can be used to provide insight into fundamental molecular evolutionary (neutral/Markov) processes in different species. Note that the model is relatively idealized, as it does not account for higher order interactions in a population involving: effective population size and allelic fixations; sex, diploidy and sex-chromosomes (e.g., X and Y in mammals or W and Z in birds, etc.); surveillance and repair in somatic cells; embryonic lethality; homologous recombination, etc. The mathematical model explored here is kept simple to motivate the machinery from graph theory developed later.

Hierarchical Network Models
Another interesting model, introduced by Ravasz and Barabasi and dubbed hierarchical network model, simulates the characteristics of many real life complex models and may be relevant. The resulting networks have modularity, high degree of clustering, and scale-free property. Modularity refers to the network phenomenon where many sparsely inter-connected dense subgraphs can be identified -"one can easily identify groups of nodes that are highly interconnected with each other, but have only a few or no links to nodes outside of the group to which they belong to." (from Ravasz and Barabási (2003)).
A generative process for hierarchical network model may be described as follows: For instance, consider an initial network H 0 of c fully interconnected nodes (e.g., c = 5). As a next step, create (c − 1) replicas of this cluster H 0 and connect the peripheral nodes of each replica to the central node of the original cluster to create H 1 with c 2 (e.g., c 2 = 25 ) nodes. This step can be repeated recursively and indefinitely, thereby for any k steps the number of nodes generating the graph H k with c k+1 nodes. If the central nodes of H 0 is called a hub and other nodes peripheral, then each recursion replicates additional copies of hubs and peripheral nodes.
One can carry out carry out a recursive analysis and shows that one obtains a power-law (i.e. scale-free) network with exponent β = 1 + ln(c) ln (c−1) . The local clustering coefficients (for the hub-nodes) follow C(k) ≈ 2 k . Also, one can show that this duplication feature of evolution leads to hierarchical behavior of the network. The networks are expected to be fundamentally modular, in other words, the network can be seamlessly partitioned into collection of modules where each module performs an identifiable task, separate from the function(s) of other modules. One can also show that the average clustering coefficient on N nodes at any given stage is about C = .7419282.. (for c = 4), C = 0.741840 (for c = 5), and a constant for a fixed c, independent of N (see Ravasz and Barabási (2003), and for exact computations Noh (2003)). While for the preferential attachment model of Barbasi-Albert has the average clustering coefficient C on N nodes decreases as 1/N , in addition not exhibiting modularity.

OPEN PROBLEMS AND FUTURE CHALLENGES
The study of biomolecular networks is still a relatively young field and has thus far focused on a mechanistic perspective. As we begin to explore it from an evolutionary view point, we encounter a large array of promising areas of investigation -most of which focuses on how information asymmetries among the gene players ultimately sculpt the information flow, as necessary for an organism to navigate in a complex and fluctuating environment. In particular, at its core this program requires an explanation of how features of genome evolution and structure might be algorithmically inferred from a network science perspective.
The traditional approaches of phylogenetic study may be applied here, but examining specifically the family of species-specific biomolecular networks. Thus mathematically we would need the networks to be aligned, motifs to be mapped to each other and network-distances to be correlated to deep evolutionary time. In order to account for the evolution by duplications, orthologs and paralogs of a gene (or gene families) are to be identified and connected to their roles in biochemical pathways. Ultimately, this analysis could be targeted at extracting the origin of various information-asymmetric signaling games and how they stabilized in their Nash equilibria.
Network analysis is used in disease studies, but there have been more focused studies with applications to disease processes in cancer. In Figure 4 we show part of an interactome network useful in deciphering aberrant interactions in diseases (Figure 2.3 from Loscalzo et al. (2017)).

Algorithmic Complexity Issues
A key problem central to this program would be in detecting isomorphism mappings among pairs of graphs or subgraphs, a problem of infeasible algorithmic complexity (assuming P = N P .) We start with a discussion of these issues and cite heuristics that can tame the problem, albeit computing the solutions approximately.

Intractability: NP-Completeness
Many combinatorial optimization problems seem impossible to solve except by brute-force searches evaluating all possible configurations in the search space. They belong to a complexity class called NPcomplete and include such problems as whether a graph has a clique of size k. Since finding certain shared motifs in a class of networks shares many computational characteristics of the clique problem and since it could be central to discovering important evolutionary signatures (e.g., EBD), it seems unlikely that it would be possible to characterize the evolutionary trajectories precisely -especially when the number of genes involved are in the thousands. See the supplement for additional discussions on graph representations and to derive their algebraic invariants, that provide bounds on complexity of algorithms possibly leading to excellent approximate results in the study of sparse complex networks (see ; .

Problem 4.A
Classify various computational problems involved in detecting evolutionary trajectories of biomolecular networks and characterize their algorithmic complexity.

Problem 4.B
Explore PTAS (Polynomial Time Approximation Schemes) for these problems -Especially when the graphs satisfy certain sparsity, modularity and/or hierarchy properties.

Algebraic Approximation
As described earlier, many interesting topological features of a graph can be computed efficiently (on both sequential and parallel computers) from their descriptions in terms of adjacency matrices. The resulting spectral methods have found recent applications in complex networks (e.g., communication, social, Internet) (see Spielman (2018), , Spielman and Teng (2014), Spielman and Teng (2013), Spielman and Teng (2011a), Spielman and Teng (2004)  , , ), MacKay (2003). These methods are efficient (linear time complexity) for sparse graphs, whose number of edges is roughly of the same order as the number of vertices. Thus, they are well suited to biomolecular networks (for example for clustering, community detection, hubs, robustness, assortative mixing, spreading and mixing, closeness, isomorphism, among others).
Thus, spectral graph theory may be expected to have many applications in the analysis of biomolecular networks, most prominently, in clustering, graph similarity and graph approximation, but also in smoothing analysis and sparsification. One can envisage that many, if not most, classical network algorithms in biomolecular networks can be made faster by spectral methods. Indeed, since most biomolecular networks are sparse -both in terms of sparse connections, and in precise algebraic sense (see the supplementary section), these algorithms likely lead to linear time algorithms. The smoothing analysis methods, as well as sparsification approximations are worth exploring in these contexts.
Another fruitful direction is in parallelizing these algorithms. As an illustration, in several studies of biomolecular networks it would be useful to identify when two networks X 1 and X 2 are "close." We may wish to say that two networks are close if Spec(X 1 ) and Spec(X 2 ) are close -a computational problem that is polynomially computable (and efficiently parallelizable) (see Spielman and Teng (2013)). We can now give a mathematical formulation of this closeness, which can also be incorporated into phylogenetic studies. These biomolecular networks may be annotated with weights that are linear or quadratic approximation of relations, as common in these studies. These analyses may identify subnetworks that have been influenced by EBD, in concert with selection.

Problem 4.C
Classify various algebraic problems involved in detecting evolutionary trajectories of biomolecular networks and characterize their ability to approximate. Explore their practical implementations on sequential and parallel computers.

Design Principles via Motif Analysis
The study of Systems Biology postulates that there are important design principles of biological circuits that provide a great deal of insight. The connections of gene and protein interaction networks are assumed to provide the necessary robustness and control to achieve cellular function in the face of chemical noise. However, it remains unclear how random variations alone provide such robustness. A possible explanation may come from a game-theoretic model that lead to stable equilibria and is expected to have precipitated from duplication of genes, interactions and motifs.

Machine Learning
The biomolecular networks of interest are derived from highly noisy data e.g., CHIP-Chip, CHIP-Seq (for GRN) or co-localization or two-hybrid (for PPI) and consequently, the inferred edges of the network may miss certain genuine interactions or include several spurious interactions. Various machine learning algorithms (with fdr, false discovery rates, control and regularization techniques) have been devised in order to improve the accuracy of such models. Biomolecular networks from related species (with orthologs and paralogs analysis) are often combined to improve the accuracies and cross-validate results. The accuracies may be further ascertained via various local properties.
One important local property of networks are so-called network motifs, which are defined as recurrent and statistically significant sub-graphs or patterns. Thus, network motifs are sub-graphs that repeat themselves in a specific network or even among various networks. Each of these sub-graphs, defined by a particular pattern of interactions between vertices, may reflect a framework in which particular functions are achieved efficiently. Indeed, motifs are of notable importance largely because they may reflect functional properties. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks. Although network motifs may provide a deep insight into the network's functional abilities, their detection is computationally challenging. Thus an important challenge for both experimental and computational scientists would be to study the evolutionary dynamics starting with the experimental data ab initio, as well as in improving the accuracy and efficiency of both the experimental and algorithmic techniques simultaneously.

Problem 4.D
Classify the species distributions of the different forms of heavy tailed distributions (e.g. power law, exponential, power law with exponential decay, lognormal), in different types of biomolecular network, and infer the mechanistic causes during network growth, and ultimate molecular evolutionary origins

Problem 4.E
Characterize the motifs in the biomolecular networks of closely related species starting with the noisy experimental data. Explain the structure of the motifs via their effect on the information flow. For instance, one may focus on DOR (Dense Overlapping Regulons) motifs and how they might have evolved from a simpler ancestral regulon Alon (2006).

Problem 4.F
Study Subgraph Isomorphism Algorithms (and heuristics) for sparse graphs and identify special cases most suitable for studying evolutionary trajectories, while relating them to biomolecular design principles.

Network Alignment
Critical to the evolutionary studies, described above, is the topic of network alignment and subsequent network tree building. Networks may be aligned in a pairwise fashion to calculate similarity, and from this a distance matrix calculated, and used for the construction of a network tree, showing the relationships between multiple networks. For example, in the case of meta-metabolic networks, such studies will reveal relationships between the meta-metabolic networks of different microhabitats. A plausible prediction is that the network tree should show convergent evolution in microbial communities from microhabitats with similar conditions (e.g., anaerobic habitats). Thus this approach could lead to a tool to study convergent evolution of microbial community structure in similar habitats Goldford et al. (2018).
From an algorithmic point of view, one may employ any of the three types of network alignment approaches: 1. where node identity is known; 2. where node similarity can be determined (based on sequence similarity for example); and 3. where node identity is unknown, here only network topology is used for alignment.
The first is a straightforward edge alignment. However, a refined expression is required that incorporates similarities in edge widths in addition to the basic edge alignment (presence / absence of common edges between networks). There do exist some first generation heuristics that utilize the second and third types of alignment approach (i.e., sequence similarity and topology, and only topology) Kuchaiev and Przulj (2011), but the underlying graph isomorphism problem is known to be #P-complete. But these heuristics, as would be expected, do not work well -a straightforward test for this problem is applying them to align the social networks of the Gospels of Luke and Matthew (Figure 3) -the Jesus node should always align, as it is rather obvious topologically; but often leads to failure.

Problem 4.G
Classify and characterize the graph alignment algorithms. . Topological Alignment of Networks. Similar Biomolecular networks could be topologically aligned and compared in order to express an evolutionary distance, which may then augment the traditional approaches of phylogenetic study. In order to account for the evolution by gene duplications, genes (or gene families) are to be identified and connected to their roles in biochemical pathways. Such an approach would lead to a program to understand the critical role of information asymmetries in driving evolution. Network alignment, a core problem in this program, is computationally intractable. To sharpen our intuition, we illustrate the problem using the social networks of the Gospels of Matthew and Luke. These networks represent social interactions between characters in the gospels of Matthew (a) and Luke (b). These were chosen as a basic test for topological alignment procedures, given that they share a similar number of nodes, and the highly connected node of Jesus. A straightforward test for the efficacy of a topological alignment algorithm therefore constitutes aligning both networks and verifying that the Jesus node from both networks is matched

Somatic Evolution and Cancer
Cancer is a complex disease, but governed by somatic genomic evolution, as propelled by mutation. Thus as a consequence GRNs may be used to better understand cancer susceptibility, map its progression, design better tailored therapies, and better understand the evolution of endogenous anti-cancer strategies. Cancer genes are often network hubs Karimzadeh et al. (2018), as they are often involved in critical developmental pathways. But a better network analysis will shed light on many natural questions: Why is it so? How does this come about from the process of network growth over evolutionary time? What clues do they provide to understand the somatic evolution in cancer and its progression?
During cancer progression, the disease reduces a cell's healthy genome into an aberrant mutant, where cancer eventually leads to metastasis, ultimately resulting in death of the patient. The healthy cells in the patient may be thought to possess a normal network, that is a gene network that engenders health and well-being. Cancer progression is reflected by a dynamic change of the normal network into an aberrant network. The aberrant network manifests itself by tumorigenesis, and finally metastasis. There is a substantial literature enumerating the identity of oncogenes and tumor suppressor genes, which aberrantly gain function (e.g., amplification of copy number) or lose function (e.g., deletion in copy number, hemior homozygously), respectively. They modify the cell biology of cancer progression, effected via the dynamics of GRN and PPI networks in cancer progression -all remain to be fully characterized.
Of particular interest is the question whether there is an identifiable phase transition in network topology associated with metastasis. Figure 2 shows a simple model for how the evolution of p53 and its paralogs may affect GRN topology; such molecular evolutionary signaling games approaches may help to better understand the motifs associated with oncogenes in GRNs. An additional important factor in cancer is the pervasive occurrence of molecular deception Bhatia and Kumar (2013), which from a signaling games perspective is consistent with cancer's conflict of interest with somatic cells. The identity of deceptive macromolecular signals may be incorporated into the network, potentially shedding a novel light on the mechanism of carcinogenesis. The genesis of deceptive signals therefore is expected to impact and drive carcinogenesis.
An additional factor to understanding this biology are copy number variants (CNVs) -types of gene mutations where a number of large sections of genomic DNA may be duplicated (or deleted), resulting in dosage effects of the resident gene sequences, which are exactly duplicated (or deleted). The numbers of CNVs can commonly vary substantially within a population, and have been shown to have significant roles in the propensity to develop cancer Krepischi et al. (2012). An increase in the number of CNVs would have the effect of enhancing the weight of an edge, which represents the interaction of the CNV gene product with its macromolecular binding partner. Such a network variant represents an increased disposition to develop cancer, and can be understood as occupying a position in 'network space' (the space of all possible network topologies) in greater proximity to an aberrant network, than a normal network.

Problem 4.H
Study Cancer progression models in terms of GRN's and identify the role of driver and passenger genes in the somatically evolving networks.

Gene Regulation and 3D Networks
In the genome of the ancestral life form, once a number of genes with separate function had evolved, it then would have become beneficial to evolve gene regulation. Therefore, genes with the dedicated function of regulating other genes in the genome would have arisen (transcription factors). The combination of regulatory and functional genes would have comprised the first gene regulation network. Increases in organismal complexity have been facilitated by an increase in the complexity of the gene regulation network Burton (2014).
Recent work has outlined the importance of three-dimensional proximity of genes to genes on other chromosomes, in addition to their immediate neighborhood on their own chromosome Li et al. (2018). This effect implies that gene proximity and spatial relationships within the nucleus can be meaningfully represented as a network. Such a network would be comprised of two types of edge: 1) linear distance on the same chromosome (centimorgans), 2) physical distance with genes on other chromosomes (nanometers). Such networks may be termed 3D gene orientation networks.
Gene regulation and co-regulation may be better understood by the construction and analysis of 3D gene orientation networks. This is because the proximity of regulatory modules to a gene has an influence of gene expression. Most genes have a regulatory region 5' of the transcription start site, the promoter. In addition, regulatory enhancers and other regulatory elements may be located distant from the gene, generally on the same chromosome Gondor and Ohlsson (2018). It is thought that the bending and juxtaposition of chromosomes within the nucleus may bring such elements into physical proximity to the gene Gondor and Ohlsson (2018). Clearly, the physical distance, and frequency with which the element is brought into contact with the gene will influence the nature of its regulatory input. Using 3D gene orientation networks, additional information may be incorporated into edges, such as whether physical proximity is static, or has movement. If there is movement, this may be coordinated (or not) with other regulatory elements affecting the same gene. Likewise, interactions with regulatory elements may show some coordination between genes.

Problem 4.I
Describe the Gene Duplication process and their utilities in terms of the genome's 3D structure.

CONCLUSION
Here, we have outlined graph theoretical approaches that may reveal some novel aspects of the molecular evolutionary process, which become manifest at the level of the phenome. Further work is required to link the diverse features of network topology with network evolution and growth. While the evolutionary aspects shaping individual gene-gene interactions has been addressed by geneticists and molecular evolutionists, we believe that a multi-disciplinary effort combining game theory, graph theory, and algebraic/statistical analysis will provide a more informative omnigenic model of gene interactions, in contrast to the traditional homogenic view. Given our view that biomolecular networks may be modeled using evolutionary game theory, and game theoretical approaches in the study of social networks, we expect that some surprising similarities and convergences between the topologies of the two might be observed. Finally, we note that the field of statistics gained impetus from the consideration of biological problems, from workers such as Fisher, Haldane, Rao, Wright, Kimura, Crow and others, and so we suggest that consideration of the open problems listed here might also lead to a similar development of new mathematics.

ACKNOWLEDGMENTS
We acknowledge our colleagues in UPR and NYU, who have generously provided many constructive criticisms.

DATA AVAILABILITY STATEMENT
The datasets [GENERATED/ANALYZED] for this study can be found in the [NAME OF REPOSITORY] [LINK].

GRAPH THEORY AND ITS APPLICATION TO BIOMOLECULAR NETWORKS
In this section, we discuss fundamentals of graphs and networks as well as other important topics that are critical for the study and analysis of biomolecular networks.
Our discussions deal with an abstraction that facilitates reasoning about a set of entities, denoted V and a binary relation E ⊆ V × V : the binary relation is usually irreflexive, asymmetric and not necessarily transitive. It is often represented as a directed graph, with vertices V and edges E. When V denotes biomolecules and E denotes interactions (e.g., regulations, proximity, binding, synteny, etc.), the resulting graph is called a biomolecular network, object of our study. Such networks evolve over time with additions and deletions to the sets V and E.

GRAPHS FROM A COMBINATORIAL PERSPECTIVE
We collect here essential results from graph theory. For these results, we refer to  and . For some studies we need to provide an orientation of an undirected graph. An orientation of a graph X is a subset E + of the edges such that E is the disjoint union of E + and E + . DEFINITION 4. A graph is called bipartite if the vertex set can be partitioned into two parts V 1 and V 2 such that each edge has one vertex in V 1 and one vertex in V 2 .
The distance between two vertices u and v is the length of the shortest walk connecting them, if both vertices are in the same connected component (∞, otherwise). The shortest walk connecting u and v is called a geodesic.
Let n be an integer ≥ 1. Consider the oriented graph on Z/nZ, and the orientation is given by the edges DEFINITION 5. A subgraph Y of a graph X is called a circuit of length n if it is isomorphic to the circle graph on Z/nZ.
A circuit of length 1 is called a loop. If the relation E is irreflexive then the graph is loop-free. DEFINITION 6. A graph is called combinatorial or simple if it has no circuit of length ≤ 2.
DEFINITION 7. A non-empty connected graph T without circuits is called a tree.
DEFINITION 8. A weighted graph G has weights assigned with edges, by the weight function, w : v, u). We also define volume V of the graph as vol(G) := v d v .

GRAPHS FROM AN ALGEBRAIC PERSPECTIVE
Certain linear operators can be associated with a graph and can be given a physical meaning in terms of diffusion (of information) over the graph, as common in the signaling games over the biomolecular networks. Spectral analysis of such linear operators yields eigenvalues, eigenvectors, and spectra of graphs, playing important roles in determining various properties of the network -specifically, with respect to how information diffuses over them (see  and ).
DEFINITION 9. Algebraically a graph G (network) can be represented as an n × n adjacency matrix A(G), in which, A ij is 1 iff ∃e ∈ E, o(e) = i & t(e) = j; otherwise it is 0. The matrix is symmetric if the graph is undirected, i.e., e =ē, ∀e ∈ E. If the graph G is weighted, then A ij = w(i, j) for every edge (i, j) = e ∈ E, and 0, otherwise.
We can think of A as operating on the space V = C n of complex n-tuples written as column vectors X as follows: X → AX. X can be thought of as values of a function evaluated on the vertices. One can show that there exist lines through the origin, in V that are left invariant along those lines. That is to say, there exist scalars λ i (called eigenvalues), and corresponding non-zero vectors X i (called eigenvectors spanning invariant lines) that span invariant lines such that A i = λ i X, for 1 ≤ i ≤ n. The spectrum of the graph X is defined to be Spec(X) := Spec(A) := {λ 1 , · · · , λ n }, a collection of A's eigenvalues.
It can also be shown that if A is a real symmetric matrix, then the eigenvalues of A are real and its spectrum can be presented in decreasing order, i.e., {λ 1 ≥ λ 2 ≥ · · · λ n }. This fact is very important for our study of graphs and networks.
Let us consider a more general weighted graph as defined earlier. Let T be the diagonal matrix with d v along the diagonal. First, consider the stochastic matrix P = T −1 A, which may be thought of as describing the probabilities of certain "information" being moved from one node to a neighboring node by a diffusion process. Let {v 0 , e 0 , v 1 , e 1 , · · · , v s } be a random walk in the graph with (v i−1 , v i ) ∈ E(G), for all 1 ≤ i ≤ s, and determined by transition probabilities P (u, v) = P rob(x i+1 = v|x i = u) which are independent of i. Normally we take p(u, v) = w(u, v)/d u , as defined by the stochastic matrix P .
Then, let f : V → R with v f (v) = 1 be a probability distribution on V (G). Then v P (u, v) = 1. Then for any initial distribution f : V → R with v f (v) = 1, the distribution after k steps is P k f , where f is viewed as a column vector and P is the matrix of transition probabilities. In particular, a probability distribution satisfying the fixed point equation φ = P φ = P 2 φ = · · · = P k φ describes the stationary distribution of the diffusion process and can be described as an eigenvector of the corresponding matrix.
Thus intuitively, algebraic techniques allow thinking about the graph features in terms of a set of "blurrier" notions such as random walks (instead of walks), diffusion distances (instead of geodesic distances), ranks (instead of informational relevance), etc. However, because such spectral analysis is based on linear algebra, the underlying algorithms become tractable.
The adjacency matrix should be best viewed as an operator on functions of V (G). A modified operator, called the Laplacian operator is the most effective formulation. The Laplacian operator can be used in interpolation on graphs, graph clustering, resistance networks, rapid mixing, linear solving, linear optimization, and many other applications.
Thus, one may define L = T − A = T (I − P ), as the Laplacian Matrix of G, where L is defined as follows: L(u, v) = −w(u, v) (when u and v are distinct), and d v if u = v. Imagine assigning a scalar-valued rank function ρ : is minimized. Thus ρ has the meaning that if a gene in a GRN is important then the genes it regulates and the genes that regulate it are also important; one expects p53 to be labeled as an important gene because of its "hubbiness," but so also, MDM1, ATM, BRCA1, etc. as they are in the pathways directly regulating p53; and also making the genes such as p63 and p73 important as they are regulated by this cluster of genes (which may have preferentially attached themselves to p53 and its duplicates, which they continue to regulate). Note that solution to the optimization problem for Dirichlet sum is given by the following equation (under suitable conditions-see Gleich (2015); Easley and Kleinberg (2010); ; ): Thus functions such as ρ can be rapidly computed by iterating over the graph while performing weightedaveraging. An example of this process is seen in Google's PageRank algorithm based on the "Random Surfer (with Teleportation) Model,". This and other PageRank algorithms have been successfully applied mutatis mutandis to rank genes in a GRN (GeneRank), to rank Proteins in a PPI Networks (PPIRank), and in other biomolecular networks (see the survey Gleich (2015)).
From now, we assume that G is weight symmetric w(u, v) = w (v, u). Then the eigenvalues of L(G) are real, and indeed 0 = λ 0 ≤ λ 1 ≤ · · · ≤ λ n−1 . Then λ G := λ 1 is called the spectral gap of G. The spectral gap (and other eigenvalues) can be determined by the Courant-Fisher theorem. For example, if one considers L as an operator on the space of functions g : V (G) → R, then One can show that if the spectral gap λ G is large, and k is large enough any initial distribution f converges to the stationary distribution very rapidly.

EXPANSION PROPERTIES AND INFORMATION FLOWS IN GRAPHS
We shall consider graphs X = (V, E, ·, ·), where V is the set of vertices and E is the set of edges of X. We will assume that the graph is undirected and connected and we shall only consider finite graphs. For F ⊂ V , the boundary ∂F is the set of edges connecting F to V \ F . The expanding constant, or isoperimetric constant of X is defined as, Moreover if X is viewed as the graph of a communication network, then h(X) measures the quality of the network as a transmission network. In all applications, the larger the h(X) the better, so we seek graphs (or families of graphs) with h(X) as large as possible with some fixed parameters.
In Tanner (1984), M. Tanner introduced another notation for the expansion coefficient. Let as before X = (V, E, ·, ·), be a graph where V is the set of vertices and E is the set of edges of X. Let X ⊆ V with |X| ≤ α|V |, then It is well-known that the expansion properties of a graph are closely related to the eigenvalues of the adjacency matrix A of the graph X = (V, E); it is indexed by pairs of vertices x, y of X and A xy is the number of edges between x and y. When X has n vertices, A has n real eigenvalues, repeated according to multiplicities that we list in decreasing order It is also known that if X is D-regular, i.e. all vertices have degree D, then λ 0 = D and if moreover the graph is connected λ 1 < D. Also X is bipartite if and only if −λ 0 is an eigenvalue of A. We recall the following (see for example   ): THEOREM 1. Let X be a finite, connected, D-regular graph then And THEOREM 2. (see , ) Let (X m ) m≥1 be a family of finite connected, D-regular graphs with |V m | → +∞ as m → ∞. Then This leads to the following.
DEFINITION 10. A finite connected, D-regular graph X is Ramanujan if, for every eigenvalue λ of A other than ±D, one has λ ≤ 2 √ D − 1.
We will also need an important definition.
It is known that computing the expansion coefficient of arbitrary graphs is an NP-complete problem. Thanks to the work of Tanner, and Alon and Millman, one can derive bounds on the expansion coefficient in terms of λ. The complexity of determining λ, though in P, is still difficult if the number of vertices is large (for example of the order 10 4 -10 6 or more for biomolecular networks such as GRN or PPI).
There are useful bounds on λ for arbitrary (bipartite) graph X in terms of the number of edges, the maximum degree λ max , and the rank r χ of the adjacency matrix of X. Since effective upper bounds exist on λ max , and r χ , (see Hø holdt and Janwa (2012)) we thus obtain a bound that is easily computable.
An expander graph is a highly connected sparse graph (see, for example ). Expander graphs have numerous applications including those in communication science, computer science (especially complexity theory), network design, cryptography, combinatorics and pure mathematics (see the references under Bibliographic Notes below))). Expander graphs have played a prominent role in recent developments in coding theory (LDPC codes, expander codes, linear time encodable and decodable codes, codes attaining the Zyablov bound with low complexity of decoding (see the Bibliographic notes for references).
DEFINITION 12. A matrix A with rows and columns indexed by a set X is called irreducible when it is not possible to find a proper subset S of X so that A(x, y) = 0, whenever x ∈ S and y ∈ X \ S. Equivalently, A is not irreducible if and only if it is possible to apply a simultaneous row and column permutation on A to get a matrix in a square block form so that one of the blocks is a zero block. For the following lemma, see for example (Horn and Johnson (2013) p. 363). LEMMA 1. Let D be a finite graph. Then the adjacency matrix of A is irreducible if and only if D is connected.
We shall also need the following.
PROPOSITION 1 (Perron-Frobenius). Let A be an irreducible non-negative matrix. Then, there is up to scalar multiples, a unique non-negative eigenvector a := (a 1 , a 2 , · · · , a n ) all of whose coordinates a i are strictly positive. The corresponding eigenvalue λ 0 (called the dominant eigenvalue of A) has algebraic multiplicity 1 and λ 0 ≥ λ i for any eigenvalue λ i of A.
We recall the following special case of Courant-Fisher theorem (also, called the Raleigh-Ritz Theorem) (see for example, (Horn and Johnson (2013), Theorem 4.2.2)) THEOREM 3. Let A be an n × n Hermitian matrix over the complex field C, then it is known that all its eigenvalues are real, with maximum eigenvalue λ max (i.e. the spectral radius of A ). For 0 = X ∈ C n , define the Raleigh quotient R X := X * T AX X * T X .
Then λ max = max X =0 R X . Furthermore, R X ≤ λ max with equality if and only if X is an eigenvector corresponding to the eigenvalue λ max .

GROUPS ACTING ON GRAPHS
To understand symmetries of graphs, and to understand original motivation of graphs as objects associated with topology and algebraic topology, we briefly discuss groups acting on graphs (as groups acting on topological spaces). This converts graph isomorphism problems into much more manageable group isomorphism problem. It is needed in alignment and motif detection, We say that a group G acts on a graph X(V, E, o, t) if it acts on V and E such that the actions are compatible with φ 1 and φ 2 , i.e., it commutes with φ 1 and φ 2 , i.e., g(φ 1 (e)) = φ 1 (g(e)), o(g(v)) = g(o(v)), t(g(v)) = g(t(v)) and g(φ 2 (e)) = φ 2 (g(e)), ∀g ∈ G.
An inversion is a pair consisting of an element g ∈ G and an edge e ∈ E such that g(e) = e. We will say that a group acts freely on X if it acts without inversion and g = 1 is the only element having a fixed point. For the following result, we refer to Serre (Serre, 1980, Page 27).
THEOREM 4. If G acts freely on a tree, then G is a free group. The group G acts on X(G, S) by left multiplication. This action preserves orientation; and its action is free on the set of vertices and on the set of edges.
PROPOSITION 2. i) X(G, S) is connected if and only if S generates G. In fact, the connected components correspond in a 1-1 fashion to the cosets of H = S (the group generated by S).
ii) X contains a loop if and only if 1 ∈ S. iii) For X to be combinatorial, it is necessary and sufficient that S ∩ S −1 = ∅.
The adjacency matrix provides significant amount of information about the graph. For example A r gives the number of walks of length r between vertices. DEFINITION 14. The girth of X is the smallest positive integer r such that T race(A r ) > 0. Let d(X) be the smallest integer (if it exists) so that for every pair of vertices (u, v) there is a walk of length at most d from u to v. Then d(X) is called the diameter of the graph X.
Important results about diameter, girth and other combinatorial invariants (important for biomolecular networks) and bounds on them in terms of spectral invariants can be found for general graphs in Bollobas , pp. 156-157).
LEMMA 2. Let X be a k-regular graph. Then ii) λ 0 = k and m(λ 0 ) equals the number of connected components of X. iii) λ n−1 = −k if and only if X is bipartite. iv) For a bipartite graph X, if λ is an eigenvalue with multiplicity m(λ), then so is −λ with multiplicity m(λ).
The adjacency matrix A can be considered as the matrix of a Hecke operator on l 2 (X) (which can be called the adjacency operator) as A(f (x)) = y∈V A(x, y)f (y) (here A(x, y) = 1 if there is an edge from x to y and 0 otherwise). As mentioned earlier in the context of rank function, another interpretation of the adjacency operator is an averaging function of information contained on the neighboring vertices that flow along the adjacencies. An iterative process, then leads to mixing and diffusion globally in the network via the adjacency or the corresponding Laplacian operators.

Remark (1)
The topics we expanded upon are the following (in order): (1) graphical representation of Networks; (2) algebraic representation of graphs; and finally, (3) algebraic properties, such as spectral analysis that provide us large number of tools and techniques to deduce various properties of the biomolecular networks. For complex and mid-size networks and models, there are important algorithmic questions related to random graphs and their evolution (Erdös-Renyi model), degree-distribution, power laws, preferential attachment models, scale free networks, random-walks and mixing, spectral distance, graph similarity, clustering, smoothing analysis, sparsification and linear solvers and applications.

Remark (2)
Algebraic representation of biomolecular networks has several advantages. In particular, the combinatorial meaning is inherent in the Adjacency matrix A(X) of a graph X. In the study of biomolecular networks, it is very important to know the number of walks of length m between any two entities i to j, w m ij , (e.g. between any two proteins in the PPI network). But that is given by the (ij) th entry in A m . From this one can deduce that the diameter of the network is the dimension of what is called the adjacency algebra A(L) associated with A. If this diameter is small, we can deduce that the networks shows small world phenomenon, as we explained in the main text. The sequence (w m ij ) can be put into coefficients of a series called the zeta function ζ(X) of the graph X, and it has a very simple form as a rational polynomial in terms of what are called spectra (or frequencies) of the network. And from ζ(X) we can find simple expressions for the set of walks.

Remark (3)
One can explain spectra (frequencies) of a biomolecular network in an intuitive manner as follows. In analogy with the physical sciences, if one considers the whole network as a space, one can think of its vibration modes, that is the frequencies (eigenvalue spectra) and their amplitudes (corresponding eigenvectors). One can indeed determine the shape of the the biomolecular network from its Spectra (in the manner of Marc Kac (Can one determine the shape of the drums? (from its spectra) Kac (1966) ), and its analogy to networks and graphs Chung (1966). The advanced matrix of the biomolecular network can be then shown to take a simple diagonal form with respect to the basis of eigenvectors, and that leads to immense simplification of explanation of many phenomena associated with the biomolecular networks, and plays important roles in the spectral algorithms and their complexity analysis. The coefficients of polynomials whose roots are the eigenvalues, carries information about the motifs at the nodes. The spectral gap λ(X) (the difference between the two largest frequencies) of the biomolecular network determines expansion in the biomolecular network, meaning how fast spreading and mixing takes places between a set of nodes. How fast partitioning and cuts can be carried out--important ingredient of many algorithms in biomolecular networks. In particular the spectral gap of the biomolecular network, λ(X), gives lower bounds in terms of the combinatorial invariant, the Cheeger constant, as defined below. We will state isoperimetric inequality results where λ(X) provides an excellent lower bound on h(X), and for explanation of combinatorial phenomenon or for spectral approximation to combinatorial algorithms, λ(X) can replace h(X) and still one can get excellent approximate results. For example, it is very easy to show fast mixing and spreading phenomenon--it is just A n f = λ(X) n f , where f is the initial distribution on the nodes.

Remark (4)
We have defined important algebraic concepts in graph theory in precise mathematical terms. Networks (molecular or otherwise) have algebraic representations as their adjacency matrix (from which one can re-derive the graphical representation if one wishes). From these one derives algebraic invariants such as eigenvalues and spectra of graphs, eigenvectors, and spectral gaps. In the main text we had defined the combinatorial invariant called the isoperimetric constant h(G). The determination of h(G) is an NPcomplete problem. In the next sections we will show that h(G) can be bounded in terms of the spectral gap λ(X). This helps immensely in the determination of spectral graph algorithms as approximation to NP-complete combinatorial algorithms.

Remark (5)
One application of large spectral gap is rapid mixing (one can envisage that it is expected to prove very important in many biological applications). Indeed, the discrete analog of Cheeger inequality has increasingly crucial utility in the study of random walks and rapid mixing on Markov chains and new powerful spectral techniques such as Heat-Kernels and Sobolov inequalities have emerged to deal with general graphs (see , ). Rapid mixing in Markov chains can be framed as: How long does it take for an irreducible finite state Markov chain to converge to equilibrium? A fundamental application is to Markov Chain Monte Carlo (MCMC) simulation algorithms that are used widely in the scientific community to simulate Gibb's measures and to derive approximate solutions to difficult combinatorial questions. as Markov Chain Monte Carlo simulations are used widely in the scientific community to simulate Gibb's measures and to derive approximate solutions to difficult combinatorial questions--that have high complexity and many of which appear in the topological analysis of biomolecular networks (such as clustering, community detection, and so on). Indeed two of the most heavily studied problems in the analysis of networks are graph clustering and graph diffusion. We already studied the importance graph clustering biomolecular networks in the main text. Graph diffusion refers to problems involving spreading or propagation along the edges of a graph. These problems are of fundamental important in algorithms such as PageRank and Hitts algorithms (see Easley and Kleinberg (2010), Levin and Peres (2017), , and ).

BIBLIOGRAPHIC NOTES
For further comprehensive treatment of applications of spectral graph theory to biomolecular networks we refer to  and the survey article Banerjee and Jost (2009). For state of the art in spectral methods in algorithmic analysis, we refer to the lecture notes of Spielman (http://www.cs.yale.edu/homes/spielman) biomolecular networks algorithms in clustering, mixing, partitioning, random walks, Schur complements, effective resistance and applications, expander graphs and applications, graph sparsification and related algorithms, testing isomorphism of networks.