Surveying the Side-Chain Network Approach to Protein Structure and Dynamics: The SARS-CoV-2 Spike Protein as an Illustrative Case

Network theory-based approaches provide valuable insights into the variations in global structural connectivity between different dynamical states of proteins. Our objective is to review network-based analyses to elucidate such variations, especially in the context of subtle conformational changes. We present technical details of the construction and analyses of protein structure networks, encompassing both the non-covalent connectivity and dynamics. We examine the selection of optimal criteria for connectivity based on the physical concept of percolation. We highlight the advantages of using side-chain-based network metrics in contrast to backbone measurements. As an illustrative example, we apply the described network approach to investigate the global conformational changes between the closed and partially open states of the SARS-CoV-2 spike protein. These conformational changes in the spike protein is crucial for coronavirus entry and fusion into human cells. Our analysis reveals global structural reorientations between the two states of the spike protein despite small changes between the two states at the backbone level. We also observe some differences at strategic locations in the structures, correlating with their functions, asserting the advantages of the side-chain network analysis. Finally, we present a view of allostery as a subtle synergistic-global change between the ligand and the receptor, the incorporation of which would enhance drug design strategies.


INTRODUCTION
The concept of allostery has evolved for more than half a century (Monod et al., 1965;Koshland et al., 1966;Cooper and Dryden, 1984;Cui and Karplus, 2008;Changeux, 2011;Motlagh et al., 2014). This word in simple terms means "action at a distance" and implies long-distance communication within and across the three-dimensional structures of proteins. Fundamental understanding of the principles guiding allostery in proteins came from two classical models, the concerted Monod-Wyman-Changeux (MWC) (Monod et al., 1965) model and the sequential Koshland-Nemethy-Filmer (KNF) model (Koshland et al., 1966), with the structural insights coming from one of the earliest crystal structures of hemoglobin (Perutz, 1970). An exponential increase in the availability of protein structures in different functional states has improved our comprehension of the phenomenon of allostery (Liu and Nussinov, 2016;Greener and Sternberg, 2018). Studies over the past decades have associated allostery in proteins with accompanying conformational variations. Such conformational changes range from dramatic alterations at the protein backbone level to subtle re-orchestrations involving protein side chains in the absence of appreciable backbone variations Motlagh et al., 2014;Tsai and Nussinov, 2014;Salamanca Viloria et al., 2017). It is this latter mode of conformational fluctuations and long-range signaling that are more challenging to capture.
Current advances in both experimental and theoretical techniques have started shedding light into these subtle conformational variations.
In particular, long-range molecular dynamics (MD) simulations, providing equilibrium conformational ensembles, have offered extensive computational characterization of the conformational dynamics in proteins/protein complexes (Lindorff-Larsen et al., 2010Karandur et al., 2020;Mysore et al., 2020). The goal of these studies ranges from understanding the fundamental biophysical principles to understanding more practical applications for drug design (Borhani and Shaw, 2012;de Vivo et al., 2016). Over the past decades, this topic has been extensively discussed in many excellent articles and reviews (Bagler and Sinha, 2005;Ghosh and Vishveshwara, 2007;Cui and Karplus, 2008;de Ruvo et al., 2012;Bhattacharyya et al., 2016;Astl et al., 2019;Zhang and Nussinov, 2019;Verkhivker et al., 2020;Zhang et al., 2020).
In essence, it has become possible to obtain a better perception of biological phenomena at the molecular level, such as allostery, evolutionary effects, and transport phenomena, mediated through macromolecules, by employing two major concepts: (1) viewing macromolecules, such as proteins, as one single connected entity, where perturbations can affect the conformations at the local or global level, and (2) considering the dynamically accessible conformations of proteins, and the interconversion of their populations under different conditions as a key to biological functions. Regarding the concept of viewing proteins as a single unit, the connections at the non-covalent level play an important role since these are pliable for minor perturbations that are encountered at normal physiological conditions, unlike the covalently stitched polymer chains.
A number of approaches available in the literature (some of them referenced above) differ in how we view the protein structure as a single unit, connected through non-covalent interactions. One can focus on backbone connectivity alone, or connectivity at the level of side chains (explicit atoms or a representation through centroids) (Greene, 2012;Bhattacharyya et al., 2016;Kayikci et al., 2018). There are a number of ways to define the connectivity criteria and assign strengths of interactions. Similarly, the dynamical conformational landscape can be obtained at the explicit atomic level or indirectly achieved through methods like ENM, which provides cooperative modes of motion Zhang et al., 2020). The atomic level description can be obtained experimentally through biophysical techniques such as X-ray crystallography, cryo-EM, and NMR, and computationally through molecular dynamics (MD) simulations.
The identification of specific regions responsible for the overall perturbation and the reorganization of interactions to yield a different conformation in the landscape has received great attention. This is a crucial step in the process of making the connection between molecular events and protein functions. The methods range from direct analysis of the structures to ones developed based on the physical, mathematical, and engineering principles. Many such concepts are integrated together in computational programs to obtain critical biological insights (Greener and Sternberg, 2018;Verkhivker et al., 2020;Zhang et al., 2020). Network theory is a widely used approach which provides explicit information on the role of constituent amino acids on the stability of structure networks at a global level (Atilgan et al., 2010;Brown et al., 2017;Gadiyaram et al., 2018). A vast range of experimental and computational studies have taken up the challenge of correlating biological cellular functions to the molecular level changes. Understanding protein connectivity and dynamics can provide molecular mechanistic insights into the various biological processes, like allostery (Atilgan et al., 2007;Verkhivker et al., 2020;Wang et al., 2020a), protein-protein or protein-nucleic acid interactions Keskin et al., 2005;Sathyapriya et al., 2008;Sethi et al., 2009), and ligand/perturbation-induced conformational variations (Csermely et al., 2013;Bandaru et al., 2017;Creixell et al., 2018). Such calculations may also aid the identification of epitopes for drug-binding and capture drug-induced conformational changes in proteins and protein complexes (Csermely et al., 2013;Krieger et al., 2020).
The focus of this review is to provide a brief account of the different network theory-based techniques targeted at (i) characterizing protein structures as a single entity connected by non-covalent interactions and (ii) integrating with conformational dynamics, for which several comprehensive reviews are available (Atilgan et al., 2012;Hu et al., 2017;Verkhivker et al., 2020). The main emphasis here is on the development and application of protein side-chain network approaches (Bhattacharyya et al., 2016;Salamanca Viloria et al., 2017;Kayikci et al., 2018), which have been shown to capture subtle conformational differences that are sometimes elusive to conventional analyses, such as the root mean square deviation (RMSD) at the backbone level. Here, we have considered the SARS-CoV-2 Spike glycoprotein (Zhu et al., 2020) as an illustrative example to demonstrate the capabilities of side-chain network studies. Our focus on analyzing SARS-CoV-2 spike protein, in particular, stems from its critical role in COVID-19 and the immediacy posed by the global pandemic caused by this highly infectious coronavirus.
In order to appreciate the relevance of side-chain network studies on the SARS-CoV-2 spike protein, here we provide an introduction to this protein in the context of its structure dynamics and function. SARS-CoV-2 belongs to the family of βcoronaviruses, and is closely related to the earlier pathogens, such as SARS-CoV and MERS-CoV, which caused severe respiratory diseases in 2004 and 2013, respectively. To develop promising therapeutic strategies, we need a clear understanding of the mechanism of action of the SARS-CoV-2 virus. A succinct summary of the structures of the SARS-CoV-2 spike protein and its interactions with the host cell membrane has been recently provided (Wang et al., 2020b;Xia et al., 2020a;Zhu et al., 2020). These studies highlight how the recognition of the human ACE2 receptor by the spike protein mediates viral entry into the host cell. A simplified version of the interaction between the human ACE2 receptor and the SARS-CoV-2 spike protein, with an emphasis on the structure of the spike protein, shows the steps that lead to viral fusion to the host cell membrane (Figure 1). Long-timescale MD simulations of the viral spike protein in different conformations have been recently made available under the Creative Commons Attribution 4.0 International Public License (D. E. Shaw Research., 2020). Further, a few computational studies on the SARS-CoV-2 spike protein, to explore putative allosteric binding sites (Di Paola et al., 2020) and the role of glycans (Casalino et al., 2020) have also been recently published.
The SARS-CoV-2 spike protein is a homotrimeric complex that is pivotal to the coronavirus entry into host cells and one of the key drug targets for COVID-19 (Hoffmann et al., 2020;Huang et al., 2020). In this article, we have selected the closed (PDB_ID: 6VXX) and partially open (PDB_ID: 6VYB) structures of the spike protein (Walls et al., 2020) as examples to explicitly elucidate the protein side-chain network concepts. Each subunit in the spike protein is organized into an S1 and S2 domain (Figures 2A,B) (Xia et al., 2020a;Zhu et al., 2020). The S1 domain hosts the receptor-binding domain (RBD) that recognizes the human ACE2 receptor (Figures 2A,B) and the Nterminal domain (NTD). In order to engage the ACE2 receptor, the RBD undergoes a conformational change much like the opening and closing of a hinge ( Figure 2C). It is either in the receptor-inaccessible state (closed state) or receptor-accessible state (open state), governing access to the factors that control ACE2 binding ( Figure 2C). The S2 domain hosts the TMPRSS2 cleavage site and the heptad repeat 1 and heptad repeat 2 (HR1/HR2) domains, which are the targets for fusion inhibitors (Xia et al., 2020a,b).
Backbone alignment of a closed structure (PDB_ID: 6VXX) and a partially open structure (PDB_ID: 6VYB) reveals small structural differences except that in the partially opened state the receptor-binding domain of one subunit swings outward as compared to the closed state ( Figure 2C). This is an ideal model system to apply protein side-chain-based network calculations, as the observed backbone changes are small, but carries important functional information. The availability of long-scale MD simulation trajectories of the closed and partially open states of the spike protein (D. E. Shaw Research., 2020) further emboldened our choice of using the spike protein as our model system. This data allows us to demonstrate the capabilities of the dynamically stable protein side-chain network, correlating structural connectivity with conformational dynamics.
In the section Protein Structure Network (PSN) Perspective Into Structural Organization, an overview of network theory in the context of protein side-chain networks, connectivity criteria for the protein backbone (PBN) and the side-chain (PScN) networks, the selection of optimal strength of interaction from percolation theory perspective, and the cluster identification from graph spectral analysis are presented. In the section Protein Structure Network (PScN) for Dynamically Accessible Conformational Ensembles, the method for integrating network analysis with dynamically stable conformational landscapes is introduced. Additionally, the differences between the closed and open trimeric states of the SARS-CoV-2 spike protein are elucidated through chosen network metrics. This is followed by a Summary and Outlook section.

PROTEIN STRUCTURE NETWORK (PSN) PERSPECTIVE INTO STRUCTURAL ORGANIZATION Network Theory-Based Representation of Protein Structures
The overall shape of protein structures at a molecular level is captured elegantly through secondary structures, such as helices, beta strands and sheets, and loops, formed by the backbone atoms of the polypeptide chain. Based on noncovalent interactions, the Ramachandran map characterized the allowed regions of the backbone torsion angles (ϕ,ψ) and demonstrated that the allowed conformational space of the polypeptide chain mainly consists of compact secondary and super-secondary structures, stabilized by backbone hydrogen bonds (Ramachandran et al., 1963). However, there are also numerous examples where despite insignificant differences at the protein backbone level, subtle conformational changes at the protein side-chain level guide a plethora of biological functions (Ghosh and Vishveshwara, 2007;Sethi et al., 2009). Such examples have motivated the development of techniques to completely map structural connectivity of proteins at both the backbone and side-chain interaction levels, enabling us to correlate even subtle structural variations that elude backbonebased alignments with biological functions.
The study based on the mathematical principles of network theory enables us to view the protein structures with noncovalent interactions as a single, global network. Numerous studies have formulated the backbone and the side-chain structural connectivity in proteins using adjacency matrices and analyzed them using various network metrics (del Sol et al., 2006;de Ruvo et al., 2012;Bhattacharyya et al., 2016;Kayikci FIGURE 1 | SARS-CoV-2 employs the spike glycoprotein to enter its host cell. The spike protein is composed of two domains, the S1 domain that hosts the receptor-binding domain (RBD), and the S2 domain. The S2 domain arbitrates the fusion of the viral and host cell membranes. Activation of the spike protein happens by cleavage at two sites (S1/S2 and S2 ′ ) by the Furin TMPRSS2 protease. The spike protein initially binds to the ACE2 receptor on the host cell through its RBD. On activation, it sheds the S1 domain, enabling S2 to fuse to the host cell membrane. This figure was adapted from Structural and functional mechanism of SARS-CoV-2 | Abcam (https://www.abcam.com/content/structural-and-functional-mechanism-of-sars-cov-2-cell-entry). et al., 2018; Astl et al., 2019;Krieger et al., 2020;Verkhivker et al., 2020). While protein backbone networks (PBN) capture the non-covalent connectivity at the level of the backbone atoms, protein side-chain networks (PScN/PSN) capture the structural connectivity at the level of non-covalent interactions between side-chain atoms. A representation of the global connectivity across the protein structure in terms of networks can capture the effect of perturbations at the local level and also across the entire protein structure network. This property is key to the understanding of how ligand or mutation-induced local conformational changes affect the global structure of a protein, and therefore its function.
An analysis of the network metrics from such a connectivity matrix allows the identification of allosteric communication pathways within a protein structure network by affording insights into the interconnected global architecture of proteins. A variety of network metrics can be used to analyze these protein structure networks (both PBN and PScN). The choice of the network metric being used depends on the question of interest. Briefly, metrics such as hubs and clustering coefficient indicate the degree of a residue and its connectivity to neighboring residues. The percolation behavior of a network can be captured in terms of clusters and cliques/communities (Palla et al., 2005;Deb et al., 2009). The molecular details of pathways responsible for allosteric signaling can be examined using the algorithms to identify shortest paths of communication (Ghosh et al., 2011;Tse and Verkhivker, 2015).

Protein Structure Network Based on Backbone (PBN) and the Side-Chain (PScN) Connectivity
Protein Structure Network refers to the representation of the three-dimensional structural connectivity in a protein in terms of a connectivity or adjacency matrix. In network theory language, individual amino acid residues are termed as nodes and the connections between them are defined as edges. In case of the protein backbone network (PBN), Cα atoms are generally considered as the representative of nodes and a distance of about 6.5 Å or less (based on the radial distribution of Cα atoms in protein structures) between any two sequentially non-adjacent residues are considered as an edge (Miyazawa and Jernigan, 1985;Patra and Vishveshwara, 2000). The construction and application of PBN have been extensively discussed in earlier reviews (Greene, 2012;Di Paola et al., 2015). Here, our focus is on the technical details of construction and the subsequent application of amino acid side-chain-based protein structure networks denoted as PScN (or PSN). There are different ways of measuring side-chain connectivity, such as the distance between the centroids of the side-chains, or all atom-atom pairwise distances. Pairwise distances between all atoms of the sidechain of residues i and j (n ij ) and values within a distance of 4.5 Å (related to the sum of atomic radii; Heringa and Argos, 1991) capture explicit atomic-level connectivity. A normalization value of the total number of interactions (n ij ) with respect to the maximum values (N i and N j ) observed from a large dataset of high-resolution crystal structures of proteins provides a uniform basis of evaluation as shown in Equation 1 (Kannan and Vishveshwara, 1999;Sathyapriya et al., 2008).
This expression allows us to weigh the strength of the interactions (edge weights) in a systematic manner, which can be uniformly applied to all protein structures. Edge weight (I ij ) can range from a value of zero to one. Values close to zero and one represent weak and strong side-chain connections, respectively. In general, strong connections can be related to nucleation centers formed by the interactions between the residue pairs such as oppositely charged, stacked aromatic residues, or polar residues involved in hydrogen bonds. The weak interactions, on the other hand, usually emerge from a smaller number of noncovalent interactions (n ij ) between pairs of hydrophobic amino acid residues. Generally, these interactions aid in bridging the strongly connected nucleation centers and in organizing the overall tertiary structure of the proteins. A PScN is constructed based on a user-defined value of I ij (termed I min ), and an edge is drawn when the calculated I ij between a pair of residues exceeds I min .

Percolation Profile for the Largest Connected Subnetwork as a Function of Edge Weight
To formalize the effect of the edge weight cutoff (I min ) on the properties of PScN for protein structures, the concept of the largest connected subnetwork (cluster or cliques/communities) transition profile was established and has been applied to a wide range of biological problems (Deb et al., 2009;Brinda et al., 2010). The PScN constructed from low values of I min results in a dense matrix with a large fraction of the residues in the protein getting connected, yielding the largest cluster of the size ∼80-90% of the amino acid residues in the protein. The PScN at higher I min values consists only of strongly connected edges, leading to a sparse matrix. The largest cluster in such a case does not cover a major fraction of the residues in the protein structure. On the other hand, in the largest cluster from a PScN of low I min , although it encompasses a large fraction of residues, the ratio of signal/noise is low in this network. It is therefore important to identify an optimal I min to construct the PScN, without losing information from a sparse graph or encountering low signal/noise from a dense graph. The identification of the optimal I min to construct the PScN has been addressed from the concepts of percolation within a system, as defined by percolation transition point. In these studies, the PScN is characterized by a macroscopically connected subnetwork obtained from I min , around the transition point. The sizes of the largest cluster (L Clu ) or the largest clique/community (L Cli ) in the protein structure network are measured as a function of network connectivity at various I min values. Plotting the values of L Clu or L Cli as a function of I min leads to a sigmoidal curve. The transition point of this sigmoidal curve is identified as the percolation transition point at which a giant connected cluster still permeates the protein structure network. Interaction strength (I min ) around this transition point balances the problems of identifying non-specific, weak interactions at smaller I min values, and discontinuous, sparse network connectivity across the structure at high I min values. From earlier studies, it is shown that generally the transition point occurs in the range of I min values 0.2 to 0.4 (Brinda et al., 2010). This transition point is noted to be a common feature in most protein structures.
In this study, we have analyzed the largest cluster percolation profile for the partially open and closed states of the SARS-CoV-2 spike protein. The plots of I min vs. L Clu are generated from the dynamically stable adjacency matrices (the generation of which is described in the section Protein Structure Network (PScN) for Dynamically Accessible Conformational Ensembles) corresponding to the closed (PDB_ID: 6VXX) and partially open structure (PDB_ID: 6VYB) of the spike protein (Figure 3). A noteworthy feature is that the profiles of the closed and partially open states show some differences in the percolation transition point. These differences are specifically located in the transition regions of I min (between 0.2 and 0.3), with the closed state exhibiting a plateau of the LClu consisting of about 2000 residues, whereas, in the partially open state, the plateau is around a 1,000-residue cluster. Structurally, this is reminiscent of the conformational variations between the two states, with more residues held together as the largest cluster in the closed state, in comparison with the partially open state in the transition region. Based on earlier studies, as well as this one, we infer that the results obtained from the interaction strengths around the percolation transition point, I min value of 0.3, are more sensitive and also provide a global view of the structural connectivity in proteins. In the analysis described in subsequent sections, we have used an I min value of 0.3 to generate the networks.

Network Parameters of the PScN
With an increase in availability of data in various fields, the advancement in the research area of large-complex network studies has moved in different directions, such as problem and data-driven approaches, development of efficient algorithms, and availability of computational packages (Newman, 2001;Newman and Girvan, 2004;Palla et al., 2005). As we have seen above, the crucial input to obtain a solution to the graph is the connectivity or adjacency matrix in which the nodes and edges are defined based on the chosen application. The connectivity matrices (PBN/PScN) can be analyzed using wellestablished algorithms and network metrics, which can be used to describe various structural and functional properties of the protein. Some of the frequently used network metrics for analyzing protein/macromolecular structures are hub, clustering coefficient, cluster, cliques, and communities, and the shortest paths of communication rely on well-established mathematical algorithms (Dijkstra, 1959;Newman, 2001Newman, , 2004Palla et al., 2005). Details of these individual parameters and their physical significance have been extensively discussed in past reviews. A brief description of the parameters and their physical significance follows.
Hubs represent highly connected residues in the protein structure network, which essentially refer to the degree of a node. In some general networks, the degree of certain nodes can be very large (Newman, 2001;Tsai et al., 2009). However, the degree of any residue in PScN generally does not exceed 10 due to the steric constraints. The hubs are key in maintaining structural stability and information flow in the protein structure network and are often termed as "hot spots" in the structure (Amitai et al., 2004;del Sol et al., 2006). Clusters, commonly identified using the depth first search (DFS) algorithm (Cormen et al., 2001), are a set of residues that are connected in a way that the number of intra-cluster connections is higher than the number of intercluster connections, involving these residues. Clusters evaluated at different I min values are used to predict the strength of connectivity within a network as well as to study interfacial interactions in protein complexes .
Cliques are defined as completely connected subgraphs in a network such that every residue is connected to every other residue in this subgraph. An assemblage of cliques that share common edges are termed communities (Palla et al., 2005). Cliques and communities are used to identify regions of rigidity and higher-order connectivity in protein structures (Ghosh and Vishveshwara, 2008). Like L Clu , described in the section Percolation Profile for the Largest Connected Subnetwork as a Function of Edge Weight, the largest identified communities (L Cli ) can also provide insights into the percolation behavior of strongly connected components within a protein as a function of I min (Deb et al., 2009). Floyd-Warshall and Dijkstra algorithms for computing the shortest paths of communication have been widely used to determine the critical residues involved in allosteric communication in proteins, and for mapping ligand-induced conformational changes (Atilgan et al., 2007;Bhattacharyya and Vishveshwara, 2011;Pandini et al., 2012). The specific choice of a network metric used for analyzing a protein structure network is therefore determined by the structural-biological insight we plan to seek. In the section Protein Structure Network (PScN) for Dynamically Accessible Conformational Ensembles, we will demonstrate the application of some of these parameters, by comparing the hubs and cliques/communities between the closed and partially open states of the SARS-CoV-2 spike protein, in their dynamical equilibrium states.

Graph Spectra of PScN
The graph spectral methods based on analyzing eigenvalues and eigenvector components of the connectivity matrices are another approach that has been extensively used to analyze protein structure networks (Hall, 1970). Graph spectral analysis on such a network is performed by studying the eigenspace of the Laplacian matrix associated with it. For a network with n nodes, the Laplacian L is an n x n matrix that satisfies equation 2.
where the summation is over every pair of nodes (u, v) connected by an edge with weight w uv for some vector X in the space of nodes. It is shown that the Laplacian may be expressed in terms of the degree matrix D and the adjacency matrix A as equation 3 (Hall, 1970;Chung, 1997).
The eigenvalues and eigenvectors of the Laplacian matrix contain information about the connected components or clusters of the network (Gadiyaram et al., 2016). The eigenvector corresponding to the lowest non-zero eigenvalue of the Laplacian, called the Fiedler vector, contains the clustering information. Sorting the Fiedler vector by value (FVC) (the components range from values −1 to +1) identifies nodes that are part of the same cluster. In this manner, all the clusters in the graph, ranging from the largest cluster with maximum number of residues to isolated edges, can be obtained as an analytical solution to the Laplacian matrix of the graph. We have considered the example of the SARS-CoV-2 spike protein receptor-binding domain (RBD) (Figures 2A,B) to demonstrate the capability of the graph spectral analysis from the Laplacian matrix. Specifically, we have extracted the clusters from the sorted Fiedler vector of the receptor-binding domain (RBD) of the spike protein (PDB_ID: 6LZG), which is complexed with the target ACE2 receptor (Wang et al., 2020b). The adjacency matrix is constructed as a binary matrix with I ij ≥ 0.3, with the elements taking values one and zero, respectively. A plot of sorted Fiedler components (FVC) as a function of the nodes (residue details given in Supplementary Table S1) is shown in Figure 4A. The slope of the FVC plot is also shown in this figure, which provides a clearer indication of cluster separation (Sistla et al., 2005). The clusters with the number of residues four and above are plotted on the structure of the RBD (Figure 4B). Thus, the graph spectral analysis is a powerful analytical method to extract the clustering information in protein structure networks.
It should be noted that there are limitations of performing graph spectral calculations, such as on large datasets (e.g., long MD simulation trajectories), as they are computationally expensive. However, the method provides unique information which is difficult to obtain directly from other methods. For instance, information can be extracted not only for clusters within a protein but also on the interfaces between domains in a single protein or across subunits in multimeric proteins Sistla et al., 2005). Graph spectral studies can also be performed on weighted networks, improving the accuracy. Further, it lends itself for quantitative comparison of networks by providing a score and allows us to identify the regions of the network which are dissimilar. A brief review of these aspects has been presented earlier (Gadiyaram et al., 2019).

PROTEIN STRUCTURE NETWORK (PScN) FOR DYNAMICALLY ACCESSIBLE CONFORMATIONAL ENSEMBLES
Biological systems exist in a dynamic equilibrium which alters under different conditions of temperature and ionic strengths, in complex with endogenous ligands/small molecules/drugs/interacting proteins, and so on. A glimpse of the accessible conformational landscape can be obtained by studying a large number of experimentally solved structures in different conditions or through long-timescale molecular dynamics (MD) simulations. The network properties that we described above for a single structure of proteins can also be extended to study the dynamically average properties of conformational ensembles. Depending on the objective, a judicious choice has to be made as to whether to get the averages from all the structural snapshots along the MD trajectory or from selected structures representing various local minima along the trajectory.

Dynamically Stable Protein Structure Network (PScN) for Conformational Ensembles
Analyses of protein conformational ensembles have been facilitated by the development of multiple open-source program packages (Eargle and Luthey-Schulten, 2012;Bhattacharyya et al., 2013;Chakrabarty and Parekh, 2016;Brown et al., 2017;Felline et al., 2020) that analyze multiple structural snapshots in dynamically accessed conformational states. The critical element in many of these open-source software is the ability to implement network theory-based calculations to analyze MD simulation trajectories. Some of these packages (PSN-Ensemble, webPSN v2.0, and NetworkView) also enable the use of residue pairwise interaction energies to weigh the connectivity matrix.
In this review, we discuss the general concepts of network theory-based analysis of protein conformational ensembles, specifically using PSN-Ensemble as an example software package. The basic principles and capabilities of PSN-Ensemble have been described before (Bhattacharyya et al., 2013). Briefly, PSN-Ensemble provides a consolidated and automated analysis platform, bridging network studies with protein conformational dynamics. Taking the coordinates of structural snapshots from conformational ensembles (MD simulations, NMR studies, or multiple crystal structures) as input, the program computes protein side-chain connectivity matrices (PScN). The individual matrices can be averaged by imposing a user-defined cutoff for dynamic stability (say X%). This "dynamically stable" PScN retains any interaction that appears in greater than X% of the structural ensemble, highlighting interactions that persist in a user-defined fraction of the ensemble.
Here we provide an example of the application of network theory to analyze MD simulation trajectories. Using PSN-Ensemble, we analyze the long-timescale MD simulation trajectories (10 µs each) of SARS-CoV-2 spike protein in the closed and partially open states (D. E. Shaw Research., 2020). Based on the fact that the interaction strength around the percolation transition point is most sensitive while providing a global view of the structural connectivity (Figure 3), an I min value of 0.3 is chosen to construct the PScN for the two states of the spike protein. Further, the dynamically stable PScN for the MD conformational ensemble is computed at a cutoff of 50%. We compared the hubs and cliques/communities from the dynamically stable PScN for the closed and partially open states of the spike protein. The results of these analyses on the spike protein are summarized in the subsequent sections as an example of network theory-based comparison of different conformational states.

Analysis of Dynamically Stable Metrics of PScN for the Closed and Partially Open States of the SARS-CoV-2 Spike Protein
In this section, we compare the side chain network properties related to rigidity/flexibility (hubs, cliques/communities) from  Supplementary Table S1). Nodes with identical FVC belong to a cluster. The red dotted line represents the slope of the FVC plot. The constant zero values of slope correspond to clusters, and spikes in the slope indicate the separation between clusters. (B) Representation of the side-chain clusters identified through FVC on the structure of RBD. The clusters are color-coded green (largest), blue (2nd largest), yellow (6 residue cluster), orange (5 residue cluster), and cyan (4 residue cluster).
the long-timescale MD trajectories on the closed and partially open states of the SARS-CoV-2 spike protein (10 µs each) (D. E. Shaw Research., 2020). In order to engage the host cell receptor (ACE2), the receptor-binding domain (RBD) of the spike protein undergoes conformational changes, much like the opening of a hinge (Walls et al., 2020). The closed state of the spike protein (PDB_ID: 6VXX) is receptor inaccessible. A partially open structure, with one of the subunits exhibiting RBD opening (PDB_ID: 6VYB), is representative of the receptor accessible states of the protein. Using PSN-Ensemble on the MD simulation trajectories, we analyzed the global conformational changes between these closed and partially open states.
The root mean square deviation resulting from a backbone alignment of the closed (PDB_ID: 6VXX) and partially open (PDB_ID:6VYB) structures of the spike protein is ∼0.5 Å. In addition, each subunit in the trimeric spike protein shows < ∼0.5 Å root mean square deviation when compared to each other, either within or between the two conformational states (closed and partially open). This indicates highly symmetric trimeric organization in the starting structures used for the long MD simulations, in terms of backbone superposition. The two states mainly differ in the conformations of the RBD in one subunit, with the rest of the domains relatively unchanged at the backbone level ( Figure 2C) (Walls et al., 2020). Subtle conformational changes that differentiate these two states elude backbone-based structural comparisons, which cannot efficiently capture re-orientations at the protein side-chain level. These factors make the SARS-CoV-2 spike protein conformational states an ideal model system to highlight the benefits of sidechain-based network (PScN) analysis.
To compute the dynamically stable PScN, we extracted conformational snapshots every 100 ns from the two MD simulation trajectories of the spike protein in its closed and partially open states (D. E. Shaw Research., 2020). About 100 snapshots from each trajectory are used as an input to the software package PSN-Ensemble to calculate network metrics that persist in at least 50% of the structural ensemble. A comparison of the dynamically stable hub and cliques/communities between the closed and partially open states of the spike protein shows how the conformational change in the RBD leads to global structural rearrangements, which percolates into the membrane-binding domains of the spike protein. Through this highly relevant example, we reaffirm the advantages of using protein side-chain network-based calculations in capturing the changes in structural connectivity and conformational dynamics in proteins under different conditions of activity, ligand binding, environmental stimulus, and allosteric communication.

Comparison of Dynamically Stable Hubs Between the Closed and Partially Open States of Spike Protein
As mentioned in the section Network Parameters of the PScN, the residues that form four or more connections with other residues are defined as hubs and these are considered as dynamically stable, if they appear as hubs in at least 50% of the MD simulation snapshots. These hubs are considered structural hotspots. A change in the number and location of the dynamically stable hub residues in the closed and partially open states of spike protein captures the differences in structural connectivity between these two states. The three subunits in the closed and partially open states of the trimeric SARS-CoV-2 spike protein show a total of 187 common dynamically stable hubs (Supplementary Table S2). The common hubs between the two states represent the structural connectivity in the PScN which remains unchanged between the two conformational states. The three subunits exhibit mostly symmetrical distribution of these common hubs, in terms of both number of hubs and the participating residues (Supplementary Table S2).
The distinctive structural features of the closed and partially open states are shown by the hubs that are unique to each conformational state. A comparison of these unique hubs reveals striking differences between the closed (64 unique hubs) and partially open states (74 unique hubs). The number and distribution of these dynamically stable unique hubs show large variations between corresponding subunits of the trimeric spike protein, in going from the closed to the partially open state (Figure 5, Supplementary Table S2). Strikingly, the number and distribution of the unique hubs among the three subunits within a particular conformational state also show significant differences. This indicates asymmetry between the three subunits of the spike protein, in both closed and partially open states. This asymmetry is exhibited at the side-chain network level, despite the highly symmetric nature of the spike protein structure in terms of backbone alignment of every subunit with every other subunit (RMSD < 0.5 Å).
Depiction of these unique hubs on each subunit of the respective conformational states reveals structural rewiring in the entire SARS-CoV-2 spike protein as the RBD goes from the closed to the open conformation. Here, we discuss the differences observed in chain A (detailed differences for all the three subunits are summarized in Figure 5). In chain A, the partially open state of the spike protein shows an increased number of hubs in the NTD as well as in the region connecting the HR1 and HR2 domains (Figure 5). The increased number of dynamically stable hubs suggests enhanced connectivity in these regions as the spike protein transitions into a partially open state. Our results also suggest that the conformational changes in RBD between the two states induce significant reorganization in the dynamically stable PScN. These global side-chain conformational changes are reflected as differences in the distribution of hubs (Figure 5), especially at sites distant from the RBD, despite minimal backbone reorganization between the two states.

Comparison of Dynamically Stable Cliques and Communities Between the Closed and Partially Open States of Spike Protein
Cliques represent a subset of residues within a protein structure network where each residue is connected to every other residue (Palla et al., 2005). Cliques represent higher-order connectivity in a network, highlighting regions of structural rigidity in the context of protein structures. An assemblage of cliques through shared edges/interactions is defined as communities (details in the section Network Parameters of the PScN). Communities capture the percolation of structural rigidity through the protein structure network. Together, comparison of cliques/communities reflects subtle conformational changes that alter regions of rigidity/flexibility in protein structural organization.
We compared the dynamically stable cliques and communities obtained from the SARS-CoV-2 trimeric spike protein in the closed and partially open states (Figure 6, Supplementary Tables S3A-C). The conformational changes that accompany the transition between the two states of the spike protein are reflected by the cliques/communities that are unique to each state. For clarity, we will only focus on the dynamically stable cliques/communities formed at the trimeric interface between the three subunits (also see Supplementary Tables S3A-C for a comparison of all cliques/communities between partially open and closed states of the spike protein).
Interfacial cliques/communities are an excellent metric to measure changes in connectivity or interaction between subunits for multimeric proteins. In the closed state of the spike protein, a large number of unique interfacial cliques are seen within the RBD of the three subunits (Figure 6, Supplementary Table S4). A total of 32 unique interfacial cliques are identified in the closed state, involving residues in the RBD, NTD, SD1, S2, and HR1 domains in the spike protein. This suggests a tightly packed trimeric interface in the closed state with rigid connections between the residues across the three subunits. In contrast, only 21 unique interfacial cliques are observed in the partially open state, with marked alterations in the domains participating in the cliques. Only eight common interfacial cliques are shared between the two conformational states, indicating significant variations in the trimeric interface packing.
Interestingly, most interfacial cliques formed by the RBD residues are lost in the partially open state. This, as expected, may be due to opening of the RBD in one of the subunits in the spike protein, which leads to a weakening of the interfacial connections involving the RBD residues across the trimer. Interestingly, this conformational change percolates to domains that are distant from the RBD, with cliques altering across the entire spike protein (Figure 6, Supplementary Table S4). A slight increase in the number of interfacial cliques is noted near the HR1/HR2 domain, especially in the region connection the cores of HR1 and HR2.
In this section, we have demonstrated the utility of side-chain network metrics like hubs, clusters, and cliques/communities by correlating the function of partial RBD opening to global conformational changes at the side-chain interaction level. We have shown that the local conformational changes at the RBD lead to extensive re-orchestration of the entire spike protein side-chain network.

SUMMARY AND OUTLOOK
The term "allostery" was coined more than half a century ago, to characterize the action of proteins away from the classically identified binding site (Monod et al., 1965;Koshland et al., 1966;Changeux, 2011). The mechanism of action was described through lock-and-key or induced fit models. Our understanding of the protein structure-function relationship has increased with advancement in structural biology. Today, there is an exponential increase of structural data from experiments such as X-ray crystallography, NMR, and Cryo Electron Microscopy (Structural Biology Shapes Up | Science | AAAS., 2016; Nitta et al., 2018). In parallel, computational biology has reached the maturity to explore the conformational space of large protein assemblies through long timescale MD simulations (Lindorff-Larsen et al., 2016;Wang et al., 2019;Mysore et al., 2020).
The data from experimental structural biology and the MD simulations have become a rich source of information to investigate macromolecular systems in atomic details. Mining such valuable data for protein conformation and dynamics, in order to unravel biological function at a molecular level and provide meaningful and reliable predictions for experimental biologists, has been a challenge. This has led to multidisciplinary approaches and adaptation of different domain expertise to investigate the importance of specific amino acids toward the stability and functions of proteins from various perspectives. Some of the computational concepts and methods that have made their way to address biological systems are network theory, accessible modes, machine learning, and percolation phenomenon, in combination with highly valuable chemical and biological inputs.
Here we have presented a focused review of the protein structures from a network perspective. Specifically, we have focused on the networks of side-chain connectivity to highlight the unique benefits of this approach. We have described the method of quantifying connectivity and identifying the optimal connectivity criteria by employing concepts from percolation theory. We have also discussed the global connectivity of the protein side-chains and clustering of interacting residues from the graph spectral perspective. We have pointed out that the highly similar backbone conformations of proteins can host a repertoire of conformational landscapes, which subtly differ in their side-chain interactions. Thus, mild perturbations to proteins can lead to side-chain reorganizations that elude backbone-based structural studies and drive allosteric communication. We have briefly touched upon a variety of approaches to investigate allostery, on which excellent recent reviews are available. Molecular dynamics simulations can yield an ensemble of protein conformations, which can capture both the backbone and the side-chain level differences in interactions. Analysis of MD simulation trajectories using side-chain network formalism provides a global view of protein structural connectivity from a dynamic perspective. We have reviewed the methodology for such integration of MD simulation with network theorybased analyses.
Due to the global pandemic caused by the highly infectious COVID-19, we have chosen the SARS-CoV-2 spike protein as an example to illustrate the dynamic PScN perspective. We have investigated the molecular dynamics trajectories of the closed and partially open states of the trimeric spike protein that have been made available by D.E. Shaw Research (D. E. Shaw Research., 2020). Backbone-based structural comparison between the closed and partially open states reveals minimal structural changes. Highlighting the importance of side-chain network analyses, a dynamic PScN-based comparison reveals key differences between the two conformational states. The present investigation highlights the differences at the sidechain interaction level, between the two states such as (1) the differences in the size of the largest connected clusters (L Clu ) in the percolation transition region, with the closed state being more stable than the open state, and (2) the differences in the network parameters such as hubs, cliques, and communities.
A comparison of the network properties of the partially open and closed forms of the SARS-CoV-2 spike protein reaffirms that different functional states of proteins can adopt very close backbone topology. While substantial side-chain network parameters like hubs, cliques, and communities are also common to both the forms, the unique ones are strategically located in various parts of this multimeric protein. For example, the local conformational changes during the RBD opening lead to extensive re-orchestration of the entire spike protein network, more pronounced in the interfacial region of the trimeric contacts. The different functional states are carefully balanced through the re-organization of side-chain connectivity to mediate interactions with the ACE2 receptor, and ultimately viral fusion to host cell membrane. A detailed study of these interactions between the SARS-CoV-2 spike protein and ACE2 receptor or relevant antibodies/drugs from a side-chain network perspective will be the subject of future investigations.
In addition to offering insights into the structure-function correlation in proteins at the side-chain connectivity level, the dynamic network-based studies also provide a new perspective of allostery. The flexibility of the protein involved in interactions with ligand/drug or other proteins is as important as their interacting partners, to have a productive signaling output. Allostery should be viewed as a synergistic-global interaction between the ligand (or the environment) and the receptor. The mechanism of long-distance communication involves specific routes and subtle changes in the communication paths, in order to signal at a distance. Analysis of PScN reveals allosteric communication paths via side-chain interactions even without substantial backbone reorganization. A stimulus at the ligandbinding pocket may be transmitted to the desired destination through subtle reorganization of the side-chain interactions that are allowed in the equilibrium dynamical state. Comparison between the two states of the SARS-CoV-2 spike protein reveals significant changes in the hubs and cliques/communities in regions distant from the RBD. The global reorganization of the side-chain connectivity between the two states of the spike protein could also influence the communication paths within and across proteins. Thus, one can consider the conformational landscape as being made up of various side-chain network paths.
Finally, in the context of treatment of infections, the antibodies and vaccines are produced in response to the global topology of the host protein or receptor. They complement the naturally evolved receptor more globally around the binding sites. The drugs, on the other hand, which are designed based mainly on the binding site information, may not be highly effective. As we have seen here, the binding site residues are held loosely or tightly by the residue clusters, firmly anchoring some of the interacting residues deep within the pocket. The drugdevelopment strategies would benefit by incorporating the sidechain network connectivity information into their design, thus providing a rationale for incorporating the effects of variations in global structural connectivity in proteins.