Elucidating Interactions Between SARS-CoV-2 Trimeric Spike Protein and ACE2 Using Homology Modeling and Molecular Dynamics Simulations

Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19). As of October 21, 2020, more than 41.4 million confirmed cases and 1.1 million deaths have been reported. Thus, it is immensely important to develop drugs and vaccines to combat COVID-19. The spike protein present on the outer surface of the virion plays a major role in viral infection by binding to receptor proteins present on the outer membrane of host cells, triggering membrane fusion and internalization, which enables release of viral ssRNA into the host cell. Understanding the interactions between the SARS-CoV-2 trimeric spike protein and its host cell receptor protein, angiotensin converting enzyme 2 (ACE2), is important for developing drugs and vaccines to prevent and treat COVID-19. Several crystal structures of partial and mutant SARS-CoV-2 spike proteins have been reported; however, an atomistic structure of the wild-type SARS-CoV-2 trimeric spike protein complexed with ACE2 is not yet available. Therefore, in our study, homology modeling was used to build the trimeric form of the spike protein complexed with human ACE2, followed by all-atom molecular dynamics simulations to elucidate interactions at the interface between the spike protein and ACE2. Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) and in silico alanine scanning were employed to characterize the interacting residues at the interface. Twenty interacting residues in the spike protein were identified that are likely to be responsible for tightly binding to ACE2, of which five residues (Val445, Thr478, Gly485, Phe490, and Ser494) were not reported in the crystal structure of the truncated spike protein receptor binding domain (RBD) complexed with ACE2. These data indicate that the interactions between ACE2 and the tertiary structure of the full-length spike protein trimer are different from those between ACE2 and the truncated monomer of the spike protein RBD. These findings could facilitate the development of drugs and vaccines to prevent SARS-CoV-2 infection and combat COVID-19.

SARS-CoV-2 is a positive-sense, single-stranded RNA virus that encodes 16 non-structural proteins (NSP1-NSP16), four structural proteins (spike, membrane, envelope, and nucleocapsid), and nine accessory proteins (Figure 1; Romano et al., 2020). Spike proteins present on the virion surface are responsible for targeting host cells and triggering fusion of viral and host cell membranes, which are critical steps in initiating infection and enabling the transfer of viral RNA into host cells. Membrane and envelope proteins are responsible for the virus shape and assembly/budding, respectively. The nucleocapsid protein enters the host cell along with the SARS-CoV-2 genetic material, which serves to facilitate RNA transcription, replication, virus assembly, and release (Kang et al., 2020;Zeng et al., 2020). Since the spike protein plays a major role in initializing viral infection through binding to ACE2, inhibiting the binding of the spike protein to ACE2 is an attractive strategy for developing drugs to block the spread of SARS-CoV-2 infection and treat COVID-19 (Das et al., 2020;Wu et al., 2020). Therefore, understanding interactions between the spike protein and ACE2 may facilitate the development of drugs that target binding of the spike protein to ACE2.
The spike protein contains 1273 amino acids and is composed of two subunits, S1 (amino acids 14-685) and S2 (amino acids 686-1273), which are responsible for receptor binding and membrane fusion with the host cell, respectively, preceded by a short signal peptide (amino acids 1-13). The S1 subunit consists of three domains: an N-terminal domain (NTD; amino acids 14-305), a receptor binding domain (RBD; amino acids 319-541), and a carboxy-terminal domain (CTD) which has two subdomains (SD1 and SD2) (Henderson et al., 2020;Huang et al., 2020;Tang et al., 2020a,b). The S2 subunit consists of a fusion peptide (FP; amino acids 788-806) composed of hydrophobic residues, heptapeptide repeat 1 (HR1; amino acids 912-984), heptapeptide repeat 2 (HR2; amino acids 1163-1213), a transmembrane domain (TM; amino acids 1213-1237), and a cytoplasmic domain (CP; amino acids 1237-1273) (Astuti and Ysrafil, 2020;Huang et al., 2020;Tang et al., 2020a,b). In its native form, the spike protein is present as a trimer on the surface of the virion, with the S1 and S2 subunits forming the extracellular stalk and bulbous "crown, " for which the Latin translation is "corona" Tang et al., 2020a,b;Walls et al., 2020). The crown of the trimeric spike protein undergoes hinge-like conformational changes between a closed/down conformation and a less-stable open/up conformation Tang et al., 2020a,b;Walls et al., 2020;Wrapp et al., 2020). In the open/up conformation, the RBD is accessible for binding to the ACE2 receptor; whereas, in the closed/down conformation, the RBD cannot interact with the ACE2 receptor (Ortega et al., 2020;Wrapp et al., 2020). Upon binding to the ACE2 receptor, the spike protein trimer undergoes a conformational change resulting in the accessibility of the S1 and S2 cleavage sites to host proteases Shang et al., 2020;Xia et al., 2020). Cleavage of the S1 and S2 subunits primes the spike protein for membrane fusion by enabling insertion of the S2 FP domain into the host cell membrane, which enables subsequent interactions between the HR1 and HR2 coiled-coil domains to form a six helical bundle (6-HB). This bundle stabilizes another S2 subunit conformational change in which viral and host membranes are close enough in proximity to trigger membrane fusion Tang et al., 2020a,b).
Recently, structures of the trimeric form of mutant or truncated spike proteins complexed with antibodies were reported. Walls et al. determined atomic models of SARS-CoV-2 spike protein in the closed/down (6VXX) trimeric conformation and a one-up (6VYB) trimeric conformation, in which a single S unit is in the open/up conformation and two S units are in the closed/down conformation (Walls et al., 2020). However, the fulllength wild-type S protein was not determined due to missing residues and loops, and spike protein trimers bound to the ACE2 receptor were not evaluated. Thus, a crystal structure of the wildtype trimeric form of spike protein bound to ACE2 has not yet been determined (Song et al., 2018).
In this study, a complex structure of the wild-type trimeric spike protein with ACE2 was constructed using homology modeling based on the atomic details of the trimeric form of the spike protein with the one-up conformation and the RBD of the spike protein with ACE2. The homology models were evaluated using a Ramachandran plot and the highest quality model was subjected to molecular dynamics simulations to identify interactions between the trimeric spike protein and ACE2.

Homology Modeling
The primary sequence of the spike protein (ID: P0DTC2) was retrieved from the UniProt database (https://www.uniprot.org) and used as a template sequence. The mutant trimeric spike protein of SARS-CoV-2 with the one-up conformation (PDB ID: 6VYB) (Walls et al., 2020) and the complex of spike protein RBD of SARS-CoV-2 with ACE2 (PDB ID: 6M0J)  were selected as the template structures. Sequence alignment between templates and targets was performed using the EBI-Clustal Omega Server (Madeira et al., 2019). Homology modeling was performed using the Modeler v9.24 (Sali and Blundell, 1993) program. Ten models were generated, and one model was selected based on the DOPE energy value. To avoid steric clashes, the selected model was energy minimized using the Schrödinger suite (www.schrodinger.com) and evaluated for its stereo-chemical quality using a Ramachandran plot (https:// swissmodel.expasy.org/assess).

Molecular Dynamics Simulations
The validated homology model of the trimeric form of the spike protein complexed with ACE2 was used as the starting structure for molecular dynamics simulations using Amber 18 (https:// ambermd.org/CiteAmber.php). The tleap from AmberTools was used to prepare topologies and coordination files for the protein using protein.ff18SB forcefield (Ponder and Case, 2003;Maier et al., 2015) by adding the force field and hydrogen atoms. The prepared system was then placed inside an octahedral box 10 Å away from the protein surface, solvated with the TIP3P (Jorgensen et al., 1983) water model, and subjected to energy minimization. The counter ions were added to neutralize the unbalanced charge of the system. The whole process was divided into 2 phases. In phase 1, the solute molecules were constrained and only the solvents were minimized and equilibrated. Subsequently, the steepest descent method (1,000 steps) followed by conjugate gradient (4,000 steps) methods were applied to minimize the whole system (solute and solvent). The minimized system was gradually heated from 0 to 310 K, and the entire system was equilibrated without any constraints.
During the simulations, the system temperature and pressure were maintained at 310.5 K and 1 atm, respectively. In Phase 2, a 100 nanosecond (ns) unrestrained production run was applied to the system. The SHAKE (Ryckaert et al., 1977) algorithm and Particle Mesh Ewald (PME) (Darden et al., 1993) were applied for the hydrogen bonds and the long-range electrostatic interactions, respectively. Coordinate files were saved for every 5 picosecond (ps). The resultant trajectory was processed using the CPPTRAJ (Roe and Cheatham, 2013) from AmberTool package, Visual Molecular Dynamics (Humphrey et al., 1996), and PyMol (https://pymol.org). The kclust algorithm from MMTSB toolset (http://www.mmtsb.org/) was used to cluster the resultant trajectory. The representative structure from the cluster analysis was used for hydrogen bond and energy analysis. The hydrogen bond between the trimeric form of the spike protein and ACE2 was determined using a distance cut-off value of 4 Å or less.

Binding Free Energy Calculation
The spike protein, ACE2, and spike protein-ACE2 complex structures were used to calculate binding free energy. The tleap from AmberTools18 (https://ambermd.org/AmberTools. php) was used to generate the gas phase, solvated complex topology (prmtop); and coordination (inpcrd) files for the spike protein, ACE2, and spike protein-ACE2 complex. The MMPBSA.py (Miller et al., 2012) script from the Amber package was used to calculate the binding free energy using the Molecular Mechanism-Generalized Born Surface Area (MMGBSA) approach. The representative structure from clustering analysis was used to calculate binding free energies using Equation (1) (1) where G BFE represents the binding free energy between the spike protein and ACE2. The terms G S−ACE2 , G S , and G ACE2 represent the free energy of the spike protein-ACE2 complex, spike protein, and ACE2, respectively. The MMPBSA.py script was used to calculate interaction and solvation free energies for the spike protein, ACE2, and spike protein-ACE2 complex. The energy values were calculated using Equation (2) where G gas , G sol , and T S represent the gas phase molecular mechanism component, solvation of binding free energies, and changes in entropy due to ACE2 binding, respectively. Because our goal was to obtain an estimated free energy rather than an absolute value, and since the computational cost was high, entropy changes in the free energy calculation were ignored. The gas phase molecular mechanism ( G gas ) and solvation free energy ( G sol ) were calculated using Equations (3-5) where G vdW and G ele correspond to van der Waals and electrostatic interactions, respectively, and G pol.sol and G nonpol.sol represent polar solvation and non-solvation terms from the MMGBSA approach. The terms SASA, γ, and β denote solvent-accessible surface area, surface tension, and regression offset of the linear relationship, respectively. Default values were used to calculate the estimated binding free energy between the spike protein and ACE2.

Interaction Free Energy of Residues
The representative structure from clustering analysis was used to analyze interaction free energy for residues at the interface of the spike protein and ACE2 using the decomp (Miller et al., 2012) module from the MMPBSA.py program. Contributions of the interacting residues were calculated based on the following energy terms: electrostatic contribution, non-polar solvation contribution, van der Waals contribution, and polar solvation, using Equation (6).

Alanine Scanning
The selected interacting residues were mutated individually in the representative structure from the cluster analysis. The free energies of the mutated trimeric spike protein and ACE2 complexes were calculated by MMPBSA.py using Equation (7).
where G BFE represents the estimated binding free energy for mutated spike protein and ACE2. The G S mut −ACE2 , G S mut , and G ACE2 terms represent the free energy components estimated for the mutated spike protein-ACE2 complex, mutated spike protein, and ACE2, respectively. The MMPBSA.py script was used to calculate the free energies for the mutated spike protein, ACE2, and mutated spike protein-ACE2 complex.

Homology Modeling of the Trimeric Form of the Spike Protein
Ten homology models of the trimeric form of the spike protein bound to ACE2 were generated and their DOPE energy values were calculated ( Table 1). Model_4 showed the lowest DOPE energy value and was selected as the best structure for the trimeric form of spike protein complexed with ACE2. Model_4 was evaluated for stereochemical chemical quality using a Ramachandran plot. According to the Ramachandran plot (Supplementary Figure 1), 93.57 and 1.53% of residues were in the favored and outlier regions, respectively. Chain A, chain B, and chain C showed 94, 93.39, and 92.95% residues in the favored regions, respectively. Superimpositions between Model_4 and the reported structure of the trimeric form of the spike protein with the one-up conformation, and between Model_4 and the crystal structure of the truncated spike protein RBD bound with ACE2 (Figure 2), showed RMSD values of 0.28Å and 0.744 Å, respectively. This finding indicates that the homology model was consistent with the empirical structures. Therefore, Model_4 was used for subsequent molecular dynamics simulations.

Molecular Dynamics Simulations
The trajectory obtained from the molecular dynamics simulations was analyzed to identify critical residues at the interface between spike protein and ACE2. Stabilities of the trimeric form of the spike protein complexed with ACE2 and fluctuation of residues in the simulations were examined using root mean squared deviation (RMSD) of Cα atoms and root mean square fluctuation (RMSF), respectively. The RMSD plot (Figure 3) indicates that the trimeric form of wild-type spike protein was stable in the last 20 ns with a converged RMSD of   (Figure 4) suggests that the residues in the secondary structure regions were quite stable, but the residues in the loop regions showed large fluctuations with RMSF values >3Å.

Clustering Analysis
Clustering analysis was used to select a representative structure from the trajectory for binding free energy analysis. The kclust algorithm from MMTSB was used to cluster the structures in the trajectory. The kclust is a fast and sensitive algorithm and is widely used to cluster large size proteins. First, structures were extracted from the trajectory file with an interval of five frames and saved as PDB files for clustering analysis. The extracted structures were then put into the kclust algorithm with the radius set to 3 Å. Four clusters were obtained from kclust that resulted in 326, 1,090, 118, and 466 structures, as shown in Figure 5. As Cluster two was the largest, the structure with the smallest RMSD value in Cluster two exhibited the shortest distances to other structures and, thus, was selected as the representative structure for subsequent binding free energy analysis via alanine scanning.

Interacting Residues
The representative structure from the cluster analysis was used to identify the residues between the trimeric spike protein and ACE2 at distances of 4 Å or less . Twenty residues of the trimeric spike protein were within 4 Å to ACE2 and were defined as interacting residues. Whereas, Lan et al. reported 17 spike protein RBD interacting residues within 4 Å to ACE2 when the truncated RBD monomer was analyzed. The superimposition of both complexes, the truncated spike protein RBD monomer complexed with ACE2 and the fulllength trimeric spike protein complexed with ACE2, is depicted  in Figure 6. Both approaches to identify interacting residues, using the full-length trimer described here and the truncated RBD monomer , resulted in identification of 14 conserved residues with distances to ACE2 of ≤4 Å. Lan et al.
reported three additional interacting residues (Lys417, Tyr453, and Ala475) that were within 4 Å to ACE2 when using the truncated spike protein RBD monomer, but were slightly >4 Å to ACE2 in assessments using the full-length trimeric spike protein. However, given that their distances to ACE2 are very close to 4 Å using the full-length trimeric spike protein, it is acceptable to consider all three as interacting residues based on both approaches. Conversely, Glu484 was slight >4 Å from ACE2 using the truncated spike protein RBD monomer with a distance 4.39 Å but was <4 Å to ACE2 using the full-length trimeric spike protein. Thus, it is also acceptable to include Glu484 as an interacting residue according to both approaches. Therefore, a total of 18 ACE2-interacting residues may be considered to be conserved between both approaches using either the fulllength trimer or the truncated RBD monomer. However, five additional interacting residues (Val445, Thr478, Gly485, Phe490, and Ser494) with distances <4 Å from ACE2 were identified using the full-length trimeric spike protein but were too far away to interact with ACE2 (5.69 Å, 7.62 Å, 5.72 Å, 5.34 Å, and 6.25 Å, respectively) using the truncated spike protein RBD monomer. Among these five new residues, four are in loop regions and the remaining one, Ser494, is at the end of a beta sheet in the spike protein RBD. Altogether, these data suggest that there are likely to be 23 spike protein residues capable of interacting with ACE2. These findings further suggest that the specific interacting residues may change based on the tertiary conformation, as indicated by the conformation difference between the full-length trimeric spike protein and its truncated RBD.

Hydrogen Bond Interactions
Hydrogen bonds are major interactions between proteins. Hence, hydrogen bonds formed between the spike protein and ACE2 in the representative structure from the molecular dynamics simulations were identified using CPPTRAJ from AmberTools. A hydrogen bond was formed if the distance between two hydrophilic atoms (O or N) near a hydrogen atom was <4 Å and the angle from the hydrogen to the two hydrophilic atoms was >135 • . Using this criterion, hydrogen bonds were formed for six residues of the spike protein and five residues of ACE2 at the interface region (Table 2, Figure 7). Among these, five spike residues and one ACE2 residue were reported to form hydrogen bonds in the crystal structure of the truncated spike protein RBD complexed with ACE2 . Two residues that were reported to form hydrogen bonds in the crystal structure of the spike protein RBD bound with ACE2 , Leu455 and Phe456, did not form hydrogen bonds in our structure of the trimeric form of spike protein complexed with ACE2. On the other hand, Phe490 involved hydrogen bonding in our structure, but was not reported to form hydrogen bonds in the crystal structure of the truncated RBD of spike protein monomer complexed with ACE2. Thus, these findings suggest that the comprehensive hydrogen bonding network between the spike protein and ACE2 is likely to depend on the trimer of full-length spike protein complexed with ACE2.

Binding Free Energy Calculation
MMPBSA.py script from AmberTools is a fast method to compute binding free energy compared to other methods (Marimuthu et al., 2020), such as Replica-Exchange Free-Energy Perturbation (Fratev and Sirimulla, 2019) and umbrella sampling (Kumar et al., 1992). MMPBSA.py script was used to calculate binding free energy values between the spike protein and ACE2 for the representative structure and its mutated structures from alanine screening. The estimated binding free energy between the spike protein and ACE2 was −60.54 kcal/mol, indicating that the spike protein tightly binds with ACE2. The contributions from different energy terms (van der Waals, electrostatic, polar and non-polar solvation, solvation, and gas phase) to the free energy of the spike protein-ACE2 complex, the spike protein, and ACE2 were calculated and are shown in Table 3. Electrostatic interactions contributed more to the total binding free energy than van der Waals. To investigate contributions of individual residues to the binding between the spike protein and ACE2, decomposition of free energies to individual residues was conducted. The energy components (van der Waals, electrostatic, and polar and non-polar solvation energy) of the 20 interacting residues with distances <4 Å from ACE2 are depicted in Table 4 and indicate that van der Waals interactions were the major force by which the interacting residues interacted with ACE2. The 20 interacting residues with distances <4 Å from ACE2 were used for alanine scanning. The current version of alanine scanning in MMPBSA script does not support energy calculation for the mutation of Gly to Ala. Hence, the four Gly residues were excluded from the alanine scanning calculation. Each of the remaining 16 interacting residues were mutated to alanine to generate a complex, then the free energy values were calculated  for the complex using MMPBSA.py script. The binding free energy values of the wild-type and 16 mutated trimeric spike protein-ACE2 complexes were calculated and are reported in Table 5. Of the 16 alanine mutations, 15 resulted in structures with higher binding free energy than the representative wildtype structure, indicating that these 15 residues are likely to be important for spike protein binding to ACE2. The remaining one residue (Glu: 484) led to a structure with a slightly lower binding free energy, indicating that this residue is less important for the interaction between the spike protein and ACE2. Overall, the alanine screening analysis confirmed that the 20 interacting residues identified using the full-length trimeric spike protein play key roles in the binding interaction between the spike protein and ACE2.
Recently, 35,000 de novo hACE2 decoys were designed and CTC-445.2 was found to tightly bind with the RDB of the trimeric spike protein in four different states (state 1: 1 RBD up, state 2: 2 RBD up, state 3: 1 RBD up and 1 RBD down, and state 4: 2 RBD up and 1 RBD down) (Linsky et al., 2020) which were deposited in the electron microscopy databank and protein data bank, but the structures are not yet available for the public. Eleven residues of the state four trimeric spike protein were determined to interface with CTC-442.2. We compared the 11 residues with the interacting residues identified in our structure and found nine that are interacting residues, Tyr489, Phe486, Gln493, Asn501, Thr500, Tyr449, Phe456, Asn487, and Gln498. The remaining two, Tyr495 and Ala475, do not meet the criterion of 4 Å for interacting resides, but are located in the interface between the trimeric spike protein and ACE2 in our structure, at distances of 4.13 Å and 4.28 Å to ACE2 residues, respectively. Therefore, all the interface residues were confirmed in our structure.

CONCLUSIONS
The structure of the trimeric form of the full-length wildtype spike protein complexed with ACE2 was generated using homology modeling. The homology model was of good quality, as determined by Ramachandran plot evaluation. Using molecular dynamics simulations of the homology model, the  Residue mutated Spike ACE2 Spike-ACE2 complex G residues in the spike protein that are key for tightly binding ACE2 were identified. Most of the interacting residues reported in the crystal structure of the monomeric truncated spike protein RBD bound with ACE2 were among the interacting residues identified in this study with interaction distances defined as ≤4 Å, and all were included when considering those just slighter >4 Å. In addition, one previously excluded residue and five new interacting residues were identified as likely to be important contributors to ACE2 binding. Of the 16 interacting residues analyzed by alanine screening, 15 were confirmed to be important for the binding between the spike protein and ACE2. The binding interactions between the spike protein and ACE2 that were identified using the trimeric form of full-length wild-type spike protein are different from those reported in the crystal structure of the monomeric form of the spike protein RBD, indicating that the binding interface between the spike protein and ACE2 receptor is likely to be dependent on the tertiary structure of the spike protein used in the analysis. Together, the constructed structure of the trimeric full-length wild-type spike protein bound with ACE2 and the key binding residues identified in this study provide new insights into understanding mechanisms of SARS-CoV-2 infection of host cells, which could facilitate the development of drugs and vaccines to prevent SARS-CoV-2 infection and to combat COVID-19.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
SS, HH, MA, JH, and TP: conceived the experiment. SS and WG: conducted the experiment. SS, BP, GY, and ZJ: analyzed the result. SS, HH, MA, JH, and TP: wrote the manuscript. All authors contributed to the article and approved the submitted version.