Structural Insights on the SARS-CoV-2 Variants of Concern Spike Glycoprotein: A Computational Study With Possible Clinical Implications

Coronavirus disease 2019 (COVID-19) pandemic has been attributed to SARS-CoV-2 (SARS2) and, consequently, SARS2 has evolved into multiple SARS2 variants driving subsequent waves of infections. In particular, variants of concern (VOC) were identified to have both increased transmissibility and virulence ascribable to mutational changes occurring within the spike protein resulting to modifications in the protein structural orientation which in-turn may affect viral pathogenesis. However, this was never fully elucidated. Here, we generated spike models of endemic HCoVs (HCoV 229E, HCoV OC43, HCoV NL63, HCoV HKU1, SARS CoV, MERS CoV), original SARS2, and VOC (alpha, beta, gamma, delta). Model quality check, structural superimposition, and structural comparison based on RMSD values, TM scores, and contact mapping were all performed. We found that: 1) structural comparison between the original SARS2 and VOC whole spike protein model have minor structural differences (TM > 0.98); 2) the whole VOC spike models putatively have higher structural similarity (TM > 0.70) to spike models from endemic HCoVs coming from the same phylogenetic cluster; 3) original SARS2 S1-CTD and S1-NTD models are structurally comparable to VOC S1-CTD (TM = 1.0) and S1-NTD (TM > 0.96); and 4) endemic HCoV S1-CTD and S1-NTD models are structurally comparable to VOC S1-CTD (TM > 0.70) and S1-NTD (TM > 0.70) models belonging to the same phylogenetic cluster. Overall, we propose that structural similarities (possibly ascribable to similar conformational epitopes) may help determine immune cross-reactivity, whereas, structural differences (possibly associated with varying conformational epitopes) may lead to viral infection (either reinfection or breakthrough infection).

Among the human-infecting CoVs, only SARS2 infection resulted to a pandemic causing the coronavirus disease 2019   (Tay et al., 2020). Moreover, multiple SARS2 variants were produced ascribable to various mutations occurring within the spike and, among the SARS2 variants produced, variants of concern (VOC) were identified to have increased transmissibility and virulence while having decreased response to available therapeutic strategies (Koyama et al., 2020). Considering VOC are a product of mutational changes occurring within the spike and structural orientation modifications are a product of amino acid alterations which in-turn may affect viral pathogenesis (Chen and Bahar, 2004), we hypothesize that the VOC spike glycoprotein may have structural modifications that may affect both immune cross-reactivity and viral pathogenesis. However, this has likewise not been fully investigated. A better understanding of the possible structural differences and similarities occurring within the VOC spike proteins may give us a better understanding of the potential of cross-reactivity to occur and, likewise, could give a possible explanation for the occurrence of both SARS2 reinfection and breakthrough infections which in-turn may lead to novel therapeutic strategies.

SARS2 VOC and HCoV Spike Modeling
Representative CoV spike amino acid sequences were collected from the National Center for Biological Information (NCBI) website. In order to obtain an accurately generated representative spike model, at least five sequence models were initially analyzed, whereby, spike models having similar Root Mean Square Deviation (RMSD) values and Template Modeling scores (TM-scores) based on superimposition done by TM-align (Zhang and Skolnick, 2005) were utilized for further downstream analyses. For generating SARS2 VOC spike models, the following representative amino acid sequences were used with Genebank accession number indicated: alpha (QTC11018), beta (QTJ24451), gamma (QRX39401), and delta (QUF59047). For generating the endemic HCoV spike models, the following representative amino acid sequences were used with Genebank accession number indicated: 229E (ABB90513), OC43 (AXX83297), NL63 (QED88040), HKU1 (ARB07617), SARS1 (AAR07625), MERS (AHX00731), and original SARS2 (YP_009724390). Similarly, representative original SARS2 spike S1 C-terminal domain (S1-CTD) and N-terminal domain (S1-NTD) models were generated based on UniProt reference number P0DTC2. All models generated were through the Phyre2 web server (Kelley and Sternberg, 2009) while Jmol applet (Herraez, 2006) was used for protein visualization.

Spike Model Quality Assessment
All CoV spike models generated throughout the study were initially assessed for quality before further downstream analyses. In this regard, protein model:crystal structure superimposition and contact mapping were performed. Representative crystal structure used for model quality comparison was the 2021 strain (PDB ID: 7BNM) which already has the D614G mutation (Tomaszewski et al., 2020). Moreover, a monomeric 7BNM crystal model (based on the 7BNM crystal structure) was generated using Phyre2 and superimposed to the 7BNM crystal structure to likewise serve as an additional model quality check. Representative CoV spike models and crystal structure were superimposed using TM align (Zhang and Skolnick, 2005). For this study, we considered spike models as suitable for further downstream analyses if TM scores between superimposed sequence model:crystal structure, crystal model:crystal structure, and crystal model:sequence model are close to 1.0. Subsequently, CMView applet (Contact type: Cα; Distance cut-off: 8.0; Needleman-Wunsch alignment) was used to determine protein common contact among the superimpositions made (Vehlow et al., 2011). Briefly, higher common contact would indicate that there is more structural similarities between the superimposed models and crystal structure (Holm and Sander, 1996) which inturn implies that the generated spike models are suitable for further downstream analyses.

CoV Spike Model Comparison
Three different sets of protein structural differentiation were performed: 1) whole protein structural comparison among VOC spike models, whereby, all generated models were compared (RMSD value, TM score, common contact) to the original SARS2 and among VOC spike models through superimposition and contact mapping; 2) whole protein structural comparison between VOC and endemic HCoVs spike models, whereby, generated VOC spike models were compared (RMSD value, TM score, common contact) to generated endemic HCoV spike models also through superimposition and contact mapping; and 3) spike domain structural comparison, whereby, generated S1-CTD and S1-NTD models derived from the VOC and endemic HCoV spike models were compared (TM score only) through original SARS2:VOC and VOC:endemic HCoV superimposition. RMSD value, Tm score, and protein common contact were established using TM align and CMView, respectively.

Generated Spike Models Are Fit for Downstream Analyses
Model quality assessment has been highly recommended before performing any downstream structural analyses using generated protein structures from either experimental (i.e. crystallized) or theoretical (i.e. computer-based) approaches (Berman et al., 2006). To determine the quality and correctness of all spike models generated, both protein structural superimpositions and contact mapping were done. Representative SARS2 crystal structure ( Figure 1A), generated SARS2 crystal model ( Figure 1B) and SARS2 sequence model ( Figure 1C) were all utilized for superimposition. We found that TM scores between crystal structure:crystal model [TM (based on the crystal structure): 0.94939] ( Figure 1D Figure 1I) have high common contact (>85%), thereby, insinuating that there is high protein contact similarity between the structures. Taken together, these results would suggest that the generated spike models are fit for further downstream structural analyses.

Original SARS2 and VOC Spike Models Putatively Have Minor Structural Differences
Both protein structure and conformation dynamics are associated to biological function (Chen and Bahar, 2004). To establish the possible spike structural variations among the VOC, spike models of each VOC (alpha, beta, gamma, delta) and the original SARS2 were superimposed and analyzed using RMSD values, TM scores, and contact map overlap (CMO) analyses. Measurements involving RMSD values focus on similarities between superimposed atomic coordinates (including amino acid residues), whereas, measurements involving TM scores focus on similarities between protein structures regardless of protein size (Zhang and Skolnick, 2005;Kufareva and Abagyan, 2012). Additionally, common contacts obtained through CMO analyses provide information related to pairwise spatial and functional relationship of residues within a protein while unifying certain features related to protein folding and structure prediction (Wang and Xu, 2013;Bittrich et al., 2019). Original SARS2 and VOC spike models used were generated by Phyre2 (Supplementary Figure S1). As seen in Figure 2A, alpha, gamma, and delta variants are possibly similar with the original SARS2 (RMSD <1.00), whereas, the beta variant has a higher structural difference compared to the original SARS2 and other VOC (RMSD >1.00). These observations are likewise generally consistent with TM scores ( Figure 2B). Moreover, CMO analyses between the original SARS2 and VOC showed similar common contact (95%) between the original and both alpha ( Figure 2C) and beta ( Figure 2D) variants while both gamma ( Figure 2E) and delta ( Figure 2F) variants had higher common contact at 100 and 99.5%, respectively. Taken together, we hypothesize that no major structural difference within the spike glycoprotein occurred among the original SARS2, alpha, gamma, and delta variants (RMSD <1.00; TM > 0.99), whereas, the beta variant putatively may have differed with regards to atomic coordinates when compared to the original SARS2 and VOC (RMSD >1.00). However, considering TM score, we likewise presume that no major structural difference occurred in the beta variant (TM > 0.98). Furthermore, similar common contact between the alpha and beta variants could suggest similar functional residues in both variants, whereas, the close to similar common contact (0.5% difference) between gamma and delta variants may likewise imply that functional residues are somewhat the same albeit with some minor difference. These results are consistent with SARS2 maintaining its genomic integrity across propagation (Mercatelli and Giorgi, 2020) and varying VOC transmissibility (Campbell et al., 2021). In this regard, we postulate that the overall spike model among VOC generally did not have a major deviation in terms of protein structural conformation from the original SARS2 spike model. Nevertheless, the minor structural deviation observed may contribute to each VOC having a unique biological characteristic especially in terms of viral transmissibility and immune evasion consistent with an earlier report (Campbell et al., 2021) showing that the effective reproduction numbers of the VOC differ among themselves, namely: alpha (4% compared to alpha), beta (4% compared to beta), gamma (10% compared to alpha; 17% compared to beta), and delta (55% compared to alpha; 60% compared to beta; 34% compared to gamma).
It is worth mentioning that the spike model of the gamma variant potentially has similar atomic coordinates (RMSD value), protein structure (TM score), and functional residues (CMO analyses) when compared to the original SARS2 spike model. Considering the gamma variant is more transmissible compared to the original SARS2 (Campbell et al., 2021), we hypothesize that the biological difference between the gamma variant and original SARS2 in terms of spike function is mainly associated with amino acid residue changes and not on protein structural variations. Additionally, it is also worth mentioning that individuals  infected with the beta variant have a higher chance of needing critical care and death occurrence compared to infections associated with alpha, gamma, and delta variants (Callaway, 2021) possibly due to high levels of immune evasion associated to the beta variant (Madhi et al., 2021). In this regard, we think that the difference in atomic coordinates of the beta variant (RMSD >1.00) compared to the other VOC (RMSD <1.00) is a contributing factor in COVID-19 infection severity.
Admittedly, additional work is needed to further explore these two points.

VOC Spike Models May Have Varying Structural Similarity to Endemic HCoVs
Among the known endemic HCoVs, both 229E and NL63 strains are classified under the alpha-CoV phylogenetic cluster while the   (Hamre and Procknow, 1966;Kapikian et al., 1969;Ksiazek et al., 2003;Chiu et al., 2005;Woo et al., 2005;Letko et al., 2020;Zhu et al., 2020). To determine the potential spike structural differences and similarities between VOC and endemic HCoVs, model superimposition and analyses (RMSD values, TM scores, and CMO analyses) were performed. All endemic HCoV spike models were generated by Phyre2 (Supplementary Figure S2). In terms of atomic coordinates (RMSD values), we found that VOC spike models differed (RMSD >2.6) from endemic HCoVs ( Figure 3A). However, in terms of protein structure (TM scores), we observed that VOC spike models ( Figure 3B . Interestingly, in terms of CMO analyses, we found that endemic HCoV spike models have the same common contact difference when compared to spike models from the alpha ( Figure 3C) and beta ( Figure 3D) variants which we suspect to be due to alpha and beta variants having putatively the same functional residues (common contact) consistent with our earlier results ( Figures 2C,D) and reported biological characteristics wherein effective reproduction numbers between the two variants are the same (Campbell et al., 2021). In contrast, both gamma ( Figure 3E) and delta ( Figure 3F) variants have varying common contact when compared to the endemic HCoV spike models which we likewise believe to be attributable to the difference in functional residues between the two variants consistent with our earlier results ( Figures 2E,F) and reported biological characteristics wherein the effective reproduction numbers of both gamma and delta variants differ between the two (Campbell et al., 2021). Noticeably, VOC spike models have high common contact (74.2-74.6%) with SARS1 which coincidentally belongs to the same lineage as that of SARS2. This would emphasize the close structural dynamics between SARS1 and VOC spike models which we attribute to high nucleotide similarity (Robson, 2020). Taken together, we postulate that the overall VOC spike models have varying atomic coordinates and functional residues while generally having the same protein structural conformation when compared to the endemic HCoV spike models.
Considering the results at this point (Figures 2, 3), we wish to highlight that data obtained from RMSD values, TM score, and CMO analyses were all based on superimposing full-length CoV spike protein models. However, since it is probable that the protein structural dynamics along a receptor binding site may be composed of different atomic coordinates (particularly, protein length and structure) while having a similar binding surface (Di Rienzo et al., 2017) [consistent with what we observed (TM > 0.98) ( Figure 2B)], further structural comparison is merited which would mainly focus on both S1-CTD and S1-NTD of the VOC spike models. VOC S1-CTD and S1-NTD Models Are Structurally Comparable to the Original SARS2 and Endemic HCoV S1 subunit of CoV spike glycoproteins is made up of the C-terminal domain (S1-CTD) and N-terminal domain (S1-NTD) which in-turn have been associated to host cell binding (Hulswit et al., 2016;Li, 2016). To elucidate the structural similarities and differences within the SARS2 S1-CTD and S1-NTD, VOC S1-CTD and S1-NTD models were superimposed with models from the original SARS2 and endemic HCoV. Structural analyses were done using TM score measurements. Surprisingly, when comparing the original SARS2 and VOC S1-CTD models, we found that they are structurally similar (TM 1.00) ( Figure 4A). Moreover, ocular inspection of the model superimposition between the original SARS2 and VOC S1-CTD models showed no difference ( Figures 4B-E). SARS2 pathogenesis and host tropism were linked to the SARS2 furin-like cleavage site (FLC) (Xing et al., 2020), however, protein structural analyses have shown that the SARS2 S1-CTD [alternatively known as the receptor binding domain (RBD)] is unaffected in the absence of the SARS2 FLC Papa et al., 2021). This emphasizes the structural importance of maintaining the structural conformation of the SARS2 S1-CTD with regards to viral pathogenesis and host tropism consistent with our results. In this regard, we postulate that regardless of successive SARS2 variants being generated, S1-CTD would most likely maintain its structural conformation. In contrast, we observed that the original SARS2 and VOC S1-NTD models had varying structural differences (TM > 0.95) ( Figure 4F) which can be further seen upon ocular inspection of the model superimposition between the original SARS2 and VOC S1-NTD models ( Figures 4G-J). Mutations along the S1-NTD have been linked to viral escape from humoral immune response Kemp et al., 2021) and S1-NTD was shown to bind to heme metabolites (in particular to biliverdin and bilirubin) which has been proposed to have a role in immune evasion (Rosa et al., 2021). This could putatively mean that structural alterations within the S1-NTD may contribute to immune evasion. Admittedly, additional experimentation is needed to further prove this point.
Subsequently, when comparing VOC and endemic HCoV S1-CTD models, we noted a consistent structural difference ( Figure 4K) which we ascribe to VOC S1-CTD models being structurally similar ( Figure 4A). On the other hand, when VOC and endemic HCoV S1-NTD models were structurally compared ( Figure 4L), we likewise observed varying structural differences consistent with our earlier results ( Figure 4F). Noticeably, both S1-CTD and S1-NTD models belonging to the same phylogenetic cluster (SARS1, OC43, HKU1, MERS) possibly have the same structural conformation (TM > 0.50) (Yang et al., 2015) with the VOC S1-CTD and S1-NTD models, respectively. These results are consistent with our earlier work and further emphasizes the possibility of the receptor binding structural conformation (S1-CTD and S1-NTD) being somewhat conserved in the same phylogenetic cluster and lineage (Cueno and Imai, 2021).
It is worth mentioning that gamma and delta S1-NTD models have similar TM scores ( Figure 4F) when compared to the original SARS2 S1-NTD insinuating that both variants have similar S1-NTD structural conformation. Considering both S1-CTD and S1-NTD models are structurally similar between the gamma and delta variants while having varying viral transmissibility (Campbell et al., 2021), we hypothesize that amino acid residue changes unique in each variant play a significant role in contributing to viral pathogenesis (Harvey et al., 2021). In a possible future work, it would be interesting to test this hypothesis.

DISCUSSION
SARS2 genome has mutated consistently with genetic changes occurring almost every week (Day et al., 2020;Mercatelli and Giorgi, 2020). Similarly, nonsynonymous nucleotide changes occurred which in-turn causes amino acid changes (Day et al., 2020). Additionally, these mutations are either high-effect (contribute to viral adaptation and fitness) or low-effect mutations (deleterious and rapidly purged) (Frost et al., 2018;Harvey et al., 2021). Moreover, heavily mutated SARS2 lineages have emerged since the original SARS2 was detected in December 2019 giving rise to VOC (Harvey et al., 2021;Sanyaolu et al., 2021). Throughout this study, we attempted to show that VOC spike models have structural similarities and differences with the original SARS2 and endemic HCoV spike models. Spike protein binding is the initial step in all CoV infections which is why it is the first CoV antigen targeted by the immune system (Hulswit et al., 2016;Li, 2016;Salvatori et al., 2020). In general, epitopes found along antigen regions are classified as either sequential (continuous or linear amino acid stretch) or conformational (discontinuous amino acid stretch) epitopes (Jerne, 1960;Benjamin et al., 1984;Gershoni et al., 2007). Moreover, antigen:antibody complexes formed are mainly composed of conformational epitopes (∼90%) (Haste Andersen et al., 2006). Additionally, antibody paratopes found in the antibody variable region primarily identify and interact with antigen epitopes thereby forming epitope: paratope complementarity which goes beyond amino acid sequence recognition but instead protein structure dynamics (Vojtek et al., 2019). Furthermore, every antibody paratope could interact with multiple antigen epitopes which in-turn could induce a polyclonal immune response resulting to crossreactivity (Sewell, 2012;Vojtek et al., 2019). These would highlight the potential significance of protein structure formation (particularly conformational epitopes) when considering SARS2 immune response induction. In fact, it was found that viral epitopes (such as Influenza and CMV) that lack sequence identity with SARS2 are able to stimulate an immune response (Mahajan et al., 2021) which we believe is attributable to similar protein structural formation. In this regard, we postulate that high VOC S1-CTD and S1-NTD structural similarity (TM > 0.70) with either the original SARS2 or endemic HCoV could putatively have crossreactivity with the original SARS2 and endemic HCoV spike models (Ladner et al., 2021) possibly ascribable to having multiple similar conformational epitopes that are considered valuable in neutralizing viral pathogenesis (Khare et al., 2021). This is consistent with previous work showing that T cell frequencies against the original SARS2 have likewise been correlated to VOC (Stankov et al., 2021) which we suspect to be due to structural similarity (particularly S1-CTD). Moreover, VOC have been shown to partially escape humoral immune response, however, VOC are found to be unable to escape cellular immune response among convalescent donors and vaccinees (Geers et al., 2021). This would highlight the putative significance of cellular immune response [particularly Th1 and Tfh cells (Poland et al., 2020) ] in providing lasting protection against VOC and, more importantly, the T cell-recognizing conformational epitopes that can counteract viral infectivity (Khare et al., 2021).
It is worth mentioning that VOC emergence is distinguished by having reduced susceptibility to polyclonal antibody responses which can potentially lead to increased reinfections or breakthrough infections (Geers et al., 2021). In this regard, we speculate that both reinfections and breakthrough infections are ascribable to T cell-recognizing conformational changes along the VOC spike glycoprotein [particularly S1-NTD Kemp et al., 2021)]. Admittedly, these speculations would need both laboratory and clinically-derived data to prove.
In summary, we putatively showed that: 1) minor structural differences occur in the whole original SARS2 and VOC spike protein model; 2) the whole VOC spike models possibly have differing structural similarity to spike models from endemic HCoVs, wherein, those belonging in the same phylogenetic cluster have high structural similarities while those belonging in a different phylogenetic cluster have low structural similarities; 3) original SARS2 S1-CTD and S1-NTD models are structurally similar to VOC S1-CTD and S1-NTD models; and 4) endemic HCoV S1-CTD and S1-NTD models are structurally similar to VOC S1-CTD and S1-NTD models belonging to the same phylogenetic cluster. Overall, we propose that structural similarities (possibly ascribable to similar conformational epitopes) may help determine immune cross-reactivity, whereas, structural differences (possibly associated with varying conformational epitopes) may lead to viral infection (either reinfection or breakthrough infection)

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.