Skip to main content

ORIGINAL RESEARCH article

Front. Genet., 29 November 2021
Sec. Computational Genomics
https://doi.org/10.3389/fgene.2021.753440

Hotspot Mutations in SARS-CoV-2

www.frontiersin.orgIndrajit Saha1*, www.frontiersin.orgNimisha Ghosh2, www.frontiersin.orgNikhil Sharma 3 and www.frontiersin.orgSuman Nandi1
  • 1Department of Computer Science and Engineering, National Institute of Technical Teachers’ Training and Research, Kolkata, India
  • 2Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, India
  • 3Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India

Since its emergence in Wuhan, China, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has spread very rapidly around the world, resulting in a global pandemic. Though the vaccination process has started, the number of COVID-affected patients is still quite large. Hence, an analysis of hotspot mutations of the different evolving virus strains needs to be carried out. In this regard, multiple sequence alignment of 71,038 SARS-CoV-2 genomes of 98 countries over the period from January 2020 to June 2021 is performed using MAFFT followed by phylogenetic analysis in order to visualize the virus evolution. These steps resulted in the identification of hotspot mutations as deletions and substitutions in the coding regions based on entropy greater than or equal to 0.3, leading to a total of 45 unique hotspot mutations. Moreover, 10,286 Indian sequences are considered from 71,038 global SARS-CoV-2 sequences as a demonstrative example that gives 52 unique hotspot mutations. Furthermore, the evolution of the hotspot mutations along with the mutations in variants of concern is visualized, and their characteristics are discussed as well. Also, for all the non-synonymous substitutions (missense mutations), the functional consequences of amino acid changes in the respective protein structures are calculated using PolyPhen-2 and I-Mutant 2.0. In addition to this, SSIPe is used to report the binding affinity between the receptor-binding domain of Spike protein and human ACE2 protein by considering L452R, T478K, E484Q, and N501Y hotspot mutations in that region.

1 Introduction

COVID-19 caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) was first identified in late December 2019 and has a high transmission rate (Zhu et al., 2020). The WHO declared this outbreak as a pandemic on March 11, 2020 (Cucinotta and Vanelli, 2020). Like other coronaviruses, SARS-CoV-2 is also an enveloped single-stranded RNA virus containing nearly 30 K nucleotide sequences (Alexandersen et al., 2020). SARS-CoV-2 encompasses 11 codding regions, which include ORF1ab, Spike (S), ORF3a, Envelope (E), Membrane (M), ORF6, ORF7a, ORF7b, ORF8, Nucleocapsid (N), and ORF10.

Though the vaccination process has started, the virus is evolving and spreading all across the world, causing fresh waves every few months. Since the virus is mutating frequently, it creates new variant of the original virus. Among several variants, B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma), and B.1.617.2 (Delta) are declared as variants of concern (Singh et al., 2021). In this regard, the variant B.1.1.7 was first identified in the United Kingdom, which contains E484K, N501Y, D614G, and P681H mutations in Spike glycoprotein (Tang et al., 2020). In December 2020, the variant B.1.351 was first detected in South Africa, with mutations such as K417N, E484K, N501Y, D614G, and A701V (Tang et al., 2021). The Brazilian variant P.1 also has almost the same mutations as the B.1.351 variant, but instead of A701V, the P.1 variant has H555Y mutation (Faria et al., 2021). On the other hand, the variant B.1.617.2 was first identified in India with L452R, T478K, D614G, and P681R mutations in Spike glycoprotein (Bernal et al., 2021).

To understand the new variants of SARS-CoV-2, Tiwari and Mishra (2021) have performed phylogenetic analysis of 591 SARS-CoV-2 genomes where they have found 43 synonymous and 57 non-synonymous mutations in 12 protein regions. They found the most prevalent mutations in the Spike protein, followed by NSP2, NSP3, and ORF9. They have also highlighted several distinct SARS-CoV-2 features as compared with other human-infecting viruses. Yuan et al. (2020) have analyzed 11,183 global sequences where they have identified 119 single-nucleotide polymorphisms (SNPs) with 74 non-synonymous and 43 synonymous mutations. The mutational profiling shows that the highest mutation has occurred in Nucleocapsid, followed by NSP2, NSP3, and Spike. From China, India, the United States, and Europe, 570 SARS-CoV-2 genomes are analyzed by Weber et al. (2020), where they have identified 10 individual mutations where most of the mutations altered the amino acids in the replication-relevant proteins. Sarkar et al. (2021) have performed a genome-wide analysis of 837 Indian SARS-CoV-2 genomes, where 33 unique mutations were observed, among which 18 mutations were identified in India in five protein regions (six in Spike, five in NSP3, four in RdRp, two in NSP2, and one in Nucleocapsid). The isolated Indian sequences were classified into 22 groups based on their coexisting mutations. This study highlights several mutations identified in various protein regions, which also help to identify the evolution of virus genome across various geographic locations of India. Saha et al. (2020) have performed phylogenetic analysis of 566 Indian SARS-CoV-2 genomes to identify several mutations. As a result, 933 substitutions, 2,449 deletions, and two insertions have been identified from the aligned sequences. In another study, Saha et al. (2021) have performed genomic analysis of 10,664 SARS-CoV-2 genomes, resulting in 7,209 substitutions, 11,700 deletions, 119 insertions, and 53 SNPs.

Motivated by the aforementioned analysis, in this work, we have performed multiple sequence alignment (MSA) of 71,038 SARS-CoV-2 genomes using MAFFT (Katoh et al., 2002) followed by their phylogenetic analysis using Nextstrain (Hadfield et al., 2018) to visualize the virus evolution. This led to the identification of hotspot mutations as deletions and substitutions in the coding regions based on entropy greater than or equal to 0.3. Furthermore, as a demonstrative example, 10,286 Indian sequences are considered from 71,038 global SARS-CoV-2 sequences. For all the non-synonymous substitutions (missense mutations), the functional consequences of amino acid changes in the respective protein structures are calculated using PolyPhen-2 and I-Mutant 2.0. Finally, SSIPe is used to report the binding affinity between the receptor-binding domain (RBD) of Spike protein and human ACE2 protein by considering the hotspot mutations in that region.

2 Methods

In this section, the dataset collection for the SARS-CoV-2 genomes is discussed along with the proposed pipeline.

2.1 Data Preparation

For MSA and phylogenetic analysis, 71,038 global SARS-CoV-2 genomes are collected from Global Initiative on Sharing All Influenza Data (GISAID)1, and the Reference Genome (NC 045512.2)2 is collected from the National Center for Biotechnology Information (NCBI). The SARS-CoV-2 sequences are mostly distributed from January 2020 to June 2021 globally. Moreover, to map the protein sequences and changes in the amino acid, Protein Data Bank (PDB) is collected from Zhang Lab3 (Zhang et al., 2020; Wu et al., 2021), and it is then used to show the structural changes. All these analyses are performed on the High Performance Computing facility of NITTTR, Kolkata; and for checking the amino acid changes, MATLAB R2019b is used.

2.2 Pipeline of the Work

The pipeline of this work is provided in Figure 1A. Initially, MSA of 71,038 global SARS-CoV-2 genomes is performed using MAFFT, which is followed by their phylogenetic analysis using Nextstrain. The corresponding phylogenetic tree is shown in Figure 1B. MAFFT merges local and global algorithms for MSA, and it uses two different heuristic methods such as progressive (FFT-NS-2) and iterative refinement (FFT-NS-i). To create a provisional MSA, FFT-NS-2 calculates all-pairwise distances from which refined distances are calculated. Thereafter, FFT-NS-i is performed to get the final MSA. As MAFFT uses fast Fourier transform, it scores over other alignment techniques. On the other hand, Nextstrain is a collection of open-source tools, which is useful for understanding the evolution and spread of pathogen, particularly during an outbreak. By taking advantage of this tool, in this work, the evolution and geographic distribution of SARS-CoV-2 genomes are visualized by creating the metadata in our High Performance Computing environment.

FIGURE 1
www.frontiersin.org

FIGURE 1. Pipeline of the workflow.

Once the alignment and the phylogenetic analysis are completed, hotspot mutations as deletions and substitutions are identified in the coding regions based on entropy greater than or equal to 0.3. Furthermore, 10,286 Indian sequences are considered as an example to identify such mutations as well. The corresponding phylogenetic tree for Indian sequences is shown in Figure 1C. Moreover, using the codon table, amino acid changes in the SARS-CoV-2 proteins for the corresponding mutations are highlighted as well. The hotspot mutations are identified considering their entropy values, which are calculated as:

E=ln5+λγδ[ln(λγδ)](1)

where λγδ represents the frequency of each residue γ occurring at position δ and 5 represents the four possible residues as nucleotides plus gap. Thereafter, the amino acid changes in the SARS-CoV-2 proteins for the non-synonymous deletions and substitutions for both global and Indian sequences are graphically visualized as shown in Figure 1D. Finally, these changes are also used for the evaluation of their functional characteristics and are visualized in the respective protein structure as well.

3 Results

The experiments in this work are carried out according to the pipeline as given in Figure 1A. Initially, MSA of 71,038 global SARS-CoV-2 genomes across 98 countries is carried out using MAFFT followed by their phylogenetic analysis using Nextstrain, which revealed five clades: 19A, 19B, 20A, 20B, and 20C. The number of sequences for each country is reported in Supplementary Table S1. This resulted in the identification of hotspot mutation points as deletions and substitutions in the coding regions based on entropy. In this regard, only those hotspot mutations are considered whose entropy values are greater than or equal to 0.3. The entropy values for each of the genomic coordinates for both global and Indian sequences are provided in Supplementary Table S2. The mutation statistics by considering different threshold values of entropy for each category are reported in Table 1. Based on the results in this table, the entropy value of 0.3 is considered as the threshold for choosing the hotspot mutations. It is to be noted that choosing a threshold value as either 0.2 or 0.1 will lead to a huge amount of hotspot mutations, which is not desired. As a consequence of choosing entropy threshold of 0.3, 45 unique hotspot mutations are identified, which resulted in 39 non-synonymous deletions and substitutions with nine unique deletions and 22 unique amino acid changes. Also, out of the 98 countries that are considered for global analysis, India with 10,286 sequences is taken as an example to demonstrate the mutations for a particular country as well. In this regard, 52 unique hotspot mutations provide 45 non-synonymous deletions and substitutions with five unique amino acid changes for deletions and 36 unique amino acid changes for substitutions. The analysis on other countries with the most number of sequences is provided in the Supplementary Material. The phylogenetic trees in radial and rectangular views considering global analysis are shown in Figures 2A,B, respectively, while for Indian sequences, such views are provided in Figures 2D,E, respectively. These phylogenetic trees respectively show the evolution of the global and Indian SARS-CoV-2 genomes over the months. For the benefit of the readers, it is important to mention that the number of sequences does not have any direct relationship with the number of hotspot mutations. The number of hotspots is based on the entropy value, which in turn depends on the frequency of mutations at a given genomic coordinate. So even with smaller number of sequences, if the frequency of mutations is higher than that with larger number of sequences, it will produce more hotspot mutations. Thus, with 71,038 global sequences, 45 unique hotspot mutations are identified, while for 10,286 Indian sequences, 52 such mutations are identified.

TABLE 1
www.frontiersin.org

TABLE 1. Mutation statistics of 71,038 global and 10,286 Indian SARS-CoV-2 genomes by considering different threshold values.

FIGURE 2
www.frontiersin.org

FIGURE 2. Phylogenetic analysis of (A, B, C) global and (D, E, F) Indian SARS-CoV-2 genomes.

The list of hotspot mutations for the global and Indian SARS-CoV-2 genomes along with their associated details is respectively provided in Tables 2 and 3. For example, in Table 2, genomic coordinate 28,881 in Nucleocapsid with nucleotide changes G > A and G > T has the highest entropy value of 0.773655. India also shows the same mutation but with an entropy value of 1.14807 as shown in Table 3. Please note that mutations like G28881A and G28883C may have an impact on antigenicity of Nucleocapsid protein (Yuan et al., 2020). The entropy values for the corresponding nucleotide changes for global analysis are shown in Figure 2C, while for India, the same is shown in Figure 2F. It is to be noted that the total number of unique amino acid changes for deletions and substitutions is less than the number of non-synonymous deletions and substitutions. One of the reasons for this can be that if there are deletions at consecutive genomic coordinates, the corresponding amino acid changes are the same. For example, as can be seen from Table 2, at the three consecutive genomic coordinates 11,288, 11,289, and 11,290, deletion has occurred with the amino acid change as S106-. Thus, though the number of non-synonymous deletions is 3, the number of unique amino acid change is 1. This is true for other such changes as well.

TABLE 2
www.frontiersin.org

TABLE 2. List of hotspot mutations for 71,038 global SARS-CoV-2 genomes along with the protein change.

TABLE 3
www.frontiersin.org

TABLE 3. List of hotspot mutations for 10,286 Indian SARS-CoV-2 genomes along with the protein change.

The amino acid changes in protein for the non-synonymous deletions and substitutions as reported in Tables 2 and 3 are visualized in Figure 1D; Supplementary Figure S1. All the amino acid changes in the protein for the non-synonymous substitutions or missense mutations for the global sequences are shown in Figure 3, while the same for the Indian sequences are depicted in Figure 4. The month-wise virus evolution in terms of entropy for both global and Indian genomic sequences is visualized respectively in Figures 5 and 6, while the corresponding entropy values are reported in Supplementary Tables S3 and S4. For example, it can be seen from both the figures that both P681H and P681R, which are part of the variant of concerns Alpha or B.1.1.7 and Delta or B.1.617.2, have evolved over time globally and for India as well. It is to be noted that due to the lack of appropriate number of sequences, the data of January and February 2020 have been merged for the global analysis, while for India, such merging is for the months January to March 2020. Also, please note that since the calculation of entropy is performed on aligned sequences, only coding regions are considered for the identification of hotspot mutations, as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations. Furthermore, the evolution of the mutation points for global SARS-CoV-2 genomes pertaining to the different variants of concern like Alpha, Beta, Gamma, and Delta as declared by the WHO is also reported respectively in Figures 7A,B,C,D. It can be observed from the figures that the popular mutation D614G, which is common in all the variants though predominant in the earlier months of the pandemic, has waned over time. Also, the mutation T478K, which is unique to the Delta variant, is known to facilitate antibody escape (Planas et al., 2021). Some important hotspot mutations like H69-, V70-, Y144-, A222V, N501Y, A570D, P681H, and P681R identified in this study are associated with the different SARS-CoV-2 variants of concern like Alpha, Beta, Gamma, and Delta.

FIGURE 3
www.frontiersin.org

FIGURE 3. Highlighted amino acid changes in the protein structures for the non-synonymous substitutions or missense hotspot mutations for global SARS-CoV-2 genomes in (A) NSP3, (B) ORF3a, (C) Spike, (D) ORF8, and (E) Nucleocapsid.

FIGURE 4
www.frontiersin.org

FIGURE 4. Highlighted amino acid changes in the protein structures for the non-synonymous substitutions or missense hotspot mutations for Indian SARS-CoV-2 genomes in (A) NSP2, (B) NSP3, (C) NSP4, (D) NSP6, (E) RdRp, (F) helicase, (G) endoRNAse, (H) ORF3a, (I) Membrane, (J) Spike, (K) ORF7a, and (L) Nucleocapsid.

FIGURE 5
www.frontiersin.org

FIGURE 5. Month-wise evolution of global SARS-CoV-2 genomes based on entropy.

FIGURE 6
www.frontiersin.org

FIGURE 6. Month-wise evolution of Indian SARS-CoV-2 genomes based on entropy.

FIGURE 7
www.frontiersin.org

FIGURE 7. Month-wise evolution of (A) Alpha (B.1.1.7), (B) Beta (B.1.351), (C) Gamma (501.V3), and (D) Delta (B.1.617.2) variants in global SARS-CoV-2 genomes.

The unique and common hotspot mutations between global and Indian sequences are represented in the form of Venn diagram in Figures 8A,B, which shows the unique and common non-synonymous hotspot mutations, while the unique and common amino acid changes are shown in Figure 8C. As shown in Figure 8A, there are 37 and 44 unique mutations in global and Indian sequences, while eight are common in both. For non-synonymous hotspot deletions and substitutions, there are 32 and 38 unique mutations in each category, while the common number of such mutations is seven as reported in Figure 8B. For amino acid changes, as shown in Figure 8C, these statistics are 22, 32, and nine. The Venn diagram showing the common and unique hotspot mutations for global and Indian sequences with Alpha, Beta, Gamma, and Delta variants of SARS-CoV-2 is reported in Supplementary Figure S2. For example, in Supplementary Figure S2A, there are four unique mutations in both global sequences and Alpha variant, while there are nine mutations that are common to both.

FIGURE 8
www.frontiersin.org

FIGURE 8. Venn diagrams of global and Indian SARS-CoV-2 genomes to represent common hotspot mutations.

4 Discussion

There are spurts of new waves in almost every country around the globe. India has already gone through the massively catastrophic second wave, and according to the experts, a third wave is imminent. This can be attributed to the fact that the virus is evolving and new strains are getting identified, thereby making the study of this ever-evolving virus all the more important. The functional characteristics of some important mutations in the global and Indian SARS-CoV-2 genomic sequences are reported in Table 4.

TABLE 4
www.frontiersin.org

TABLE 4. Functional characteristics of some important mutations.

Structural changes in amino acid residues may sometimes lead to functional instability in proteins due to change in protein translations. To judge their characteristics, these changes are demonstrated through sequence and structural homology-based prediction for the hotspot deletions and missense mutations for global and Indian sequences in Table 5. The tools used for these predictions are PolyPhen-2 (Polymorphism Phenotyping) (Adzhubei et al., 2010) and I-Mutant 2.0 (Capriotti et al., 2005). PolyPhen-24 works with sequence, structural, and phylogenetic information of missense mutations, while I-Mutant 2.05 uses support vector machine (SVM) for the automatic prediction of protein stability changes upon missense mutations. PolyPhen-2 is used to find the damaging hotspot mutations, and I-Mutant 2.0 determines protein stability. To determine if a mutation is damaging using PolyPhen-2, its score is considered, which lies between 0 and 1. If the score is close to 1, then a mutation is considered to be damaging. It can be concluded from Table 5 that out of the 22 unique amino acid changes for substitutions in global sequences, 14 are damaging, while for Indian sequences, 24 are damaging out of 36 changes. It is important to note that in case of protein, damaging mostly defines instability. Generally, this is used for human proteins. As a consequence, if the human protein is damaging in nature because of mutations, then the human protein–protein interactions may occur with high or low binding affinity. Now in case of virus, similar consequences may happen, which means that if the virus protein is damaged because of mutations, it may interact with human proteins with similar binding affinity. As a result, the virus may acquire characteristics like transmissibility and escaping antibodies (Alenquer et al., 2021; Harvey et al., 2021).

TABLE 5
www.frontiersin.org

TABLE 5. Sequence and structural homology-based prediction of non-synonymous substitution as hotspot mutations along with their protein structural stability for 71,038 global SARS-CoV-2 genomes.

Another important parameter to judge the functional and structural activities of a protein is protein stability, which dictates the conformational structure of a protein. Any change in protein stability may cause misfolding, degradation, or aberrant conglomeration of proteins. I-Mutant 2.0 uses free energy change values (DDG (kcal/mol)) to predict the changes in the protein stability wherein a negative value of DDG indicates that the protein has a decreasing stability, while a positive value indicates an increase in stability. For example, the very low DDG value of G25563T shows that there is a decreased protein stability, thereby resulting in a reduction of virus virulence (Cheng et al., 2021). The results from I-mutant 2.0 show that out of the 14 and 24 unique damaging changes for global and Indian sequences, 10 and 18 changes respectively decrease the stability of the protein structures. Figure 9 shows the binding affinity between the RBD of Spike protein and human ACE2 protein performed using SSIPe6 (Huang et al., 2019) for the four mutations of SARS-CoV-2, viz., L452R, T478K, E484Q, and N501Y, taking place in such domain. The region marked in red shows the exact positions (471–492) where the binding takes place. To report the binding affinity using SSIPe, initially the RBD region of Spike protein (Woo et al., 2020) is docked with human ACE2 protein7 using PatchDock8. The best docked structure is then provided as an input to SSIPe. Table 6 further reports the binding affinity values for the four mutations. A strongly favorable mutation is usually defined as the one that has DDG value ≤ −1.5 kcal/mol, while a strongly unfavorable mutation is the one that has DDG value ≥1.5 kcal/mol. The DDG value of −0.769 kcal/mol for E484Q indicates that this is a favorable mutation, while DDG values of 1.083, 1.248, and 0.236 kcal/mol for L452R, T478K, and N501Y indicate that these mutations are somewhat unfavorable. These results corroborate our earlier explanation that because of mutation, virus–human protein–protein interactions may occur with high or low binding affinity.

FIGURE 9
www.frontiersin.org

FIGURE 9. Binding between RBD region of Spike protein (specifically 471–492 the region marked in red)) and human ACE2 protein. RBD, receptor-binding domain.

TABLE 6
www.frontiersin.org

TABLE 6. Binding affinity of the mutations in RBD region of Spike protein and human ACE2 protein.

Supplementary Figure S3 shows the percentage of nucleotide change and frequency of nucleotide change for hotspot mutations for global and Indian sequences. For example, in Supplementary Figure S3A, the occurrence of nucleotide change G > A in 71,038 global sequences is almost 45%, while the number of times it occurs in 45 hotspot mutations is two, as is also evident from Table 2. It can also be seen from Supplementary Figures S3B, S3D that 10 and 16 out of 39 and 45 non-synonymous mutations are from C to T, thereby representing abundant transition. This transition increases the frequency of codons for hydrophobic amino acids and provides evidence of potential antiviral editing mechanisms driven by host (Yuan et al., 2020). Also, more C-to-T transition means less CpG abundance, indicating rapid adaptation of virus in host. This CpG deficiency, which leads to evasion of host antiviral defense mechanisms, is exhibited the most in SARS-CoV-2 virus (Xia, 2020).

5 Conclusion

With the imminent third wave, it is very crucial to understand the evolution of SARS-CoV-2. In this regard, MSA of 71,038 SARS-CoV-2 genomes of 98 countries over the period from January 2020 to June 2021 is performed using MAFFT followed by phylogenetic analysis to visualize the evolution of SARS-CoV-2. This resulted in the identification of hotspot mutations as deletions and substitutions in the coding regions based on entropy, which should be greater than or equal to 0.3. Consequently, a total of 45 unique hotspot mutations out of which 39 non-synonymous deletions and substitutions are identified with nine unique amino acid changes for deletions and 22 unique amino acid changes for substitutions. Moreover, 10,286 Indian sequences are considered from 71,038 global SARS-CoV-2 sequences as a demonstrative example, which gives 52 unique hotspot mutations, resulting in 45 non-synonymous deletions and substitutions with five unique amino acid changes for deletions and 36 unique amino acid changes for substitutions. Some important mutations in such sequences pertaining to the Delta variant of SARS-CoV-2 are T19R, G142D, E156-, F157-, L452R, T478K, and P681R. Furthermore, the evolution of the hotspot mutations along with the mutations in variants of concern is visualized, and their characteristics are also discussed. Moreover, for all the missense mutations, the functional consequences of amino acid changes in the respective protein structures are calculated using PolyPhen-2 and I-Mutant 2.0. Finally, SSIPe is used to report the binding affinity between the RBD of Spike protein and human ACE2 protein by considering L452R, T478K, E484Q, and N501Y hotspot mutations in that region.

Data Availability Statement

The aligned 71038 Global SARS-CoV-2 genomes with the reference sequence and the final results of this work are available at http://www.nitttrkol.ac.in/indrajit/projects/COVID-Hotspot-Mutation-Global-71K/. Further inquiries can be directed to the corresponding author.

Author Contributions

IS and NG designed the research. IS, NG, NS, and SN analyzed the data and wrote the article. All the authors reviewed and approved the final version of the article.

Funding

This work has been partially supported by CRG short-term research grant on COVID-19 (CVD/2020/000,991) from Science and Engineering Research Board (SERB), Department of Science and Technology, Govt. of India. However, it does not provide any publication fees.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We thank all those who have contributed sequences to GISAID and NCBI databases.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2021.753440/full#supplementary-material

Footnotes

1https://www.gisaid.org/

2https://www.ncbi.nlm.nih.gov/nuccore/1798174254

3https://zhanglab.ccmb.med.umich.edu/COVID-19/

4http://genetics.bwh.harvard.edu/pph2/

5https://folding.biofold.org/i-mutant/i-mutant2.0.html

6https://zhanggroup.org/SSIPe/

7https://www.rcsb.org/structure/1R42

8http://bioinfo3d.cs.tau.ac.il/PatchDock/patchdock.html

References

Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., et al. (2010). A Method and Server for Predicting Damaging Missense Mutations. Nat. Methods 7, 248–249. doi:10.1038/nmeth0410-248

PubMed Abstract | CrossRef Full Text | Google Scholar

Alenquer, M., Ferreira, F., Lousa, D., Valério, M., Medina-Lopes, M., Bergman, M.-L., et al. (2021). Signatures in SARS-CoV-2 Spike Protein Conferring Escape to Neutralizing Antibodies. Plos Pathog. 17, e1009772. doi:10.1371/journal.ppat.1009772

PubMed Abstract | CrossRef Full Text | Google Scholar

Alexandersen, S., Chamings, A., and Bhatta, T. R. (2020). SARS-CoV-2 Genomic and Subgenomic RNAs in Diagnostic Samples Are Not an Indicator of Active Replication. Nat. Commun. 11, 1–13. doi:10.1038/s41467-020-19883-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Bernal, J. L., Andrews, N., Gower, C., Gallagher, E., Simmons, R., Thelwall, S., et al. (2021). Effectiveness of Covid-19 Vaccines against the B.1.617.2 (Delta) Variant. N. Engl. J. Med. 385 (7), 585–594. doi:10.1056/NEJMoa2108891

PubMed Abstract | CrossRef Full Text | Google Scholar

Boehm, E., Kronig, I., Neher, R. A., Eckerle, I., Vetter, P., and Kaiser, L. (2021). Novel SARS-CoV-2 Variants: the Pandemics within the Pandemic. Clin. Microbiol. Infect. 27, 1109–1117. doi:10.1016/j.cmi.2021.05.022

PubMed Abstract | CrossRef Full Text | Google Scholar

Capriotti, E., Fariselli, P., and Casadio, R. (2005). I-Mutant2.0: Predicting Stability Changes upon Mutation from the Protein Sequence or Structure. Nucleic Acids Res. 33, W306–W310. doi:10.1093/nar/gki375

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, L., Han, X., Zhu, Z., Qi, C., Wang, P., and Zhang, X. (2021). Functional Alterations Caused by Mutations Reflect Evolutionary Trends of SARS-CoV-2. Brief. Bioinform. 22, 1442–1450. doi:10.1093/bib/bbab042

PubMed Abstract | CrossRef Full Text | Google Scholar

Cucinotta, D., and Vanelli, M. (2020). WHO Declares COVID-19 a Pandemic. Acta Biomed. 91, 157–160. doi:10.23750/abm.v91i1.9397

PubMed Abstract | CrossRef Full Text | Google Scholar

Faria, N. R., Mellan, T. A., Whittaker, C., Claro, I. M., Candido, D. D. S., Mishra, S., et al. (2021). Genomics and Epidemiology of the P.1 SARS-CoV-2 Lineage in Manaus, Brazil. Science 372, 815–821. doi:10.1126/science.abh2644

PubMed Abstract | CrossRef Full Text | Google Scholar

Garcia-Beltran, W. F., Lam, E. C., St. Denis, K., Nitido, A. D., Garcia, Z. H., Hauser, B. M., et al. (2021). Multiple SARS-CoV-2 Variants Escape Neutralization by Vaccine-Induced Humoral Immunity. Cell 184, 2372–2383.e9. doi:10.1016/j.cell.2021.03.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Greaney, A. J., Starr, T. N., Gilchuk, P., Zost, S. J., Binshtein, E., Loes, A. N., et al. (2021). Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition. Cell Host & Microbe 29, 44–57.e9. doi:10.1016/j.chom.2020.11.007

CrossRef Full Text | Google Scholar

Hadfield, J., Megill, C., Bell, S. M., Huddleston, J., Potter, B., Callender, C., et al. (2018). Nextstrain: Real-Time Tracking of Pathogen Evolution. Bioinformatics 34, 4121–4123. doi:10.1093/bioinformatics/bty407

PubMed Abstract | CrossRef Full Text | Google Scholar

Harvey, W. T., Carabelli, A. M., Jackson, B., Gupta, R. K., Thomson, E. C., Harrison, E. M., et al. (2021). SARS-CoV-2 Variants, Spike Mutations and Immune Escape. Nat. Rev. Microbiol. 19, 409–424. doi:10.1038/s41579-021-00573-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, X., Zheng, W., Pearce, R., and Zhang, Y. (2019). SSIPe: Accurately Estimating Protein-Protein Binding Affinity Change upon Mutations Using Evolutionary Profiles in Combination with an Optimized Physical Energy Function. Bioinformatics 36, 2429–2437. doi:10.1093/bioinformatics/btz926

CrossRef Full Text | Google Scholar

Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform. Nucleic Acids Res. 30, 3059–3066. doi:10.1093/nar/gkf436

PubMed Abstract | CrossRef Full Text | Google Scholar

Luan, B., Wang, H., and Huynh, T. (2021). Enhanced Binding of the N501Y‐mutated SARS‐CoV‐2 Spike Protein to the Human ACE2 Receptor: Insights from Molecular Dynamics Simulations. FEBS Lett. 595, 1454–1461. doi:10.1002/1873-3468.14076

PubMed Abstract | CrossRef Full Text | Google Scholar

McCarthy, K. R., Rennick, L. J., Nambulli, S., Robinson-McCarthy, L. R., Bain, W. G., Haidar, G., et al. (2021). Recurrent Deletions in the SARS-CoV-2 Spike Glycoprotein Drive Antibody Escape. Science 371, 1139–1142. doi:10.1126/science.abf6950

PubMed Abstract | CrossRef Full Text | Google Scholar

Meng, B., Kemp, S. A., Papa, G., Datir, R., Ferreira, I. A. T. M., Marelli, S., et al. (2021). Recurrent Emergence of SARS-CoV-2 Spike Deletion H69/V70 and Its Role in the Alpha Variant B.1.1.7. Cell Rep. 35 (13), 109292. doi:10.1016/j.celrep.2021.109292

PubMed Abstract | CrossRef Full Text | Google Scholar

Planas, D., Veyer, D., Baidaliuk, A., Staropoli, I., Guivel-Benhassine, F., Rajah, M. M., et al. (2021). Reduced Sensitivity of SARS-CoV-2 Variant Delta to Antibody Neutralization. Nature 596, 276–280. doi:10.1038/s41586-021-03777-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Saha, I., Ghosh, N., Maity, D., Sharma, N., Sarkar, J. P., and Mitra, K. (2020). Genome-wide Analysis of Indian SARS-CoV-2 Genomes for the Identification of Genetic Mutation and SNP. Infect. Genet. Evol. 85, 104457. doi:10.1016/j.meegid.2020.104457

PubMed Abstract | CrossRef Full Text | Google Scholar

Saha, I., Ghosh, N., Pradhan, A., Sharma, N., Maity, D., and Mitra, K. (2021). Whole Genome Analysis of More Than 10 000 SARS-CoV-2 Virus Unveils Global Genetic Diversity and Target Region of NSP6. Brief. Bioinform. 22, 1106–1121. doi:10.1093/bib/bbab025

PubMed Abstract | CrossRef Full Text | Google Scholar

Sarkar, R., Mitra, S., Chandra, P., Saha, P., Banerjee, A., Dutta, S., et al. (2021). Comprehensive Analysis of Genomic Diversity of SARS-CoV-2 in Different Geographic Regions of India: an Endeavour to Classify Indian SARS-CoV-2 Strains on the Basis of Co-existing Mutations. Arch. Virol. 166, 801–812. doi:10.1007/s00705-020-04911-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Singh, J., Rahman, S. A., Ehtesham, N. Z., Hira, S., and Hasnain, S. E. (2021). SARS-CoV-2 Variants of Concern Are Emerging in India. Nat. Med. 27, 1131. doi:10.1038/s41591-021-01397-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Tang, J., Toovey, O., Harvey, K., and Huic, D. (2021). Introduction of the South African SARS-CoV-2 Variant 501Y.V2 into the UK. J. Infect. 82, e8. doi:10.1016/j.jinf.2021.01.007

CrossRef Full Text | Google Scholar

Tang, J. W., Tambyah, P. A., and Hui, D. S. (2021). Emergence of a New SARS-CoV-2 Variant in the UK. J. Infect. 82, e27–e28. doi:10.1016/j.jinf.2020.12.024

CrossRef Full Text | Google Scholar

Tiwari, M., and Mishra, D. (2021). Investigating the Genomic Landscape of Novel Coronavirus (2019-nCoV) to Identify Non-synonymous Mutations for Use in Diagnosis and Drug Design. J. Clin. Virol. 128, 104441. doi:10.1016/j.jcv.2020.104441

CrossRef Full Text | Google Scholar

Weber, S., Ramirez, C., and Doerfler, W. (2020). Signal Hotspot Mutations in SARS-CoV-2 Genomes Evolve as the Virus Spreads and Actively Replicates in Different Parts of the World. Virus. Res. 289, 198170. doi:10.1016/j.virusres.2020.198170

PubMed Abstract | CrossRef Full Text | Google Scholar

Woo, H., Park, S.-J., Choi, Y. K., Park, T., Tanveer, M., Cao, Y., et al. (2020). Developing a Fully Glycosylated Full-Length SARS-CoV-2 Spike Protein Model in a Viral Membrane. J. Phys. Chem. B 124, 7128–7137. doi:10.1021/acs.jpcb.0c04553

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, S., Tian, C., Liu, P., Guo, D., Zheng, W., Huang, X., et al. (2021). Effects of SARS‐CoV‐2 Mutations on Protein Structures and Intraviral Protein-Protein Interactions. J. Med. Virol. 93, 2132–2140. doi:10.1002/jmv.26597

CrossRef Full Text | Google Scholar

Xia, X. (2020). Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Mol. Biol. Evol. 37, 2699–2705. doi:10.1093/molbev/msaa094

PubMed Abstract | CrossRef Full Text | Google Scholar

Yuan, F., Wang, L., Fang, Y., and Wang, L. (2020). Global SNP Analysis of 11,183 SARS‐CoV‐2 Strains Reveals High Genetic Diversity. Transbound. Emerg. Dis. doi:10.1111/tbed.13931

CrossRef Full Text | Google Scholar

Zhang, C., Zheng, W., Huang, X., Bell, E. W., Zhou, X., and Zhang, Y. (2020). Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as its Intermediate Host and the Unique Similarity between its Spike Protein Insertions and HIV-1. J. Proteome Res. 19, 1351–1360. doi:10.1021/acs.jproteome.0c00129

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., et al. (2020). A Novel Coronavirus from Patients with Pneumonia in China, 2019. N. Engl. J. Med. 382, 727–733. doi:10.1056/NEJMoa2001017

CrossRef Full Text | Google Scholar

Keywords: COVID-19, deletions, entropy, hotspot mutations, SARS-CoV-2 genomes, substitution

Citation: Saha I, Ghosh N, Sharma  N and Nandi S (2021) Hotspot Mutations in SARS-CoV-2. Front. Genet. 12:753440. doi: 10.3389/fgene.2021.753440

Received: 04 August 2021; Accepted: 07 October 2021;
Published: 29 November 2021.

Edited by:

Yang Zhang, University of Michigan, United States

Reviewed by:

Xiaoqiang Huang, University of Michigan, United States
Yavuz Oktay, Dokuz Eylul University, Turkey

Copyright © 2021 Saha, Ghosh, Sharma  and Nandi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Indrajit Saha, indrajit@nitttrkol.ac.in

These authors have contributed equally to this work

Download