SARS-CoV-2 Spike Glycoprotein and ACE2 Interaction Reveals Modulation of Viral Entry in Wild and Domestic Animals

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a viral pathogen causing life-threatening diseases in humans. Interaction between the spike protein of SARS-CoV-2 and angiotensin-converting enzyme 2 (ACE2) is a potential factor in the infectivity of a host. In this study, the interaction of SARS-CoV-2 spike protein with its receptor, ACE2, in different hosts was evaluated to predict the probability of viral entry. Phylogeny and alignment comparison of the ACE2 sequences did not lead to any meaningful conclusion on viral entry in different hosts. The binding ability between ACE2 and the spike protein was assessed to delineate several spike binding parameters of ACE2. A significant difference between the known infected and uninfected species was observed for six parameters. However, these parameters did not specifically categorize the Orders into infected or uninfected. Finally, a logistic regression model constructed using spike binding parameters of ACE2, revealed that in the mammalian class, most of the species of Carnivores, Artiodactyls, Perissodactyls, Pholidota, and Primates had a high probability of viral entry. However, among the Proboscidea, African elephants had a low probability of viral entry. Among rodents, hamsters were highly probable for viral entry with rats and mice having a medium to low probability. Rabbits have a high probability of viral entry. In Birds, ducks have a very low probability, while chickens seemed to have medium probability and turkey showed the highest probability of viral entry. The findings prompt us to closely follow certain species of animals for determining pathogenic insult by SARS-CoV-2 and for determining their ability to act as a carrier and/or disseminator.

866 (2,519 MERS cases) human lives, respectively (1), while SADS devastated livestock production by causing fatal disease in pigs in 2017. The SARS and MERS viruses had several common factors in having originated from bats in China and being pathogenic to humans or livestock (2)(3)(4). Seventeen years after the first highly pathogenic human CoVs, SARS-COV-2 is devastating the world with 87,808,867 cases and 1,894,632 deaths (as on January 07, 2021) (5). This outbreak was first identified in Wuhan City, Hubei Province, China, in December 2019 and notified by WHO on January 5, 2020. The disease has since been named as COVID-19 by WHO.
Coronaviruses are an enveloped, crown-like viral particles belonging to the subfamily Orthocoronavirinae in the family Coronaviridae and the Order Nidovirales. They harbor a positivesense, single-strand RNA (+ssRNA) genome of 27-32 kb in size. Two large overlapping polyproteins, ORF1a and ORF1b, that are processed into the viral polymerase (RdRp) and other non-structural proteins involved in RNA synthesis or host response modulation, cover two-thirds of the genome. The rest one-third of the genome encodes for four structural proteins [spike (S), envelope (E), membrane (M), and nucleocapsid (N)] and other accessory proteins. The four structural proteins and the ORF1a/ORF1b are relatively consistent among the CoVs; however, the number and size of accessory proteins govern the length of the CoV genome (4). This genome expansion is said to have facilitated the acquisition of genes that encode accessory proteins, which are beneficial for CoVs to adapt to a specific host (6,7). Next-generation sequencing has increased the detection and identification of new CoV species resulting in the expansion of the CoV subfamily. Currently, there are four genera (α-, β-, δ-, and γ-) with 38 unique species in CoV subfamily [International Committee on Taxonomy of Viruses (ICTV) classification] including the three highly pathogenic CoVs, viz., SARS-CoV-1, MERS-CoV, and SARS-CoV-2 are β-CoVs (8).
Coronaviruses are notoriously promiscuous. Bats host thousands of these types, without succumbing to illness. The CoVs are known to infect mammals and birds, including dogs, chickens, cattle, pigs, cats, pangolins, and bats. These viruses have the potential to leap to new species and in this process mutate along the way to adapt to their new host(s). coronavirus disease 2019 (COVID- 19), a global crisis, likely started with CoV-infected horseshoe bat in China. The SARS-CoV-2 is spreading around the world in the hunt of entirely new reservoir hosts for reinfecting people in the future (9). Recent reports of COVID-19 in a Pomeranian dog and a German shepherd in Hong Kong (10); in a domestic cat in Belgium (11); in five Malayan tigers and three lions at the Bronx Zoo in New York City (12) and in minks (13) make it all more necessary to predict species that could be the most likely potential reservoir hosts in times to come.
Angiotensin-converting enzyme 2 (ACE2) is an enzyme that physiologically counters renin-angiotensin-aldosteronealdosterone system (RAAS) activation that functions as a receptor for both the SARS viruses (SARS-CoV-1 and SARS-CoV-2) (14)(15)(16). The ACE2 human RefSeqGene is 48,037 bp in length with 18 exons and is located on chromosome X. ACE2 is found attached to the outer surface of cells in the lungs, arteries, heart, kidney, and intestines (17,18). The potential factor in the infectivity of a cell is the interaction between SARS viruses and the ACE2 receptor (19,20). By comparing the ACE2 sequence, several species that might be infected with SARS-CoV2 have been identified (21). Recent studies, exposing cells/animals to the SARS-CoV2, revealed humans, horseshoe bats, civets, ferrets, cats, and pigs could be infected with the virus and mice, dogs, pigs, chickens, and ducks could not be or poorly infected (16,22). Pigs, chickens, fruit bats, and ferrets are being exposed to SARS-CoV2 at the Friedrich-Loeffler Institute and initial results suggest that Egyptian fruit bats and ferrets are susceptible, whereas pigs and chickens are not susceptible (23). In this cause of predicting potential hosts, no studies on ACE2 sequence comparison among species along with homology modeling and prediction, to define its interaction with the spike protein of SARS-CoV-2, are available. Therefore, this study is taken to identify the viral entry in potential hosts through sequence comparison, homology modeling, and prediction.

Sequence Comparison of ACE2
The protein and DNA sequence lengths of ACE2 varied in different hosts (Supplementary Table 1). Among the sequences that were compared, the longest coding sequence (CDS) was found in the order-Chiroptera (Myotis brandtii-811 aa) and the smallest in the order-Proboscidea (Loxodonta africana-800 aa). The within-group mean distance, the parameter indicative of variability of nucleotide sequences within the group, was found to be minimum in Perrisodactyla followed by Primates and was maximum among the Galliformes followed by Chiroptera (Supplementary Table 2). To establish the probability of SARS-CoV-2 entry into species of other orders, the distance of all the orders from Primates was assessed (Supplementary Table 3). This distance was found minimum for Perissodactyls followed by Carnivores and maximum for Galliformes followed by Anseriformes. Further, to decide a cutoff distance that can establish whether the species can be infected or not, the individual distance of each species from Homo sapiens was evaluated (Supplementary Table 3). Meleagris gallopavo (Turkey) is the species, which had the greatest distance from Homo sapiens. The minimum distance that corresponded to the species that was already established to be uninfected with the SARS-CoV-2, i.e., Sus scrofa, was 0.194. The codonbased test of neutrality to understand the selection pressure on the ACE2 sequence in the process of evolution was done. The analysis showed that there was a significant negative selection between and within orders for the ACE2 sequence. On sequence comparison of the spike-interacting domain of the alignments, both the protein and nucleotide (Data Sheets 1, 2) showed that the sequences were well conserved within the orders, suggesting that the structure defined by the sequence was conserved within the orders. The maximum variability with the Homo sapiens sequence within these regions was observed for Galliformes, followed by Accipitriformes, Testudines, Crocodilia, and Chiroptera. The protein sequence alignment at 30-41, 82-84, and 353-357 aa also showed similar sequence conservation and variability.

Phylogenetic Analysis
The protein sequences aligned were further subjected to find the best substitution model for phylogenetic analysis. The best model on the basis of BIC was found to be JTT + G. The phylogenetic analysis clearly classified the sequences of the species into their orders. All the sequences were clearly grouped into two clusters. The first cluster represented the Mammalian class and the second cluster was represented by two subclusters of Avian and Reptilian classes with high bootstrap values (Figure 1). Within the mammalian cluster, the artiodactyls were subclustered farthest to the primates and the rodents, lagomorphs, and carnivores were found clustered close to the primates with reliable bootstrap values. The Chiroptera subcluster had a subnode constituting horseshoe bat (Rhinolophus ferrumequinum) and the fruit bats (Pteropus Alecto and Rousettus aegyptiacus) (Figure 1).

Homology Modeling, Docking, and Evaluation of Spike-Binding Parameters of ACE2
Homology modeling was done for all the ACE2 sequences based on the X-ray diffraction structures defined in Protein Data Bank (PDB) database-6LZG, 6VW1, and 6M0J. After homology modeling using SWISS-MODEL, the models (144 = 48 x 3) were validated using SAVES. The homology modeled structures used in this study showed no "error" in PROVE. Most of the homology modeled structures had >90% score in PROCHECK and >95% score in ERRAT2 showing the models were good enough for further analysis. All the models were assigned "PASS" by Verify 3D (Supplementary Table 4).
These models constructed were then studied for their interaction with the spike ACE2-binding domains defined in the same IDs using GRAMM-X (Supplementary Table 5). Out of the five docked complexes tested for each X-crystallography structure, the best three docked complexes were selected based on the delta G and the number of hydrogen bonds. Several spikebinding parameters for these selected complexes-432 were generated in FoldX (Supplementary Table 6). Initially, to classify the infected from the uninfected irrespective of the order(s), unpaired t-test was done. The spike-binding parameters-rootmean-square deviation (RMSD), delta G, intraclashesGroup1, Van der Waals and solvation hydrophobic, and entropy sidechain were found to be significantly different in the infected from the uninfected (Supplementary Tables 7, 8). These parameters were further used to classify an order as infected or uninfected (Supplementary Table 9). None of the parameters could clearly classify the orders to be infected or uninfected, i.e., for RMSD, the orders-Artiodactyla and Testudines, were significantly different from the infected and uninfected; however, the order-Chiroptera was significantly different only from the infected (Figures 2-4). Similar findings were observed with the rest of the significant parameters that were evaluated. This suggested that the use of a single parameter would not help in identifying a species with probable viral entry.

Logistic Regression and Prediction of Viral Entry Probability
The seven different combinations of data used for finding the best combination of X-crystallography models for predicting the viral entry can be accessed through Supplementary Table 7 (for details please refer to materials and methods). On analyzing the data against a single X-crystallography model, i.e., either 6M0J or 6LZG or 6VW1, the number of significant parameters at a 5% level of significance were found to be highest for 6M0J and lowest for 6VW1 ( Table 1). Among these single model combinations, the highest reduction in null deviance and the greatest R square was observed for 6VW1. However, the akaike information criterion (AIC) value was the lowest for 6LZG. On considering the data against two models, the number of significant parameters were found to be highest for both the combinations-6LZG & 6M0J and 6LZG & 6VW1. These two combinations were better than the other combination vis-a-vis most of the evaluation parameters. Between, 6LZG & 6M0J and 6LZG & 6VW1, the former was having the lowest AIC value, the greatest reduction in null deviance, and the lowest p-value that determines a significant reduction in null deviance than the latter. However, the R square was higher in the later than the former. The analysis of data against the three-model combination, 6M0J & 6VW1 & 6LZG, also proved to have good estimates of evaluation parameters ( Table 1). Among all the seven data combinations considered, based on the evaluation parameters, the best three combinations, 6LZG & 6M0J, 6LZG & 6VW1 and 6M0J & 6VW1 & 6LZG, were considered for evaluating the probability of viral entry by partitioning the data as training and test data. The predicted probability of all the infected species was closer to being infected with the data combinations-6M0J & 6LZG followed by 6LZG & 6VW1 and 6M0J & 6VW1 & 6LZG. Similar was the probability for the uninfected species except for a minor difference in S. scrofa. Considering these findings, the prediction equation obtained from the combination of 6M0J & 6LZG was selected for predicting the probability of the rest of the species in this study. The probabilities were predicted using the following equation:    The Hosmer and Lemeshow goodness of fit (GOF) test showed no significant difference between the logistic model and the observed data (p > 0.05) indicating that the logistic model constructed is a good fit ( Table 1). The predicted probabilities are given in Table 2. Within the Order Artiodactyla, all species except Bison bison bison (American bison), Ovis aries (Sheep), and S. scrofa (Pig) had more than 80% probability of viral (SARS-CoV-2) entry using ACE2 as a receptor. In American bison, Sheep and Pig, the probability of virus entry was 0.0036, 24.3, and 18.6%, respectively. In Perrisodactyla, the probability of viral entry was 48% in horses and 79.1% in donkeys. All the carnivores in this study had a high probability of viral entry. In bats, the probability of viral entry was high in all the species. Among the rodents, except for Hamster, mouse and rat had a low probability of virus entry. The lagomorphs-rabbits and American pika had more than 90% probability of viral entry. All the primates had close to a 100% probability of viral entry. The reptiles, Testudines and Crocodilia, showed medium to high probability of viral entry. However, in the viral entry varied of the bird probability, with chicken, golden eagle, and duck having a low probability and white-tailed eagles and turkey having a probability of 73.8 and 81%, respectively. Further, pangolins had a very high probability and African elephants had a very low probability.

DISCUSSION
Recognition of the receptor is an important determinant in identifying the host range and cross-species infection of viruses (24). It has been established that ACE2 is the cellular receptor of SARS-CoV-2 (16). This study is targeted to predict viral entry in a host, i.e., hosts that can be reservoir hosts (Artiodactyla, Perrisodactyla, Chiroptera, Carnivora, Lagomorpha, Primates, Pholidota, Proboscidea, Testudines, Crocodilia, Accipitriformes, and Galliformes) and hosts that can be appropriate small animal laboratory models (Rodentia) of SARS-CoV-2, through sequence comparison, homology modeling of ACE2, docking the modeled homology structures with the spike-ACE2-binding domain and prediction of viral entry. Initially for prediction of probability of viral entry, sequence comparison of ACE2 was done vis-a-vis, within-group distance; the distance of an order from the order primates, the distance of each individual taxa from humans; variability in the ACE2 spike-interacting domain at protein and nucleotide level; and phylogeny. Considering the pandemic nature of the disease in humans, the low within-group distance in primates indicated that all the species considered within the order primates are prone to be equally infected with SARS-CoV-2 as humans. On comparing the orders, Galliformes were most distant from the primates and carnivora was found proximal. This confirms the recent reports of chicken (Galliformes) and ducks (Anseriformes) not being infected with SARS-CoV-2 (22), and tigers and lions being infected (12). On comparing individual hosts, the pig was found to be the established taxa that is uninfected with SARS-CoV-2 (22). Considering the distance of pig from Homo sapiens as a cutoff, it would include all the carnivores, perissodactyls, and few artiodactyls, viz., goat, buffalo, bison, and sheep, to be infected, but, excludes cattle (Artiodactyla), all the bats (Chiroptera), and birds (Galliformes, Anseriformes, and Accipitriformes). Further, the negative selection observed on a codon-based test of neutrality, indicates that the variation at the nucleotide level, is translated synonymously, indicating that the structure of ACE2 is conserved through the process of evolution. The comparison of the spike-binding domains across all the orders also did not lead to meaningful conclusions on viral entry in different species.
On phylogeny, subclustering of the rodents, lagomorphs, and carnivores close to primates with reliable bootstrap values partially corroborates with the occurrence of SARS-CoV-2 infection in carnivores (22), as mice were found not to be infected with SARS-CoV-2 (16). Further, subclustering of fruit bat with horseshoe bat suggests possible entry of the virus in fruit bat, as COVID-19 outbreak in Wuhan in December 2019 was traced back to have a probable origin from horseshoe bat (16). The virus strain RaTG13 isolated from this bat was found to have 96.2% sequence similarity with the human SARS-CoV-2. These results again led to no concrete conclusions on viral entry in various hosts. Therefore, to assess the probability of viral entry in various hosts, after homology modeling of ACE2 and docking the modeled homology structures with the spike-ACE2-binding domain, 32 spike-binding parameters were evaluated.
A total of 9 data for each host for each spike binding parameter as described in the materials and methods are available to select the parameters that would clearly classify the Orders into infected/uninfected. However, none of the 6 parameters-RMSD, delta G, intraclashesgroup1, van der Waals, solvation hydrophobic, and entropy sidechain-that were significantly different in the infected from the uninfected could classify the orders into infected or uninfected. This suggests that a single parameter at a time, as has been considered in recent reports (21), may not be considered and evaluated for estimating the probability of virus entry. Therefore, logistic regression with all the estimated parameters was done with seven different combinations of data to predict the probability of viral entry. The best combination of X-ray crystallography  models was identified based on evaluation parameters-number of parameters significant in the model at 1% LS, number of parameters significant in the model at 5% LS, McFadden R 2 , null deviance, residual deviance, AIC, p-value of the chi-square statistic associated with the null deviance model, p-value of the chi-square statistic associated with the residual deviance model, p-value to determine whether there is a significant reduction in deviance from null to residual, and Hosmer and Lemeshow GOF test.
McFadden R 2 is a measure of fit in statistical modeling (25). However, this can be used only to compare models with the same number of covariates, i.e., this increase with an additional covariate. AIC is used to compare models fitted over the same datasets. Lower the AIC better is the model and better is the fit (26). Significant reduction in the null deviance is assessed by the change in the p-value of the chi-square statistic associated with the null deviance model to the p-value of the chi-square statistic associated with the residual deviance model. This can be further determined by the p-value that determines whether there is a significant reduction in deviance from null to residual. A non-significant p-value on Hosmer and Lemeshow GOF test indicates that there is no evidence that the model is not fitting well with the data considered. All these parameters were relatively better for the data against the combinations-6LZG & 6M0J, 6LZG & 6VW1, and 6M0J & 6VW1 & 6LZG than the other four combinations. The number of significant parameters at 1 and 5% levels of significance was greater in these combinations than the other four. The reduction in null deviance was found to be highly significant in 6M0J & 6VW1 & 6LZG followed by 6LZG & 6M0J and 6LZG & 6VW1. Considering several criteria as mentioned, the data against these models were finally considered to predict the probability of viral entry on the test data and the prediction accuracy was found to be higher for the data against 6LZG & 6M0J.
Root-mean-square deviation was the most significant parameter among the 32 spike-binding parameters of ACE2 in all the logistic models considered (Data Sheet 4). RMSD measures the degree of similarity between two optimally superposed protein three-dimensional (3D) structures (27). The smaller the RMSD between two structures, more similar they are. Docking predictions within an RMSD of 2 Å are considered successful, whereas values higher than 3 Å indicate docking failures (28). The average RMSD in the infected and uninfected known hosts was 0.068 and 0.113, respectively. In all the logistic models, the coefficient (i.e., the log of odds ratio) of RMSD was negative, indicating that RMSD is negatively connected with infection. This means that the increase in RMSD would lead to higher odds of not getting infected. In the combination that is finalized (i.e., combination of 6LZG & 6M0J) for predicting the probability of viral entry, the coefficient of RMSD was −5.575e+01. Further, the deviance residuals for this logistic model from this combination were symmetric as indicated by the median (0.01172), which is close to zero. The AIC for this selected combination is 64.348. Further, there was also a significant reduction in null deviance with an R-square of 0.652. The prediction equation on analysis of these data against the combination 6LZG & 6M0J was used to predict the probability of viral entry in various hosts.
As observed in this study, it has been predicted that Bos indicus (Indian cattle) and Bos taurus (Exotic cattle) can act as intermediate hosts of SARS-CoV-2 (29) and that pigs are not susceptible (22). Also, Camels, which are reported to be infected with SARS-CoV (30), are equally capable of SARS-CoV-2 infection. Among the rodents, hamsters had the highest probability of viral entry. It has been established that SARS-CoV-2 effectively infects hamster (31), and rats and mice were found less probable (32). All the Carnivores in the study had a high probability of viral entry. Reports of SARS-CoV2 infection in cats (22), tigers, and lions (12) substantiate our estimates obtained in the study. Rabbits also had a high probability of viral entry showing concordance to the recent evidence of SARS-CoV-2 replication in rabbit cell lines (33). All the primates close to the human species were identified to be highly probable. The variability within the Order(s) must be the reason for not being able to classify them as a group, to either being infected or uninfected using an unpaired t-test.

CONCLUSION
Most of the species considered under different orders, in this study, showed a high probability of viral entry. The findings hint toward the probable hosts that can act as laboratory models or as reservoir hosts and allow us to take a cue about the probable pathogenic insult that can be caused by SARS-CoV-2 to different species. This, however, warrants further research. Also, viral entry is not the only factor that determines infection in COVID-19 as viral loads were found to be high in patients with asymptomatic (34,35). The important factors that determine disease/infection  in host(s) are-host defense potential, underlying health conditions, host behavior and number of contacts, age, atmospheric temperature, population density, airflow and ventilation, and humidity (36).

MATERIALS AND METHODS
Sequence analysis, phylogenetic analysis, homology modeling of ACE2, docking the modeled homology structures with the (F) Entropy side chain-no significant difference on comparison of Chiroptera with the infected and uninfected groups. **Significance at p < 0.01; *Significance at p < 0.05 after unpaired t-test on comparing two groups at a time. spike-ACE2-binding domain, and prediction of viral entry were done in this study (Figure 5).

Sequence Analysis
In this study, 48 (mammalian, reptilian, and avian species) ACE2 complete/partial protein and nucleotide sequences available on the National Center for Biotechnology Information (NCBI) were analyzed (Supplementary Table 1) to understand the possible difference(s) in the ACE2 sequences that may correlate with SARS-CoV-2 viral entry into the cell. The partial sequences are considered in the study after ensuring that these sequences completely cover the spike interacting domain of ACE2. Within the mammalian class, orders-Artiodactyla, Perrisodactyla, Chiroptera, Rodentia, Carnivora, Lagomorpha, Primates, Pholidota, and Proboscidea; within the Reptilian class, orders-Testudines and Crocodilia; and within the Avian class, orders-Accipitriformes, Anseriformes, and Galliformes, were considered in this study. These orders were considered keeping in view all the possible reservoir hosts/laboratory animal models that can possibly be infected with the SARS-CoV-2. The withinand between-group distances were calculated in Mega 6.0 (37). The ACE2 sequences in this study are compared as a group (average of the order) with the average of all the species in the order Primates or individually with the Homo sapiens ACE2 sequence. The codon-based Z test of selection [strict-neutrality (dN = dS)] to evaluate synonymous and non-synonymous substitutions across the ACE2 sequences among the orders was done. Further, for comparing the sequence of the spikeinteracting domain, this was identified to be defined in the

Phylogenetic Analysis
Phylogenetic analysis of the protein sequences was done using MEGA 6.0 (37). Initially, the sequence alignment was done using Clustal W (38). The aligned sequences were then analyzed for the best nucleotide substitution model on the basis of Bayesian information criterion scores using the JModelTest software version 2.1.7 (39). The tree was constructed by the Neighborjoining method with the best model obtained using 1,000 bootstrap replicates. It is important to note that the missing data or gaps are treated in this analysis by using pair-wise deletion.

Homology Modeling
The structures of novel CoV spike receptor-binding domain complexed with its receptor, ACE2, that were determined through X-ray diffraction, are available at PDB database with IDs 6LZG (40), 6M0J (41), and 6VW1 (42). These available ACE2 models from PDB database were used for homology modeling using SWISS-MODEL (43), which was accessed through the ExPASy web server. The models (144 = 48 × 3) were validated through SAVES. SAVES is a conglomerate of different validating algorithms like PROCHECK, VERIFY 3D, ERRAT2, and PROVE (44). The models are assigned "PASS" by Verify 3D when more  than 80% of the amino acids have scored ≥ 0.2 in 3D/1D profile.
In the case of ERRAT2, models that scored more than 95% are considered to have good resolution. PROVE gives: error (>5%), Warning (1-5%), or Pass (<1%) based on % of buried atoms. From PROCHECK, Ramachandran plot with over 90% of the residues in core regions is considered to be a good model.

Protein-Protein Docking
The spike ACE2-binding domains of 6LZG, 6M0J, and 6VW1 were used in docking along with the respective homology modeled structures of ACE2 protein of all the hosts, i.e., ACE2 of 48 hosts as a receptor and spike ACE2-binding domain of SARS-CoV-2 as a ligand for protein-protein docking. GRAMM-X docking server was used for protein-protein docking, which generated a docked complex (45). Five docked complexes were generated from GRAMM-X for each X-ray crystallography model in each species and postdocking analyses were carried out using Chimera software (46) and PRODIGY (47). A total of 720 models (48 hosts × 3 X-ray crystallography models × 5 docking complexes) were analyzed. Chimera is an extensible program for interactive visualization and analysis of molecular structures for use in structural biology. Chimera provides the user with highquality 3D images, density maps, trajectories of small molecules, and biological macromolecules, such as proteins. The number of hydrogen bonds in each docking structure was estimated using Chimera and the delta G of the docked models was estimated using PRODIGY. Out of the five docked complexes generated through GRAMM-X, three best complexes for each host under each X-crystallography structure were selected (432 model = 48 × 3 × 3) for further analysis based on delta G and number of hydrogen bonds (Data Sheet 3; Supplementary Table 6). The docked models are expected to differ from the real structure and the differences are quantified by RMSD. To estimate RMSD, the three best-docked complexes of each X-ray crystallography model in each species were compared with the respective models-6LZG/6M0J/6VW1 using Chimera. Further, in addition to delta G and RMSD, in FoldX software (48) several parameters were estimated for all these selected docked structures for 432 models (48 hosts × 3 X-ray crystallography models × 3 selected docking complexes) were analyzed. These parameters include-IntraclashesGroup1, IntraclashesGroup2, Interaction Energy, Backbone Hbond, Sidechain Hbond, Van

Statistical Analysis for Prediction
Till date, clear-cut information of 15 species that are either infected or uninfected with SARS-CoV2 is available (Supplementary Table 7). For each of these species, a total of nine models with their parameters were taken for the analysis, i.e., for each species, the three selected docked structures for each of the X-ray crystallography structures were selected (Data Sheet 3). A total of 135 data per parameter (15 hosts × 3 X-ray crystallography models × 3 selected docking complexes) were analyzed. Initially, for each parameter (spike-binding parameters of ACE2), the difference between the infected and uninfected is evaluated using unpaired t-test in GraphPad Prism 7.00 (GraphPad Software, La Jolla, California, USA). Welch correction was applied wherever necessary. For those parameters that were significant, the difference between order(s) means and the infected/uninfected groups was also further evaluated using unpaired t-test (Note: if a species is included in the infected/uninfected group, the same is not included in its Order on comparing the order(s) with the infected/uninfected group) (Supplementary Table 9 for more information).
Later, the backward stepwise logistic regression model was constructed on all the 32 parameters (29 from FoldX, RMSD, H bonds, and delta G) estimated above in the 15 known species of infected (11) and uninfected (4) (Supplementary Table 7). A total of 135 data per parameter were available across the three Xray crystallography structures considered. These data were used in seven different combinations based on the combination of Xray crystallography structures. The seven combinations include data against single model-6LZG, 6M0J, and 6VW1 (i.e., 45 data); data against two models-6LZG and 6M0J/6LZG and 6VW1/6M0J and 6VW1 (i.e., 90 data); and data against all the three models-6LZG and 6M0J and 6VW1 (i.e., 135 data). These seven combinations were evaluated based on the estimates of number of parameters significant in the logistic model at 1% LS, number of parameters significant in the logistic model at 5% LS, McFadden's R2, null deviance, residual deviance, AIC, p-value of the Chi-sq statistic associated with the null deviance model, p-value of the chi-square statistic associated with the residual deviance model, p-value to determine whether there is a significant reduction in deviance from null to residual, Hosmer and Lemeshow GOF test. After selecting the best combination(s), the best model (prediction equation) was selected after evaluation of the training and test data sets for each of the combinations. This prediction equation from the best combination of data was used to predict the probability of viral entry in the rest of the species using the average values of the top three models for all the parameters in the equation.
Further, with 32 parameters, the minimum sample size required to derive statistics that represent each parameter, is 1,700 (49) [n =100 + xi, i.e., here:-n = 100 + (50 × 32) = 1,700, with a minimum of 50 events per parameter]. The data were needed to be extrapolated to at least 1,700 to predict the CIs. This was based on the assumption that the ACE2 structure and sequence are conserved within a species. For the species-Homo sapiens, we compared several ACE2 sequences and found that all the compared sequences were identical. With this assumption that the spike-binding Parameters of ACE2 within a species are conserved and due to the pandemic nature of the disease, the data were extrapolated.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
MP performed sequence alignment and phylogeny of nucleotide and amino acid and drafted the manuscript. PG, SS, VK, and NT performed protein modeling and docking and estimated the different parameters from FoldX. RK retrieved the amino acid and nucleotide sequences and edited the manuscripts. MP, GK, and BM edited and proofread the manuscript. RG did complete statistical analysis and manuscript development. TM, SM, RS, RG, and BPM conceptualized and planned the entire study. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We are grateful to Director NIAB and Director IVRI for their support.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed. 2021.775572/full#supplementary-material Data Sheet 1 | Nucleotide sequence alignment of the coding sequence (CDS) region of angiotensin-converting enzyme 2 (ACE2). The shaded regions show the spike interacting domains.
Data Sheet 2 | Protein sequence alignment of ACE2. The shaded regions show the spike interacting domains.
Data Sheet 3 | Depiction of numbers of models considered in this study showing the number of values per parameter. For each species, the ACE2 sequence is homology modeled against the three X-crystallography structures-6M0J, 6LZG, and 6VW1. The spike ACE2-binding domain of each of the X-crystallography structures is docked with its homology modeled ACE2 and 5 docked complexes were evaluated to select the top three models. This leaves us with 9 values for all the spike binding parameters for further analysis.
Data Sheet 4 | Details about the commands used and results obtained after testing different combinations of models.