Structural and Evolutionary Insights Into the Binding of Host Receptors by the Rabies Virus Glycoprotein

Rabies represents a typical model for spillover of zoonotic viral diseases among multiple hosts. Understanding the success of rabies virus (RV) in switching hosts requires the analysis of viral evolution and host interactions. In this study, we have investigated the structural and sequence analysis of host receptors among different RV susceptible host species. Our extensive bioinformatic analysis revealed the absence of the integrin plexin domain in the integrin β1 (ITGB1) receptor of the black fruit bats in the current annotation of the genome. Interestingly, the nicotinic acetyl choline receptor (nAChR) interaction site with the glycoprotein (G) of RV was conserved among different species. To study the interaction dynamics between RV-G protein and the RV receptors, we constructed and analyzed structures of RV receptors and G proteins using homology modeling. The molecular docking of protein-protein interaction between RV-G protein and different host receptors highlighted the variability of interacting residues between RV receptors of different species. These in silico structural analysis and interaction mapping of viral protein and host receptors establish the foundation to understand complex entry mechanisms of RV entry, which may facilitate the understanding of receptor mediated spillover events in RV infections and guide the development of novel vaccines to contain the infection.


INTRODUCTION
Rabies is a lethal zoonotic viral disease which causes serious behavioral changes and neurological disorders in a wide range of hosts with a high fatality rate of up to 100% (Hueffer et al., 2017). Rabies virus (RV) is an enveloped negative-stranded RNA virus and belongs to the family Rhabdoviridae with bullet-shaped virion particles with a size of~200 nm. The viral genome encodes five transcriptional units for nucleocapsid protein (N), phosphoprotein (P), matrix protein (M), glycoprotein (G), and RNA-dependent RNA polymerase or large protein (L) (Jackson, 2013). Viral RNA is encapsulated by N protein which forms the ribonucleoprotein (RNP) and acts as a template for viral replication and transcription. The RNP together with P and L form the viral replication complex, which is surrounded by a lipid bilayer containing the viral G protein protruding as spikes from the viral surface. The M protein has been proposed to bridge the RNP and the cytoplasmic domain (CD) of G protein to form the bullet-shaped virion (Pulmanausahakul et al., 2008).
Owing to the location of G protein on the viral surface, it is considered the major determinant of tissue tropism. The RV exhibits a broad host spectrum, highlighting the importance of G protein in interacting with multiple host receptors (Jackson, 2013). The RV-G protein is a type I membrane glycoprotein which is translated on membrane-bound ribosomes and inserted cotranslationally into the endoplasmic reticulum (ER) in an unfolded form. The folding of transmembrane proteins occurs in three topologically and biochemically distinct environments: the ER lumen, the ER membrane, and the cytosol. The structural organization of RV-G protein indicated the constitution of three domains; ectodomain, transmembrane domain (TMD), and the cytoplasmic domain which can fold independently of each other (Maillard and Gaudin, 2002). Three different states have been demonstrated for the G protein. The native state (referred as "n") is detected at the virus surface and is known to be responsible for receptor binding. The activated hydrophobic state (A) interacts with the target membrane as the primary step in the fusion process, and the fusion-inactive conformation state (I) (Gaudin et al., 1999). These distinct states are governed by pH equilibrium, where the I state is triggered by low pH, forming a more elongated conformation than that in the n state, rendering them antigenically different (Gaudin et al., 1999). Cleavage of the signal peptide results in the formation of the mature protein. The G protein undergoes cellular modification processes, whereby carbohydrates (glycans) are attached to specific amino acid side chains in protein. Appropriate folding of G glycoprotein is mainly dependent on the N-glycosylation sites, which increase the solubility of folding intermediates and facilitate the interaction with chaperones (Wojczyk et al., 2005).
Cellular receptors are regarded as the primary pathway through which viruses can gain access to the host. The ways by which viruses can unlock the host cells using the viral attachment proteins is considered the most fundamental aspect in viral invasion of the host cells. The RV interacts with a wide range of receptors by which it enters cells of different host species. Successful infection occurs only when a receptor can initiate the full viral life cycle from cell recognition to release the genome into the cell for replication and protein synthesis. Elucidating the preferences of RV in binding and attaching to certain types of host receptors is a point of interest which may provide insights into how entry mechanisms may be targeted therapeutically (Maginnis, 2018). Following attachment of the RV-G protein to host receptors, the virus is internalized when it reaches the endocytic zones. RV is transported into clathrin-coated pits by the filopodia. The filopodia are actin-enriched cell surface protrusions with which the cell probes the extracellular environment (Xu et al., 2015).
Nicotinic acetyl choline receptor (nAChR), a well-known host receptor for RV, is a pentameric ligand-gated ion channel which is present in neuromuscular junctions, mediating intraneuronal communication in central and peripheral nervous systems (Lafon, 2005). Recently, it has been observed that cells lacking nAChR are still susceptible to RV infection. Additionally, RNA interference (RNAi)-mediated depletion of different receptors was carried out to explore the role of the various RV receptors. The most recently discovered RV receptor is integrin beta 1 (ITGB1) (Shuai et al., 2019). ITGB1 is a transmembrane cell surface receptor and consists of one a and one b subunits. The expression of ITGB1 is predominantly observed in the skeletal muscle. The cell lines expressing ITGB1 are human embryonic kidney cells (HEK293) and mouse neuroblastoma cells (N2a) (Shuai et al., 2019). Another novel RV receptor, metabotropic glutamate receptor subtype 2 (mGluR2), has recently been identified. It belongs to the G protein-coupled receptor family that is abundant in the central nervous system. Several human and murine cell lines (i.e., HEK293, N2a and neuroblastoma cells SK-N-SH, SK cells) express mGluR2 receptors and can be successfully infected with RV (Wang et al., 2018). Neural cell adhesion molecule (NCAM) is another well-known RV receptor which is a cell adhesion glycoprotein of the immunoglobulin superfamily, and it is mainly concentrated in synaptic regions and at the neuromuscular junction (NMJs). Its main role is mobilization and cycling of synaptic vesicles in addition to synaptogenesis (Lafon, 2005). Additionally, other components of the cell membrane, such as gangliosides and heparin sulfate were recognized to play a role in the entry of RV to hosts (Lafon, 2005).
Despite the availability of substantial information on the RV infectivity, there is a gap in understanding the mechanism by which the G protein can interact with different receptors for initiation of the infection. In the current study, we aim to underpin the evolutionary differences among different RV receptors in humans, dogs, and bats with disclosure of the possible in silico-predicted mechanisms by which the G protein can interact with the different receptors in different host species. The presented data provide insights into these interactions to open avenues for the prevention and control of RV infection in future.

Construction of Data Sets
To investigate the differences among RV receptors in different species, protein sequences were retrieved from NCBI (Agarwala et al., 2018) by BLAST search. All sequences for each of the proteins were downloaded and collated in a FASTA format.

3D Structure Model Building and Quality Assessment
The 3D structure models for RV receptors (ITGB1, mGluR2, nAChR, and NCAM proteins), RV-G protein of Egyptian strain (QEU57979.1), and RV-G protein of bat-related group (BAE95290.2) were generated using I-TASSER (Yang et al., 2010). The 3D structures generated by I-TASSER were based upon threading, fragment assembly, and iteration. The best model was selected according to the confidence score (C-score) which represented the quality of predicted models by I-TASSER. The Cscore range was between (−5 and −2), where the higher the Cscore, the higher the confidence of a model and vice versa. After predicting the protein model, structure and stereochemical analyses were performed and the predicted 3D structures were visualized and annotated using PyMOL software (DeLano, 2002).

Molecular Docking Simulations
The predicted structures were used for protein-protein docking studies using GRAMM-X software (Tovchigrechko and Vakser, 2006). Docking studies were performed for the Egyptian RV-G protein against each of RV receptors from human, dog, and black fruit bat. In addition, in silico interactions between RV-G protein related to bat strain were mapped with different receptors.

Analysis of the Docking Complex
The docking complexes obtained from GRAMM-X were uploaded to PDBsum (Laskowski et al., 2018) and PDBePISA (Battle, 2016) servers for analysis of the protein-protein interactions. Identification of hydrogen bonds, interacting interfaces, nonbonded contacts, salt bridges, Gibb' free energy of binding (DG int , kcal/mol), pores, and tunnels in protein complexes were carried out. Mapping of the docking complexes was performed using PYMOL software (DeLano, 2002).

Computational Analysis of Primary Structure of RV Receptors
The physicochemical properties of ITGB1, mGluR2, nAChR, and NCAM were compared among different species (human, dog, and black fruit bat) based on their amino acid (a.a.) composition as summarized in Table 1. Investigation of the hydrophobic nature of the RV receptors was assessed by GRAVY values. The range of GRAVY values for ITGB1 and NCAM were negative, revealing their overall hydrophilic nature. In contrast, positive GRAVY values identified in mGluR2 and nAChR proteins indicated their hydrophobic nature. To test the stability of different receptors, the instability index values were estimated. Instability indices for mGluR2, nAChR, and NCAM were all below 40, indicating the stability of these proteins in all species examined. However, the ITGB1 in human and dogs were considered relatively unstable, compared with ITGB1 from black fruit bat which had an instability index below 40. The number of amino acid residues, signal peptide, transmembrane, and low complexity regions were predicted and are summarized in Table 1. Interestingly, the signal peptide in black fruit bat ITGB1 was missing according to SMART and SignalP databases in the sequence that is available in the current genomic annotation.

Domain Organization of Different Receptors
A schematic representation of the domain organization of human ITGB1, mGluR2, nAChR, and NCAM receptors are shown in Figures 1A-D, respectively. The 3D structures of the receptors highlighting each domain are also shown (Supplementary Figures S1A-D). Analysis of the human ITGB1 a.a. sequence ( Figure 1A) revealed six distinct domains. The integrin plexin domain, a short disulfide-rich domain (a.a. from 25 to 76) located at the N-terminus of integrin beta chains, was also present in dog ITGB1, but interestingly was the only domain absent in black fruit bat (Supplementary Figure S1A).  Figure 1B). The large extracellular region is the ligand binding domain of the group II metabotropic glutamate receptor (a.a. 6-458). The cysteine-rich domain (CRD) (a.a. 469-546) is characterized by highly conserved cysteine residues forming disulfide bridges. Linked to the CRD domain is a transmembrane domain composed of seven transmembrane helices (7 TMD) (a.a. 567-833 a.a.). All domains were similarly present in dogs and black fruit bat. The domain organization analysis of the nicotinic acetyl-choline receptor nAChR ( Figure 1C) demonstrated a large conserved extracellular domain (a.a. 22-256). Four transmembrane regions named as neurotransmitter gated ion channel (a.a. 263-468) were followed by a cytoplasmic loop (Albuquerque et al., 2009). A similar domain arrangement was also present in dog and black fruit bat within the nAChR. Analysis of the neural cell adhesion molecule (NCAM) domain architecture revealed an extracellular portion of NCAM which is composed of five N-terminal Ig-like domains and two fibronectin type III domains which form a dimeric glycoprotein composed of disulfide-linked subunits. NCAM domain arrangement was the same in dog and black fruit bat ( Figure 1D).

Protein Sequence Alignment
Analysis of sequence similarities of receptors among different species was crucial for mapping the differences within the interaction site of the receptors with the RV-G protein. For each of the four RV receptors, protein sequences from human, dog, and black fruit bat were aligned (Figures 2A-D). Analysis of  the ITGB1 protein sequences from human, dog, and black fruit bat ( Figure 2A) revealed that the VWA domain represented the most variable region among species, where dog showed 15 a.a. differences in comparison with human and 18 a.a were variable between black fruit bat and human. The area of greatest homology among the ITGB1 protein was the EGF-1 domain where only one a.a. varied between species. There were five different a.a. residues in the integrin plexin domain between human and dog. Similarly, within the EGF-2 domain, four and five a.a. residues showed differences in black fruit bat and dog, compared with human. Both dog and black fruit bat showed nine a.a. residue difference in the tail domain relative to human. The differences in these residues (1-728 a.a) may be crucial, since the ITGB1 ectodomain constitutes the interaction site with the RV-G protein, as was pointed out in a previous study (Shuai et al., 2019). Our results highlight the sequence homology within the mGluR2 protein alignment among the different species ( Figure 2B). The ligand-binding domain showed the least sequence similarity between species where eight and 10 a.a. residues were different in black fruit bat and dog, respectively, in comparison with human. The CRD domain only showed three a.a. differences between black fruit bat, dog, and human. Only four amino acid residues were different in the transmembrane domain between human, dog, and black fruit bat. Considering that the binding site of RV-G with nAChR was previously mapped to be on the ax subunit between residues 173 and 204 (Lafon, 2005), one major finding in our study ( Figure 2C) was the high conservation of this region among human, dog, and black fruit bat. The interaction site of NCAM with RV-G protein has not been previously determined. Our results showed the a.a. differences of NCAM domains among the different species. As shown in Figure 2D, the first immunoglobulin domain and second and third immunoglobulin I-set domains were the most conserved regions among all species. On the other hand, the first immunoglobulin I-set domain showed the most variable region among the three species where eight and 10 a.a. residues differ in black fruit bat and dog, respectively, compared with human. Black fruit bat NCAM displayed the highest variability in its fibronectin type III domain where 26 residues were variable, while only one residue differed in dog with respect to human.

Analysis of the Predicted 3D Structures of Receptors
The 3D structures of different receptors were predicted using the ITASSER online server. The best predicted model structures were chosen according to the maximum confidence score which was calculated according to threading templates significance (Supplementary Figures S1A-D). The C-score ranged from −5 to −2 (the higher the value, the higher the confidence and vice versa). For human ITGB1, a model with a C-score of 0.40 was selected; for dog ITGB1, a model with a C-score of 0.37 was chosen; and for bat, a model with C-score of 0.23 was selected. For mGluR2, 3D structures of the dog (C-score, −0.13) human (C-score, −0.13), and black fruit bat (C-score, −0.19) were chosen.  For nAChR, the best model in human had a C-score of −0.37, dog with a C-score of 0.05, and black fruit bat with a C-score of 0.35. NCAM C-scores of 3D structures were as follows: human, −0.69; dog, −0.79; and black fruit bat, 0.10.

Protein-Protein Interaction Prediction
To elucidate the mechanism by which the RV-G protein interacts with different receptors, protein-protein docking was performed using GRAMMX. Analysis of the docking complexes was resolved through PDBsum and PDBePISA servers. Additionally, mapping the interacting hydrogen bonds, salt bridges, and the DG int was also performed ( Table 2). The DG int value which expresses the solvation free energy gain upon assembly formation (total solvation energies of assembled structures-solvation energies of isolated structures) was determined as well (Pantsar and Poso, 2018). The RV-modeled 3D structure of Egyptian strain G protein was utilized to undertake the docking against different receptors. The docking complex of human ITGB1 and Egyptian RV-G protein indicated five interactions mediated by hydrogen bonds between residues of ITGB1 VWA and EGF-1-like domain with the RV-G protein in addition to formation of one salt bridge between Glu 340 of human ITGB1 and His 438 of the RV-G protein ( Figure 3A). A relatively more stable docking complex between dog ITGB1 VWA domain and RV-G protein of DG int −20.8 kcal/mol was mapped ( Figure 3B). The stability of the docking complex might be due to formation of four hydrogen bonds between VWA domain of dog ITGB1. Four salt bridges between Lus 156 , Asp 287 , Glu 340 , and Glu 347 in dog ITGB1 and Asp 429 , His 105 , His 438 , and Arg 103 in the RV-G protein were identified. Predicted interactions of the integrin beta tail domain from black fruit bat ITGB1 showed bonding with RV-G protein through three hydrogen bonds ( Figure 3C). Our results showed that the G protein ectodomain is responsible for binding to ITGB1 in different hosts (Figures 3A-C). Our modeling of the interaction between mGluR2 in human and dog with RV-G protein showed that the interactions were only mediated by the hydrogen bonds in the seven-transmembrane domain of mGluR2 ( Figures 4A, B). Intriguingly, 10 hydrogen bonds were at the interface between the ligand-binding domain of mGluR2 from black fruit bat and the RV-G protein ( Figure 4C), along with formation of four salt bridges between Lys 24 , Arg 107 , and His 129 in black fruit bat mGluR2 and Glu 430 , Asp 427 , and Asp 420 in the RV-G protein. Interestingly, the G protein ectodomain, transmembrane, and cytoplasmic domains all appear to play a role in interactions of G protein with mGluR2 in different hosts (Figures 4A-C). The nAChR extracellular domain of human and black fruit bat interacted with RV-G protein through three hydrogen bonds ( Figures 5A, B). Surprisingly, the docking complex with dog nAChR showed neither hydrogen bonds nor any salt bridges with the RV-G protein which needs further investigation. It was observed that the G protein ectodomain was responsible for interactions with nAChR in human and bat ( Figures 5A, B). Two hydrogen bonds mediate the interaction between human NCAM and RV-G in the docking complex ( Figure 6A). A salt bridge was noticed between residues Arg 177 in human NCAM and Asp 429 within the RV-G protein.
Modeling of the dog NCAM-RV-G docking complex demonstrated nine hydrogen bonds ( Figure 6B). A total of five hydrogen bonds were mapped in the docking complex between black fruit bat NCAM and RV-G protein ( Figure 6C). The RV-G ectodomain was the interacting part with human and dog NCAM, while both the RV-G ectodomain and cytoplasmic domains may interact with bat NCAM. Interestingly, we have performed further docking analysis and compared the interaction of bat ITGB1 with the G protein of the bat RV group as shown in Figure 7. Our results show the formation of four hydrogen bonds between bat ITGB1 and batrelated RV-G protein ( Figure 7A), while only two hydrogen bonds were mapped between bat ITGB1 and the G protein of the dog-related group ( Figure 3C). To elucidate if the interaction of human ITGB1 will be stronger with bat-or dog-related RV-G proteins, we modeled the docking complex of human ITGB1 with bat RV-G protein ( Figure 7B). The best model predicted an interaction mediated by only one hydrogen bond between human ITGB1 and bat RV-G protein, in contrast to the five hydrogen bonds created upon interaction of human ITGB1 with dog RV-G protein ( Figure 3A). For checking the accuracy of the docking results, we have tested if nonsusceptible host to RV (chicken) will show any interacting residues with G protein. Two docking complexes were modeled as follows: chicken nAChRdog G protein and chicken ITGB1-dog G protein. Unexpectedly,  chicken nAChR-dog G protein complexes showed seven hydrogen bonds ( Figure 7C). While in chicken ITGB1-dog G protein, four hydrogen bonds were mapped ( Figure 7D).

DISCUSSION
Little is known about the mechanisms by which RV crosses species barriers. Studying the differences among RV receptors in different species through which the RV is capable to jump among different host species will provide novel insights into controlling spillover events. The capability of RV-G protein to bind wide range of receptors from different protein families and expressed by diverse cell types remain elusive and puzzling (Lafon, 2005;Wang et al., 2018;Shuai et al., 2019). In order to unravel some of these mechanisms, we performed structural and protein-protein interaction analysis of all known RV receptors and investigated the possible mechanisms by which the G protein may enter into diverse cell types in different host species. Interestingly, some differences were observed especially for the ITGB1 domain organization of black fruit bat which showed absence of integrin plexin domain and signal peptide ( Table 1). Signal peptides have an important role in protein sorting and localization and its absence is a matter of interest and warrant future investigation. The absence of an N-terminal signal peptide among orthologous proteins might be linked to the absence of the integrin plexin domain (N-terminal domain) in ITGB1 from black fruit bat in comparison with ITGB1 in other species (Hönigschmid et al., 2018). On the molecular level, proteinprotein interactions are the basis of life. Mapping and modeling the protein interactions through computational approaches (docking) can improve our understanding of the interactions occurring in vivo, though with less accuracy (Tovchigrechko et al., 2002). In the current study, we analyzed the proteinprotein interactions between the RV-G protein and ITGB1 in different species (Figures 3A-C). Results obtained from ITGB1 and RV-G protein docking are consistent with previous studies which identified that the interaction site between ITGB1 and RV-G protein within residues 1-728 a.a on the ITGB1 ectodomain (Shuai et al., 2019). Also, our analysis showed that the G protein ectodomain was the interacting site with ITGB1 in human, dog, and bat. Regarding mGluR2, none of the previous studies have identified the interaction site with RV-G protein. Our analysis for the docking results indicated the importance of the transmembrane domain specifically in humans and dogs to interact with RV-G protein. In contrast, in black fruit bat mGluR2, the hydrogen bond with RV-G protein was within the ligand binding domain. The diversity in orthologous protein domains interacting with RV-G protein is plausible. Despite the sequence similarity shared by proteins, divergent functions and interactions are commonly observed (Hönigschmid et al., 2018). Although, the interaction site for nAChR with RV-G protein has been proposed to be within 173-204 a.a region (Lafon, 2005), our results predict different interacting a.a. residues, however within the same domain. These findings highlight the importance of future in vitro and in vivo studies to gain further molecular mechanistic insights. The interaction site of NCAM with the RV-G protein has not been determined in previous studies. Our results demonstrated that hydrogen bonds bonded mainly within immunoglobulin-like domains and fibronectin III-like domain which may define the interaction between the virus and cell receptor. Our focus in hydrogen bond mapping within interaction complexes was primarily due to their known roles in improving the stability of the interacting protein complexes (Nilofer et al., 2017). Differences of the binding a.a. residues between the two docking complexes: (bat ITGB1-bat RV-G) and (human ITGB1-bat RV-G) may suggest the involvement of the residues near the conserved a.a. in bat-related RV-G mentioned above (Figure 3) in binding to bat receptors. Since virus attachment to cellular receptors is considered only the preliminary step for viral infection, this might explain our prediction that the RV-G protein can bind to chicken receptors, even though chickens are not hosts of RV infection. Therefore, we can hypothesize that penetration or other steps involved in viral infection might be the barrier to infect chickens and other non-host species. In the current study, predicted protein-protein interactions were performed with Gramm-X software which is based on rigid body docking utilizing the Fast Fourier transform (FFT) algorithm (Tovchigrechko and Vakser, 2006). The FFT algorithm allows the determination of the best surface match between molecules based on shape complementarity (Tovchigrechko and Vakser, 2006). This method has clear limitations represented in reduced accuracy due to large conformational changes formed upon binding of protein complexes (Desta et al., 2020). In addition to the possibility of large movements during binding which ultimately may result in transient or weak binding (Pons et al., 2010). Moreover, the reliability of docking on structural models of proteins generated by computational analysis render them more prone to errors (Szilagyi and Zhang, 2014). Besides the mentioned limitations, Gramm-X software does not allow for the selection of specific glycosylation sites in modulating the interactions between RV-G and receptors. Thus, the stability of the generated docking complex might have been affected. Since, anticipating the glycosylated sites might have resulted in higher free energy which is ultimately known to affect the stability of docking complex (Shental-Bechor and Levy, 2008). Taken together, our in silico analysis, unraveled some of the most crucial receptors utilized by RV for entry purposes. These analyses establish the foundations for future research to understand the preference and mutual importance of each of these receptors for the entry mechanisms of RV. Additionally, these structure-guided insights will establish a foundation on the host-specific differences that may help to understand the spillover of the RV among different hosts.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material. Residues involved in hydrogen bonds within the docking complex chicken ITGB1 with RV-G Egyptian strain (QEU57979.1). ITGB1 colored in cyan, a.a residues colored in green, RV-G protein colored in violet, a.a residues colored in yellow, nAChR colored in magenta, a.a residues colored in green, RV-G protein colored in salmon, a.a residues colored in yellow.

AUTHOR CONTRIBUTIONS
contributed to review and editing. MM and LU contributed to supervision. All authors contributed to the article and approved the submitted version.