Structural Models for Roseolovirus U20 And U21: Non-Classical MHC-I Like Proteins From HHV-6A, HHV-6B, and HHV-7

Human roseolovirus U20 and U21 are type I membrane glycoproteins that have been implicated in immune evasion by interfering with recognition of classical and non-classical MHC proteins. U20 and U21 are predicted to be type I glycoproteins with extracytosolic immunoglobulin-like domains, but detailed structural information is lacking. AlphaFold and RoseTTAfold are next generation machine-learning-based prediction engines that recently have revolutionized the field of computational three-dimensional protein structure prediction. Here, we review the structural biology of viral immunoevasins and the current status of computational structure prediction algorithms. We use these computational tools to generate structural models for U20 and U21 proteins, which are predicted to adopt MHC-Ia-like folds with closed MHC platforms and immunoglobulin-like domains. We evaluate these structural models and place them within current understanding of the structural basis for viral immune evasion of T cell and natural killer cell recognition.


INTRODUCTION
The roseolovirus genus of the b-herpesvirus subfamily includes Human Herpesviruses 6A, 6B, and 7 (HHV-6A, HHV-6B, and HHV-7) (Figure 1). These viruses have extremely high prevalence and infect over 90% of the world's population before the age of 6 (2-4). While all three viruses primarily target activated T cells, there are some key differences in the range of additional cell types in which each virus can be found, with HHV-6A having a broader cell tropism compared to HHV-6B and HHV-7. Primary infection with these viruses in infancy causes exanthem subitum, commonly known as roseola or sixth disease, with fever, rash, and occasionally febrile seizures (5). While primary infection rarely causes severe symptoms in immunocompetent individuals (6), like other herpesviruses the roseoloviruses establish lifelong latency with periodic reactivation. In immunocompromised individuals, roseoloviruses can cause severe complications, such as encephalitis and pneumonitis. Roseoloviruses are particularly problematic in solid organ transplant and hematopoietic stem cell transplant patients, where infection/reactivation can result in organ rejection and graft vs. host disease (4, 7,8).
Roseolovirus genomes are composed of linear, double stranded DNA that contains a unique (U) region flanked by a set of identical direct repeat regions (DR), which facilitate integration into the telomeric regions of chromosomes. For HHV-6A and -6B this can occur in the germ line, resulting in inherited chromosomally integrated HHV-6 in approximately 3% of the population (9,10). The HHV-6A/B genomes are 160-170 kb while the HHV-7 genome is slightly smaller at 145-153 kb, although both contain between 100 and 120 ORFs. Overall, the amino acid identity between HHV-6A and HHV-6B is approximately 90% while the difference between HHV-7 and HV-6A/6B is closer to 50%. In order to facilitate lifelong latency, many of these genes encode products that allow the virus to modulate the host immune response. Much previous work with human and mouse cytomegaloviruses (HCMV and MCMV) and certain poxviruses including cowpox virus (CPVX) and Yabalike disease virus (YLDV) has revealed some of the broad strategies that dsDNA viruses use for immune evasion. HCMV encodes as many to 40 gene products that can antagonize the immune system [reviewed in (11)] but many of the basic strategies are shared with other, similar viruses. Despite their dissimilarity at a genetic level, betaherpesviruses and poxviruses have evolved gene products that contain MHC-like and/or immunoglobulin-like domains that utilize similar strategies to attack the host immune response (Figure 1).
A common immune evasion strategy utilized by herpesviruses and some poxviruses is downregulation of classical MHC-Ia proteins to prevent presentation of viral antigens to CD8+ T cells. Since this activity can trigger NK cell "missing self" recognition, these same viruses often antagonize NK cell responses by interacting with activating and inhibitory NK receptors or their ligands, which in many cases are non-classical MHC-Ib stress-induced proteins. Viral homologues of nonclassical class I MHC molecules (vMHCs) play key roles in this process. Although these vMHCs have a diverse range of functions they all share the characteristic class I MHC fold FIGURE 1 | Phylogenetic relationships of viruses and viral immunoevasins discussed in this work. The complete genomic DNA sequences from HHV-6A (strain U1102), HHV-6B (strain Z29), HHV-7 (strain JI), HCMV (strain HAN-SCT17), and MCMV (strain N1) and poxviruses Cowpox (strain Brighton Red) and Yaba-like Disease Virus (strain Amano) were retrieved from the VIPR database ( Table 1) and aligned using the MEGA11 software package. MEGA was then used to generate a maximum likelihood phylogeny reconstruction (1). Scale bar represents the probability of a nucleotide substitution at a given site, calculated for select betaherpesviruses. Viral MHClike proteins that contribute to evasion of host T cell and NK cell responses and are discussed in this work are indicated next to the virus labels. [reviewed in (12)(13)(14)]. Roseoloviruses HHV-6A, HHV-6B, and HHV-7 all have been shown to downregulate host-derived cellular MHC-Ia (cMHC-Ia) (15,16) and cellular MHC-Ib (cMHC-Ib) molecules (17)(18)(19)(20) with the roseolovirus U21 glycoprotein responsible for this activity by intercepting cMHC-Ia in the ER and directing it to lysosomes for degradation (21,22), and the U20 protein recently implicated in MHC-Ib downregulation (18). However, many questions remain surrounding the structures and molecular mechanisms of these two proteins. In this article, we evaluate structural models for roseolovirus U20 and U21 proteins produced by next-generation computational machine learning approaches AlphaFold (23) and RoseTTAFold (24). These models provide insight into possible roles of U20 and U21 in roseolovirus immune evasion as non-classical MHC-Ib proteins.

VIRAL EVASION OF T CELL AND NK CELL RECOGNITION
The game of evolutionary cat and mouse between virus and host has established a complicated back and forth evolution of viral and host gene products. Cytotoxic CD8+ T cells recognize viral peptides presented by host cell classical MHC-I molecules (cMHC-Ia) and respond by killing the infected cell ( Figure 2A). To combat this, viruses evolved the ability to redirect MHC-I from the ER or cell surface to proteasomes or lysosomes for degradation, preventing T cell recognition and killing ( Figure 2B). However, NK cells have evolved mechanisms that allow detection of the absence or reduction of MHC-I on the cell surface. In this pathway, inhibitory NK receptors, such as those in the LIR/KIR family, engage cMHC-Ia to generate suppressive signals that block NK activation ( Figure 2C).
To circumvent this, some viruses express homologs of classical MHC-I (vMHC-Ia), that mimic cMHC-Ia by engaging inhibitory NK receptors ( Figure 2C). Another host defense mechanism involves the surface expression of stress-induced ligands for activating NK receptors, such as those in the NKG2D family ( Figure 2D). In general, ligands for these activating NK receptors are non-classical MHC-I proteins (cMHC-Ib), which do not present peptide or other antigens. Structurally cMHC-Ib proteins are similar to classical cMHC-Ia proteins, although without an antigen binding groove, and in many cases structural elements such as the small b2microglobulin subunit or lower immunoglobulin-like domain are missing. To evade this mechanism, viruses have evolved mechanisms to bind these activating NK ligands in the ER or at the cell surface and sequester them and/or divert them to the lysosome or proteasome for degradation ( Figure 2E). In many cases these mechanisms involve vMHC-Ib molecules. Other vMHC molecules inhibit activating NK receptors by alternative mechanisms. One mechanism involves secretion of a soluble vMHC-Ib protein that acts as a decoy, competitively inhibiting activating NK receptor signaling ( Figure 2F). Another involves a cell-surface vMHC-Ib, that paradoxically binds an activating NK receptor apparently without triggering signaling, the molecular basis of which remains unelucidated ( Figure 2G). Finally, many viruses have evolved decoy receptors or ligands that interfere with cytokine signaling; in one case this involves a vMHC-Ib molecule capable of binding and sequestering TNFa, attenuating host immune responses to viral infection ( Figure 2H).

STRUCTURAL ASPECTS OF CLASSICAL AND NON-CLASSICAL MHC IMMUNOEVASION
In some cases the structural basis for vMHC-I-based immunoevasion is well understood. Certain vMHC-I directly engage inhibitory NK receptors similarly to the normal host ligands. The human LIR-1A inhibitory NK receptor binds to the canonical host cMHC-Ia HLA-A2 underneath the MHC platform domain ( Figure 3A PDB: 1P7Q) (25). UL18 is a classical class I MHC homolog expressed by HCMV that mimics cMHC-Ia by binding LIR-1 in the same manner ( Figure 3B, PDB: 3D2U) (26). Similarly to cMHC-Ia, UL18 binds both peptides and soluble b-2-microglobulin (b2m), making it a vMHC-Ia, but it is the only viral MHC homolog known to do so (26). Despite its ability to bind inhibitory NK receptors, the precise role of UL18 remains mysterious, as it has been shown to inhibit LIR-1 positive NK cell activation while promoting activation of LIR-1 negative NK cells (27,28). Other frequent targets of viral immunoevasins are activating NK receptors such as NKG2D. NKG2D binds to a variety of host cMHC-Ib ligands upregulated in cases of cellular stress or damage, including human ULBP and MIC family members, and the mouse RAE-1 family of ULBP homologs. NKG2D engages these receptors by forming a dimer that associates with the top of the MHC fold as illustrated by the ULBP3-NKG2D ( Figure 3C, PDB: 1KCG) (29). Cowpox virus has evolved an interesting evasion mechanism where the vMHC-Ib protein CPXV018 (OMCP) binds the activating NK receptor NKG2D in a configuration very similar to native cMHC-Ib ligands ( Figure 3D, PDB: 4PDC). Because the viral version is soluble instead of membrane-bound, it acts as a competitive inhibitor, with a 14-fold higher affinity for mouse NKG2D as compared to one of its typical ligands, RAE-1ϵ. This marked increase in affinity was attributed to a single loop that reaches up into the NKG2D binding pocket (30).
Another, less direct means of inhibiting NKG2D and NK activating receptors is to downregulate the activating cMHC-Ib ligands in the host cell. The MCMV m152 glycoprotein is a vMHCI-b with several important immunoregulatory functions that has been shown to bind and downregulate multiple members of the NK activating RAE-1 family. It contains an MHC platform domain and immunoglobulin-like domain, but not b2-microglobulin or peptide, and binds its targets in a clawlike manner, reaching over the top of the NK ligand's MHC domain ( Figure 3E, PDB: 4G59) (31). When this occurs in the ER, the target cMHCI-b is retained and subsequently degraded. Recently, m152 has been shown also to reach the cell surface and mask its RAE-family binding partners from being recognized by NKG2D, as well as affect IRF signaling through STING (32,33).
Other viral immunoevasins that lack MHC platform domains down-regulate cMHC-Ia and cMHC-Ib through interactions with their immunoglobulin-like domains. The HCMV protein UL16 resides primarily in the ER and cis-Golgi, and downregulates NK ligands MICB, ULBP1, and ULBP2 by binding and sequestering them within the secretory system (34,35). UL16 is a single immunoglobulinlike domain protein that engages MICB by binding perpendicularly to the a1 and a2 helices of the MHC platform domain and but parallel to the sheets of the MHC domain underneath ( Figure 3F, PDB: 2WY3 and 1JE6) (36,37). As with m152, the result of this interaction is that NKG2D activating ligands are sequestered in the ER and degraded. US2 is another single immunoglobulin-like domain protein that binds to host classical class I MHC proteins as they are synthesized in the ER, inducing them to be degraded (38). US2 binds underneath cMHC-Ia platform to the C-terminus of the a2 helix and the a3 domain ( Figure 3G, PDB: 1IM3). Interestingly this interaction is similar to that of the inhibitory NK receptor Ly49A binding to cMHC-Ia ( Figure 3A). Cowpox virus CPXV203, with similar structure and function to US2, binds on the opposite side of the host MHC-I in a manner similar to that of LIR-1 and other NK receptors, making contact with the MHC platform and b2m ( Figure 3H, PDB: 4HKJ) (39).
In some cases, the interplay of host response and viral evasion protein evolution can be complex. This is exemplified by the Ly49 family of murine NK receptors. The Ly49 receptors can be either activating or inhibitory, and use C-type lectin domains to interact with MHC-Ia molecules [reviewed in (40,41)]. Structures of inhibitory Ly49C and Ly49A family members bound to their normal MHC ligands reveal significant diversity in binding modes ( Figure 3I, PDB 38CK and Figure 3J 1QO3) (42,43). No vMHC proteins have been identified that bind inhibitory NK receptors using these particular modes of interaction, but these structures could provide models for viral immunoevasins yet to be characterized. The m157 glycoprotein of MCMV engages inhibitory NK receptors such as Ly49I to prevent missing-self recognition (44). Although detailed structural information is not available, mutagenesis suggests a different binding mode than for other Ly49 family inhibitory NKR engaging their host cMHC-Ia ligands (43,44). There is a high degree of m157 sequence variability across MCMV strains, and m157 exhibits varying effects depending on the combination of mouse and virus strains. In fact, m157 from the Smith strain of MCMV binds the inhibitory NK receptor Ly49I in 129/J mice, but in C57BL/6 mice, m157 binds the activating NK receptor Ly49H (45). Although only a small part of Ly49H was visualized in the m157 complex structure, binding would appear to involve the top of the MHC platform ( Figure 3K, PDB: 4JO8) (46). This interaction contributes to NK activation and viral clearance, and it has been hypothesized that Ly49H has evolved to recognize m157 to thwart NK evasion by MCMV (46).
Finally, the Yaba-like Disease Virus 2L protein has an MHClike fold that binds TNFa in a manner different from other viral TNFR mimics to achieve picomolar binding affinities that allow it to compete quite effectively with the host receptor (47) ( Figure 3L, PDB: 3IT8).
U20 AND U21 FROM HHV-6A, HHV-6B, HHV-7 The complex interplay between host and viral immunomodulators has been best characterized in MCMV and HCMV and is just now beginning to come into focus for the roseoloviruses. The first pieces of the puzzle fell into place when it was shown that U21 can downregulate non-classical cMHC-Ib stress-response proteins in addition to classical cMHC-Ia proteins. In HHV-7, U21 was shown to bind and downregulate the nonclassical MHCs HLA-E and HLA-G, as well as the NK activating ligands MICA and MICB (19,20). Also, U21 expressed by HHV-6A is able to downregulate ULBP3 (18). In HHV-6B, the precise role of U21 has yet to be defined, but it is clear that the NK ligands ULBP1, ULBP3, and MICB are downregulated during infection (17). Very recently, HHV-6A U20 was shown to downregulate ULBP1 from the cell surface, decreasing NK activation in degranulation assays (18). In addition, U20 from HHV-6B has been shown to affect TNFR signaling, inhibiting PARP cleavage, caspase 3 and 8 activation, and IkBa and NF-kB transcriptional activity (48). Although roles for HHV-6 U20 and U21 in down-regulating cMHC-Ia and cMHC-Ib have been identified, several questions remain. For example, based on the high degree of sequence conservation (91.2%) between the amino acid sequences of U21 among HHV-6A and -6B (Figure 4), one would expect that U21 likely plays a similar role in both viruses. However, HHV-6A U21 seems to be more efficient at downregulating MHC-I as compared to HHV-6B U21 (22). Additionally, U21 from the various roseoloviruses exhibit the ability to bind variable targets from one virus to another despite a high degree of conservation (15,19,20). Finally, while it has been proposed that U21 might adopt an MHC-like fold (50), there is not yet structural evidence to support this and even less is known about the structure of U20. U21 redirects and sequesters its targets in the ER where they are eventually redirected to the lysosome (21). However, U20 accomplishes ULBP1 downregulation by an alternative, lysosome-independent mechanism that remains to be elucidated (18). To better understand how these roseolovirus immunoevasins fit into the picture of T cell and NK cell evasion presented above, we utilized advanced structure prediction tools to generate structural models and draw parallels to the better-characterized viral immunoevasins just discussed.

EVOLUTION OF HHV-6B U20 STRUCTURE PREDICTION
With no experimental characterization of U20 or U21 structures, computational tools can be used to gain insight into their structures and potential binding partners. We first review the evolution of computational structure prediction for these proteins, using U20 from HHV-6 as an example. The presence of immunoglobulin-like domains can be robustly predicted from sequence data alone, using the pattern of hydrophobic, hydrophilic, and turn-inducing amino acid residues along with a presence of the characteristic disulfide-bond linking the two component beta sheets (51). An immunoglobulin-like fold was predicted for HHV-6B U20 along with the initial genomic sequence (52). This prediction was further supported by analysis of predicted secondary structure elements, which also can be predicted from raw sequence data using JPRED or similar algorithms (53). The C-terminal half of the extracellular region shows the typical pattern of disulfide-linked 3-stranded and 4stranded beta sheets ( Figure 5B), as expected for membraneproximal immunoglobulin-like domain and reported by Kofod-Olsen et al. (48). Interestingly, the N-terminal half of the extracellular region shows a pattern of predicted sequential beta strands followed by alpha-helical regions. This pattern is characteristic of the MHC-I fold, which is composed of an eightstranded beta sheet topped by two alpha helices, sitting above an immunoglobulin C-type domain. This is illustrated for the classical MHC-Ia protein HLA-A2 ( Figure 5A), represented as FIGURE 4 | Sequence homology of U20 and U22 from HHV-6A, HHV-6B, HHV-7. The amino acid sequences for U20 and U21 from HHV-6A, -6B, and -7 were retrieved from the Uniprot database (Table 3). They were then aligned using Clustal Omega (49). Top: Sequence identity matrix for U20 and U21 from HHV-6A, HHV-6B, and HHV-6. Bottom Left: Alignment of U20 sequences from HHV-6A (strain U1102), HHV-6B (Z29) and HHV-7 (JI). Amino acids are colored by type. Identities are indicated by an asterisk and similarities are indicated with a colon. Bottom Right: Alignment of U21 sequences from these same strains, colored and annotated as in B.
Weaver et al.

Structural Models of Roseolovirus Immunoevasins
Frontiers in Immunology | www.frontiersin.org April 2022 | Volume 13 | Article 864898 a ribbon diagram and colored for correspondence with the JPRED secondary structure diagram. The pattern of two sequential copies of a strand-strand-strand-strand-helix motif characteristic of the MHC platform fold was used in the original discovery of non-classical MHC-Ib proteins from MCMV as ligands for NK receptors (54). The computational structure prediction engine used in that work was 3D-PSSM, which matches predicted secondary structure patterns in the target sequence with similar patterns present in proteins with known three-dimensional structures. The use of protein-specific scoring matrices and hidden Markov models in 3D-PSSM was an early application of machine learning to protein structure prediction (55). In 2009 we used Phyre (56), a successor to 3D-PSSM, to extend that approach to HHV-6B U20, but an MHC platform domain was not identified, although the immunoglobulin domain was robustly detected. An improved version of Phyre, Phyre 2 , considers sequence-based predictions of disordered regions as well as secondary structures, and aligns homologous sequences prior to secondary structure prediction for more robust pattern detection (57). The Phyre 2 prediction for HHV-6B U20 shows a four stranded beta sheet and alpha helix atop a canonical immunoglobulin-like domain ( Figure 5C). A second beta sheet with associated alpha helix, N-terminal to the one just mentioned, is present but not assembled into the canonical fold ( Figure 5C) and is predicted with much lower confidence. The PHYRE 2 modeling quality score is shown as a tube model, with the magenta intensity and tube radius increasing with proportionally to confidence in structure prediction ( Figure 5C bottom). We also predicted a three-dimensional structure for HHV-6B U20 using I-TASSER, a more recent structure prediction engine ( Figure 5D). I-TASSER uses a threading alignment refinement procedure, which aligns target sequences onto known three-dimensional structure templates ("threading"), and then evaluates energetics and steric clashes for local regions of three-dimensional space. Low-energy local regions are clustered and assembled into domains, and the process is iterated (58,59). Models are scored using the root mean square deviation (RMSD) of structures used in the clustering and a TM-score representing the differences between pairwise distances for all atoms in local region between models in the cluster, which provides an improved per-residue confidence score. The I-TASSER model for HHV-6B U20 shows both halves of an assembled canonical MHC-I like domain, with similar confidence throughout ( Figure 5D). U21 has also been previously predicted to adopt an MHC-like fold based on a similar analysis using Phyre 2 and I-TASSER, with similar results for the HHV-6A, HHV-6B, and HHV-7 orthologs of U20 and U21 (50). The protein structure prediction community was rocked in 2018 by the performance of AlphaFold (60), a machine learning algorithm from Google's DeepMind project, which also produced the chess-and GO-playing AlphaZero and AlphaGo algorithms (61). Since 1994, protein modelers have participated in the biennial Critical Assessment of Protein Structure Prediction (CASP), a competition to predict recently determined but unreleased protein structures (62). Sustained progress has been made in each cycle (Phyre, Phyre 2 and I-TASSER were prominent among the top scoring computational algorithms from CASP 8 in 2012 to CASP-12 in 2016). CASP-13 saw an unprecedented increase in prediction accuracy, with DeepMind's AlphaFold demonstrating more than 20% increased accuracy of backbone prediction for the most challenging structure prediction category, proteins with marginal similarity to any known structure (63). In 2020 at CASP-14 an improved AlphaFold algorithm continued this trend, dramatically outperforming other strictly computational as well as humanassisted approaches, with precision in some cases approaching expected experimental error (23). AlphaFold employs parallel tracks of one-dimensional multiple sequence alignments, twodimensional patterns of pairwise co-evolution of residues in homologous sequences, and three-dimensional structural representations that minimize distance between co-evolving pairs of residues, with machine-learning optimizations in each track and iterative propagation of information between tracks. A novel aspect is that the three-dimensional structure is modeled initially as an "atomic gas" of interacting residues without conventional protein structure constraints such as bond lengths and angles, torsions, electrostatics, and conformational constraints, which are applied only subsequently during conventional gradient-descent refinement and Amber force field after the end-to-end ab initio structure determination is completed. AlphaFold also introduced an improved confidence measure, the predicted local Ca distance difference test (pLDDT). Similar approaches were incorporated into RoseTTAfold, an effort by prominent academic protein structure groups to make a similar but somewhat simplified algorithm available to the larger community, on an open-source platform suitable for use on conventional computer hardware (24).
The AlphaFold prediction of HHV-6B U20 structure includes the components of the conventional MHC-I fold, but with the domains reoriented ( Figure 5E). The MHC platform a1 and a2 domains are displaced from each other and from the immunoglobulin-like a3 lower domain, and the a1 helix is largely missing. Modeling confidence is higher for the immunoglobulin domain than for the platform domains, with the a2 confidence score low but still somewhat higher than for a1. Notably the platform domains are oriented "upside-down" relative to a conventional MHC-Ia structure, somewhat similarly to the arrangement in certain MHC-Ib proteins like MICA and MICB, but with an even more extreme displacement, and the a1  domain strand threading pattern is different (DABC versus ABCD). In contrast, RoseTTAfold predicts a more canonical MHC-like fold, with conventionally threaded and oriented a1 and a2 domains, which however still are considerably displaced from the a3 immunoglobulin-like domain ( Figure 5F). Modeling confidence is similar for the a2 and a3 domains and only slightly lower for a1, although RoseTTAfold reports an RMSD-based score which is believed to be less accurate than and not directly comparable to pLDDT (64).

U20-U26 GENE CLUSTER
U20-U26 comprise a gene cluster specific to the Roseolovirus genus, sandwiched between clusters of genes shared within the betaherpesvirus subfamily or by the entire herpesvirus family ( Figure 6A) (52,65). Generally, these genes are dispensable for viral growth (66) and involved in immune evasion: in addition to U20 and U21 described above, U24 has been implicated in endocytic recycling and protein degradation (67-69) and U26 has been shown to inhibit the RLR/MAVS signaling pathway (70). U20, U21, U23, and U24 are single-pass type I membrane glycoproteins, U25 is a soluble tegument protein, and U26 is a polytopic 8-pass integral membrane protein. To help evaluate the specificity of MHC-like fold prediction for membrane glycoproteins, and to identify possible additional MHC-like or immunoglobulin-like proteins in this gene cluster, we examined AlphaFold structures for the entire HHV-6B U20-U26 gene cluster, after removal of any predicted signal sequence and Cterminal membrane sequences. HHV-6B U21 was predicted to have an N-terminal MHC-platform domain and a C-terminal immunoglobulin-like domain, similarly to HHV-6B U20 but with the domains oriented somewhat differently. This will be discussed in detail in the next section. None of the predicted structures for U22-U26 contain immunoglobulin-like or MHClike domains, although beta sheets and helices are apparent, indicating substantial specificity in prediction of MHC-like folds ( Figure 6B).

ALPHAFOLD AND ROSETTAFOLD STRUCTURAL MODELS FOR U20 AND U21
We used AlphaFold and RoseTTAfold to predict threedimensional structures for U20 and U21 from each of the three human roseoloviruses HHV-6A, HHV-6B, and HHV-7. DeepMind and the European Bioinformatics Institute are developing an extensive, openly accessible database intended to eventually include AlphaFold predictions for most or all annotated genomes, but at the current time viral sequences are not included (71). We used AlphaFold2 Advanced CoLab, a Google-based computational environment, and Robetta, a distributed computing project hosted by the Baker laboratory at University of Washington, HHMI, and Rosetta@home, to provide access to RoseTTAfold (72). The HHV-6A and HHV-6B versions of U20 and U21 have high sequence homology (91% and 90% respectively (Figure 6), likely indicating very similar three-dimensional structures. We predicted structures for each separately because in some cases they have been shown to have different functional effects and binding specificities, as noted above, and as a test of the sensitivity of the folding algorithms to sequence variation. Predicted structures are shown as ribbon diagrams colored according to the linear sequence from Nterminus (blue) to C-terminus (red), and in earlier figures. Confidence scores along the sequences are shown in tube diagrams in the lower portion of each panel (magenta). For the pLDDT score provided by AlphaFold ( Figures 7A-F), values of 90-100 indicate regions modeled with high accuracy including side-chain conformations, 70-80 indicated regions of less confidence for which the backbone is expected to be modeled well, and 50-60 indicate values of low confidence. Values below 50 indicate regions of possible disorder, for which there is no confidence in the structural prediction (76). RoseTTAfold provides a conventional RMSD score to characterize the Ca variation among predicted versions of the same structural regions; we converted this to a linear score running from 15Å (no confidence) to 0Å (high confidence) (Figures 7G-L). Both AlphaFold and RoseTTAfold provide a set of models that represent the structure prediction. In Figure 7 we show the top scoring model for each prediction run, but in each case, we examined the top five models for structural consistency. In general, the top five scoring models were similar, with differences restricted to the low-confidence and likely unfolded regions at the proteins' extreme N-and C-termini, and to differences between the relative orientation of the MHC-like and immunoglobulin-like domains. In a few cases, for the lower-scoring models, pairs of strands were separated from the main beta sheets, but as these were not consistently observed in multiple models we did not consider them further.
Each of the U20 and U21 proteins was predicted to adopt an MHC-Ib-like fold, with an immunoglobulin-like domain close to the C-terminus, and a characteristic MHC platform domain consisting of two a-helices atop an eight-stranded beta-sheet platform ( Figures 7A-L). In all cases the immunoglobulin-like domains were predicted with higher confidence, and the MHC platform domains with lower confidence, generally with ahelical regions predicted with higher confidence than the beta sheet, and at least some of the b-strands with low or minimal confidence. There were several alterations in the MHC platform conformations relative to canonical structures, as occasionally observed for other MHC-Ib proteins. In the AlphaFold predictions of HHV-6A U20 ( Figure 7A) and HHV-6B U20 ( Figure 7B), the MHC platforms were the most distorted relative to a canonical MHC fold, with the a1 and a2 components of the MHC platform separated from each other for HHV-6A U20 and the a1 helical domain unfolded for HHV-6B U20. The RoseTTAfold predictions for these proteins ( Figures 7G, H) had more standard MHC platform domains, although with some a1 strands missing for HHV-6B U20. All but one of the predicted structures for the U20 and U21 proteins had ahelices closely together and aligned to form closed "binding sites," as generally found in MHC1b proteins that do not present peptides or other ligands. The one exception was the RoseTTAfold structure for HHV-7 U20 (Figure 7I), for which the helices were separated at the end of the site where peptides conventionally bind in MHC-Ia proteins, and in fact the extreme N-terminal 15 residues formed a helix that docked in this region (asterisk in Figure 7I). However, a comparable structure was not seen in the AlphaFold version. For three of the predictions, RoseTTAfold HHV-6A U20 ( Figure 7G), AlphaFold HHV-7 U20 ( Figure 7C) and AlphaFold HHV-7 U21 (Figure 7F), some of the b-strands were missing from the a1-domain platform that usually comprises a 4-stranded beta sheet, and for one of predictions, AlphaFold HHV-6B U21 (Figure 7E), the immunoglobulin domain had nine instead of the canonical 7 b-strands.
In general, the relative orientation of the MHC platform and immunoglobulin-like domain varies widely between different | AlphaFold and RoseTTAfold structures for U20 and U21. AlphaFold structure predictions for U20 from HHV-6A (A), HHV-6B (B), and HHV-7 (C), and for U21 from HHV-6A (D), HHV-6B (E), and HHV-7 (F). RoseTTAfold structure predictions for U20 from HHV-6A (G), HHV-6B (H), and HHV-7 (I), and for U21 from HHV-6A (J), HHV-6B (K), and HHV-7 (L). The same sequences used in the alignments were processed through SignalP to identify signal sequences and TMHMM to identify transmembrane domains (73,74). The sequences were then truncated to reflect the extracellular portion of the protein before use for structure prediction. For Phyre2, sequences were submitted for analysis via the Phyre2 server utilizing the "Intensive" modeling mode (57). For I-TASSER, we provided a protein sequence and specified no constraints or template exclusions (59). For AlphaFold we used the AlphaFold2 Advanced Script hosted by Google Colab (23). We used the default settings, specifically utilizing de novo generation of multisequence alignments with mmseqs2. We generated 5 models for each prediction with 1 ensemble, 3 recycles, a tolerance of 0, and 1 random seed. For RoseTTAfold we used the Robetta server and predicted on a single sequence using the "RoseTTAFold" method with no additional constraints (24). Both AlphaFold2 and RoseTTAFold generate several structural models from each modeling run. Figures were generated using the top-scoring model. For AlphaFold and RoseTTAfold, the top five scoring models from each run were examine for consistency with the model presented. Scale bars representing the confidence intervals are shown on a linear scale, RMSD for Phye2 and RoseTTA, TM-score for ITASSER, and pLDDT for AlphaFold. (A-L) Predicted structural models and previously determined crystal structures were visualized and figures were prepared using the Pymol molecular graphics program (75). Ribbon and tube diagrams colored as in Figure 5. Asterisks indicate features highlighted in text.
proteins that adopt the MHC-like fold, ranging from almost perpendicular for classical MHC-Ia proteins (such as H-2Kb in Figure 3A) to almost co-linear for some MHC-Ib proteins such as MICB ( Figure 3E). We examined the angle between the MHC platform beta sheet and the immunoglobulin-like domain in each of our predicted U20 and U21 models and observed a range of interdomain orientations (Figure 7). The most extreme was for the AlphaFold U20 structures from both HHV-6A and HHV-6B ( Figures 7A, B), for which the MHC platforms were flipped upside-down relative to the canonical MHC-Ia orientation. The RoseTTAfold structures for these proteins were somewhat more conventional, with the MHC platforms oriented roughly perpendicular to the immunoglobulin-like domain, but with the platforms swung out to the side with an extended linker between the domains and essentially no interdomain contacts (Figures 7G, H). These features might be indicative of substantial interdomain flexibility for U20 from HHV-6A and -6B. For U20 from HHV-7 ( Figures 7C, I), the domains were roughly perpendicular, with a small area of contact between the top of immunoglobulin-like domain and loops between the strands in the MHC platform, apparent in both AlphaFold ( Figure 7C) and RoseTTAfold ( Figure 7I) models. The U21 models in general had more acute angles between MHC platform and immunoglobulin domain that for U20, in most cases with extensive contacts between the domains (Figures 7D-F, J-L). Each of the U21 models has a kinked C-terminal extension in the a2-domain a-helix, which orients the Cterminal end of the helix down towards the immunoglobulinlike domain. This same feature is also seen for the MCMV protein m152 ( Figure 3G) as well as m153, m144, and m157 (31). For both AlphaFold and RoseTTAfold models of HHV-6A U21 ( Figures 7D, J), the immunoglobulin-like domain is tucked underneath the MHC platform, with additional contacts from the extended N-terminal tails (asterisks in Figures 7D, J). However, the AlphaFold model ( Figure 7D) has the immunoglobulin domain to the side of the MHC platform, still making extensive interdomain contacts, rather than underneath, as for the RoseTTAfold model ( Figure 7J). For HHV-6B U21, both AlphaFold ( Figure 7E) and RoseTTAfold ( Figure 7K) models are quite similar to the corresponding models for HHV-6A U21 (Figures 7D, J). Finally, both models for HHV-7 U21 orient the immunoglobulin domain so that its edge makes extensive contacts with the underside of the MHC platform ( Figures 7F, G).
Despite these variations from canonical structures, we considered the predictions of an MHC-Ib-like fold for each of the U20 and U21 proteins to be robust for two reasons. First, proteins with MHC-like folds generally contain two disulfide bonds, one between a cysteine residue in the a2-domain ahelix and another cysteine residue in a b-strand near the center of the MHC platform beta sheet as well as a canonical intersheet disulfide bond in the center of the immunoglobulin-like domain. The U20 extracellular domains have four (HHV-6A and -6B) or five (HHV-7) cysteines. In all the predicted U20 structures the cysteines are in position and modeled to form the expected disulfide bonds in the MHC platform and the immunoglobulin domain. The U21 extracellular domains contain more cysteine residues, six for HHV-6A, nine for HHV-6B, and 10 for HHV-7. In all the predicted U21 structures, the expected disulfide bonds in the MHC platform and immunoglobulin-like domains are formed, with an additional disulfide bond in the immunoglobulin domain in all U21 structures, and up to two additional disulfide bonds in the MHC platform for HHV-6B and HHV-7 U21. The expected disulfide bonding pattern is not known to the folding algorithms, and so formation of the typical MHC-fold disulfides represents an independent confirmation of the MHC-Ib-like fold. Second, U20 extracellular domains have a large number of potential N-linked glycosylation sites: nine for HHV-6A and HHV-6B at identical positions, and seven for HHV-7. For HHV-6B U20 we have verified that each of the sites is modified in recombinant soluble protein expressed in HEK-293 GnTI cells (GW and LJS, unpublished results). For each of the predicted U20 structures, the N-linked Asn residues are on the surface of the protein. U21 proteins have fewer Nlinked glycosylation sites: one for HHV-6A, two for HHV-6B, and four for HHV-7 (77). For each of the predicted U21 structures, these glycosylation sites also are located on the protein surface. Surface accessibility of each of the asparagine residues involved in N-linked glycan formation provides some additional confidence in the overall model conformations.
We examined the predicted structures for compatibility with previously identified NK immunoevasin mechanisms involving MHC-Ia and MHC-Ib proteins for which structural information is available. U20 from HHV-6A recently has been reported to down-regulate the stress-induced MHC-platform-only MHC-Ib protein ULBP-1 (18). We investigated whether the predicted U20 structures and structural mechanisms from previous work would be consistent with this activity. The structure of ULBP-1 is not known, but ULBP-3 and ULBP-6 structures are highly similar to each other (29,78) and to mouse RAE-1g (31), a member of the murine RAE-1 family, which is orthologous to the human ULBP family. Previous structural work has shown that the m152 protein from MCMV binds to RAE-1g, in a pincer-like mechanism that uses both the underside of its MHC platform and the edge of the its immunoglobulin domain to surrounding the top of the RAE-1g MHC platform (31), as shown in Figure 3E. The structural models for U20 from HHV-6A, HHV-6B, and HHV-7 are all consistent with this mechanism, with no apparent steric interference for ULBP-1 underneath the platform for any of the models, and with immunoglobulin-like domains oriented appropriately for contacting ULBP-1 in the RoseTTAfold models. For MCMV m152, the interaction with RAE-1 results in down-regulation of RAE-1 surface expression, primarily via retention within the early secretory pathway (79). A similar interaction between U20 and ULBP-1 could explain the U20-mediated ULBP-1 down-regulation activity recently reported for HHV-6A. We note that this interaction would also be consistent with an MHC1b surface masking mechanism as recently proposed for m152 (32). Thus, the m152-RAE1-g interaction potentially provides a model for U20 down-regulation of ULBP-1. U21 from HHV-7 has been reported to down-regulate both nonclassical (MICA, MICB, HLA-E, and HLA-G) (18)(19)(20) and classical MHC-Ia proteins (15). We investigated whether the predicted U21 structures and previous structural characterization of other viral immunoevasins would be consistent with this activity. For MICA and MICB, the m152-RAE-1g model just described for U20 would be consistent with the U21 structural predictions, although interactions with both the MHC platform and immunoglobulin-like domains would require some domain reorientation. Because classical MHC-Ia proteins have immunoglobulin-like domains and b2-microglobulin domains in addition the MHC platform, the m152-RAE-1g model is not directly applicable to modeling classical MHC-Ia down-regulation, but could be relevant if the MHC-Ia protein adopted a supine conformation on the membrane (80). It is also possible that U21 uses distinct mechanisms to interact with MHC-Ia and MHC-Ib proteins. We considered whether previously reported structures of Ig-only viral immunoevasins US2 and CPXV203 ( Figures 3G, H) could provide a model for the observed U21-mediated MHC-Ia down-regulation. The structural models for HHV-7 U21 would be consistent with the US2 mechanism without substantial steric interference, but for the CPXV203 mechanism the U21 MHC platform domains would interfere with the immunoglobulin-like domain docking on the MHC-Ia target. Finally, U20 from HHV-6B has been reported to interfere with TNFa signaling (48). The MHC-Ib protein 2L from poxvirus YDV provides a potential model for this activity (47,81). The regions of the U20 structural models corresponding to those from YDL 2L that bind TNFa ( Figure 3J) are surface exposed and potentially available for cytokine interaction, but we did not consider the predicted structures to be sufficiently accurate for docking or other structural modeling to evaluate this possibility in detail.

DISCUSSION
No experimental three-dimensional structure is available for any roseolovirus U20 or U21 protein. Cellular studies have revealed functional roles for these proteins in evasion of NK responses by interference with surface expression of classical MHC-Ia and non-classical MHC-Ib proteins. Structural information on these proteins would help to define mechanisms for the interference, suggest potential binding partners, and contribute to understanding the basis for observed differences in activities for the HHV-6A, HHV-6B and HHV-7 orthologs of U20 and U21. Previous sequence analysis had suggested the presence of immunoglobulin-like domains and in some cases MHC-like domains for some of these proteins. We used the nextgeneration structure prediction algorithms encoded in the machine-learning programs AlphaFold and RoseTTAfold to predict three-dimensional structures for the extracellular domains of the U20 and U21 glycoproteins from HHV-6A, HHV-6B, and HHV-7. All proteins were predicted to adopt MHC-like folds characteristic of non-classical MHC-Ib proteins. Structural models for U20 from all three viruses had MHC platform domains displaced from the immunoglobulin-like domains, with missing or altered structural elements relative to canonical structures, particularly in case of the a1 domains. Structural models for U21 from all three viruses had MHC platform domains closely opposed to the immunoglobulin-like domains. The U21 MHC platform domains had conventionally oriented a-helices atop an eight-stranded b sheet, in each case with a kinked and extended a-2 domain a-helix as previously observed in structures of m152, m153, and m157 from MCMV (31,44,82). We present the U20 and U21 models as guides for hypothesis generation about potential mechanisms of viral interference in MHC-Ia and MHC-Ib pathways, and to help understand observed differences between the HHV-6A, HHV-6B, and HHV-7 variants. We look forward to comparison of the models presented with here with experimentally-determined structures when they become available. It has been noted that while in some cases the accuracy of AlphaFold-derived models appears to surpass that of experimental methods (23), this may not be the case for new protein folds, where high-resolution experimentally-determined structures of close structural homologs are not available (83). In these cases, the expected accuracy is much lower, and has been estimated to correspond roughly to a very low resolution (~4Å) structure determined by X-ray crystallography (83).
For the U20 structural models, the interdomain orientation appeared to less well-defined than for U21, and in some models parts of the MHC platform were missing, disordered, or displaced. This could reflect bona fide aspects of U20 structure or conformational lability, but it is also possible that the U20 MHC platform and/or interdomain interaction might be stabilized by a binding partner. In conventional class Ia MHC proteins, peptide binding stabilizes the MHC platform and interdomain orientation, with synergistic stabilization by b2microglobulin (84,85). However, we do not expect that U20 or U21 require b2-microglobulin or peptide to complete folding, as recombinant proteins that do not contain peptide or b2microglobulin retain full biding activity, at least for HHV-6B U20 binding to ULBP-1 (GW and LJS, unpublished results) and HHV-7 U21 binding to the MHC-Ia molecule HLA-A2 (50). It is also possible that oligomer formation could stabilize U20 and U21 folding, and in solution both recombinant HHV-6B U20 and HHV-7 U21 form dimers (GW and LJS, unpublished results) or tetramers (50), respectively. To evaluate the possibility that dimer formation might stabilize U20 folding, we used AlphaFold-multimer (86) to predict structures for the HHV-6B U20 dimer. However, the resultant structural models did not reveal more well-ordered interactions within the MHC platform or between domains. Finally, U20 is heavily glycosylated, with the nine N-linked glycans representing~35-40% of the apparent molecular weight of the extracellular portion as assessed by SDS-PAGE (CS and AWH, unpublished results). For some glycoproteins, N-linked glycans are required for proper protein folding (87)(88)(89), but the influence of these bound glycans is not included in the AlphaFold and RoseTTAfold algorithms (90). However, fully deglycosylated recombinant HHV-6B U20 exhibits no tendency to aggregate, with a thermal denaturation at 60-62°C (GW and LJS, unpublished observations), similar to recombinant cMHC-I proteins (91), and we do not expect that the U20 glycans are required for adoption or stabilization of the folded structure. One consistent feature of the prediction efforts reported here is that the structural models produced by RoseTTAfold are more compact than those produced by AlphaFold. RoseTTAfold models had fewer unstructured regions, fewer broken helices and sheets, fewer displaced secondary structure elements, and more interdomain contacts than did the corresponding AlphaFold models. The lack of more complete correspondence between the algorithms is puzzling. RoseTTAfold was designed as a simplified version of AlphaFold suitable for use with limited computational power, and generally thought to be slightly less accurate (24,92). Both AlphaFold and RoseTTAfold rely on aligned sets of evolutionarily related sequence variants for co-variation analysis, but U20 and U21 do not have easily-identified orthologs outside of the roseolovirus family and even there sequence coverage is thin. It is possible that AlphaFold is more reliant than RoseTTAfold on these alignments and unable to fold portions of the structures because of the limited sequence coverage, or that the full DeepMind prediction engine would fold these regions more completely than the slightly limited Google CoLab implementation that we used. However, is also possible that these regions are in fact more structurally labile, and that RoseTTAfold is overzealous in packing and overoptimistic in its confidence calculations. It will be interesting to compare the predicted structural models presented here with experimental models for U20 and U21 to evaluate these possibilities.
There are several limitations to our study. Structural modeling approaches based on machine learning multi-track algorithms are very new, and confidence estimates derived from predictions of newly determined crystal structures and from cross-validation of PDB entries might overestimate the prediction accuracy, especially for proteins with novel folds, folds not well-represented in the database, or containing unstructured regions. We did not consider chaperones or binding partners that might be necessary to complete folding, nor attempt to model immunoevasion mechanisms for which there is no current structural model. Finally, we focused on U20 and U21 because these proteins have been implicated in immunoevasion mechanisms that in other viruses involve MHC-Ia proteins, but there might be additional non-classical MHC or other proteins expressed by roseoloviruses required to understand the full picture of roseolovirus MHC-Ia immunoevasion.
In conclusion, we evaluated structures for the extracellular domains of the U20 and U21 immunoevasin proteins from human roseoloviruses HHV-6A, HHV-6B and HHV-7, produced by recently described state-of-the-art machine-learning prediction engines. The expected relatively low accuracy of the structural models limited detailed interpretation, and we considered only backbone conformations. Despite this restriction, each of the proteins was confidently predicted to adopt an MHC-like fold with a closed MHC platform domain above a canonical immunoglobulin-like domain. Predicted conformational differences between U20 and U21 included missing or unstructured elements in the MHC platform a1 domain for the U20 proteins, and more substantial interaction between MHC platform and immunoglobulin domains for the U21 proteins.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.