Structure-Based Modeling of SARS-CoV-2 Peptide/HLA-A02 Antigens

SARS-CoV-2-specific CD4 and CD8 T cells have been shown to be present in individuals with acute, mild, and asymptomatic Coronavirus disease (COVID-19). Toward the development of diagnostic and therapeutic tools to fight COVID-19, it is important to predict and characterize T cell epitopes expressed by SARS-CoV-2. Here, we use RosettaMHC, a comparative modeling approach which leverages existing structures of peptide/MHC complexes available in the Protein Data Bank, to derive accurate 3D models for putative SARS-CoV-2 CD8 epitopes. We outline an application of our method to model 8–10 residue epitopic peptides predicted to bind to the common allele HLA-A*02:01, and we make our models publicly available through an online database (https://rosettamhc.chemistry.ucsc.edu). We further compare electrostatic surfaces with models of homologous peptide/HLA-A*02:01 complexes from human common cold coronavirus strains to identify epitopes which may be recognized by a shared pool of cross-reactive TCRs. As more detailed studies on antigen-specific T cell recognition become available, RosettaMHC models can be used to understand the link between peptide/HLA complex structure and surface chemistry with immunogenicity, in the context of SARS-CoV-2 infection.


INTRODUCTION
An ongoing pandemic caused by the novel SARS coronavirus (SARS-CoV-2) has become the focus of extensive efforts to develop vaccines and antiviral therapies (1). Immune modulatory interferons, which promote a widespread antiviral reaction in infected cells, and inhibition of pro-inflammatory cytokine function through anti-IL-6/IL-6R antibodies, have been proposed as possible COVID-19 therapies (2,3). However, stimulating a targeted T cell response against specific viral antigens is hampered by a lack of detailed knowledge of the immunodominant epitopes displayed by Human Leukocyte Antigen (HLA) alleles across individuals. The molecules of the class I major histocompatibility complex (MHC-I, or HLA in humans) display on the cell surface a diverse pool of 8-15 amino acid peptides derived from the endogenous processing of proteins expressed inside the cell (4). This MHC-I restriction of peptide antigens provides jawed vertebrates with an essential mechanism for adaptive immunity: surveillance of the displayed peptide/MHC-I (pMHC-I) molecules by CD8 cytotoxic T-lymphocytes allows detection of aberrant protein expression patterns, which signify viral infection and can trigger an adaptive immune response (5). A recent study has shown important changes in T cell compartments during the acute phase of SARS-CoV-2 infection (6), suggesting that a more detailed analysis of antigenspecific T cells would provide new avenues for understanding the expansion and contraction of TCR repertoires in different clinical settings. Given the reduction in breadth and functionality of the naïve T cell repertoire during aging (7), identifying a minimal set of viral antigens that can elicit a protective response will enable the design of diagnostic tools to monitor critical gaps in the T cell repertoire of high-risk cohorts, which can then be addressed using peptide or epitope string DNA vaccines (8).
Human MHC-I molecules are highly polymorphic, with thousands of known alleles in the classical HLA-A, -B, and -C loci. Specific amino acid polymorphisms along the peptidebinding groove (termed A-F pockets) define a repertoire of 10 4 -10 6 peptide antigens that can be recognized by each HLA allotype (9,10). Several machine-learning methods have been developed to predict the likelihood that a target peptide will bind to a given allele [reviewed in (11)]. Generally these methods make use of available data sets in the Immune Epitope Database (12) to train artificial neural networks that predict peptide processing, binding and display, and their performance varies depending on peptide length and HLA allele representation in the database. Structure-based approaches have also been proposed to model the bound peptide conformation de novo [reviewed in (13)]. These approaches utilize various algorithms to optimize the backbone and side chain degrees of freedom of the peptide/MHC structure according to a scoring function, derived from physical principles (14)(15)(16), that can be further enhanced using modified scoring terms (17) or mean field theory (18). While these methods do not rely on large training data sets, their performance is affected by bottlenecks in sampling of different backbone conformations, and any possible structural adaptations of the HLA peptide-binding groove.
Predicting the bound peptide conformation whose N-and C-termini are anchored within a fixed-length groove is a tractable modeling problem that can be addressed using standard comparative modeling approaches (19). For HLA-A * 02:01, the most common HLA allele (Supplementary Figure 1) among disease-relevant population cohorts (20), there is a large number of high-resolution X-ray structures available in the PDB (21), suggesting that such methods can be applied to produce models of candidate epitopes identified in the proteome of a pathogen of interest. Here, we apply RosettaMHC, a comparative modeling and side chain optimization approach to model all HLA-A * 02:01 epitopes predicted directly from the ∼30 kbp SARS-CoV-2 genome, and make our models publicly available through an online database. Previous studies have shown evidence for T cell cross-reactivity (22,23) for SARS-CoV-2 viral peptides in healthy individuals (24,25). Analysis of electrostatic surfaces of our models, relative to models of homologous peptide/HLA-A * 02:01 complexes derived from four strains of human common cold coronavirus (229E, HKU1, NL63, OC43) allows us to determine epitopes that can elicit SARS-CoV-2 specific and cross-reactive T cell responses. As more data from highthroughput tetramer staining (26)(27)(28) and T cell functional screens (29) become available, the models provided here can serve as a toehold for understanding the structural basis of immunogenicity, with actionable value for the development of tetramer-based diagnostics and peptide vaccines to monitor and combat the disease.

Identification of SARS-CoV-2 Peptide Epitopes
The SARS-CoV-2 protein sequences (https://www.ncbi.nlm.nih. gov/nuccore/NC_045512.2) were obtained from NCBI and used to generate all possible peptides of lengths 8, 9, and 10 (9,631 8, 9,621 9, and 9,611 10 mer peptides). We used NetMHCpan-4.0 (30) to derive binding scores to HLA-A * 02:01, and retained only peptides classified as strong or weak binders [selected using the default percentile (%) rank cut-off values]. The binding classification was performed using eluted ligand likelihood predictions. While in this study we use NetMHCpan-4.0 predictions to select candidate epitopes for structure modeling, our workflow is fully compatible with any alternative epitope prediction method.

Selection of PDB Templates
To model SARS-CoV-2/HLA-A * 02:01 antigens, we identified 3D structures from the PDB that can be used as templates for comparative modeling. First, we selected all HLA-A02 X-ray structures that are below 3.5 Å resolution and retained only those that have 100% identity to the HLA-A * 02:01 heavy chain sequence (residues 1-180). We obtained 236 template structures bound to epitopes of lengths from 8 to 15 residues (of which 1 is an 8 mer, 165 are 9 mers, and 61 are 10 mers). For each SARS-CoV-2 target peptide of (i) length 8, we selected a set of candidate templates of lengths 8-9 by matching the target peptide anchor positions (P1 and P8 in the 8 mer, P2, and P9 in the 9 mer templates), and (ii) lengths 9 and 10, we selected candidate templates of the same peptide length, by matching the target peptide anchor positions (P2 and P9/P10) to each peptide in the template structures. Then, we used the BLOSUM62 (31) substitution matrix to score all remaining positions in the pairwise alignment of the target/template peptide sequences, and the structure with the top score was selected as a template for modeling. For target peptides where we found no templates which matched both peptide anchors, we scored all positions in the pairwise alignment and selected the top scoring template for modeling.

RosettaMHC Modeling Framework and Database
A detailed description and commands to execute our workflow is available in Supplementary Methods. RosettaMHC (manuscript in preparation) is a comparative modeling protocol developed using PyRosetta (32) to model pMHC-I complexes. The program accepts as input a list of peptide sequences, an HLA allele definition and a template PDB file (selected as described in the previous step). To minimize "noise" in the simulation from regions of the MHC-I structure that do not contribute to peptide binding, only the α 1 and α 2 domains are considered in all steps. For each peptide, a full alignment between the target and template peptide/MHC sequences is performed using Clustal Omega (33). The alignment is used as input to Rosetta's threading protocol. From the threaded model, all residues in the MHC-I groove that are within a heavy-atom distance of 3.5 Å from the peptide are subjected to 10 independent all-atom refinement simulations using the FastRelax method (34) and a custom movemap file. Binding energies (dG_separated score terms) are extracted from the refined structures using the interface analyzer protocol. The top three models are selected based on binding energies, and used to compute an average energy for each peptide in the input list. RosettaMHC models of SARS-CoV-2/HLA-A * 02:01 epitopes are made available through an online database (see Data Availability). The website that hosts our database was constructed using the Django web framework.
Electrostatic Classification of SARS-CoV-2 Peptide/HLA-A * 02:01 Complexes To perform a structure-based classification of SARS-CoV-2 peptide/HLA-A * 02:01 complexes according to their TCR interaction features and compare their surfaces to homologous peptides from common cold coronavirus strains, we (i) aligned respective protein sequences (specifically, orf1ab, membrane, spike, envelope, and nucleocapsid proteins) from all strains using Clustal Omega, (ii) extracted 395 (out of 439) epitopes of length 9 from common cold coronavirus strains based on sequence homology with SARS-CoV-2 binders predicted by NetMHCpan-4.0 using default %rank cut-off values (44/439 SARS-CoV-2 epitope sequences are from proteins not considered), (iii) filtered out 141 epitopes containing insertions and deletions in the sequence alignment and those that do not have homologous sequences across all strains of common cold coronaviruses, (iv) modeled structures of the remaining 254 peptide/HLA-A * 02:01 complexes from each strain using RosettaMHC, and (v) performed a comparison of surface electrostatic potentials between each SARS-CoV-2 pMHC structure and its corresponding common cold coronavirus homologs using multipipsa4.0.2 (35). The multipipsa4.0.2 software applies the Adaptive Poisson-Boltzmann Solver (or APBS) (36) method to first compute electrostatic potentials, and then compares the potentials using the Protein Interaction Property Similarity Analysis (or PIPSA) protocol (37). The side chains of the modeled complexes are protonated using PROPKA (38), followed by assignment of atom charges and radii using the Amber force field (39) at a pH of 7.2. The electrostatic potentials of the structures are calculated by solving a linear Poisson-Boltzmann equation for 129 points on a cubic grid using 150 mM ionic strength at 298. 15 Kelvin with protein dielectric of 1.0, and solvent dielectric of 78 using a probe radius of 1.4 Å. Next, the PIPSA protocol compares the electrostatic potentials quantitatively using grid points on the superimposed regions (regions are at a distance of σ from the van der Waals surface and are of thickness δ) around the pMHC complexes. The similarity between any two electrostatic surfaces is captured by the Hodgkin similarity index (HSI, ranges from −1 to 1, where −1, 0, and 1 indicate electrostatic anticorrelation, no correlation, and electrostatic identity respectively) (40), which is converted into a distance measure, D (D = √ (2 − 2HSI)), that assigns values between 0 and 2 (0: identity, 1: no correlation and 2: anticorrelation). For our study, we have used 4 Å thickness (δ) and a distance of 3 Å from the molecular surface (σ) (41).

Template Identification and Structure Modeling Using RosettaMHC
Our full workflow for template identification and structure modeling is outlined in Figure 1A, with a flowchart shown in Figure 1B. To predict all possible peptides expressed by SARS-CoV-2 that can bind to HLA-A * 02:01, we used a recently annotated version of all open reading frames (ORFs) in the viral genome from NCBI (42), made available through the UCSC genome browser (43). We used 8-, 9-, and 10-residue sliding windows to scan all protein sequences, since these are the optimum peptide lengths for binding to the HLA-A * 02:01 groove (44). The limited availability of templates for peptides of lengths >10 (9 total in the PDB) suggests that such peptides are likely to represent a small fraction of the displayed peptide repertoire, and were not considered here. While spliced peptide epitopes (45) were not considered in the current study, this set can be added to our workflow in future studies. NetMHCpan-4.0 (30) predicted 54 8, 439 9, and 256 10 mer epitopes that can bind to HLA-A * 02:01 (classified as both weak and strong binders), with the majority of peptides originating from the nsp3 protein encoded by orf1ab (NCBI Reference YP_009724389.1) (Supplementary Figure 2). A sequence analysis of all 9 mer peptides predicted by NetMHCpan-4.0 to peptides bound to HLA-A02 structures in PDB showed similar motifs (Supplementary Figure 3). In general, binders predicted by NetMHCpan-4.0 exhibit higher sequence similarity to peptides present in the PDB HLA-A02 structures, relative to non-binders (Supplementary Figure 4).
To further validate the NetMHCpan-4.0 predictions and to derive plausible 3D models of the peptide/HLA-A * 02:01 complexes, we used a structure-guided approach, RosettaMHC, which aims to derive an accurate fitness score for each peptide in the HLA-A * 02:01 binding groove using an annotated database of high-resolution structures and Rosetta's all-atom energy function (46). RosettaMHC leverages a database of 236 HLA-A * 02:01 Xray structures, to find the closest match to each target epitope predicted from the SARS-CoV-2 proteome. Here, the range of available structures in the PDB provides a natural sampling of different possible backbone conformations within the highly restrictive environment of the peptide-binding groove, as shown by a structural alignment of all 9 mer templates (Figure 2A). To identify the best template for modeling of each target peptide, we use sequence matching criteria which first consider the peptide anchors (positions P1/P2/P2 and P8/P9/P10 for 8/9/10 mer epitopes), followed by similarity of the full alignment between the template and target peptide sequences. To demonstrate the accuracy of RosettaMHC, we performed benchmark calculations using a non-redundant set of 90 9-mer peptide/HLA-A02 complex structures. Each epitope was modeled from the closest template with identical anchor residues present in the benchmark set, while homologous peptide sequences were excluded from Protein sequences from the annotated SARS-CoV-2 proteome are used to generate peptide epitopes with a sliding window covering all frames of a fixed length (9,631 8, 9,621 9, and 9,611 10 mer possible peptides). Candidate peptides are first filtered by NetMHCpan-4.0 (30) to identify all predicted strong and weak binders (54 8, 439 9, and 256 10 mer epitopes). For rapid template matching and structure modeling, we use a local database of 236 HLA-A*02:01 X-ray structures with resolution below 3.5 Å from the Protein Data Bank (21). Each candidate peptide is scanned against all peptide sequences of the same length in the database, and the top-scoring template is used to guide the RosettaMHC comparative modeling protocol and to compute a binding energy. template selection. From these results, we find that (i) the binding energies of RosettaMHC models fall within the distribution of the native PDB templates (Supplementary Figure 5), and (ii) models generated for 75 and 98% of peptides are within 1.5 and 2 Å backbone heavy-atom RMSD from their native X-ray structures, respectively (Supplementary Figure 5). These results suggest that RosettaMHC can provide accurate models of peptide/HLA-A02 complexes for a range of peptide sequences using a simple threading approach which takes into account the peptide anchor positions as the main criterion for identifying the closest template in the database.
The template assignment statistics for the six different classes of SARS-CoV-2 epitopes in our set are shown in Figure 2B. We find that we can cover the entire set of 749 predicted 8, 9, and 10residue binders using a subset of 123 HLA-A * 02:01 templates in our annotated database of PDB-derived structures (Figure 2C). Each target peptide sequence is then threaded onto the backbone of its best identified template, followed by all-atom refinement of the side chain and backbone degrees of freedom using Rosetta's Ref2015 energy function (46), and binding energy calculation.

RosettaMHC Models Recapitulate Features of High-Resolution X-Ray Structures
The sequence logos derived from 9 and 10 mer peptides with good structural complementarity to the HLA-A * 02:01 groove according to Rosetta's binding energy (see below) adhere to the canonical binding motif, with a preference for hydrophobic, methyl-bearing side chains at the peptide anchor residues P2 and P9/P10 (Figure 3A, Supplementary Figure 6A). In addition, the sequences of high-affinity binders, show preferences for specific amino acids at positions P1, P3, P6/P7, P7/P8 for 9 and 10 mer peptides, respectively ( Figure 3A, Supplementary Figure 6A). These preferences are recapitulated in representative 9 and 10 mer models of the two top binders in our set as ranked by Rosetta's energy ( Figure 3B, Supplementary Figure 6B), corresponding to epitopes TMADLVYAL and FLFVAAIFYL derived from the RNA polymerase and nsp4 proteins, respectively, both encoded by orf1ab. In accordance with features seen in high-resolution structures of HLA-A * 02:01restricted epitopes, the peptides adopt an extended, bulged backbone conformation. The free N-terminus of both peptides is stabilized by a network of polar contacts with Tyr 7, Tyr 159, Tyr 171, and Glu 63 in the A-and B-pockets of the HLA-A * 02:01 groove. The Met (9 mer) or Leu (10 mer) side chain of P2 is buried in a B-pocket hydrophobic cleft formed by Met 45 and Val 67. Equivalently, the C-terminus is coordinated through polar contacts with Asp 77 and Lys 145 from opposite sides of the groove, with the Leu P9/P10 anchor nestled in the F-pocket defined by the side chains of Leu 81, Tyr 116, Tyr 123, and Trp 147. Residues P3-P8 form a series of backbone and side chain contacts with pockets C, D, and E, while most backbone amide and carbonyl groups form hydrogen bonds with the side chains of residues lining the MHC-I groove. These high-resolution structural features are consistent across low-energy models of unrelated target peptides in our input set, suggesting that, when provided with a large set of input templates, a combined threading and side chain optimization protocol can derive accurate models (within 2 Å RMSD), as also shown in our benchmark calculations.

The Rosetta Energy Function Generally Distinguishes High-Affinity Peptides
To evaluate the accuracy of our models and fitness of each peptide within the HLA-A * 02:01 binding groove, we computed Rosetta all-atom binding energies across all complexes modeled for different peptide sets. High binding energies can be used as an additional metric to filter low-affinity peptides in the NetMHCpan-4.0 predictions, with the caveat that high energies can be also due to incomplete optimization of the Rosetta energy function as a result of significant deviations between the target and template backbone conformations, not captured by our protocol. We performed 10 independent calculations for each peptide which may allow Rosetta's optimization protocol to sample slight changes in peptide backbone (up to 1 Å from starting structure), and the 3 lower-energy models were selected as the final ensemble and used to compute an average binding energy. The results for all 9 and 10 mer peptides are summarized in Figures 3C,D, Supplementary Figures 6C,D while additional results for 8 mers are provided through our web-interface and outlined in Supplementary Table 1. As a positive binder reference set, we used the binding energies of the idealized and relaxed PDB templates, which are at a local minimum of the Rosetta scoring function. As a reference set for sub-optimal binders, we modeled structures of peptides from SARS-CoV-2 proteome that are classified as poor binders according to NetMHCpan-4.0 (highest %rank values). , and 100 SARS-CoV-2 peptides classified as poor binders by NetMHCpan-4.0 modeled using the PDB templates and used as a reference set for sub-optimal binders (Poor binders; pink). The binding energies of models generated for 28 confirmed SARS T cell epitopes from the IEDB and ViPR (47)(48)(49) are indicated by circles at the bottom of the plot. Red circles indicate epitopes that lie within the distribution of refined PDB templates. (D) Box plots showing distribution of average binding energies for 93 PDB templates, 100 sub-optimal SARS-CoV-2 peptides, 28 confirmed epitopes (47)(48)(49) and RosettaMHC models for 164 strong (SB) and 275 weak (WB) binder 9 mer epitopes predicted from the SARS-CoV-2 proteome using NetMHCpan-4.0 (30). An unpaired Mann-Whitney U-test was performed for relevant pairs of distributions and their statistical significance described by the p-values (where, p < 0.1 is considered statistically significant) are (i) PDB templates and strong binders: p < 0.15 (ii) PDB templates and confirmed binders: p < 0.39 (iii) PDB templates and weak binders: p < 0.00002 (iv) confirmed epitopes and strong binders: p < 0.39, and (v) confirmed epitopes and weak binders: p < 0.02, and (vi) strong and weak binders: p < 0.00004, are shown inside the plot. The sequence logo was generated using Seq2Logo (50).
We observe a significant, favorable (∼−15 kcal/mol) energy gap between the average binding energies computed from the refined PDB templates relative to models obtained for poor binder peptides. The binding energies for all predicted 9 mer and 10 mer binders show a significant overlap with the refined PDB template energies (Figure 3C,   Supplementary Figure 6C). Comparison of energy distributions of epitopes that are classified as strong vs. weak binders by NetMHCpan-4.0 shows a moderate bias toward lower binding energies for the strong binders and a larger spread in energies for weak binders, likely due to suboptimal residues at the P2 and P9/P10 anchor positions (Figure 3D,  Supplementary Figure 6D, with a significance level, p < 0.1 between strong and weak binders for both 9 and 10mers). As an intendent positive set, we also modeled 28 9 and 5 10 mer peptides that are homologous to peptides in the SARS viral genome and have been previously reported to bind HLA-A * 02:01 in the IEDB and ViPR (12,(47)(48)(49) databases (Supplementary Table 2). Inspection of Rosetta binding energies derived from models in this set shows a  (58). similar distribution to the epitopes predicted by NetMHCpan-4.0, with the energies of all the peptides falling well within the distribution of the refined PDB templates (red dots in Figure 3C, Supplementary Figure 6C). Finally, to enrich our set of potential binder peptides, we used a higher NetMHCpan-4.0 cut-off value and modeled structures for 627 additional SARS-CoV-2 epitopes (Supplementary Table 3,  Supplementary Figure 7). The analysis of these models is discussed in the Supplementary Results section.
Comparison of Surface Features of Peptide/HLA-A * 02:01 Models Relative to Homologous Peptides From Common Cold Coronavirus Strains Visualization of our models through an interactive online interface provides direct information on SARS-CoV-2 peptide residues that are bulging out of the MHC-I groove, and are therefore accessible to interactions with complementaritydetermining regions (CDRs) of T cell receptors (TCRs). Given that αβ TCRs generally employ a diagonal binding mode to engage pMHC-I antigens where the CDR3α and CDR3β TCR loops form direct contacts with key peptide residues (51,52), knowledge of the surface features for different epitopes allows us to interpret sequence variability between different viral strains. For other important antigens with known structures in the PDB, such features can be derived from an annotated database connecting pMHC-I/TCR co-crystal structures with biophysical binding data (53), and were recently employed in an artificial neural network approach to predict the immunogenicity of different HLA-A * 02:01 bound peptides in the context of tumor neoantigen display (54). The electrostatic compatibility between self vs. foreign HLA surfaces has been shown to define antibody alloimmune responses (41). Given that antibodies and TCRs use a common fold and similar binding mode to engage pMHC-I molecules (51), surface electrostatic features also play an important role in recognition of peptide/HLA surfaces by their cognate TCRs in the context of SARS-CoV-2 infection.
T-cell responses to megapools of viral peptides have been observed in individuals not exposed to SARS-CoV-2, thus providing evidence for cross-reactivity of T cells with similar epitopes expressed by homologous coronavirus strains (55,56). To characterize SARS-CoV-2 specific and cross-reactive epitopes (according to this definition), we obtained homologous peptide sequences from four human common cold coronavirus strains with annotated genomes (229E, HKU1, NL63, and OC43), for all 395 predicted SARS-CoV-2 strong binders of length 9. From this set, 141 peptides are exclusive to SARS-CoV-2, since there are no homologous sequences present in the four common coronavirus strains considered here, or the corresponding sequences in the other four genomes have insertions or deletions (Supplementary Table 5). To identify cross-reactivity according to molecular surface features, we first used RosettaMHC to model peptide/HLA-A * 02:01 complex structures for all homologous peptides, in addition to our previously described models for SARS-CoV-2 ( Figure 4A).
We then computed surface electrostatic potentials for each model using APBS (36), followed by a pairwise comparison of potentials computed for the four homologous structures relative to each SARS-CoV-2 peptide using PIPSA (37), which provides four distance scores for each peptide (Figure 4B). From the examination of similarity scores of models, we found that (i) peptide SLAIDAYPL from orf1ab has conserved sequence and surface features across all coronavirus strains (distance score = 0), and therefore T cells specific for this epitope should be highly cross-reactive across different strains (ii) epitopes AIMTRCLAV, YLGGMSYYC, FVDGVPFVV, RIIPARARV, RILGAGCFV, RLANECAQV, SVFNICQAV, IFVDGVPFV, and GVAPGTAVL from orf1ab are conserved with one or more common strains, and are putatively cross-reactive (distance score ≤ 0.3) (Figure 4B, Supplementary Table 4), and (iii) there is no apparent correlation between SARS-CoV-2 and common cold coronavirus pMHC surface features for ALLSDLQDL (orf1ab), QLNRALTGI (spike), MLAKALRKV (orf1ab), and KIYSKHTPI (spike) epitopes (distance score > 0.8) (Supplementary Table 4). In particular, peptide KIYSKHTPI shows the most dissimilar electrostatic surface to the homologous strains among all high-affinity binders, suggesting that this epitope can be used to detect exclusive TCRs to SARS-CoV-2 ( Figure 4C). The six epitopes in our set that are known to induce CD8 T cell responses in COVID-19 patients and healthy donors have distance scores ranging from 0.5 to 0.9 (Supplementary Table 6) (24,25), suggesting that their TCRs can cross-react with homologous epitopes from common cold coronaviruses.
Electrostatic potentials calculated from our models further allow us to compare distinct surfaces for TCR recognition between different high-affinity epitopes, as demonstrated for the four top-scoring models by Rosetta binding energy (Supplementary Figure 8A). Here, PIPSA analyses of electrostatic potentials of these models allowed us to cluster them into two groups, (i) TMADLVYAL and NLIDSYFVV, and (ii) KLWAQVCQL and FLAFVVFLL, where the surface exposed residues at P2-P8 positions of the (i) and (ii) groups exhibit moderately negative and positive charges, respectively (Supplementary Figure 8). Full classification and ranking of all binders in our set on the basis of their molecular surface features would further enable the selection of a diverse panel of peptides for high-throughput pMHC tetramer library generation which can be used to identify immunodominant epitopes (28). Tetramer analysis of T cells from COVID-19 patients, recovered individuals, and healthy donors can help identify critical gaps in the T cell repertoire of highrisk groups, toward the design of epitope DNA strings for vaccine development.

DATA AVAILABILITY STATEMENT
An online web-interface for visualization and download of all the models is available at: https://rosettamhc.chemistry.ucsc.edu. The RosettaMHC source code is available at https://github.com/ snerligit/mhc-pep-threader. Rosetta binding energies for all 749 HLA A * 02:01-restricted peptides in our set are provided in Supplementary Table 1. AUTHOR CONTRIBUTIONS SN and NS conceptualized and designed the research. SN performed Rosetta comparative modeling simulations and binding energy calculations, analyzed, and interpreted data. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported through Grants from NIAID (5R01AI143997) and NIGMS (5R35GM125034).