A bioinformatic analysis of T-cell epitope diversity in SARS-CoV-2 variants: association with COVID-19 clinical severity in the United States population

Long-term immunity against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) requires the identification of T-cell epitopes affecting host immunogenicity. In this computational study, we explored the CD8+ epitope diversity estimated in 27 of the most common HLA-A and HLA-B alleles, representing most of the United States population. Analysis of 16 SARS-CoV-2 variants [B.1, Alpha (B.1.1.7), five Delta (AY.100, AY.25, AY.3, AY.3.1, AY.44), and nine Omicron (BA.1, BA.1.1, BA.2, BA.4, BA.5, BQ.1, BQ.1.1, XBB.1, XBB.1.5)] in analyzed MHC class I alleles revealed that SARS-CoV-2 CD8+ epitope conservation was estimated at 87.6%–96.5% in spike (S), 92.5%–99.6% in membrane (M), and 94.6%–99% in nucleocapsid (N). As the virus mutated, an increasing proportion of S epitopes experienced reduced predicted binding affinity: 70% of Omicron BQ.1-XBB.1.5 S epitopes experienced decreased predicted binding, as compared with ~3% and ~15% in the earlier strains Delta AY.100–AY.44 and Omicron BA.1–BA.5, respectively. Additionally, we identified several novel candidate HLA alleles that may be more susceptible to severe disease, notably HLA-A*32:01, HLA-A*26:01, and HLA-B*53:01, and relatively protected from disease, such as HLA-A*31:01, HLA-B*40:01, HLA-B*44:03, and HLA-B*57:01. Our findings support the hypothesis that viral genetic variation affecting CD8 T-cell epitope immunogenicity contributes to determining the clinical severity of acute COVID-19. Achieving long-term COVID-19 immunity will require an understanding of the relationship between T cells, SARS-CoV-2 variants, and host MHC class I genetics. This project is one of the first to explore the SARS-CoV-2 CD8+ epitope diversity that putatively impacts much of the United States population.


GRAPHICAL ABSTRACT 1 Introduction
Since the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in late 2019, the scientific community rapidly developed several therapeutic monoclonal antibodies and mRNA vaccines.Current vaccines elicit a shortlived humoral response against the SARS-CoV-2 spike protein, lasting an average of 3-4 months and requiring periodic boosters (1)(2)(3).Intriguingly, coronavirus disease 2019 (COVID-19) patients lacking humoral immune response due to treatment of hematological malignancies did not exhibit increased disease severity or mortality, suggesting that B-cell-mediated immunity may not be sufficient to confer long-term immunity against SARS-CoV-2 (4-6).In contrast, convalescent macaque models depleted of CD8 + T cells exhibited loss of host protection following reinfection, highlighting the importance of T-cell immunity in COVID-19 clinical presentation (7).
Cytotoxic CD8 + T cells are essential for the clearance of intracellular viral pathogens, such as SARS-CoV-2 (8)(9)(10).T-cell activation occurs through T-cell receptors binding to T-cell epitopes, described as peptide antigens bound by a human heterodimeric glycoprotein, known as a major histocompatibility complex (MHC).CD8 + T-cell antigen recognition is determined by MHC class I genes, which control antigenic peptide presentation on MHC class I molecules (11).Unlike the invariant b 2 -microglobulin subunit, the a subunit of MHC class I proteins is highly polymorphic, with the most polymorphic genes being human leukocyte antigens (HLA) HLA-A, HLA-B, and HLA-C; these subunits have an estimated 1,939, 2,577, and 1,595 allotypes, respectively (11,12).Therefore, the considerable individual diversity generated from HLA polymorphism is a proposed explanation for the differential clinical severity of COVID-19 variants seen between individuals, since the epitope repertoire from one patient is likely to be substantially different from the next (13)(14)(15).Select studies have sequenced the HLA alleles and SARS-CoV-2 T-cell epitopes of convalescent patients (4,(15)(16)(17)(18).However, current research on T-cell response to COVID-19, especially analysis exploring the relationship between HLA molecules and viral CD8 + epitopes on a population/ epidemiological level, remains limited.Previously published research has already identified several HLA alleles associated with increased (HLA-A*25:01, HLA-B*46:01, and HLA-B*27:07) or decreased (HLA-B*07:02, HLA-B*15:03, and HLA-B*51:01) clinical severity in convalescent patients (Table 1) (13,(19)(20)(21), but none have explored the entire epitope repertoire of variants of concern (VOC) gene products in the most common HLA allotypes.
SARS-CoV-2 VOC and subvariants accumulate mutations in the genes for their protein products, such as spike, membrane, and nucleocapsid proteins, potentially affecting the binding affinity and immunogenicity of T-cell epitopes.These mutations and the resulting alterations to MHC-I binding affinity may influence COVID-19 clinical characteristics, such as viral transmissibility, protection against neutralizing antibodies, risks of reinfection, and disease severity (15,22,23), as well as the risk of post-acute sequelae of COVID-19 infection (PASC or long COVID) (24)(25)(26).Given that an estimated 3% of CD8 + T-cell epitopes are affected by mutations conferred in various VOCs, certain HLA alleles may have more (or less) propensity to be strongly affected by mutations in specific VOCs (15).This manuscript distinguishes SARS-CoV-2 variant-specific CD8 + T-cell epitopes of spike, membrane, and nucleocapsid gene products for 27 of the most frequent HLA-A and HLA-B alleles.The purpose of this computational study was to model the immunogenic effects and clinical severity of SARS-CoV-2 variants in the most common MHC class I alleles in the United States population.Our bioinformatics approach integrates the use of Ensembl's COVID-19 genome browser, Immune Epitope Database and Analysis Resource tool TepiTool, and ExPASy translate tool (27-29).

SARS-CoV-2 viral genome sequencing
Specimens were received by the LSUHSC Precision Medicine Laboratory from various collection sites representing the Louisiana patient population for public health screening purposes.RNA extraction was performed using the Zymo Quick DNA/RNA Viral MagBead kit automated on a Tecan Fluent liquid handling workstation.The resulting viral RNA was used for library generation and next-generation sequencing using the Illumina COVID-Seq workflow as per the manufacturer's instructions.Libraries were pooled (up to 192 samples/run) and loaded on an Illumina NextSeq550Dx in RUO mode, with 74 cycles of paired-end sequencing using a 150-cycle mid output reagent cartridge and flow cell.Initial data processing and QC was performed using the DRAGEN COVID-Seq Test (EUA) v.1.2.2 application on the cloud-based BaseSpace sequence analysis hub hosted by Illumina.BaseSpace project share links were provided to BioInfoExperts, LLC for sequence processing and analysis in FoxSeq software (www.foxseqllc.com).Briefly, sequences were quality-filtered using Trimmomatic (30) and mapped to the reference using Bowtie2 (31).Variant calling and consensus sequence generation were performed using bcftools (32).Nucleotides at any position were only assigned if the sequencing depth was >200 and the allele frequency was 80%.Lineages were assigned using pangolin (https:// cov-lineages.org).Consensus sequences were uploaded to GISAID and NCBI SARS-CoV-2 viral genome data repositories.

TepiTool IEDB analysis of coronavirus T-cell epitopes
The prediction of MHC-I epitope binding to variant-specific S, M, and N was generated through the Immune Epitope Database and Analysis Resource (IEDB) (RRID: SCR_006604), via TepiTool utilizing the IEDB-recommended default prediction (33).Spike, membrane, and nucleocapsid were selected because, for the most part, spontaneous CD8 + responses against SARS-CoV-2 T-cell epitopes target the proteins they encode (16).A panel of 27 most frequent A and B alleles was used for MHC-I epitope binding analysis.The specific alleles  IEDB's default prediction method reflects consensus across ANN, SMM, and CombLib predictors and was used to select peptides with predicted consensus percentile ranks ≤1 (28).Low scores correspond to high predicted affinities.

Epitope differences between 16 variants of spike, membrane, and nucleocapsid when compared against the ancestral Wuhan strain
We generated predictive estimates of MHC-I epitopes to variant-specific S, M, and N using the IEDB Resource TepiTool, utilizing the IEDB-recommended default prediction.A panel of 27 most frequent A and B alleles were used for MHC-I epitope binding analysis, which encompassed 16 37), the 16 HLA-A alleles make up 92.4% of the population in Caucasians, 69.2% in African Americans, 74% in Asian, and 83% in Hispanics.Similarly, the 11 HLA-B alleles represent 67.7% of Caucasians, 44.8% of African Americans, 39.2% of Asians, and 39.8% of Hispanics.CD8 + epitope repertoires, comprising the 27 most common HLA-A and HLA-B alleles, were generated for 16 SARS-CoV-2 variants and the ancestral Wuhan strain.The original S, M, and N protein products resulted in a repertoire of 1,081, 237, and 289 predicted CD8 + epitopes, respectively.From the 16 SARS-CoV-2 variant spike proteins, we identified a range of 1,077-1,115 CD8 + T-cell epitopes.Variant-specific membrane epitopes ranged between 236 and 241, with nucleocapsid CD8 + repertoires comprising 289-298 epitopes for the 27 HLA alleles analyzed.
Wuhan S, M, and N repertoires were compared against 16 variants (B.1, Alpha, five Delta, and nine Omicron) to identify epitopes that were lost, gained, or altered in estimated HLA binding affinity between variants (Figure 2).In general, a balanced number of epitopes were lost and gained for all variants; however, BA.1.1M, BA.4 N, B.1.1.7 S, and BQ.1.1 S repertoires sustained greater epitope loss than gain (Figures 2A-C, 3, bottom), which may contribute to explaining the increased transmission and breakthrough cases seen in these subvariants (39,40).Additionally, spike epitopes in the early variants (B.1 and B.1.1.7)and Omicron VOCs experienced a greater number of epitopes predicted to have reduced binding affinity than increased affinity.

Spike epitopes were the least conserved, compared with membrane and nucleocapsid
Among the three viral proteins we examined, spike epitopes were least conserved, with S, M, and N epitopes experiencing 87.6%-99.8%,92.5%-100%, and 94.6%-100% conservation, respectively.Across all the variants studied, Omicron BQ.1.1 S epitopes experienced the most loss, with 138/1,115 = 12.4% affected, while strain B.1 only lost 2 epitopes out of 1,081 total (0.0019%) compared with the original Wuhan strain.Among the 14 Delta and Omicron spike proteins, the largest area of conservation, defined as a region experiencing no epitope loss or gain, was found between amino acids (AA) 987-1,205 within the 1273 AA protein (Figure 4).As seen in the two other protein products, S epitopes that were lost were generally replaced by alternate epitopes that were gained in the same regions.However, all Delta variants lost two epitopes (VSSQCNLR and SQCVNLRTR), affecting HLA-A*31:01 and HLA-A*68:01, without experiencing epitope gains (Supplementary Figure 2).These epitopes spanned the AA 11-21 region, affecting the tail end of the hydrophobic signal peptide and the S1 subunit in the S protein.
As shown in Figure 2A    Spike epitopes gained (top) and lost (bottom) when compared against the ancestral Wuhan strain.Colored regions and numbers refer to amino acid locations of predicted epitope alterations, with red indicating changes seen in Omicron, blue for Delta, and yellow for epitopes affected in all 16 variants.Protein characteristics were generated using UniProt's Feature Viewer (38). of epitopes were predicted to have reduced HLA binding, with 70% of Omicron BQ.1-XBB.1.5S epitope repertoires experiencing decreased predicted binding affinity (as compared with the roughly 3% and 15% affected in Delta AY.100-AY.44 and Omicron BA.1-BA.5 variants, respectively) (Figures 1, 2A).When compared with the ancestral Wuhan spike, XBB.1 S epitopes experienced the greatest decrease in predicted immunogenicity, with 64.9% (720/1,109 epitopes; Figure 2A) of its CD8-T-cell repertoire demonstrating a reduction in estimated binding affinity, while only 37 epitopes (3.3%) were estimated to have increased HLA binding.Additionally, all 27 HLA-A and HLA-B alleles had decreased predicted binding affinity for B.1.1.7 and BA.1-XBB.1.5spike epitopes.

Membrane epitopes were most conserved with balanced gain and loss maintained in all variants
Membrane epitopes sustained minimal alterations, with BA.1.1 losing the most (18/241 = 7.5%) and AY.100-AY.44 losing the least (1/237 = 0.04%) epitopes between Delta and Omicron variants (Figures 1, 2B).Alpha membrane epitopes were conserved unaltered from the original Wuhan variant sequenced.In general, M epitope loss was accompanied by balanced epitope gains across all VOCs, with similar patterns seen between epitopes with altered predicted binding affinity (Figure 2A).For all Delta variants, HLA-A*68:02 lost the ability to bind epitope TAMACLVGL, while HLA-B*51:01 gained IAIAMCLV between AA 80 and 90.Likewise, for the nine Omicron variants, two membrane segments (AA 12-27 and AA 55-71) experienced balanced epitope loss and gain (Figure 3).

Unbalanced nucleocapsid epitope gain/loss and alterations in predicted binding, with more epitopes experiencing decreased predicted binding
Like M, N epitopes were highly conserved, with the greatest loss seen in BA.4 (16/298 = 5.4%) and conservation in AY.100/AY.25/AY.44 (3/290 = 1%) among Omicron and Delta variants.N epitopes experienced no loss or gain between AA 66-194 and AA 237-401 in all VOCs.Although nucleocapsid epitopes experienced numerically balanced gain and loss across VOCs (Figure 2C), further analysis revealed AA 192-209 to be the only region where epitopes were both gained and lost, including the Alpha variant (Figure 5).Within this region, more HLAs sustained gain/loss in Omicron (6/27 HLA , with zero epitopes gaining predicted binding (Figure 2C).BA.4 had 40 unique peptides a ff e c t i n g 1 6 / 1 6 H L A -A a n d 8 / 1 1 H L A -B a l l e l e s (Supplementary Figure 1).

Discussion
Select studies have previously sequenced the HLA allele and viral epitopes of convalescent patients (16,51,52), but to our knowledge, none have explored the entire epitope repertoire of multiple SARS-CoV-2 variants with respect to the most common HLA allotypes.Although epitope screening has been conducted in cell lines (53, 54), no analysis of the COVID-19 peptidome exists on a population/epidemiological level.Therefore, our team utilized a computational approach aimed to model the immunogenic effects and clinical severity of SARS-CoV-2 variants in the most common MHC class I alleles comprising the United States population.Our bioinformatics analysis is consistent with the percentages of CD8 + epitope conservation (S: 87.6%-96.5%,M: 92.5%-99.6%,N: 94.6%-99%) found by Tarke et al. (15) (97%).As the virus mutated, an increasing proportion of spike epitopes experienced reduced predicted HLA binding, with 70% of Omicron S epitope repertoires experiencing decreased predicted HLA binding affinity (as compared with the roughly 3% and 15% affected in Delta AY.100-AY.44 and Omicron BA.1-BA.5 variants, respectively) (Figures 1, 2A).The changes experienced by spike CD8 + epitopes highlight both the remarkable structural plasticity of the S protein and the selective pressures experienced by its gene, particularly following the widespread availability of vaccines in mid-2021 (Figure 1).Our findings suggest that viral genetic variation affecting CD8 T-cell epitope immunogenicity contributes to determining the clinical severity of acute COVID-19.
Our findings support the hypothesis that long-lasting immunity against SARS-CoV-2 variants will be difficult to achieve through vaccines based solely on the spike protein and using neutralizing antibodies as an efficacy endpoint.One strategy to achieve longterm immunity against COVID-19 is the development of T-cell vaccines (9,55).When designing such vaccines, it is important that the epitopes selected are as invariant as possible and cover the maximum number of HLA haplotypes with even affinity distribution between HLA alleles (56).Our research identified several predicted epitopes that were gained and conserved between variants (Figures 6, 7), including highly conserved nucleocapsid (n = 2) and membrane (n = 3) peptides predicted to elicit immune response through multiple HLA alleles (Figure 7).Additionally, the CD8+ T cell epitopes in this manuscript have been evidenced in previously published datasets (Table 5).To develop a pan-coronavirus vaccine, epitopes affecting conserved protein product regions should also be considered, such as AA 987-1205 in spike, AA 132-222 in membrane, and the AA 66-194 and 210-401 regions in nucleocapsid described in our findings.Lastly, considering that several HLA haplotypes, including HLA-A*11:01, HLA-A*24:02, and HLA-B*08:01, are associated with COVIDinduced autoimmune disease (44), epitopes affecting these alleles must be carefully considered to minimize the risk of autoimmune adverse effects.In-silico and in-vitro experiments will be needed to confirm the bioinformatically predicted epitope gains and remove promiscuous peptides.
An alternative path to prevent or treat severe COVID-19 immunity is the development of personalized vaccines and/or treatment strategies.This requires the identification of haplotypes at risk of or protected from severe illness, which can be added to nongenetic risk factors to estimate the overall risk of severe outcomes.Our findings are significant because this study is one of the first to explore SARS-CoV-2 CD8 + epitope diversity in the context of HLA alleles found in most of the United States population.Our predicted clinical severity, X total (Equation 2), is consistent with previously published findings (Tables 2-4 2,  3).All referenced clinical associations were consistent with our predicted estimates, except HLA-A*11:01, which was reported to have severe disease and COVID-induced autoimmune effects despite a low X total (−19), and HLA-A*01:01, which was reported to have severe infection in Russia despite a low X total (−17).The inconsistency of predicted/reported severity seen in HLA-A*11:01 may be explained through a combination of factors, including an association with COVID-induced autoimmune disease (42)(43)(44) and limited availability of CD8 + hepatitis B epitopes, with some reports (66) suggesting that chronic hepatitis B patients with this allele had less than 10% of known HBV epitopes.Therefore, with these findings  being considered (66)(67)(68), HLA-A*11:01 patients with chronic, untreated, or poorly managed hepatitis B co-infection may be at greater risk of experiencing severe COVID-19 infection, even if the allele alone may not confer an increased risk of clinical severity.It is also important to be mindful of the considerable diversity generated from HLA polymorphism.A patient heterozygous for both HLA-A and HLA-B loci would have to account for the predicted clinical severity, X total , of all four haplotypes to determine a true net predicted  effect (not including the other MHC class I loci, -C).Therefore, clinical studies will be needed to confirm these findings.We hope that our computation study will encourage groups with access to large numbers of peripheral blood mononuclear cells from COVID-19 patients, such as the RECOVER cohorts, to analyze SARS-CoV-2 peptidomes in association with HLA haplotypes.

2
FIGURE 2Spike (A), membrane (B), and nucleocapsid (C) epitope differences between variants of interest (B.1 labeled black, Alpha in orange, Delta in blue, and Omicron in red) when compared against the ancestral Wuhan strain.Predicted binding of SARS-CoV-2 S, M, and N epitopes was generated using the IEDB database TepiTool for the 27 most common HLA-A and HLA-B alleles.

FIGURE 1
FIGURE 1Nucleotide (NT) variations of spike, membrane, and nucleocapsid over time between B.1 (colored black), Alpha (orange), five Delta (blue), and nine Omicron (red) variants when compared against the original Wuhan strain.

FIGURE 3
FIGURE 3Membrane epitopes lost (regions colored red) and gained (colored blue) in Delta (top) and Omicron (bottom) when compared against the ancestral Wuhan strain.Protein characteristics were generated using UniProt's Feature Viewer (38).

TABLE 1 Estimated
DG for SARS-CoV-2 CD8+ peptides docked with HLA-B*15:01 by FOLDX.binding values reflect predicted consensus percentile ranks generated from IEDB's Tepitools, as described in the methods.Low scores correspond to high predicted binding affinities.Highlighted letters indicate amino acid alterations from the original Wuhan sequence to the respective SARS-CoV-2 strain. Predicted SSRGTSPAR was gained in 6/16 HLA-A alleles, comprising of 27.2% EUR, 23.5% AFA, 27.8% API, and 26.1% HIS population in the United States (Figure
Predicted binding values reflect predicted consensus percentile ranks generated from IEDB's Tepitools, as described in the methods.Low scores correspond to high predicted binding affinities.

TABLE 3
Summary of HLA haplotype United States population frequencies and clinical associations.

TABLE 3 Continued
(37)ary of allelic frequencies and clinical associations for the 27 HLA-A and HLA-B analyzed.X total describes HLA-predicted clinical severity with more negative values indicating greater predicted clinical severity (Equation2).Allelic frequencies were adapted from Gragert et al.(37).*Autoimmunity reflects new-onset autoimmune symptoms following COVID-19 infection.Bolded numbers indicate estimated population coverage of the HLA-A (top) and HLA-B (bottom) alleles analyzed in this study.

TABLE 4
Summary of HLA CD8+ T cell epitope diversity and clinical associations.
Epitopes lost, gained, and altered in predicted binding affinity in 27 HLA class I alleles.Epitope values reflect the number of unique epitopes affected.X total describes HLA predicted clinical severity for all SARS-CoV-2 VOCs and protein product, spike (S), membrane (M), and nucleocapsid (N) and was generated using Equation2, with more negative values indicating greater predicted clinical severity.*Survival noted only in HLA-B*15 alleles generally.

TABLE 5 HLA
-I peptides confirmed in other peptidomic datasets.

TABLE 6
Global summary of HLA Class I allele associated with severe COVID-19 infection.

TABLE 7
(65)al summary of HLA Class I allele associated with low risk of or protection from COVID-19 infection.Tables6 and 7heavily referenced Table1from Hoeseinnezhat et al.(65).CI, confidence interval.