The HLA Ligandome Comprises a Limited Repertoire of O-GlcNAcylated Antigens Preferentially Associated With HLA-B*07:02

Mass-spectrometry based immunopeptidomics has provided unprecedented insights into antigen presentation, not only charting an enormous ligandome of self-antigens, but also cancer neoantigens and peptide antigens harbouring post-translational modifications. Here we concentrate on the latter, focusing on the small subset of HLA Class I peptides (less than 1%) that has been observed to be post-translationally modified (PTM) by a O-linked N-acetylglucosamine (GlcNAc). Just like neoantigens these modified antigens may have specific immunomodulatory functions. Here we compiled from literature, and a new dataset originating from the JY B cell lymphoblastoid cell line, a concise albeit comprehensive list of O-GlcNAcylated HLA class I peptides. This cumulative list of O-GlcNAcylated HLA peptides were derived from normal and cancerous origin, as well as tissue specimen. Remarkably, the overlap in detected O-GlcNAcylated HLA peptides as well as their source proteins is strikingly high. Most of the O-GlcNAcylated HLA peptides originate from nuclear proteins, notably transcription factors. From this list, we extract that O-GlcNAcylated HLA Class I peptides are preferentially presented by the HLA-B*07:02 allele. This allele loads peptides with a Proline residue anchor at position 2, and features a binding groove that can accommodate well the recently proposed consensus sequence for O-GlcNAcylation, P(V/A/T/S)g(S/T), essentially explaining why HLA-B*07:02 is a favoured binding allele. The observations drawn from the compiled list, may assist in the prediction of novel O-GlcNAcylated HLA antigens, which will be best presented by patients harbouring HLA-B*07:02 or related alleles that use Proline as anchoring residue.


INTRODUCTION
Peptide antigen presentation by the human leukocyte antigen (HLA) to T-cells forms an integral part of the human immune surveillance (1). Characterizing the identity of these HLA peptide antigens is a critical first step to understand their immunomodulatory functions, and may guide important decisions in the development of vaccines used for immunotherapy (2,3). A major source of therapeutic intervention for cancer patients comes from neoantigens in the HLA Class I ligandome, introduced by cancer induced mutations (4,5). Mass spectrometry (MS) based immunopeptidomics provides means to identify disease or patient specific HLA antigen presentation, and chart the HLA bound peptide ligandome (6,7). Next to the peptide sequence and protein of origin, mass spectrometry can also be used to identity post-translation modifications (PTMs) on the antigens. These may constitute another source of "neoantigens", as several PTMs are recognized as hallmarks of specific diseases, including cancer and autoimmune diseases (8,9). Peptide antigens carrying such PTMs have been shown to affect or even regulate immune system recognition of the HLA Class I peptides (10)(11)(12).
Of all the HLA peptides presented by cells or in a tissue, only a small fraction seems to harbor PTMs (13)(14)(15)(16)(17)(18). Several studies have reported on phosphorylated HLA peptides, and even shown that these could be considered Tumor Associated Antigens (TAAs) (19). Since phosphorylation signalling can be drastically dysregulated in cancer, immunity to such modified epitopes can also be lost or gained under disease conditions (19,20). Modification on serine and threonine residues with b N-acetylglucosamine (O-GlcNAc) is known to engage in reciprocal crosstalk with phosphorylation (21), and such a dynamic balance between phosphorylation and O-GlcNAcylation is particularly relevant in the nucleus and cytosol, where the enzymes coordinating the addition or removal of O-linked N-acetylglucosamine (O-GlcNAc) reside. When O-GlcNAcylated proteins are degraded via the proteasomal route and loaded onto the HLA Class I molecules, the presented peptides can retain the O-GlcNAc moiety from the source protein. Notably, aberrant O-GlcNAcylation has been shown to correlate with augmented cancer cell proliferation, survival, invasion, and metastasis (22). It has also been reported that HLA peptides can elicit glycopeptide specific T-cell responses (13,14). Hence, it is important to characterize the HLA ligandome for O-GlcNAcylated class I peptides.
HLA class I peptides are generally more difficult to sequence and identify by mass spectrometry, compared to tryptic peptides, as they are relatively small, do not carry a charged C-terminus (R/K in tryptic peptides), and fragment less efficiently into complementary series of b-and y-ions. Recent advances in immunopeptidomics have substantially improved the identification of HLA peptides, notably the sensitivity and sequence specificity have been improved, making detection of neoantigens more feasible (23,24). Using hybrid tandem MS fragmentation methods, Marino et al. and Malaker et al. independently identified so far some of the largest sets of O-GlcNAc modified HLA Class I peptides (25,26). Both studies reported a few dozen of unique O-GlcNAc modified HLA Class I peptides, but also showed that some of these carried additional glycans, extended by Gal, Gal-NeuAc and even other monosaccharides. Moreover, Malaker et al. reported potent multifunctional T-cell responses to some of these O-GlcNAc modified HLA Class I peptides, but not to the unmodified HLA Class I peptide counterpart, and apparently also found more O-GlcNAc modified HLA Class I peptides in cancer cells and tissue, compared to non-cancerous cells. To further improve our understanding of the immunological role that O-GlcNAc modified HLA Class I peptides might have to play, it is important to examine the properties of these peptides, the source proteins of origin, the HLA loading specificity, and whether these peptides are functionally activating in immune surveillance.
In this report, using hybrid fragmentation MS strategies, we identified a new set of in total 23 O-GlcNAc HLA Class I peptides presented by the non-cancerous JY B-lymphoblastoid cell line. We compared this dataset with the earlier reported O-GlcNAc HLA Class I peptides described in the literature to collate a total list of 55 O-GlcNAcylated HLA Class I peptides. Our compilation includes the glycopeptide sequence, the likely subcellular localisation of the proteins of origin and the predicted HLA allele and their respective binding affinity. We observed substantial congruence between the glycopeptide sequences we report here and the existing datasets of Malaker et al, and Marino et al. While comparing these datasets we noticed that O-GlcNAc HLA Class I peptides were presented with a marked preference by HLA-B*07:02, which harbors a Pro anchoring site for the P2 position.

Cell Culture and Isolation of HLA Class I Associated Peptides
The B-lymphoblastoid cell line JY, homozygote in class I alleles (HLA-A*02:01, HLA-B*07:02, HLA-C*07:02) was cultured in RPMI 1640 supplemented with 10% fetal bovine serum, 50 U/mL penicillin, and 50 mg/mL streptomycin. HLA class I peptides were retrieved via immunoaffinity purification, as described previously (24,27). Specifically, HLA class I complexes were immunoprecipitated using the pan-HLA class I mouse monoclonal IgG2a antibody W6/32 (28), and antigen peptides were separated from HLA molecules by elution with 10% (v/v) acetic acid, and filtration over a 10-kDa molecular weight cutoff membrane (Merck Millipore). The HLA class I peptide ligands was freeze-dried, and reconstituted in 0.1% formic acid for further cleanup by C18 STAGE tips (Thermo Fischer Scientific) before LC/MS-MS analysis.

LC-MS/MS Analysis
The HLA Class I peptides were analyzed using an Ultimate 3000 UHPLC (Thermo Fisher Scientific) coupled to an Orbitrap Fusion Lumos (Thermo Fischer Scientific). The peptides were trapped on (Thermo Fisher Scientific, µ-Precolumn, 300 µm i.d. x 5mm, C18 PepMap100, 5 µm, 100 Å) for 5 min in solvent A (0.1% formic acid in water) before being separated on an analytical column (Agilent Poroshell, EC-C18, 2.7 mm, 50 cm × 75 mm). Solvent B consisted of 80% acetonitrile in 0.1% formic acid. The gradient was as follows: first 5 min of trapping, followed by 100 min gradient from 5% to 40% solvent B. Subsequently, 10 min of washing with 99% solvent B and 10 min re-equilibration with 9% solvent A. The mass spectrometer operated in data-dependent mode. Full scan MS spectra from m/z 350 to 1400 were acquired at a resolution of 120,000 in the Orbitrap after accumulation to a target value of 4 × 10 5 or a maximum injection time of 50 ms.
For global acquisition of the JY immunopeptidome, higherenergy collisional dissociation (HCD) MS/MS spectra were acquired at a resolution of 60,000. Charge states of 2 up to 4 starting at m/z 120 were chosen for fragmentation using the MIPS algorithm with 1.2 Da isolation window. The fragmentation was performed using 30% normalized collision energy (NCE) on selected precursors with 30s dynamic exclusion after accumulation of 5×10 4 ions. Focused characterisation of the O-GlcNAcylated immunopeptidome was performed by electron transfer dissociation (ETD) and energy-stepped HCD, triggered when two of the six O-GlcNAc signature fragment ions (m/z 204, 186, 168, 144, 138, and 126) were detected at > 5% relative abundance, as described previously (29,30).

Data Analysis
All RAW files were searched using the PEAKS Studio 10.5 against the Swiss-Prot human database (20,258 entries, downloaded in February 2018) edited with the JY-specific HLA proteins and 20 most abundant FBS contaminants, with no enzyme specificity in the search engine to identify the unmodified HLA ligandome. Precursor ion and MS/MS tolerances were set to 10 ppm and 0.03 Da. Methionine oxidation and cysteinylation were set as variable modifications. Peptides were filtered by precursor tolerance 5 ppm, < 1% FDR, XCorr > 1.7, and peptide rank 1. Only peptides between 8 and 12 amino acid long were selected for further analysis. Glycopeptide assignments were made by searching the same RAW files using Byonic 4.1. Precursor and MS/MS tolerance were set to 10 ppm and 0.03 Da. Peptide loading affinity was predicted with NetMHC 4.1 algorithm for the identified HLA peptide sequences against HLA-A*02:01, HLA-B*07:02 and HLA-C*07:02 allele, without considering potential changes in loading affinity due to O-GlcNAcylation. Peptides were considered binders when % Rank < 2. We classified the binding predictions for the O-GlcNAc HLA peptides as strong (IC50 ≤ 50 nM), regular (50 nM < IC50 ≤ 500 nM) and weak (500 nM < IC50 ≤ 5000 nM) binders for peptide sequence. Gene ontology (GO) analysis of the source proteins of the O-GlcNAcylated HLA class I peptides was performed by functional annotation tool DAVID (31). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (32) partner repository with the dataset identifier PXD028874.

Triggered MS Detection of O-GlcNAcylated HLA Peptide Ligands
Here, we first revisited the ligandome of the JY B-lymphoblastoid cell line, a model cell line used by several groups active in immunopeptidomics including ours (27,(33)(34)(35), as it has relative high expression of HLA complexes, and is homozygote in its HLA class I alleles, harbouring just HLA-A*02:01, HLA-B*07:02, and HLA-C*07:02 alleles. We have previously shown that several thousand of HLA peptides can be identified when analyzing JY cells. Here we attempted to characterize the glycosylated subset and report the detection of 23 unique O-GlcNAcylated HLA class I peptides. In our calculation we count each unique peptide just once, even when it is identified with multiple different PTMs. To boost the identifications we examined the occurrence of diagnostic oxonium ions (m/z 204, 186, 168, 144, 138) in the MS2 spectra, and used these features to trigger targeted analysis by ETD and stepped HCD, to boost the generation of fragment ions for identification of the glycopeptide backbone (17,26). As described earlier (36), the relative abundance of the oxonium fragment ions was used to confirm O-GlcNAc assignment of all 23 O-GlcNAc HLA peptides identified, as opposed to being O-GalNAcylated. A representative EThcD spectrum from the O-GlcNAcylated TPASgSRAQTL peptide is shown in Figure 1, exposing both backbone fragments as well as glycan ions.

Distribution of O-GlcNAcylated HLA Class I Peptides
In the present ligandome analysis of the JY cells, we detected around~8800 unique HLA class I peptides, and thus these 23 O-GlcNAcylated HLA peptides constitute in numbers just about 0.26% of the total ligandome. By examining the binding affinity to HLA-A*02:01, HLA-B*07:02 and HLA-C*07:02, we observed that 21 out of the 23 (91%) were predicted to be strong binders of the HLA-B*07:02 allele ( Figure 2A and Table 1). This prevalence for HLA-B*07:02 was in striking contrast to the distribution of the unmodified HLA antigens, where 41% of all HLA peptides were predicted to be strong binders to HLA-B*07:02 ( Figure 2A). The O-GlcNAcylated peptides in our JY dataset originated from just 21 non-redundant source proteins. We found this low number of detected O-GlcNAcylated peptides and source proteins quite surprising, as a recent compilation, termed the human 'O-GlcNAc-ome' database (37), described that in the human proteome at least 7000 different O-GlcNAcylation sites on cumulatively 5000 proteins could be present. Evidently, not many of these lead to an abundant presentation of O-GlcNAcylated HLA class I peptides. Around three quarters of the source proteins were of nuclear origin. We also observed that quite a few of the source proteins have been functionally annotated as binding to either DNA or RNA. This observation, although based on a small number of O-GlcNAcylated peptides detected in our dataset, seems to agree with the suggestion that O-GlcNAc modifications are critical in the regulation of transcriptional events (38).
Earlier     (17,25). Therefore, also in their data O-GlcNAcylated peptides constituted just around 0.1% of all HLA peptides. They observed that approximately 87% of the detected O-GlcNAcylated peptides were predicted to bind to HLA-B*07:02, whereas for the unmodified peptides this was just~39% ( Figure 2B). Another key feature of these O-GlcNAc HLA peptides was that most of them were derived from nuclear source proteins. All these findings are very much in line with our observations, although here we used the JY cell line. Malaker et al. analyzed the HLA class I immunopeptidomes from blood cells of primary leukemia (chronic myeloid leukemia, acute myeloid leukemia and acute lymphoblastic leukemia) patients, relying on a HLA-B*07-specific antibody (ME1) to pull down HLA Class I complexes (26). Healthy T and B cells were isolated from normal spleen and tonsil biopsies from healthy donors. Additionally, they also included the JY cell line in their analysis, as a reference system. To optimize the detection of O-GlcNAcylated peptides they combined several different approaches, including HCD-MS/MS analysis of the HLA ligandome, making use of glycan specific neutral losses, an HCD triggered ETD approach like Marino et al., and a selective enrichment of esterified glycopeptides using an amino phenylboronic acid-derivatized affinity matrix to capture the glycopeptides. As in the other mentioned studies and ours, the relative glycan fragment ion intensities were used to confirm that all peptides identified were indeed O- Given the small number of O-GlcNAcylated HLA class I peptides detected in each of these studies, as well as in the current study, we had expected a priori that such small number of identifications might render the overlap in between these studies to be close to zero. Nonetheless, this was not the case, and instead, there was a substantial overlap in detected O-GlcNAcylated HLA class I peptides ( Figure 2C) and their source proteins ( Figure 2D). This was even more surprising, given that the source material in these three studies was completely different. Cumulatively, from the datasets described above we compiled a list of 55 unique O-GlcNAcylated HLA class I peptides ( Table 1).
The potential role of O-GlcNAcylated peptides as neoantigens was investigated in depth by Malaker et al. Using The O-GlcNAcylated Thr or Ser is tagged with a "g" when unambiguously assigned by tandem MS data. † Each of these two pairs of overlapping peptides were considered as a unique sequence. Peptides described in the current study are assigned as CS. a These peptides were tested by Malaker et al. for immunogenicity. b Methylated and O-GlcNAcylated peptidoform of this peptide was also tested for immunogenicity by Malaker et al.
a subset of both unmodified as well as O-GlcNAcylated HLA Class I peptides they assessed seven of the O-GlcNAcylated peptides detected in the leukemia cells. Five out of these seven HLA-B*07:02 glycopeptides were immunogenic. All healthy donors had immunity to at least one of the glycopeptides with strong responses similar to chronic viral antigens. They further assessed cytotoxicity responses in healthy donors towards a specific peptide harboring both methylation and O-GlcNAcylation. These latter HLA peptides invoked a significant T-cell activation. In contrast, no T-cell response was observed for the unmodified HLA peptide counterpart. Notably, we identified two out of the seven O-GlcNAcylated HLA-I peptides that were tested for immunogenicity by Malaker et al. also in our study on non-cancerous JY cells. These two peptides are APVgSSKSSL and IPVgSSHNSL In summary, these findings suggest that the specific T-cell response towards O-GlcNAcylated HLA-B*07:02 peptides may potentially represent an autologous immunoprotective mechanism against leukemia. One of the most striking features examining all identified O-GlcNAc HLA class I peptides is that approximately 90% (50/55) peptides are predicted to be strong binders to HLA-B*07:02 allele. This finding was expected for the dataset of Malaker et al. as they used an antibody specific for HLA-B*07, but in Marino et al. and in the current study a pan-HLA antibody was used that has no specific preference for the HLA-B*07 allele. Thus, the compiled data hint at an overall preference for O-GlcNAcylated HLA class I peptides to be presented by HLA-B*07:02, regardless of the immunopeptidome isolation strategy.
Preferential Loading of O-GlcNAcylated HLA Class I Peptides on HLA-B*07:02 To further investigate a potential basis for the observed preferential presentation of O-GlcNAcylated peptides by the HLA-B*07:02 allele, we evaluated the peptide amino acid sequences of the 55 O-GlcNAcylated HLA peptides and compared them against the sequences of the unmodified peptides observed to bind the HLA-B*07:02 allele in the current dataset. Considering the site of modification, most O-GlcNAc sites were found in between position 3 to position 7 (P3-P7), with the highest incidence at position 4 (P4). Peptides presented by HLA-B*07:02 have a high preference for a proline residue at P2 position ( Figure 3A). The peptide sequences of the 9-mer O-GlcNAcylated peptides (32 peptides in the compiled list) predominantly confirmed the presence of a proline residues at P2 along with preference for a valine residue at P3, a serine at P4 (72%, p-value < 0.05) and a serine or threonine residue at P5 (Ser 36%, p-value < 0.05) as illustrated in Figure 3B. Similarly sequences for the 10-mer O-GlcNAcylated peptides (12 peptides) were also rather consistent with a proline residue at P2 or P3, valine in P4 and serine/threonine residues at P4 to P6 ( Table 1).
Although protein O-GlcNAcylation has been studied for decades, for a long time it was not clear that there could be a conserved substrate sequence motif. However, more recent structural (21,39) and bioinformatic studies (37) have revealed a semi-consensus substrate motif for protein O-GlcNAcylation, namely P-P-(V/T)-g(S/T)-(S/T)-A ( Figures 3C, D) (37). This motif matches well with the sequence motif compiled from our compendium of O-GlcNAcylated HLA class I peptides, where a dominant proline is observed (Figures 3B, D). It is quite conceivable that HLA-B*07 peptide antigens also naturally over-represent the O-GlcNAcylation motif, thereby resulting in preferential presentation of O-GlcNAcylated HLA class I peptides by the HLA-B*07 allele.
Compiling the combined dataset of O-GlcNAcylated HLA class I peptides from three recent studies, the substantial overlap in peptides detected in all or at least two of these studies was striking. Moreover, some other peculiar features were also repeatedly observed in all studies, for instance, the frequent cooccurrence of other PTMs, and the observation of glycan extensions next to the O-GlcNAcylation.   Figure 4. They reported five different forms: non-O-GlcNAc, one GlcNAc (Thr5), hexose-GlcNAc and even more extended glycans with mass shifts of 730 and 1021 Da. Moreover, arginine (Arg3) was observed to be modified with a mono-methyl (Arg3) or asymmetric di-methyl (Arg3), both with and without the glycan modifications. Thus, although we define RPPgTQSSL and IPRPPIgTQSSL as one unique peptide in Table 1, these are presented as HLA ligands in a dozen of different "peptidoforms". In general, not much is known about the RNA-binding protein 27, and the specific sequence motif presented as HLA peptides. Moreover, whether there is any cross-talk in between these PTMs is also not known.

Co-Occurring
Next, the peptides RVKpTPTgSQSY and RVKpTPTgSQSYR of the zinc-finger protein ZNF218 were also detected with both a phosphorylated Thr886 and an (extended) O-GlcNAc on Ser891. These peptides were also detected with mass extensions of 0, 203, 365, 656, 730, and 1021 Da, all indicative of a peptide backbone with extensions from 0 up to 6 glycan moieties. Both PTM sites have been reported previously, although no report is known if they co-occur in the source protein (40,41). The proximity of O-GlcNAcylation and phosphorylation in RVKTPTSQSY may also hint at potential PTM crosstalk.
The origin of these additional glycan moieties is still puzzling. Most of the proteins observed to be decorated with these glycan extensions find their origin in the nucleus. Computational modelling of O-GlcNAcylated RPPVgTKASSF in complex with HLA molecule by Marino et al. revealed the glycan moiety was solvent exposed and not directly involved in the binding to the HLA Class I molecule groove. It is therefore plausible, as suggested earlier by Marino et al., that they may be first formed by proteasomal degradation as O-GlcNAcylated peptide, and then loaded onto an HLA class I molecule and extended by glycosyltransferases, starting with b1, 4 galactosyltransferase, in the Golgi, but this needs further validation. In general, the co-occurrence of so many peptidoforms  of O-GlcNAcylated HLA Class I peptides raises the question whether all these forms have specific T-cell response and whether they are differentially immunogenic.

SUMMARY
Combining data from our new dataset of O-GlcNAcylated HLA peptides from JY cells, with two recent related studies, we were able to compile a concise list of bone fide O-GlcNAcylated HLA peptides, presented by a whole array of different cancer and noncancerous cells. Our comprehensive analysis of the sequences of all these peptides revealed preferential presentation of O-GlcNAcylated HLA peptides by the HLA-B*07 allele. The proline at P2 position of this allele allows the semiconsensus sequence motif of O-GlcNAcylation to be presented prominently. This preferred rule of presentation by HLA-B*07 and HLA-B*07-like alleles may also be useful in predicting putative O-GlcNAcylated HLA peptide antigens. Moreover, although the number of O-GlcNAcylated HLA peptides detected are small in each of the three studies evaluated, the overlap between these diverse studies were substantially high. Several of the reported O-GlcNAcylated HLA peptides are presented in multiple peptidoforms, carrying either additional phosphorylation or arginine-(di)methylation, along with glycan extensions of between 0 to 6 more carbohydrate moieties. Whether such distinct peptidoforms have differential functionality in immune surveillance needs to be further addressed.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: www.ebi.ac.uk/ pride, PXD028874.

AUTHOR CONTRIBUTIONS
AH conceived the idea. AS-B performed the MS experiments. SM and AS-B analyzed the data and wrote the initial draft of the manuscript along with AH. SM and AS-B contributed equally to this manuscript. All the authors contributed to the article and approved on the submitted version.