Deciphering Human Leukocyte Antigen Susceptibility Maps From Immunopeptidomics Characterization in Oncology and Infections

Genetic variability across the three major histocompatibility complex (MHC) class I genes (human leukocyte antigen [HLA] A, B, and C) may affect susceptibility to many diseases such as cancer, auto-immune or infectious diseases. Individual genetic variation may help to explain different immune responses to microorganisms across a population. HLA typing can be fast and inexpensive; however, deciphering peptides loaded on MHC-I and II which are presented to T cells, require the design and development of high-sensitivity methodological approaches and subsequently databases. Hence, these novel strategies and databases could help in the generation of vaccines using these potential immunogenic peptides and in identifying high-risk HLA types to be prioritized for vaccination programs. Herein, the recent developments and approaches, in this field, focusing on the identification of immunogenic peptides have been reviewed and the next steps to promote their translation into biomedical and clinical practice are discussed.


INTRODUCTION
Immunopeptidome is known as the list of peptides (independent of length, short-large list) presented on the surface of the cell by class I and class II human leukocyte antigen (HLA) molecules, which activate the immune response through selective and specific recognition by T cells. Currently, immunopeptidomes is gaining high significance in basic and translational biomedical science. Regarding basic research, to understand the immune system and specific responses to tolerogenic and non-tolerogenic antigenic stimulus, an exhaustive analysis of the immunopeptidome could be highly relevant and a key point to understand the mechanisms of immune response to be able to manipulate the specific immune responses. About translational biomedical research, the accurate knowledge of the immunopeptidome could improve immunotherapies and help in the next generation vaccine development against cancer, autoimmune or infectious diseases (Human Immuno-Peptidome Project, 2015;Mahdi, 2019). Bearing these concepts in mind, deciphering the immunopeptidome is becoming of great interest.
The HLA system is a group of proteins encoded by the major histocompatibility complex (MHC) genes and they present peptides (antigens) to T lymphocytes. They are expressed on the cell membrane, and Class I molecules are displayed on all human cells (except for red blood cells). The principal function of these cell surface proteins is the regulation of the adaptive immune responses by engaging with the cognate T cell receptor. They are also important for the self vs. nonself discrimination by the immune system for distinguishing between the body's own proteins and foreign proteins from invaders (such as viruses, bacteria, or any type of pathogen). Moreover, the HLA system is also involved in the immunopathogenesis of many diseases, such as oncology and autoimmune pathologies, among others (Human Immuno-Peptidome Project, 2015; Mahdi, 2019).
The HLA system is composed of genes (all of them are encoded in chromosome 6) that are co-dominantly expressed and highly polymorphic. The HLA molecules are classified into two main classes, which are: i.-MHC class I complex is composed of major genes (HLA-A, HLA-B, HLA-C) and non-classical genes (HLA-E, HLA-F, HLA-G). The function of the MHC class I complex is the presentation of intracellular peptides to CD8 + cytotoxic T lymphocytes. ii.-MHC class II complex is composed of major genes (HLA-DP, HLA-DQ, HLA-DR) and non-classical genes (HLA-DM, HLA-DO). The function of the MHC class II complex is to present extracellular processed antigenic peptides to CD4+ helper T cells (Human Immuno-Peptidome Project, 2015;Mahdi, 2019).
As the source of the peptide is different between these two classes, MHC class I and II molecules have different intracellular pathways for antigen processing. This difference ranges from the HLA complex formation to the peptide loading and HLA migration to cell membrane. Therefore, the structural properties of the epitopes presented on these classes differ from each other. In HLA Class I antigen processing, the first step takes place in the cytosol where intracellular antigens are degraded, mainly by proteolysis. Then, the antigen precursor peptides are transported to the endoplasmic reticulum, where they are further modified into short linear peptides, 8 to 11 amino acids (aa´) which are assembled with HLA class I. This class I HLA-peptide complex go out to the surface of the cell and is carefully checked by CD8 + T cells for foreign antigens that differ from healthy or normal ones (Human Immuno-Peptidome Project, 2015; Mahdi, 2019;Purcell et al., 2019). When CD8+ T cells recognize the presented antigens through their T cell receptors, they ultimately eliminate the affected cells through the cytotoxic arsenal of these effector immune cells. On the other hand, the exogenous antigen will be recognized and captured by a professional antigen presenting cell (APC). Then, exogenous antigens are degraded at the endosomal compartment of the APC and the resulting peptides, larger than others, from 10 to 24 aa´, could be assembled by HLA class II molecules to be presented to CD4+ T cells (Figure 1) (Human Immuno-Peptidome Project, 2015; Mahdi, 2019;Purcell et al., 2019).
One of the concepts that antigen processing and subsequent presentation has a main effect on is the process named immune surveillance, which consist of the interaction of immune system with expressed intracellular and extracellular proteins. In this process, dendritic cells (DC), one of the most important APCs, are involved in scanning antigens on the surrounding tissues. After the antigen is recognized and internalized, the DCs are activated and migrate to the draining lymph nodes, where they can induce an adaptive immune response (Reis e Sousa, 2004). The antigen presentation pathway is completed when APCs process the internalized antigens and load their derived peptides onto MHC molecules (Embgenbroich and Burgdorf, 2018). Then, it is required to check endogenous expressed proteins (CD8 + T cells-HLA class I) or exogenous antigen presentation (CD8 + T cells-HLA class I for cross-presenting antigens and CD4 + T cells-HLA class II), due to constantly changes in the cell proteome that may trigger an immune response (Grabowska et al., 2018;Purcell et al., 2019).
Differences in the HLA subtypes of each class as well as differing antigen processing between the two classes lead to different peptides presented on these molecules. The nature of peptides also changes with the source of protein it is derived from. In recent years, HLA binding peptides have been analyzed to for databases McHeyzer-Williams et al., 1996;Rodenko et al., 2006;Mommen et al., 2014;Giam et al., 2015;Schittenhelm et al., 2015;Ternette et al., 2015;Bassani-Sternberg et al., 2016;Liepe et al., 2016;Mommen et al., 2016;Ternette et al., 2016;Khodadoust et al., 2017;Shao et al., 2018), and the accuracy of the process as well as the role of antigen abundance, peptide length and posttranslational modification have been investigated, to generate the complete immunopeptidome. Once peptides in the immunopeptidome is determined and characterized, these specific peptides can provide useful information for the development and design of several treatments like peptide-based vaccines as well as having a potential to be used as biomarkers in functional assays or enumeration of antigen-specific T cells McHeyzer-Williams et al., 1996;Rodenko et al., 2006). Peptidebased vaccines are gaining attention in recent years, especially in the field of general vaccines against conserved regions of infectious pathogens (such as HIV, influenza and Plasmodium, among others) (Nardin et al., 2001;Parra-Loṕez et al., 2006;Purcell et al., 2007;Assarsson et al., 2008;Clemens et al., 2016;Sheikh et al., 2016). Also, this type of vaccines have a huge therapeutic potential in cancer with the discovery of many related novel epitopes, targeted by tumorspecific T cells after checkpoint suppression (Brennick et al., 2017;Verdegaal and van der Burg, 2017). For this goal, Human Immunopeptidome Project (HIPP) was been recently launched (https://www.hupo.org/Human-Immuno-Peptidome-Project), to provide a complete map of the human immunopeptidome and making the technology easy accessible, more robust and reproducible so that information will become available faster in translational clinical research (Vizcaıńo et al., 2020).
In this mini-review, most relevant aspects about HLA-bound immunogenic peptides, the novel therapeutics approaches, the methodological strategies to identify MHC-I and II loaded peptides and the bioinformatic open-access databases in order to perform an in silico prediction of the potential target peptides with therapeutical interest it is detailed described.

IMMUNOPEPTIDOMICS: CONCEPT, APPLICATIONS, AND METHODOLOGICAL STRATEGIES
Immunopeptidomics is the large-scale study of peptides presented on HLA molecules. Thanks to the complete study of the endogenous peptides contained within a biological sample under defined conditions, the multitude of native peptides in a biological compartment can be exhaustively described with detailed features (ie. amino acid sequence, PTMs, peptide length, proteolysis processes) (Sirois et al., 2020).
In recent years, the study of immunopeptidomics has been of great interest to researchers in many scientific areas, from infectious and auto-immune diseases to oncology (Caron et al., 2017). In fact, Immunopeptidomics, based on mass spectrometry, currently is helping in the discovery of T cell targets against tumors, against autoimmune diseases and, more recently, against infectious pathogens for their application in the pandemic effects (ie. accelerate vaccine design and development, immune monitoring, engineering T cells . Herein, the main features (in addition to advantages/ disadvantages) of several currently developed approaches to decipher the peptides assembled in HLA molecules are critically discussed, as well as their applications in different biomedical research areas, noting that immunological knowledge and clinical translation might be highly similar regardless of the area of study, such as for cancer and infectious diseases.

Infectious Diseases and Cancer Immunopeptidomics
Nowadays, it is well-known that many human health challenges are accompanied by disruptions in immune system and immune response. As the altered immune response is the key factor in the origin of many diseases (ie. auto-immune diseases, infectious diseases, chronic inflammation) and also innate and specific immune responses are involved in other pathologies with different ontogeny (ie. cancer, neurodegenerative,…); then, it is expected than multiple of methodological approaches for immunopeptidomics might be similar and commonly applied to identify antigen peptides in the pathological situations (Marko-Varga and LaBaer, 2017). Therefore, it seems that it is highly interesting to bring together an overview of complementary immunopeptidomics research in order to advance in the development of novel therapeutic approaches (ie. peptides vaccines,…) or to provide novel knowledge into the disease (Vance et al., 2017).
Given also the activation of immune response in infectious diseases is due to the presence of a pathogen; recently it is also been well-described that 20% of cancers could be caused by infectious agents, such as Helicobacter pylori, hepatitis C virus (HCV), Rouse Sarcoma Virus (RSV), Kaposi´s sarcomaassociated herpesvirus (KSHV), … as solid tumors and also in onco-hematological pathologies (such as chronic lymphocytic leukemia (CLL),…) (Mantovani et al., 2008;Kowalewski et al., 2015;Vance et al., 2017).
In addition, as the immune tolerance mechanism require a continuous steady-state between self-and non-self components, it is expected that the characterization of cell-cell communication, cellular microenviroment, cell migration, immune evasion and suppression, endothelial activation, inflammation initiation and evolution, phagocytosis, cell death mechanism, … are also critical on the immune tolerance (Vance et al., 2017). Thus, infectious disease studies influence cancer studies, and vice versa. Moreover, nowadays, due to the enormous success in cancer immunotherapy, it is becoming essential to understand the immunopeptidome because it is opening novel therapeutical opportunities to treat and prevent both, infectious and cancer diseases; among that infectious disease might be a risk factor to considered in the efficiency of multiple onco-immunotherapies (Acebes-Fernández et al., 2020).
In the last decade, many studies have been focused on the relationship between infectious diseases and cancers. One of these studies described the direct relationship between Helicobacter pylori infection and the cause of gastric cancer by evaluating the humoral immune response. In a previous study by L. Song et al. (2020), the humoral response to 1527 proteins (almost the entire immunoproteome of Helicobacter pylori) in 50 cases of gastric cancer was characterized, highlighting that decreased immune response to several proteins in gastric cancer which can reflect mucosal damage and low bacterial load. Among this, there is also evidence that 8% to 10% of gastric cancers are related to Epstein-Barr virus (EBV). Another study performed by L. Song et al. (2021), screened the humoral response to this virus in gastric cancer patients. As a consequence, of this screening, it is reported that Epstein-Barr virus-positive cancer can be detected by specific antibodies that can also be used for the diagnosis and treatment of the disease (Song et al., 2021). Furthermore, the relation between cancer and infections has been also explored for the HLA-I and II molecules. In fact, several studies have found that protein fragments from bacteria invading tumor cells can be presented by HLA molecules on the surface of tumor cells and consequently these peptides are recognized by T-cells (Kalaora et al., 2021). Also immunopeptidomics showed that both antigen-presenting cells and tumor cells display bacterial and/or virus peptides on their cell surface by HLA molecules, which are specifically recognized by CD4+ and/or CD8+ T cells. Hence, these results could help in the selection of suitable bacterial or virus targets for cancer immunotherapy (Riemer, 2021). And more recently, in the last year, immunopeptidomics has also been successfully used to identify SARS-CoV-2 peptides; so then, the relationship between the binding capacity of viral peptides to 52 common MHC-I alleles and the mortality rate has been assessed, resulting in a high inverse relationship between peptides identified from the virus using a personal workflow called Ensemble-MHC (a consensus algorithm for the prediction of MHC-I peptides) and the mortality rate (Wilson et al., 2021).
Among these aspects, there is also a close relationship between onco-immunotherapies and infectious disease. Immune checkpoint inhibitors (ICI), one of the successful therapies in immunoncology, are under continuous investigation to be applied against T cell dysfunction in chronic viral infections (Barber et al., 2006). Similarly, CAR Tcell therapies being used in cancer therapy is also being repurposed for infectious diseases (Parida et al., 2015), mainly in design and develop CAR-T cells targeting pathogens infections. Since immunopeptidome could provide information about the peptides presented by the HLA system, which could be used to design tailored chimeric antigen receptors and vaccines that induce CD8+ T cells, in order to control infectious pathogens or fight tumors. An example of these approaches is the study of peptides presented by the human immunodeficiency virus type 1 (HIV-1) (Partridge et al., 2018) in which they found viral peptides specifically bound to HLA I and II molecules but did not elicit CD8+ T-cell responses. In a similar manner, in another immunopeptidomics and infection study, an attempt is being made to block the progression of pre-erythrocytic malaria, the asymptomatic stage of the disease, by means of a vaccine. So far, using live sporozoite-based vaccines is not feasible due to the great challenges. For this purpose, the identification of Plasmodium falciparum antigens expressed during this stage of the disease may be useful as vaccine candidates and improve the current state of treatment (Bettencourt, 2020).
Overall, immunopeptidome characterization, both for (and sometimes together) cancer and infectious diseases help to face challenges in vaccine development, overcome drawbacks and resistances to treatments, and help to increase the efficiency and efficacy of other already implemented onco-immunotherapies.

Personalized Vaccines
The principle of producing personalized vaccines seems simple, but accurate and selective prediction of selective & specific disease peptide antigens for each patient remains as one of the major obstacles (Creech et al., 2018) It is estimated that each HLA heterodimer binds to thousands of peptides of allelespecific binding preference (Hunt et al., 1992;Rammensee et al., 1995;Vita et al., 2015). Realizing the binding preference of each HLA heterodimer is the clue to successfully predict which antigens may cause specific T cell responses. For this reason, during the last decade, many efforts have been made in order to generate robust and reproducible methodological approaches for the specific identification of HLA loaded peptides which could be potential candidates for developing personalized peptide vaccines.
One of the promising therapeutic strategy in biomedicine are vaccines based on immunotherapy, especially active immunotherapy, which aims to activate the immune system in vivo and induce it to develop a high-specific response against exogenous and endogenous antigens. Following this way, therapeutic vaccines are divided in three types according to their content: cell vaccines, protein or peptide vaccines and genetic vaccines (made with DNA, RNA and viruses) (Acebes-Fernańdez et al., 2020). Here, the principal characteristics of each vaccine types will be revised from view of the role of HLA loaded peptides in the development of personalized peptide vaccines.
In cell vaccines, it is highlighted the Dendritic Cell (DC) vaccines, which are based on the intrinsic main features of DCs, commonly called professional APC. DCs, as professional antigen presenting cells, work in surrounding tissues where absorbs, process and present the pathogen and/or host antigenic peptides to primitive T lymphocytes in lymphatic organs through HLA. Therefore, although DC has a fundamental role in connecting innate and adaptive immunity, the functional characterization in DC determines that three signals are required to reach a complete and full activation. The first one is that for priming of T cells is necessary to proper loading MHCpeptide complexes. The second one, there must be an upregulation of costimulatory molecules (CD40, CD80, and CD86, for example). And the last one is the polarization of the immune response through the production of cytokines (Guo et al., 2013;Acebes-Fernańdez et al., 2020). There are numerous examples of these vaccines in difficult-to-treat diseases such as cancer, where DCs produced in vitro are used as tumoral vaccines. Mechanistically, human DC can be produced in culture from CD34+ hematopoietic progenitor cells or peripheral blood monocytes (Banchereau and Palucka, 2005). Thus, a DC vaccine is obtained by loading Tumor Associated Antigens (TAAs) onto the patient's own DC, and then treating them with adjuvant. For example, Granulocyte-Macrophage Colony-Stimulating Factor (GM-CSF) is essential for in vitro production of monocyte-derived DC (Banchereau and Palucka, 2005). These cells require a maturation process, which is related to changes in the morphology and function of DCs. These procedures can improve the expression of MHC class I and II and co-stimulatory molecules, as well as increase the production of cytokines (Inaba et al., 1992). Then, these DCs are administered to patients to induce anti-tumor immunity. The first therapeutic cancer vaccine approved by the FDA was the DC vaccine Sipuleucel-T (Provenge ™ ). It has successfully improved the survival rate of patients with a favorable toxicity profile in prostate cancer, opening a new paradigm for cancer treatment (Murphy et al., 1996;Small et al., 2006;Kantoff et al., 2010;Guo et al., 2013). Although there are more vaccines that have been used in clinical trials to treat other types of cancer such as melanoma, renal cell carcinoma and glioma (Nestle et al., 1998;HOLTL et al., 1999;Thurner et al., 1999;John et al., 2004;Small et al., 2006;Kantoff et al., 2010;Romano et al., 2011), further research is needed to prove its clinical efficacy and survival of patients with these types of cancers.
Another group of vaccines are the protein or peptide-based vaccines. Although initially these injections have been based on Tumor Associated Antigens (TAA), Cancer Germline Antigens (CGA), or Tumor Specific Antigens (TSA), together with some adjuvants, they could also be useful against infectious disease antigens. Protein or peptide-based vaccines include synthetic peptides with 20 to 30 amino acids from specific epitopes of tumor or infectious antigens. In these vaccines, the antigen could be adjusted to bind to immunogenic peptides, cytokines or antibodies (Pan et al., 2018). This type of vaccine is stable and not very expensive but has a major limitation which is the need to decipher the peptide epitopes to be used in these vaccines. Also immunosuppression present in the disease settings,as well as the weak immunogenicity of these antigens could be some disadvantages (Mocellin et al., 2009).
And the last type of vaccines is DNA vaccines: These are gene-based vaccines that use DNA (such as plasmids) or RNA (such as mRNA) (Mocellin et al., 2009). Viral DNA vectors can be used to deliver the cargo to the infiltrating somatic cells or DC (Guo et al., 2013). APC absorbs genetic material and translates them into cancer-specific antigens, thereby stimulating the immune system (Mocellin et al., 2009). Peptide or protein transcription and antigen presentation might be limited by the DNA/RNA delivery method for transfection efficiency and targeting (Mocellin et al., 2009). To administer the vaccines there are two methods: using viral vectors or by electroporation. Despite its effectiveness, it remains difficult to apply in routine clinical studies (Osada et al., 2012;Lee et al., 2015). Also it is necessary to report that the injection of live viruses can cause side effects and reduce the success of the vaccination due to clearance by antiviral antibodies in patients (Osada et al., 2012).
Considering the peptides and proteins used in cancer vaccines, in recent years, the NY-ESO-1 protein has been found to be a potential cancer vaccine antigen because of its high capacity to induce both humoral and cellular immune responses. In the study by Anna Pavlick et al. a phase I/II adjuvant clinical in resected high-risk melanomas was completed to improve the delivery of poly-ICLC as a constituent of the vaccine development. Poly-ICLC is a synthetic, stabilized, double-stranded RNA viral mimic capable of activating multiple innate immune receptors, activating CD4 and CD8 T cells and making it the optimal adjuvant for inducing de novo immune responses against tumor neoantigens (Pavlick et al., 2020).
The success of these vaccines depend on the right target selection and therefore, understanding peptide profiles presented on HLAs is essential for advancing the study of immunology, active immunotherapy, vaccine research and for further treatment development for any disease.

Deciphering MHC-I and MHC-II Loaded Peptides
Recently, it has emerged of high clinical relevance to investigate how the HLA systems can affect susceptibility to infections, to immunotherapy response in oncology or in auto-immune diseases. For example, individual genetic variation will give different immune responses to a particular antigen, such as a microorganism, in a particular population. Bearing this in mind, the latest advances in this area has lead to specific identification of the peptides assembled in HLA molecules; for this purpose, several different strategies have been designed and developed which are briefly described in this review.

Systematic Isolation of HLA Molecules
By proteomics approaches, the identification of HLA loaded peptides require multiple sequential steps because of the relative low abundance of HLA molecules (except in APCs) and a low abundance of loaded peptides; among the particular features (such as size, fixed positions of hydrophobic/hydrophilic moieties, PTMs,…) of the loaded peptides derived from protein degradation (proteosome,…) and assembled in HLA groove…. Hence, it seems that the separation and enrichment of HLA molecules is critical, and the selective elution of loaded peptides is also a key point. Thus, several methodological strategies have been performed and here it is discussed a few of them; despite of it is still an area in continuous change, progress and evolution (Figure 2).
Most of the performed approaches are based on the specific and selective enrichment of HLA complexes by immunochromatography. Then, the HLA complex is immunopurified (IP) from the cell lysate (in presence of mild detergents), and further elution of peptides from the captured HLA complexes, which could be further analyzed, at high-resolution conditions, by liquid chromatography-mass spectrometry/mass spectrometry(LC-MS/MS). For this process, it is absolutely necessary to use optimal antibodies. Here, the most used are pan-HLA-I (Anti-Human HLA A, B, C: clone W6/32) and pan-HLA-II (Anti-Human HLA DR, DP, DQ: clone Tü39) (Bassani-Sternberg et al., 2010;Chong et al., 2018).
Regarding immunopurification, Chloe Chong et al. (2018) described a method based on sequential HLA purification by a chromatographic combination, starting by a pro-A beads (to capture endogenous antibodies), followed by an anti-HLA-I and/ or anti-HLA-II antibodies coupled to pro-A beads to capture HLA class I and II, respectively. Then, HLA-I and HLA-II complexes are eluted and collected on a hydrophobic resin (C18) in order to selectively enrich loaded HLA peptides. The principal advantage of this strategy is that there are no intermediate steps, so it can continuously depleted endogenous antibodies and immunoaffinity purified class I and class II HLA complexes. Following this strategy, Chloe Chong et al. (2018) used B and T human cell lines and identified a total of 42,556 singular HLA class I peptides correlated to 8,975 proteins and 43,702 unrepeated HLA class II peptides from 4,501 proteins with a 1% false discovery rate (FDR). In both types of cell lines, the number of distinctive peptides changed from 3,293 to 13,696 for HLA I peptides and from 7,210 to 10,060 peptides for HLA II.
Another method is the direct immunoprecipitation using magnetic microspheres conjugated with anti-pan HLA I and II antibodies, respectively (Bassani-Sternberg et al., 2010;Chong et al., 2018). This procedure has the advantage of concentrating the cleft peptides of the HLA system thanks to the employment of super-paramagnetic microspheres conjugated to these highlighted antibodies. After separation of the peptides from the HLA molecules, the samples are eluted with an acid buffer and then the peptides are sequenced by mass spectrometry. If the whole exome or genome of the sample of interest is previously analysed to identify somatic mutations, then the results could be compared with the complete proteome (Chong et al., 2018;Kalaora and Samuels, 2019). In these studies, they compared several cancer cell lines with healthy ones, identified thousands of soluble HLA peptides, including some cancer specific peptides, shared among multiple several cell lines.
Recently, a novel procedure has been developed based on a mild acidic elution (MAE) of the HLA loaded peptides directly, in one single step, from the cells without any previous cell lysis and HLA selective enrichment. Here, thanks to the elution buffer thrown on the cells, the peptides carried by the cells of interest could be detected by further LC-MS/MS analysis. MAE strategy may be a cheap alternative to the other methods of immunoaffinity but is hindered by a large number of contaminating peptides not related with HLA molecules. By this strategy, Sturm T. et al., has identified the 50% common peptides between both approaches (MAE and immunoaffinity chromatography) as well as 22% of peptides identified only with MAE strategy (Sturm et al., 2020).

In Silico Prediction of HLA-I and HLA-II Loaded Peptides
Having in mind that all human cells present HLA complexes and the critical role of HLA complexes in pathogen response and pathologies he precise and deep characterization of the immunopeptidome is highly important; hence, a synergically combination with multi-omics analysis (ie. metabolomics, genomics, proteomics, transcriptomics, epigenomics, among others) seems to be a powerful strategy for systematic determinations of immunopeptidomes. In addition to all of them, in silico prediction is a fundamental initial step to identify potential target neoantigens. For this purpose, it is crucial to carry out bioinformatic analysis that correlated the different databases and repositories of interest and integrate information from the immunopeptidome characterization ( Figure 3).
Currently, several open-access bioinformatics tools are available for the prediction and selection of neoantigens by personalized proteogenomic workflows. One of them is called ProGeo-Neo (https://github.com/kbvstmd/ProGeo-neo) that allows neoantigen prediction and selection based on a customized proteogenomic pipeline . ProGeo-Neo is based on the integration of three dataset packages: i.-RNA-seq data analysis, which could generate variant peptides; ii-HLA alleles are inferred from RNA-seq data; then it is possible to work only on selected HLA alleles; iii.-Neoantigen prediction is based on genomic and proteomics information which allows the screening of new antigens by LC-MS/MS and a neoantigen filtering through RNA expression and T cell receptor recognition (epitope). This novel pipeline is already being used as in Xiaoxiu Tan's study, where a platform was developed to facilitate the screening and confirmation of potential neoantigens in cancer immunotherapy .
In a similar manner, several databases have been recently created for identified peptides assembled in HLA complexes; mainly based on the employed methodology (commonly LC-MS/MS) for the identification of the peptides. Herein, it is briefly described some relevant ones with relevance to oncology and infections ( Table 1). TRON Cell Line Portal or TCLP (http://celllines.tron-mainz. de/) (Scholtalbers et al., 2015) is a database that integrated the public RNA-Seq datasets of different exposed cell lines available in two repositories: The first set of data collected by Klijn et al. (2015) and the second one in The Cancer Cell Line Encyclopedia (CCLE). This database has been able to re-analyze accessible raw RNA-Seq datasets, determined the abundance and the type of HLA molecules as well as recognized virus and quantify the gene expression of 1,082 human cancer cell lines. Using all these available datasets of established HLA isotypes, cell linesmutations and HLA prediction algorithms, Tron Cell Line Portal allows to predict the antigenic mutations in each analyzed human cell line. There are several studies in which they did typing of Human Leukocyte Antigens by High Throughput DNA and RNA Sequencing using TCLP, which includes an overview of approaches using high-throughput sequencing for HLA typing, as well as providing supplementary wet-lab protocols and in silico screening tools (Bukur, 2017;Boegel and Castle, 2019).
In this database, seq2HLA v2.2 is useful to identify the HLA isotypes (Boegel et al., 2014) which is able to calculate the fourdigit HLA type from the RNA-Seq reads. It also generates two-and four-digit calls (Boegel et al., 2013;Boegel et al., 2014) with high precision. This public data includes HLA type data investigated by Adams et al. (2005), where sequence-based typing method (SBT) is used for HLA typing to determine the HLA class I and class II genotypes of the 60 cell lines used by Antibody and T cell epitopes datasets Prediction epitopes algorithm; Analysis epitopes tool FIGURE 3 | Essential steps in deciphering immunopeptides that elicit a T-cell response and therefore serve to design personalized peptide vaccines. National Cancer Institute (NCI-60). Using established HLA class I and II types in combination with systematic mutations will allow us to describe a register of possible neoepitopes candidates of HLA class I and II, respectively. HLAthena (http://hlathena.tools/) This open-access portal is based on identification of HLA-I loaded peptides by exhaustive and systematic LC-MS/MS characterization. Here, more than 185,000 eluted peptides were analyzed from HLA A, B, C and G of 95 monoallelic human cell lines. The typical peptide motifs of each HLA allele was determined, as well as the unique and shared binding submotifs across alleles and the ones associated with a different peptide length. In addition, this database is developed by combining datasets with transcript abundance and the knowledge of peptide processing, which provides some prediction models of specific allele length for endogenous peptide presentation. These models predict HLA class I peptides compared with existing ligands tools and identify more than 75% of HLA I loaded peptides studied in 11 tumoral patients' cells (melanoma, glioblastoma and clear cell renal cell carcinoma) with high accuracy. In summary, HLAthena database allows to systematically decipher the rules of presentation of endogenous antigens in tumoral cells Sarkizova et al., 2020). Other studies have used this database to predict and reduce the number of peptides to be studied in the face of the urgent need to develop a SARS-CoV-2 vaccine because the stimulation of an adequate immune response leading to protection is highly dependent on the presentation of epitopes to circulating T cells via the HLA complex. In this study 174 SARS-CoV-2 epitopes with high binding prediction scores were identified and validated to stably bind to 11 HLA allotypes (Prachar et al., 2020).
NetMHC 4.1 (http://www.cbs.dtu.dk/services/NetMHC/) is a method for predicting peptides bound to MHC class I using gap sequence alignment. This method is based on artificial neural network to align the amino acid sequence of peptides assembled in the HLA complexes, which allows insertion and deletion in the alignment. Alignment-based prediction methods including deletions and insertions show higher throughput than strategies trained on single-length peptides Nielsen et al., 2003;Andreatta and Nielsen, 2016). Similarly, they exemplify how the position of the deletion can help explain the peptide-MHC binding pattern, such as when a long peptide protrudes from the HLA groove or protrudes at either end. And they also demonstrated that this method can predict the length distribution of different HLA molecules, and used this prediction algorithm to quantify the reduction in the experimental workload required to identify potential epitopes. There are several studies that use this database to carry out studies of very diverse pathologies (Conley et al., 2018;Khanna and Rana, 2019). As an example, one study uses this database to predict Tcell epitopes of Mycobacterium tuberculosis. As the identification of T-cell or B-cell epitopes on the target antigen is the main goal in epitope-based vaccine design, immunological diagnostic tests and antibody development, it is essential to provide a robust and reproducible system that can assist in the diagnosis of M. tuberculosis (Khanna and Rana, 2019). SYFPEITHI (http://www.syfpeithi.de/) This approach contains a collection of MHC class I and II ligands and peptide motifs from humans as well as additional species (such as apes, cows, chickens, and mice), which is constantly being updated. You can search for HLA alleles and motifs, and you can find natural ligands, T cell epitopes, source of proteins and organisms and their references. It includes links to The European Molecular Biology Laboratory (EMBL) and PubMed databases. In addition, ligand prediction can be used for various HLA allele products. The content of the database is limited to the availability of published datasets; however, it is highly useful to correlate the prediction of T cell epitopes and HLA-loaded peptides .
The prediction based on published motifs (such as natural ligands and cluster sequencing) considers the amino acids at anchor positions. Calculation of a reliable score is for anchoring is done by giving some amino acids of determined peptides specific values depending on whether they are anchors, auxiliary anchors or preferred residues. Ideal anchoring is 10 points, unconventional anchoring is 6 to 8 points, auxiliary anchoring is 4 to 6, and residues are preferably 1 to 4 points. Also, some amino acids which have a molecular weight that is considered to have a negative outcome on binding capacity are between −1 and −3 . This novel database has already been used in the characterization (previously discussed) of HLA ligandome in chronic lymphocytic leukemia, among others .
IPD-IMGT/HLA (https://www.ebi.ac.uk/ipd/imgt/hla/) includes 25,000 allelic sequences of more than 40 genes, which are encoded by the MHC of the human genome. The IPD-IMGT/HLA is a stable, highly accessible and easy-to-use database which provides access to many alternative sequences of this genetic system to the medical and scientific communities, essential for, for example, successful transplant results. The challenge for this database is to continue providing a highly selected sequence variation database while keeping an increased amount of submissions and the complexity of the sequence (Robinson et al., 2020). This database has multiple tools, either its own or that have been incorporated from data libraries to the existing tools available from the European Bioinformatics Institute (https:// www.ebi.ac.uk/) (EMBL-EBI) (Valentin et al., 2010). Among them, you can find: i.-Sequence alignment, ii.-Allele query, iii.sequence search tool (FASTA and BLAST) and iv.-cell query (original materials) (Labarga et al., 2007;Valentin et al., 2010;Robinson et al., 2014;Albrecht et al., 2017;Madeira et al., 2019).
The Immune Epitope Database (IEDB) (https://www.iedb. org/home_v3.php) is an open-access database supported by the National Institute of Allergy and Infectious Diseases (NIAID). It catalogues, in the context of infectious disease, allergy, autoimmunity and transplantation, different experimental studies on antibody and T cell epitopes examined in humans as well as non-human primates and several animal species (Vita et al., 2019). The IEDB also hosts tools to help in the prediction and analysis of epitopes. In one recent study, they use the IEDB database to design a model for vaccine design by prediction of B-epitopes in perturbations in the sequence of a multitude of peptides, in various source or host organisms (Gonzaĺez-Dıáz et al., 2014).

Analysis of LC-MS/MS Data Sets of MHC-I and MHC-II Loaded Peptides
In addition to the information repositories about the peptides assembled in HLA molecules, tools for analyzing data extracted from LC-MS/MS characterization (including the different MS/MS instrumentation) and databases are increasingly common, including: DeepRescore (https://github.com/bzhanglab/DeepRescore) is an immunopeptidomics data analysis device that supports deep learning-derived peptide characteristics to arrange peptidespectrum matches (PSMs). It could take MS/MS raw data (in MGF format, for example) and recognize results from search engines as input. The last version supports four well-establish search engines like MS-GF+, Comet, X! Tandem and MaxQuant . DeepRescore includes peptide characteristics derived from deep learning predictions (among which stand accurate retention time and MS/MS spectrum prediction) with previously used functions to reproduce peptide spectrum matching. Using two public immunopeptideomics data sets, it is demonstrated that compared with existing methods, the scoring performed by DeepRescore increases the accuracy and sensitivity of the recognition of MHC binding peptides and neoantigens. It also shows that, to a large extent, performance improvements are driven by features derived from deep learning. Thus, this post-processing tool is freely available to the scientific community and can be used to identify sensitive and reproducible HLA binding peptides and neoantigens from immunopeptidomic datasets.
PEAKS X PRO (https://www.bioinfor.com/ Bioinformatics Solutions Inc., Waterloo, Ontario, Canada) is a commercially available software platform with several tools for the detection of peptide characteristics. Here, the three main characteristics are: DeepNovo is an extensive neural network model for de novo peptide sequencing. The DeepNovo design includes the latest developments in convolutional neural networks (CNN) and recurrent neural network (RNN) to learn the characteristics of peptides, as well as fragment ions and sequence patterns of tandem mass spectra. These systems are also integrated with local dynamic programming to solve the complex improvement tasks of de novo sequencing. DeepNovo has an accuracy growth at the amino acid level by 7.7% to 22.9% and an accuracy improvement at the peptide level by 38.1% to 64.0%. DeepNovo was used to automatically construct again the entire sequence of the mouse antibody light and heavy chain, without the need for others auxiliary databases to achieve 97.5% to 100% coverage and 97.2% to 99.5% accuracy. In addition, it can be retrained to modify any data source and supplies a whole end-toend training and prediction solution for de novo sequencing problems (Tran et al., 2017). The second one is DeepIso, which combines the latest developments in CNN and RNN to estimate the peptide intensity and detect their features of different charge states. DeepIso consists of two different deep learning-based components, which can learn multiple levels of the high-dimensional data itself represented by multiple layers of neurons and can be adapted to new situations. The peptide characteristics list investigated with this model matches with 97.43% of high quality MS/MS identifications in a standard dataset (Zohora et al., 2019). The third one is DeepNovo-DIA, de novo peptide sequencing Independent data acquisition (DIA) method of mass spectrometry data. It also uses a neural network to capture cross m/z, retention time and intensity. Besides, DIA combined with peptide sequence pattern solves height problems multiple spectra, allowing us to identify novel peptides in human antigens and antibodies .
MaxQuant (https://www.maxquant.org/) is a quantitative proteomics software package developed for analyzing large mass-spectrometric data sets. It is the most used software with this type of raw data due to its versatility for handling and the interesting options with quantitative results. It is composed by a set of algorithms that extract information from raw MS data with high efficiency and sturdiness, capable of count elevated peptide identification rates, as well as high precision protein quantification for thousand proteins in complex proteomes (Cox and Mann, 2008). In recent years, the demand of MaxQuant has increased due to the great development of methodological strategies to identify peptides of interest in different pathologies (Schaab et al., 2012;Tyanova et al., 2015;Tyanova et al., 2016).

Relevance of Immunopeptidomics Characterization in Oncology and Infections
For personalized medicine, a huge effort has been made to explore endogenous and exogenous processed ligands by diverse HLA heterodimers; which is an important from multiple points of view, such as: knowing the specific binding preferences of HLA could help predict secure and preferably immunogenic epitopes for independent patients with different pathologies. Technological and methodological advances (LC-MS/MS instrumentation, data search algorithms, acquisition methods, …) has increased the number and size of MS data repositories and their correlation with other multi-omics info (Vita et al., 2015;Fleri et al., 2017;Creech et al., 2018).
In addition, another advance has been the creation of international consortiums dedicated to establishing standardization strategies related to peptides isolated from HLA molecules. Hence, a few years ago, it was launched The Human Immuno-Peptidome Project (HIPP) (https://www.hupo.org/ Human-Immuno-Peptidome-Project), was launched under the umbrella of the Human Proteome Project Organization (HUPO), whose main goal is to use mass spectrometry technology to map the entire spectrum of peptides presented by HLA molecules and allow any immunologist, clinicians and other biomedical scientist to perform reliable analysis (Admon and Bassani-Sternberg, 2011;Caron et al., 2017). Within this framework, the immunogenicity of the predicted antigen will be confirmed through in vitro validation studies to evaluate the accuracy as well as the efficiency of present prediction algorithms investigated by numerous industrial and academic laboratories.

Limitations of Immunopeptidomics Characterization
Bearing in mind all the methodological processes that must be carried out for the identification of immunogenic peptides assembled on HLA molecules, there are several limitations to be taken into account for their discovery.
The first limitation observed is the need to use an optimal lysis buffer, which allows us to obtain the maximum amount of membrane proteins, where the HLA molecules are subcellular located, and consequently the peptides assembled on them, to achieve a huge increment of relative protein abundance. If other specific lysis buffers are used for other subcellular compartments or for total protein extraction, maybe the relative abundance is not modified to enrich on HLA molecules with the subsequent decrease of peptides identification. In a similar manner, the elution buffer is also quite critical because it also might affect the yield on peptide isolation and identification (Chong et al., 2018).
Depending on the main goal of each immunopeptidome characterization, the immunoprecipitation strategy to be used must be taken into consideration; because another limitation observed is directly related to the pan-antibodies; because these pan-antibodies are optimal for maximizing the coverage, however, it also might lead to false HLA-restriction attribution to the eluted peptides (Bassani-Sternberg et al., 2010;Chong et al., 2018).
And finally, it must be taken into account that there are peptides with a weak and/or poor immunogenicity and involved in a possible immunosuppression, so that, only by using mass spectrometry, the identification of all the peptides bound to an HLA molecule may be limited (Purcell et al., 2019).

FUTURE PERSPECTIVES AND CHALLENGES
Despite the great advances in the knowledge and prediction of HLA-bound ligands, challenges remain because the antigens of interest currently represent a very small fraction of HLA-ligandome. Then, the integration of this knowledge with other multi-omics technologies is required, such as antigen identification based on whole exome sequencing and the subsequent predictive power of the algorithms that predict HLA ligand binding. However, these investigations emphasize the clinical possibilities of naturally processed HLA ligands based on MS sequencing, especially when merged with other "omics" strategies such as RNA-seq or nextgeneration sequencing (NGS) (Dutoit et al., 2012;Nelde et al., 2019).
Another challenge which needs to be overcome is the current need for plenty patient materials and the time frame required for patient-specific groups of HLA ligands and antigen analysis, which hinder and delay the implementation in the clinical environment and routines (Ott et al., 2017;Sahin et al., 2017;Creech et al., 2018). For example, tumor biopsy, HLA typing or whole exome sequencing and mutation detection may take about two weeks. At the same time, once the new antigen target is determined, HLA ligand enrichment, MS data collection and analysis may take several days to several weeks, while the design and manufacture of personalized vaccines may take several weeks. The total time limit for these procedures could be several months, which at this moment is incredibly lengthy for routine use in the clinic (Creech et al., 2018).
In addition, it is important to take into consideration that the response to personalized therapies will depend on the state of the immune system of each patient (Strønen et al., 2016). In order to understand how to optimize and adapt adaptive immune responses, and to improve the prediction and prioritization of antigenic determinants, further research is needed on the expected antigen T cell responses with TCR sequencing and epitope antigen prediction from TCR segments. Bearing this in mind, more studies which combine HLA-ligandome and TCR sequencing for a perfect recognition match between peptide loaded HLA and epitope mapping are required.
Predictions for potential antigens could provide candidates, in many recent studies predominantly for HLA class I epitopes due to the high accessibility of experimental data for class I prediction algorithms in contrast to class II. When CD4 + T cell responses have been studied in preclinical and clinical vaccination investigations (Dutoit et al., 2012;Kreiter et al., 2015;Sahin et al., 2017), it was shown that the processing and presentation of the HLA class II epitope may also play a critical role in the treatment of many diseases. However, although there are prediction algorithms for both classes, those of class II are less accurate because the peptidebinding groove allows longer peptides to bind, increasing the heterogeneity and complexity of epitope presentation (Depontieu et al., 2009;Mommen et al., 2016;Khodadoust et al., 2017). Therefore, more extensive analysis is required to better understand the characteristics of HLA class II bound peptides and the cellular procedures required for processing and presentation of class II epitopes (Kim et al., 2017).
Overall, improving prediction algorithms and combining MS HLA ligand profiles with other "omics" approaches is essential to create opportunities for customized peptide vaccines against various pathologies targeting antigens (self-and non-self) of interest and enable personalized immunotherapies (in oncology or infectious diseases) with large-scale clinical applications (Creech et al., 2018). For this reason, LC-MS/MS technology should continue to promote the improvement of epitope prediction and our knowledge of epitope processing and presentation for personalized immunotherapies which would transform the way patients with infectious, autoimmune diseases or even cancer are treated today.