A Review on T Cell Epitopes Identified Using Prediction and Cell-Mediated Immune Models for Mycobacterium tuberculosis and Bordetella pertussis

In the present review, we summarize work from our as well as other groups related to the characterization of bacterial T cell epitopes, with a specific focus on two important pathogens, namely, Mycobacterium tuberculosis (Mtb), the bacterium that causes tuberculosis (TB), and Bordetella pertussis (BP), the bacterium that causes whooping cough. Both bacteria and their associated diseases are of large societal significance. Although vaccines exist for both pathogens, their efficacy is incomplete. It is widely thought that defects and/or alteration in T cell compartments are associated with limited vaccine effectiveness. As discussed below, a full genome-wide map was performed in the case of Mtb. For BP, our focus has thus far been on the antigens contained in the acellular vaccine; a full genome-wide screen is in the planning stage. Nevertheless, the sum-total of the results in the two different bacterial systems allows us to exemplify approaches and techniques that we believe are generally applicable to the mapping and characterization of human immune responses to bacterial pathogens. Finally, we add, as a disclaimer, that this review by design is focused on the work produced by our laboratory as an illustration of approaches to the study of T cell responses to Mtb and BP, and is not meant to be comprehensive, nor to detract from the excellent work performed by many other groups.

populations from five continents (9) and intragenus conservation (10). We presented an analysis of the complexity of Mtb-specific epitopes in Mtb infected South Africans (11) and provided evidence that bi-allelic RORC mutations are detrimental to host immunity against Mtb (12). We further showed that transcriptomic analysis revealed novel immune signatures associated with TB (13)(14)(15) and the differentiation and function of T cells are influenced by the availability antigens (16).
In particular, previous studies (5) demonstrated the feasibility of utilizing genome-wide screen to identify human leukocyte antigen (HLA) class II epitopes derived from Mtb, based on combined bioinformatic predictions and high throughput ex vivo ELISPOT assays. Feasibility of the approach had previously been demonstrated for viral targets, however tackling a bacterial genome expressing over 4,000 open reading frames (ORFs) had not been attempted. Genome-wide screens have also been conducted to identify CD8 T cell Mtb epitopes (17)(18)(19)(20). Notably, immunodominant CD8 T cell epitopes are enriched in cell wall and secreted proteins (18,19). Future studies will utilize the same approach to focus on Bordetella pertussis (BP), which causes whooping cough.

EPITOPE IDENTIFICATION FOR OTHER BACTERIA ESPECIALLY BP
While our initial focus was mostly directed toward the study of Mtb, other microbes have also been studied. Maybeno et al. (21) described Salmonella epitopes, and Cannella et al. reported studies in Brucella (22). In the context of BP, we showed that initial whole-cell pertussis (wP) vaccination results in long-term Th1/Th17 polarization even with subsequent acellular boosters (23,24).
It is hypothesized that the recent reemergence of BP infection is linked to the adoption of acellular pertussis (aP) vaccines based on specific BP antigens (FHA, Fim2/3, PRN, and PT). It is possible that the previous whole cell inactivated (wP) vaccine elicited a broader reactivity and targeted additional antigens, some of which might be of particular relevance and linked to superior vaccine performance. The extent and targets of T cell immunity in the context of natural infection and clinical disease are likewise not yet defined in a comprehensive fashion.
These considerations argue for performing broad epitope identification and characterization studies in BP as well. In the following sections we describe the techniques we have developed for the purpose of epitope identification and characterization, and then we describe specific applications to the TB and BP systems.

MEASURING HLA EPITOPE AFFINITY
Activation of alpha/beta classical T cells in general requires recognition of a specific peptide epitope, bound to specific major histocompatibility complex (MHC) molecules, a phenomenon classically named "HLA-restriction." The methods used to establish restriction are described in a separate section below. Here we focus on the fact that, since HLA binding is a prerequisite for a peptide being actually recognized as an epitope, measuring its HLA binding affinity is a powerful method to select epitope candidates. The relevant quantitative binding thresholds have been defined for both class I (25) and class II (26)(27)(28).
Our group has been a pioneer in the development of techniques to measure the binding of peptides to MHC molecules, termed HLA molecules in humans. Over the course of the last 30 years we have measured almost half a million MHC peptide binding constants for over 100,000 peptide/MHC combinations, and our group contributed a chapter describing our assay platform in detail to the laboratory compendium Current Protocols in Immunology (29).
The results obtained with this assay have been published in several 100 different peer reviewed journal articles. Our current assay panel allows measurements of binding to over 40 different HLA class I molecules and 35 HLA class II molecules. MHC binding is evaluated using a classical competition assay where peptides of interest competes with radiolabeled probe peptide for MHC binding (Figure 1). Plenty of supply of purified MHC molecules as well as labeled and unlabeled peptides is necessary for the establishment and usage of an MHC-peptide binding assay. Thus, our immunochemistry group has established an ongoing operation where cell lines expressing different alleles are expanded to allow for large-scale HLA purification by affinity chromatography.
Each cell line is rigorously characterized by HLA typing to ensure identity of the HLA allelic variant, and expression is monitored by flow cytometry. Standardized MHC purification protocols using affinity chromatography are in place. Furthermore, the purity and quantity of purified products are assessed by gel filtration and bicinchoninic acid assay (BCA) assays on each preparation. High affinity probes specific for each HLA have been identified for each allelic variant and used in a classic quantitative receptor ligand inhibition assay where IC50 values are used to approximate true K D values. Bound and unbound radiolabeled peptides are separated following incubation for 2 days, and their relative abundance is quantified. The specificity of each of the assays was rigorously determined by demonstrating high affinity binding of known independently defined eluted peptides or T cell epitopes restricted by the allelic variant in question.

HLA POLYMORPHISM AND BINDING PROMISCUITY
Thousands of different class I and class II HLA types exist in human populations. Most of the polymorphic residues line specific pockets in the HLA, which are involved in the peptide-HLA binding interaction. Accordingly, different HLA molecules have, in general, different binding specificities or "motifs" (30,31). In fact, the capacity of general populations to bind a plethora of different sequences is the evolutionary force driving HLA polymorphism, to oppose the potential of pathogens to escape immune recognition by avoiding presentation of their peptides.
This extensive polymorphism poses a fundamental challenge for epitope identification and validation. A comprehensive effort targeting thousands of variants would be impractical and unfeasible. A simple pragmatic solution would be to target HLA alleles that are present with the highest frequencies. However, this approach has to factor the reality that the frequencies of different HLA can vary dramatically across different ethnicities. Thus, comprehensive coverage of ethnically diverse populations requires careful analysis.
A subsequent study used binding data of HLA DR, DQ, and DP to quantitatively assess how much promiscuity existed in HLA class II molecules. We found that these HLAs could be divided into seven major supertypes and, rather surprisingly, the repertoire overlap of class II supertypes was five to ten-fold higher than that of class I supertypes (51). These results indicated that if promiscuous binding would translate into promiscuous T cell recognition, then promiscuous epitopes might constitute a significant proportion of the total response.

PROMISCUOUS HLA CLASS II EPITOPES
In several independent studies and antigen systems we tested the hypotheses that promiscuous epitopes account for a significant fraction of the specific immune response and that promiscuous epitopes can be identified by bioinformatic approaches. The notion that HLA class II promiscuous epitopes correspond to dominant epitopes accounting for a large fraction of the antigen specific response was initially evaluated with a set of overlapping 15-mer peptides spanning the Erythropoietin (EPO) protein (53). A large volume of subsequent data has demonstrated that these findings are generalizable to other systems, including proteins derived from infectious agents and allergens.
One series of experiments analyzed in detail the reactivity of allergic donors to the common allergen timothy grass (Phleum pretense), a mediator of hay fever (54). Over 40 different epitope regions were recognized, but upon closer inspection it was determined that only nine of them were required to cover 51% of the total response. These dominant regions were shown to correspond to promiscuously recognized epitopes, and shown to be predicted by bioinformatics algorithms targeting the most common DR, DP, and DQ variants.
This result was not limited to the timothy grass system. Indeed, similar results were obtained in several different allergen systems, including the Blatella germanica (Bla g) antigens associated with cockroach allergies (55). And, in a broader study, a panel of 133 allergens derived from 28 different sources, including fungi, trees, grasses, weeds, and indoor allergens, was surveyed utilizing predicted promiscuous HLA class II-binding peptides and ELISPOT assays with PBMC from allergic donors, resulting in the identification of 257 T cell epitopes (56).
In conclusion, a number of studies have shown that many peptides with highly promiscuous binding capacity are frequently recognized by immune individuals, and that promiscuous recognition in the context of multiple HLA class II molecules may be a mechanism significantly contributing to epitope immunodominance (33,53,(57)(58)(59)(60). This might be related to the fact that promiscuous epitopes tend to bind HLA with high affinity, or simply that binding to multiple HLAs gives an epitope multiple contexts where it can be associated with immunogenicity. Several studies by our group have demonstrated that bioinformatic predictions directed toward selection of the most promiscuous binding peptides can identify a significant fraction of the pathogen or allergen specific response (54)(55)(56)61).

DEVELOPMENT OF TOOLS TO PREDICT PROMISCUOUS EPITOPES
In the next series of investigations, we sought to derive and optimize a universal prediction schema based on data where sets of 15-mers overlapping by 10 amino acids representing the entire sequences of over 30 different allergens and bacterial proteins had been tested for T cell reactivity in human patients (62) (Figure 2). We specifically wanted to answer the question of how many predictions needed to be combined for maximal efficacy in real human patient populations, and furthermore, we wanted to determine which specific alleles should be included in such an optimal prediction tool. We defined optimal prediction parameters, and the resulting strategy was validated using a blind set of immunogenicity data that had not been utilized to derive the prediction scheme. We found that a 20th percentile IEDB consensus rank, combining predictions for a particular set of seven HLA class II, can predict about half of the total response. This approach can therefore be utilized as a universal prediction scheme, as it has been validated in a broad set of antigenic systems and in genetically diverse human patient populations.
In the study referenced above, the use of actual binding data instead of predictions did not improve the efficacy of the scheme, nor did performing allele specific predictions based on the particular HLA expressed in each individual patient. This indicated that, as expected the limited efficacy of the prediction was not due to the limitations of the algorithms, but rather that HLA binding predictions are associated with a rather high false positive rate, which fits well with the understanding that HLA binding is necessary but not sufficient for immunogenicity. Thus, other factors related to T cell repertoire and antigens processing also play a prominent role.
To address this issue, we used matched sets of dominant epitopes and negative peptides curated from the literature to train neural networks (63). The resulting "immunogenicity score" was further validated on 57 additional datasets (Figure 2). In all, data derived from more than 1,500 human donors and 2,000 peptides was considered in this training and validation effort. The results demonstrated that this agnostic "immunogenicity score" was effective in predicting dominant epitopes and human immunogenicity data. Surprisingly, the combination of immunogenicity score and HLA promiscuous predictions was associated with limited overall predictive improvement, suggesting, as previously noted, that antigen processing/T cell repertoire selection and HLA binding capacity might be influenced by coordinate evolution (64). Taken together, these results highlight that the bioinformatic tools necessary to identify promiscuous epitopes are available and have been validated in several independent studies, in different antigen system and different ethnicities.

DETERMINING HLA RESTRICTION OF T CELL RESPONSES
Determination of HLA restriction is a key element of epitope characterization, and precise knowledge of HLA restriction is also necessary to derive tetrameric staining reagents. HLA restriction was originally determined by the use of antibodies specific for different HLAs, in conjunction with antigen presentation assays; the essence of this strategy is to identify an antibody that blocks presentation of a given peptide to a given defined source of responding T cells (60). While straightforward in principle, this assay is often challenging, since antibodies with suitable specificity and selectivity are often not available, and T cells might promiscuously recognize the same peptide presented by multiple HLAs, yielding results difficult to interpret. Furthermore, the antibody and epitope concentration in the assay must be carefully controlled as excessive amounts of antibodies will inactivate the antigen-presenting cells (APCs) non-specifically, and high concentration of epitope will lead to self-presentation from the responding T cells.
An alternative is represented by determining whether the peptide can be presented by panels of partially HLA matched/mismatched cell lines/PBMCs. This is a powerful and simple approach, but can be limited by availability of suitable cell lines, and complicated again by promiscuous presentation, and the fact certain HLA combinations are in tight linkage disequilibrium. To overcome this limitation, we described an approach to define HLA class II restriction covering DP, DQ, and DR allelic variants that are most commonly represented in the general population (65). We specifically selected 46 DP, DQ, and DR HLAs which were projected to cover ∼90% of these loci and constitute >66% of all the genes at each of these loci. Utilizing HLA data of actual populations from different geographical locations in the USA and Africa, we verified that these projections were accurate. A panel of single HLA transfected cell lines was developed and validated in a series of experiments, involving assessing HLA expression, identity, peptide binding, and epitope presentation (65).
The utility of this panel was further demonstrated by a quantitative study of HLA restriction and antigen-specific responses in a cohort of Mtb-immune individuals (11). Using APCs transfected with the panel of HLA class II molecules described above, HLA restrictions for nearly 300 different epitope/donor combinations were mapped. These results were the first large scale estimate of epitope complexity of CD4 T cell responses in a patient population and a microbial human pathogen, and indicated that the majority of epitopes were associated with promiscuous HLA restriction, further demonstrating the feasibility of the approach developed. FIGURE 2 | Prediction of HLA class II-restricted T cell epitopes. A strategy to globally predict epitopes recognized by human populations has been developed and validated using HLA class II binding prediction tools from the Immune Epitope Database and Analysis Resource (IEDB). A consensus percentile rank of ≤20 has been established. In addition, an artificial neural network model using sets of dominant epitopes and negative peptides has been built to generate "immunogenicity score" that predicts CD4 T cell immunogenicity in the absence of HLA data.
As an alternative complementary approach, we developed a method called Restrictor Analysis Tool for Epitopes (RATE) that can infer HLA restriction using CD4 T cell response data from HLA-typed individulas (66), The method, available online in the IEDB analysis resource, starts by inspecting, one epitope at a time, the HLA types present in individuals responding to each epitope. Then for each of these HLAs, calculates those enriched in frequency, comparing responders and non-responders to the specific epitope. The automated calculation yields a table of likely restrictions, Odd Ratios (ORs) and associated p-values. The method was validated by various experimental approaches, which derived strategies and thresholds for optimal performance (66,67). The method is most effective for monogamous restrictions and by definition less able to detect promiscuous restrictions and HLA frequency variations due to genetic linkage.

ANALYSIS OF EPITOPE CONSERVATION
Several lines of evidence indicate that sequence variability and conservation can have a dramatic effect on the shaping and effectiveness of immune responses in general and T cell responses in particular. This influence is dynamic and can have both positive and negative effects.
One broad series of effects relates to immunological pressure exerted by antimicrobial responses, of which probably amongst the most well noted cases are the widespread mutation of T cell epitopes observed in HIV and HCV (68,69). It has to be underlined that pathogen escape by mutation is most effective for microbes with small genomes, or with responses of limited breadth, since simultaneous escape of a responses directed against large genomes and a large number of antigens/epitopes is by definition unlikely. It has indeed been proposed that the switch from wP to aP generated a response that is less diverse and created an opportunity for BP to escape vaccine responses (70)(71)(72). It will be important to compare mutation rates of epitopes and non-epitopes, in wP, and aP antigens, to potentially either refute or support this hypothesis.
We have also noted that sequence variation or conservation can have a profound influence in shaping T cell responses by a different set of mechanisms. In general, we have noted that when individuals are exposed to different strains of the same species, or different species of phylogenetic related microbes, the immune response tends to focus on conserved epitopes. This is because repeated exposure of different but crossreactive microbes ends up "teasing out" T cell recognizing conserved/homologous epitopes. Specifically, this has been observed in the case of dengue virus (DENV), where repeated exposure to different serotypes focuses the response on conserved epitopes (73,74). The phenomenon is not limited to viruses, and is also observed in the case of grass pollens and ragweed pollen specific allergic responses (75,76). In the case of bacterial genomes, we have shown that intragenus conservation within different mycobacteria species shapes T cell responses (10), and epitopes shared between mycobacteria tubercoloid species and other non-pathogenic mycobacteria are preferentially recognized, indicating that differential reactivity may at least partially accounted for by environmental factors. It is currently unknown whether BP antigens and epitopes that share significant homology to other microbes encountered as a result of environmental exposure might be preferentially recognized.
In a separate study, we have recently shown that the sequence similarity between antigens and human microbiome can either dampen or increase T cell epitope immunogenicity (77). In this study, we systematically evaluated the homology of human microbiome sequences and sets of control peptides and T cell epitopes of various autoantigens, allergens, and infectious pathogens. We expected that human adaptive immune system would be largely tolerant toward sequences identical or highly similar to those found in the human microbiome. We therefore predicted that these sequences would be more frequently found in the non-epitope category, as compared to the dominant epitope category. In many instances of epitope categories this was indeed the case, and reactivity was dampened (tolerogenic effect) suggesting that exposure to microbiome-derived sequence homologs might lead to T cell tolerization. However, in other cases, such as for example mycobacteria, and consistent with the studies mentioned above, the reactivity was increased (inflammatory effect) when the epitope sequence was conserved in the microbiome. It is currently unknown whether BP antigens and epitopes that share significant homology to other microbes contained in the human microbiome might be preferentially recognized or conversely tolerized.

VALIDATION AND CHARACTERIZATION OF T CELL EPITOPES
T cell epitopes can be characterized by various techniques such as mass spectrometry (MS), ELISPOT, intracellular cytokine staining (ICS), activation induced marker (AIM) assay, antigen-reactive T cell enrichment (ARTE) assay, tetramer staining, multidimensional fluorescence-based flow cytometry, and cytometry by time-of-flight (CyTOF), RNA-Sequencing (RNA-seq), and T cell receptor (TCR) sequencing (Figure 3). These techniques can also be combined. For example, performing TCR analysis of tetramer positive cells, or AIM/ARTE assays combined with ICS staining for particular cytokines.

In vitro vs. ex vivo Characterization
T cell responses can be characterized directly ex vivo or after in vitro re-stimulation in case that epitope-specific T cells are rare. Though in vitro re-stimulation allows for greater sensitivity, it may alter the phenotype of responding T cells; thus, the characterization of re-stimulated T cells require specific adjustments to the experimental strategy. Certain epitope characteristics are not altered by in vitro expansion, such as which particular TCR genes are expressed, HLA restriction, sequence conservation of the epitope recognized, or the pattern of cytokine polarization. On the other hand, memory and activation markers and other phenotypic markers usually detected by flow cytometry analysis are altered by the activation caused by cell culture. We have found that it is often possible to assess responses directly ex vivo, by using pools of different epitopes or peptides, so that the overall frequency of responding cells is enhanced. This approach is particularly effective when combined with the AIM assay described below, and particularly key to analyze samples with small volume. More specifically, our group has developed a megapool approach, which consists of large numbers of peptides (78). These "megapools" have been utilized in several systems such as allergies (79,80), tuberculosis (11), tetanus and pertussis (24,81), and DENV for both CD8 and CD4 T cell epitopes (82)(83)(84).

Mass Spectrometry
MS-based approach has been utilized to identify and characterize T cell epitopes presented by MHC molecules since the 1990s (85,86). Briefly, MHC molecules are purified from cell lysates and their associated peptides isolated and analyzed by MS. Although powerful and widely used, a full discussion of this approach is beyond the scope of the current review. Thus, we refer readers to (87) for more details on MS-based immunopeptidomics.

ELISPOT and ICS Assays
In our experience, ELISPOT assay is the most sensitive and high throughput-friendly method to measure T cell cytokine production. Our group has extensive experience using this method and we routinely utilize ELISPOT as a primary screen. In contrast, ICS assay is better at evaluating T cell phenotype and polyfunctionality. Both ELISPOT and ICS assays can characterize epitope pools even with small amounts of PBMCs. In our hands, we can characterize T cell responses with as little as 1 ml of peripheral blood.

AIM and ARTE Assays
In addition to ICS, the selection of ex vivo activated antigenspecific CD4 T cell populations can also be performed by measuring different activation molecules using the AIM assay [e.g., OX40 and CD25, which was co-developed by our group (81,88)] or ARTE assay. The ARTE approach utilizes magnetic-enrichment of T cells that upregulate CD154 (CD40L) to assess human antigen-specific CD4 T cells ex vivo (89). ARTE has been applied to identify antigen-specific T cells for several infections and could select rare antigen-specific T cells after short stimulation period without the need for ICS (90).

Tetramer Staining
This approach identifies antigen-specific T cells using tetramer staining reagents (91,92). Furthermore, tetramer enrichment technique can be utilized if the frequency of antigen-specific T cells is low (93). However, specific reagents for each unqiue HLA:epitope combination of intertest must be produced in order to use this approach. Thus, it is usually used for in-depth characterization of T cells of selected epitope-specificities and HLA restrictions.

Multidimensional Flow Cytometry and CyTOF
These are powerful techniques to characterize cell samples by evaluating the expression of many different markers associated with cell lineages (94), activation and functional activities (95), memory cell subtypes and chemokine receptor expression (96). Multicolor fluorescence-based flow cytometry is in general more user and equipment friendly, as antibodies are more generally available. In addition, this technique allows recovery of the cells by cell sorting, and thus is readily coupled with transcriptomic analysis. In contrast to flow cytometry, CyTOF can detect, discriminate, and quantify antibodies that are conjugated to various heavy-metal isotopes with high accuracy (97). This avoids spectral overlap between fluorophores and allows measuring more cellular parameters simultaneously. High-dimensional phenotypic data can be visualized using algorithms such as visualization of stochastic neighbor embed (viSNE) and spanning-tree progression analysis of densitynormalized events (SPADE) (98). We have utilized CyTOF and viSNE to visualize and characterize the heterogeneity of human CD4 effector memory T re-expressing CD45RA (Temra) cells (95).

Transcriptomic Profiling
Epitope-specific T cells can be further characterized in-depth by transcriptomic profiling that uses deep-sequencing technologies, including bulk RNA-seq or single-cell RNA-seq (scRNA-seq). By comparison with bulk RNA-seq, scRNA-seq is a more powerful tool to address cellular heterogeneity and to identify novel subpopulations in a "hypothesis free" manner, since individual cells within the "same" population may differ dramatically (99)(100)(101). Gene expression profiling using these methodologies is routinely utilized in our laboratory. Examples include the definition of signatures predictive of latent tuberculosis infection (13), the characterization of CD4 cytotoxic memory T cells (95,102,103) or CD4 differential responses to BP primary vaccination after aP boost vaccination (23).

TCR Sequencing
In addition to functional and phenotypic characterization of epitope-specific T cell responses, one can further define their TCR repertoires by TCR sequencing (104). TCRs dictate the antigen specificity of T cells through the interactions with peptide and major histocompatibility complexes. By analyzing epitope-associated TCR repertoires, it is possible to investigate common features of TCRs that are specific for a particular epitope and identify determinants that may predict specificity (105,106). Thus, this strategy will enable researches to systematically integrate epitopes with their specific TCR sequences as well as their associated T cell responses.

GENOME-WIDE SCREEN OF Mtb HLA CLASS II EPITOPES
As a way to illustrate how the various techniques can be utilized to tackle even large complex microbial genomes, we briefly summarize the results of an Mtb genome-wide screen (5). Our general strategy has been to first study in detail a limited number of well characterized dominant antigens, to investigate the mechanisms associated with immunodominance, and provide a point of reference for the genome-wide screen (60). While several dominant antigens were known and well described, a truly systematic and unbiased screen had not been attempted before, due to the complexity of the genome and the large number of ORFs. Next, as summarized below, we performed an unbiased genome-wide screen, and based on the results we selected the dominant epitopes and antigens (9). These were then utilized to characterize the epitope and the phenotype of the associated T cells (8,10,11,11,14), and also to develop a Mtb epitope megapool that was utilized in numerous studies and has proven a valuable tool to analyzed responses in a number of different settings (8,11,13,105).
To perform a genome-wide screen of Mtb, we selected all full genome sequences available at that point, and utilized the approaches described above, to define a library of about 20,000 predicted promiscuous binders (5). These were synthetized, and screened first as pools and then in deconvolution experiments to identify the actual epitopes responsible for T cell activation. The library also included over 1,500 different variants not totally conserved amongst the genomes analyzed. Here it could be noted that the capacity to readily test for sequence variants is an advantage of our approach.
We have identified hundreds of different epitopes; the response was thus remarkably broad, and each individual recognized tens of different epitopes, the dominant epitopes, and antigens varied appreciably from one individual to the next. The overwhelming majority of the response was CD4 restricted, which was not unexpected sine the epitopes were identified based on their predicted ability to bind to HLA class II molecules. When the epitopes were mapped back to their antigen of origin using the H37Rv reference genome, a set of 82 antigens were identified as dominant, in that they accounted for about 80% of the total response. The majority of these antigens were not previously identified as T cell antigens.
Further analysis revealed that the vast majority of the response mapped to very discrete regions of the Mtb genome, and specifically to three clusters of reactivity within the genome, which encoded close to half of the total reactivity. One of the islands contained the well-characterized antigens early secretory antigenic target 6 kDa (ESAT-6) and culture filtrate protein 10 kDa (CFP10), secreted by Type VII secretion systems (T7SS or Esx system). The other two islands also contained Type VII secretion protein pairs. To further highlight the novelty of these observation, we discovered that the antigens that were recognized as dominant were not limited to the secreted proteins, but also included proteins from the actual secretion apparatus. Thus, the results obtained illustrated the feasibility of the approach, while at the same time identifying a number of novel epitopes and antigens, and providing new insights into the mechanisms of immunodominance.

THE RESURGENCE OF BP AS A PUBLIC HEALTH CONCERN
BP has been a health concern since the Middle Ages (107), and whooping cough was prevalent and associated with high morbidity and mortality until the widespread vaccination (108). Vaccination with wP vaccine in general population has greatly reduced whooping cough since the 1950s. Nevertheless, the wP vaccine was associated with of minor adverse reactions and very rare serious side-effects, which resulted in its replacement by the aP vaccine in the United States (109,110). In spite of widespread vaccination, the cases of whooping cough have recently been steadily increasing in the United States (www. cdc.gov). Epidemiological evidence indicates that the increased prevalence may be associated with the switch from wP to aP vaccine in the mid-1990s, further implicating a potential role for waning immunity (www.cdc.gov).
Although the phenomenon of "waning BP immunity" is a serious issue (111), it is not straightforward to address as its manifestation appears more than 15 years following the initial vaccination. Therefore, it is crucial to understand the underlying mechanisms of waning immunity in order to guide the design of effective vaccines. In addition to qualitative differences in the response, several other mechanisms may exist. Two main additional hypotheses have been put forth (Figure 4). First, as the wP vaccine contains >3,400 ORFs, whereas the aP vaccine includes only a few BP proteins, it is likely that a differential breadth of response is induced by the wP and aP vaccines (112). Furthermore, the chemically detoxified pertussis toxin (PT) contained in the aP vaccine may have altered antigenicity and could potentially influence vaccine efficacy (113,114). Second, it has been proposed that decreased vaccine efficacy might be due to antigenic drift (108,(115)(116)(117)(118)(119).
Both antibody and T cell responses are thought to be associated with the effectiveness of pertussis vaccination. Notably, protective immunity against BP persists even after antibody levels have reduced (120)(121)(122), suggesting that T cells play a role in long-term protection against BP. Animal studies suggest that memory CD4 T cells of Th1 and Th17 phenotype mediate for long-term protection, which are induced by infection as well as wP vaccination (123)(124)(125). In contrast, aP vaccination is associated with a predominant Th2 response in humans (126)(127)(128)(129). Furthermore, a few studies have reported that aP vaccination induces qualitative changes in T cell responses, resulting suboptimal efficacy (130)(131)(132)(133) (Figure 4).

GENOME-WIDE SCREEN OF T CELL RESPONSE TO BP
To date, the question of whether the wP vaccine elicits strong T cell responses to additional and different set of antigens to those elicited by aP vaccination, and if so which antigens, has not been addressed. Given the fact that our genome-wide screen of Mtb (5) revealed novel dominant antigens that had escaped detection, despite decades of investigation of Mtb-specific T cell responses, we consider this possibility likely. By the same token, the breadth of responses induced by natural infection and clinical disease are not known. Here as well, it is likely that additional antigens beyond those included in the current vaccine are of importance; for example, the ACT toxin has been shown to be targeted by BP infected individuals, and the combination of PT and ACT results in superior protection from disease in animal models of BP infection and disease (134-136). These considerations underscore further investigation of BP antigens and T cell epitopes, as well as correlates of protection.

DEFINITION OF aP EPITOPES FOLLOWING aP vs. wP VACCINATION
We previously completed a series of studies aimed at the definition of CD4 T cell epitopes derived from the antigens contained in the aP vaccine (24). These illustrate the general feasibility of the study of epitopes and T cell reactivity in BP, and also provide a point of reference to interpret the results obtained in a potential genomic screen of BP T cell reactivity.
In those studies (24), PBMCs from either aP-or wPprimed healthy adults with recent aP booster were used to screen overlapping peptides derived from the protein components that are the foundation of the acellular vaccine: PT, Pertactin, Filamentous hemagluttinin, and Fimbrae 2 & 3). We utilized high-throughput ex vivo ELISPOT assays to measure T cell cytokine production of interferon-γ (IFNγ) and IL-5, and deconvolution of positive peptide pools identified individual T cell epitopes. Epitope mapping revealed the same epitopes were recognized by both aP-and wPprimed individuals (24). However, the ratios of IFN-γ and IL-5 revealed a Th1 bias in originally wP-primed donors, and FIGURE 4 | Incidence of pertussis and proposed models of waning immunity. The phenomenon of resurgence of pertussis is gradually increasing as a public health concern, even in countries with high vaccination coverage. It would be important to define the mechanisms associated with waning immunity based on our current knowledge of the qualitative and quantitative changes in both T cell response and BP genetic evolution under vaccine pressure.
dominance of IL-5 in individuals primed with aP (24). This differential polarization persists following booster, even decades after original priming (24).

CHARACTERIZATION AND VALIDATION OF EPITOPES DERIVED FROM aP-ANTIGENS
As a result of the studies described in the previous section, we defined a "megapool" encompassing the 132 most dominant epitopes recognized, which allowed to assess BP responses directly ex vivo using the AIM assay combined with ICS assays, without need for in vitro re-stimulation, and thus allow direct phenotyping avoid the alterations induced by the in vitro restimulation step. This strategy was utilized to evaluate the phenotype and function of T cells in the PBMCs from either wPor aP-primed donors, following an aP booster 1-3 months postvaccination, to allow for memory T cells return to steady state conditions (23).
Using the ex vivo readouts we still detected the persistent differential polarization previously detected after in vitro restimulation. Moreover, we detected differential polarization toward IL-4 and IL-9 in aP-primed donors and IFN-γ and IL-17 in wP-primed donors (23). This effect was specific for the vaccine antigens, since no difference was noted for other epitopes such as megapools from the ubiquitous antigens CMV and EBV. The IL-17 polarization of wP vaccination had been previously in baboon models, but not for humans (124,125,137,138).
The observation of IL-9 differential polarization is a novel aspect of our study. In-depth phenotypic analysis using combined ICS and transcriptomic analysis of BP-specific memory T cells from aP vs. wP donors revealed clear differences, especially at the level of effector memory T (Tem) cell. 13 differentially expressed genes were identified by comparing ap-Tem and wP-Tem cells, including IL9 and TGIF2, which is related to regulation of TGFβ-responsive genes (139,140). IL5, IL13, and TGFB1 were also up-regulated in samples from aP donors (23).
In contrast to aP prime, wP prime is associated with substantially higher magnitude of CD4 T cell responses following aP booster, when ex vivo responses were assayed in a time window ranging from a few days to several months. Consistent with these findings, by the use of in vitro proliferation assays we could show that the aP originally primed donors were associated with lower proliferative capacity (23). In conclusion, these results demonstrate that the various techniques described above can be used to dissect and define the phenotype associated with BP specific T cell responses, and reveal important biological differences.

CONSERVATION OF BP EPITOPES ACROSS BP VARIANTS
As mentioned before, previous studies (108,(115)(116)(117)(118)(119) indicated that mutation might be accumulating in the acellular vaccine antigens, and that this phenomenon might be related to the apparent waning of BP immunity. Conversely, we also have recently shown that the human microbiome composition modulates T cell responses via molecular mimicry (141).
A first line of preliminary analysis, considers the possibility that new BP strains that carry mutations at key epitopes (pathogen escape), have evolved. Several studies have identified mutations in circulating BP strains that could be the result of pathogen adaptation to immune pressure. For example, Bart et al. (115) identified a total of 471 coding SNPs (genetic variations that result in amino acid changes in the encoded proteins) from prevaccination strains. Precise mapping of the T cell epitopes that are prevalently recognized in the human population, including for antigens that are not also targets of immune responses, will further elucidate if the observed genetic variability of circulating BP strains is indeed a result of T cell immune pressure.

CONCLUSION AND PERSPECTIVE
There is strong evidence suggesting that T cells have important functions in BP immunity and vaccine efficacy. However, T cell epitopes elicited by either natural infection or whole cell (wP) vaccination have not been comprehensively defined, and the corresponding T cell phenotypes have not been characterized. Based upon the success of genome-wide screen of Mtb, we make an argument to support performing a genome-wide screen of T cell responses in individuals vaccinated with wP vaccines, and individuals previously diagnosed with whooping cough disease, to understand the targets of cellular immunity in those conditions. Such an investigation could utilize techniques developed and validated over the years, which include both direct ex vivo assays such as the AIM assay and in vitro expansion of memory T cells utilizing BP lysates. Although powerful, the full genome-wide screen approach also has its limitations. For instance, this approach will not select noncanonical peptides presented by HLA molecules such as peptides originating from non-coding regions and spliced peptides. In fact, recent studies (one of which we coauthored) indicate that a substantial fraction of the HLA peptidome (class I but probably class II as well) is composed of hybrid peptides that originate from two different peptide fragments (so-called cis-spliced or trans-spliced peptides) (142,143). If general rules that predict the splicing mechanisms can be defined, these sliced peptides could be predicted and thus incorporated in the analysis.
T cell responses against the various epitopes and associated antigens can be characterized and validated using several different complementary approaches. These include determining HLA restriction, and measuring HLA binding affinity, characterizing memory phenotypes, functionality and helper T cell subsets, and patterns of epitope sequence variation. Additionally, it would be of considerable interest to characterize transcriptomic profiles associated with recognition of the new epitopes identified, as compared to the ones currently included in the aP vaccine. These studies could potentially address several hypotheses proposed to explain the decreased efficacy of aP vaccines, namely differences in antigen specificity, differences in functionality, and/or mutations associated with the antigen/epitope associated with vaccine responses. Furthermore, we anticipate that these studies will be broadly applicable to other intracellular bacterial pathogens such as Salmonella and Brucella.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

FUNDING
This work was supported by National Institutes of Health contracts and grants HHSN272200900044C, HHSN272201200010C, HHSN272201400045C, U19 AI118626, and U01 AI141995.