Shotgun Immunoproteomic Approach for the Discovery of Linear B-Cell Epitopes in Biothreat Agents Francisella tularensis and Burkholderia pseudomallei

Peptide-based subunit vaccines are coming to the forefront of current vaccine approaches, with safety and cost-effective production among their top advantages. Peptide vaccine formulations consist of multiple synthetic linear epitopes that together trigger desired immune responses that can result in robust immune memory. The advantages of linear compared to conformational epitopes are their simple structure, ease of synthesis, and ability to stimulate immune responses by means that do not require complex 3D conformation. Prediction of linear epitopes through use of computational tools is fast and cost-effective, but typically of low accuracy, necessitating extensive experimentation to verify results. On the other hand, identification of linear epitopes through experimental screening has been an inefficient process that requires thorough characterization of previously identified full-length protein antigens, or laborious techniques involving genetic manipulation of organisms. In this study, we apply a newly developed generalizable screening method that enables efficient identification of B-cell epitopes in the proteomes of pathogenic bacteria. As a test case, we used this method to identify epitopes in the proteome of Francisella tularensis (Ft), a Select Agent with a well-characterized immunoproteome. Our screen identified many peptides that map to known antigens, including verified and predicted outer membrane proteins and extracellular proteins, validating the utility of this approach. We then used the method to identify seroreactive peptides in the less characterized immunoproteome of Select Agent Burkholderia pseudomallei (Bp). This screen revealed known Bp antigens as well as proteins that have not been previously identified as antigens. Although B-cell epitope prediction tools Bepipred 2.0 and iBCE-EL classified many of our seroreactive peptides as epitopes, they did not score them significantly higher than the non-reactive tryptic peptides in our study, nor did they assign higher scores to seroreactive peptides from known Ft or Bp antigens, highlighting the need for experimental data instead of relying on computational epitope predictions alone. The present workflow is easily adaptable to detecting peptide targets relevant to the immune systems of other mammalian species, including humans (depending upon the availability of convalescent sera from patients), and could aid in accelerating the discovery of B-cell epitopes and development of vaccines to counter emerging biological threats.


INTRODUCTION
Development of an effective vaccine against a biothreat agent or emerging pathogen is a costly and cumbersome process that can take years to decades to complete. The identification of antigens that stimulate protective immunity against a pathogen can represent a significant bottleneck in the vaccine development process, especially for bacterial or fungal pathogens, eukaryotic parasites, or even large DNA viruses, which can contain hundreds to thousands of potential antigens. Our study addressed the need to accelerate this process by testing the feasibility of a screening platform for efficient identification of immunoreactive peptides that could be utilized as candidates for development of peptidebased vaccines.
Peptide-based vaccines represent a potential solution to provide protection against biothreat and emerging pathogens to which current vaccine development strategies have failed. Peptide vaccine formulations consist of multiple synthetic linear epitopes that together trigger immune responses resulting in robust immune memory. This multi-epitope, multi-target approach has the potential to be broadly protective across divergent strains (e.g., the first universal influenza vaccine to enter phase III clinical trials was a peptide vaccine), and could be effective for pathogens with complex life cycles (e.g., several malaria peptide vaccines are currently in clinical trials) (1)(2)(3). Although it has been reported that conformational (discontinuous) epitopes make up the majority of B-cell epitopes (4), linear epitopes possess several advantages for vaccine design over conformational epitopes. Due to their short sequence and lack of complex secondary and tertiary structure, short antigenic peptides can be easily synthesized, and multiplexed into vaccine formulations, for high-throughput assessment of efficacy. Consequently, peptide-based vaccines are potentially powerful medical countermeasures that would seem amenable to rapid development in responding to infectious disease outbreaks.
Current strategies for epitope identification depend upon detection of epitopes within an individual full-length protein, a low-throughput approach that requires prior knowledge of the antigenic protein, its sequence, and its conformational structure. Technologies to screen for epitopes at the whole proteome level have been developed (e.g., proteomic microarrays, phage and yeast display); however, these technologies require extensive use of synthetic biology and other time-consuming methodologies (e.g., library construction, peptide/protein array preparation, heterologous protein expression) (3,(5)(6)(7)(8)(9)(10)(11). Another major disadvantage of display technologies and use of non-native expression systems is that these methods do not reliably replicate the native properties of the antigenic proteins, including their post-translational modifications, which can lead to inaccurate identification of epitopes.
In this study, proteome-wide screening for linear B-cell epitopes was achieved using total protein extracts isolated from the pathogen of interest, affinity purified using antibodies from convalescent sera from infected animals. This strategy holds several advantages over the currently available methods for epitope discovery: It does not require prior knowledge of antigenicity or antigen structure, and obviates need for complex and laborious experimental techniques such as preparation of display libraries and heterologous protein expression. As with other methods for epitope discovery from serum, it may be less well suited for pathogens for which natural infection does not confer immunity, such as HIV, malaria and TB, although even in those cases protective antibodies may be found in some subsets of patients or animal models (12)(13)(14).
Our approach was designed to enable identification of the protein antigen and, importantly, the antigenic regions within the identified antigen, such that these short linear peptides can be immediately synthesized and tested for efficacy in vaccine formulations. Note that several strategies have been previously developed for the identification of T-cell peptide epitopes (15,16), including techniques similar to that presented here involving purification of MHC-bound peptides and their subsequent identification via LC/MS/MS (17).
In this study, we focused on two intracellular bacterial pathogens, Francisella tularensis (Ft) and Burkholderia pseudomallei (Bp), organisms which pose a high risk for misuse as bioweapons and therefore are considered Tier 1 Select Agents by the US Centers for Disease Control and Prevention. The mortality rates of both pathogens are high, and there is currently no licensed vaccine available for either agent (18)(19)(20). Humoral immunity plays an important role in developing immune protection to both of these intracellular pathogens, making them good model organisms for the purposes of this study (21)(22)(23)(24)(25)(26). In addition, the immunoproteome of Ft has been thoroughly characterized (19,27,28), such that the previously published data could be compared to the datasets generated in our study. We leveraged a merged dataset of 164 previously identified antigens, corresponding to~10% of Ft proteome. The Bp immunoproteome is not as well characterized compared to that of Ft: our reference dataset contained only 61 previously identified seroreactive proteins, corresponding to~1% of the Bp proteome (29,30). Consequently, analysis of the dataset resulting from the Bp screen has revealed many proteins that have not been previously categorized as antigens.

Bacterial Strains and Culture Conditions
Francisella tularensis SCHU S4DclpB ("Ft-DclpB") was a generous gift from Dr. Wayne Conlan (National Research Council Canada). Stock cultures were prepared by growing Ft-DclpB on Chocolate II Agar plates supplemented with hemoglobin and isovitalex (BD 221169) for 48 hours at 37°C. Bacteria were harvested by scraping confluent lawns into Mueller Hinton (MH) broth containing 20% (w/v) sucrose, and stored at -80°C at a concentration 108 -109 CFU/mL. Burkholderia pseudomallei mutant DpurM ("Bp82") was obtained from BEI resources (NR-51280). Frozen stocks were prepared by growing the bacteria to log phase in Luria-Bertani (LB) broth, adding glycerol to achieve 20% (w/v) with the bacteria at a final concentration of 10 8 -10 9 CFU/mL, and storing aliquots at -80°C. For immunizations, the Ft-DclpB and Bp82 bacterial stocks were thawed and diluted in sterile phosphate-buffered saline (PBS) to the specified concentrations used for dosing. For protein extraction purposes, Ft-DclpB and Bp82 were propagated to log phase in MH and LB broth, respectively. Both bacterial strains used in this study are classified as Risk Group 2 organisms. All biological materials were handled under standard institutional biosafety and biosecurity procedures, as outlined in an approved Institutional Biosafety Committee (IBC) protocol.

Protein Extraction and Peptide Preparation
Ft-DclpB and Bp82 were grown to log phase in 300 mL of MH broth or LB broth, respectively, at 37°C with shaking (250 rpm). The bacteria were harvested by centrifugation at 3200 x g for 10 min at 4°C, washed once with 10 mL of PBS, and the pellet flash frozen using dry ice. The bacteria in the pellet were lysed by subjecting them to two freeze-thaw cycles (alternating between room temperature and dry ice). For protein extraction, the lysate was mixed with Bper Complete Bacterial Protein Extraction Reagent (Thermo Fisher Scientific cat# 89822), and the mixture incubated at room temperature for 15 min with rotational shaking. The mixture was then subjected to two rounds of sonication (1 sec pulses, timed output 10 sec, at 50% power) using a Heat Systems Ultrasonics sonicator (model W-385), and centrifuged at 16,000 x g for 10 min. Proteins were precipitated with acetone and washed twice with ethanol. Airdried protein pellets were solubilized using 8M urea and Protease Max surfactant (Promega cat# V2071), then digested with trypsin (Promega cat# V5111) using the in-solution digestion protocol provided by the manufacturer (Promega cat# TB373). Completion of the trypsinization reaction was confirmed by gel electrophoresis. The trypsin-digested proteins were filtered using 10K MWCO concentrators (Pierce) at 10,000 x g for 20 min at 20°C, and the filtrates (purified peptides) stored at -20°C. These purified peptides preparations were used as inputs in subsequent experiments.

Mice and Immunizations
Mouse immunization studies were carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals and the National Institutes of Health. Standard institutional safety and biosecurity procedures were followed for in vivo experiments. Appropriate efforts were made to minimize suffering of animals. All animals were housed in ABSL2 conditions in an AAALACaccredited facility, and the protocol (Protocol 270, renumbered 284) was approved by the LLNL Institutional Animal Care and Use Committee (IACUC). For immunization, 6 week-old female specific-pathogen-free BALB/c-Elite and C57BL/6J-Elite mice (Charles River) were injected subcutaneously with 10^6 CFU Ft-DclpB (BALB/c and C57BL/6J), or intradermally with 10^7 CFU Bp82 (BALB/c), and boosted at 2 weeks. No adjuvants were used. Matched PBS-dosed controls were included for each injection route. Course of infection was monitored by performing daily health scoring and weight measurements. Mice that developed infection wounds (Ft only) were topically treated with Dakin's solution to encourage wound healing, and allowed to remain on test so long as they did not meet humane endpoint criteria (any mice with~20% body weight loss or overt signs of morbidity were humanely euthanized). Sera from euthanized mice were excluded from analysis due to lack of immunity to the pathogen. Convalescent sera were harvested from resilient mice at 4 weeks post-infection, via cardiac puncture terminal bleeding under inhaled isoflurane anesthesia followed by blood fractionation [centrifugation at 3800 x g for 15 min in microtainer serum separator tubes (BD)]. Sera were stored at -80°C.

SDS-PAGE and Western Analysis
Western analysis was performed to confirm seropositivity of infected mice. Bacterial lysates were prepared using Bper Complete Bacterial Protein Extraction Reagent (Thermo Fisher Scientific cat# 89822), combined with Laemmli loading buffer (BioRad), and boiled at 95°C for 5 min. Samples were loaded onto 4-15% acrylamide gels (Mini-Protean TGX, BioRad) and separated by electrophoresis at 120 V for 1 hr. The proteins were transferred from the gels to nitrocellulose membranes (BioRad). Membranes were blocked with Tris-buffered saline plus 0.05% Tween 20 (TBS-T) plus 5% nonfat dry milk, at room temperature for 1 hr or at 4°C for 16 hrs. The membranes were hybridized with mouse sera at 1:500 dilution in TBS-T plus 5% milk, at room temperature for 2 hrs; washed three times with TBS-T; and then incubated with goat anti-mouse antibodies conjugated to HRP (Pierce cat# 1858413), at 1:5000 dilution in TBS-T plus 5% milk, at room temperature for 1 hr. After three TBS-T washes, the membranes were developed using SuperSignal ™ West Pico PLUS Chemiluminescent Substrate (Thermo Fisher Scientific).

Enzyme-Linked Immunosorbent Assay
ELISA was performed to assess the level of seropositivity of infected mice. Wells were coated with bacterial lysates and incubated at 4°C for 16 hrs. After three washes with PBS plus 0.1% Tween-20 (PBS-T), sera from infected mice diluted to 1:100 with PBS were added to the wells and incubated at room temperature for 1 hr. Following four PBS-T washes, the wells were incubated for 1 hr with Recombinant Protein A/G peroxidase (Pierce cat# 32490) diluted at 1:5000 with PBS.
After four PBS-T washes, 1-Step ABTS Substrate Solution (Pierce cat# 37615) was added, and after 15 min incubation any colorimetric changes in the wells were detected using a microplate reader (Tecan M200 Pro).

Affinity Purification of Immunoreactive Peptides
Magnetic beads coated with protein G (Invitrogen cat# 10007D) were used to capture antibodies from pools of sera obtained from either infected (experiment) mice or mock-infected (control) mice, following the manufacturer's protocol (MAN0017348). Each pool was comprised of sera recovered from 3-5 mice, with equal volumes used for each experiment-control pair. The antibody-coated beads were then incubated with peptide preparations (inputs) at room temperature for 45 min. Antibody-coated beads from each experiment-control pair were incubated with the same input peptides; in total, 6 input peptide preparations were used with the 8 Ft experiment-control pairs, and 5 with the 9 Bp experiment-control pairs. Following three PBS washes, immunoreactive peptides were eluted from the beads using citrate buffer (pH 3). Input, unbound, and eluted (output) peptides were flash frozen with dry ice and stored at -20°C.

Mass Spectrometry
The input, unbound, and eluted (output) peptides recovered from the antibody-coated beads (see preceding section) were desalted using an Empore SD solid phase extraction plate; lyophilized; reconstituted in 0.1% TFA; and analyzed via LC-MS/MS by MS Bioworks (Ann Arbor, Michigan), using a Waters M-Class UPLC system interfaced to a ThermoFisher Fusion Lumos mass spectrometer. Peptides were loaded on a trapping column and eluted over a 75 mm analytical column at 350 nL/ min. Both columns were packed with Luna C18 resin (Phenomenex). A 2 hr gradient was employed. The mass spectrometer was operated in a data dependent HCD mode, with MS and MS/MS performed in the Orbitrap at 60,000 FWHM resolution and 15,000 FWHM resolution, respectively. The instrument was run with a 3 sec cycle for MS and MS/MS.

MS Data Processing
Data were analyzed using Mascot (Matrix Science) with the following parameters: Enzyme: Trypsin/P; Database: UniProt F. tularensis SCHU S4 or UniProt B. pseudomallei strain 1026b (forward and reverse appended with common contaminants and mouse IgG sequences); Fixed modification: Carbamidomethyl (C); Variable modifications: Oxidation (M), Acetyl (N-term), Pyro-Glu (N-term Q), Deamidation (N/Q); Mass values: Monoisotopic; Peptide Mass Tolerance: 10 ppm; Fragment Mass Tolerance: 0.02 Da; Max Missed Cleavages: 2; Mascot DAT files were parsed into Scaffold Proteome Software for validation, filtering and to create a non-redundant list per sample. Data were filtered using 1% protein and peptide FDR and requiring at least one unique peptide per protein.

Bioinformatic Analysis
Each experiment typically consisted of three sets of data: "Input" (total bacterial peptides without affinity purification), "Control" (peptides purified from beads coated with antibodies from uninfected mice), and "Experiment" (peptides purified from beads coated with antibodies from infected mice).
LC-MS/MS data were analyzed at the peptide level, rather than rolling up peptide scores into a protein abundance metric as would be done in standard proteomics. We used the Total Ion Current (TIC, total area under the MS2 curve) as a metric for the abundance of the peptide in each sample. Input datasets were first normalized against each other based on median ratios for the peptides occurring in every Input dataset. The sparser Control and Experiment datasets were then normalized against their respective Input dataset based on median ratios as well.
Since each animal can be expected to raise a different set of antibodies, we counted how often specific output peptides occurred more abundantly in the Experiment vs Control, rather than focusing on the average log fold change in abundance. For each peptide and each Experiment sample, we assigned an enrichment score of +1, 0, or -1 depending on whether the normalized peptide abundance was greater than, equal to, or lower in the Experiment than in the corresponding Control sample, creating a score matrix of peptides × Experiments. The total enrichment score for each peptide is then the sum of its enrichment scores across each Experiment. Statistical significance was evaluated by generating a number of randomized score matrices, where each peptide was randomly assigned a +1, 0, or -1 score for each Experiment, with the same probabilities as in the real matrix, and calculating how frequently peptides reach an specific total enrichment score. This gives us a background level of how many high-scoring peptides we would expect to see even if there was no correlation in peptide abundance across the different experiments, which can then be used to calculate the significance level of observing a given number of high scoring peptides in the real data, using a simple binomial test comparing expected vs observed number of peptides exceeding a given score.
Amino Acid Conservation Scores were calculated using the ConSurf web server (31) with default parameter values, using near full-length protein structure homology models from SWISS-MODEL or crystal structures from PDB where available. These scores are normalized position-specific evolutionary rates, with negative scores indicating the most conserved amino acids. The Average Amino Acid Conservation Score (AAACS), proposed by Ren et al. as a useful tool to identify conserved epitopes that may be targeted by broadly neutralizing antibodies, is the average of the conservation score for the residues in an epitope, with negative scores indicating more highly conserved regions (32).
In addition to AAACS, we also scored peptides based on how many complete sequenced genomes of pathogenic B. pseudomallei and F. tularensis they occurred in, similar to the conservation analysis in EpitoCore (33). We downloaded proteomes for all 110 B. pseudomallei strains with complete genome sequences available through NCBI. For F. tularensis, 36 strains with complete genomes were available through NCBI, but several of these corresponded to the less-pathogenic novicida, holartica and mediasiatica subspecies, so we decided to focus exclusively on the 17 available F. tularensis subsp. tularensis complete genomes. We identified homologs with ≥90% sequence identity to the proteins containing our top scoring peptides in Tables 1, 2, and then scored each peptide based on how often they had a 100% identical hit in each homolog.
We used two state-of-the-art computational B-cell epitope prediction tools to evaluate all of the peptides in our proteomic data that match the proteins in Tables 1, 2. Peptides were submitted to the iBCE-EL web server for scoring (34). iBCE-EL is an ensemble-based method based on extremely randomized tree and gradient boosting classifiers, trained on 5,550 experimentally validated B-cell epitopes and 6,893 nonepitopes from the Immune Epitope Database, to identify linear B-cell epitopes. In addition, proteins were submitted to the Bepipred Linear Epitope Prediction 2.0 tool on the IEDB website (35), and peptides were then scored based on their average predicted residue score. Bepipred 2.0 is a random forest classifier trained on 160 non-redundant antigenantibody crystal structures, to predict the probability that a given antigen residue is part of an epitope.

Overview of Immunoproteome Screen
In this study, we tested the feasibility of proteome-wide screening for linear B-cell epitopes using peptide extracts from target bacteria and sera from infected animals. The method requires: (1) isolation of peptides from lysates generated from the target bacteria; (2) challenge of the host (in this case, mouse) with the target bacteria, followed by collection of convalescent serum; (3) mixing of the bacterial peptides and convalescent serum, to allow peptide antigens to bind to their cognate antibodies in the serum; and (4) recovery of bound peptides for identification through mass spectrometry ( Figure 1). We applied this method to two bacterial Select Agent pathogens: Francisella tularensis and Burkholderia pseudomallei. Infection with attenuated strains of these pathogens [F. tularensis SCHU S4DclpB and B. pseudomallei DpurM (strain Bp82)] has been shown to stimulate development of protective immunity against their corresponding fully-virulent parental strains (F. tularensis SCHU S4 and B. pseudomallei K96245, respectively) (36,37), suggesting that convalescent sera recovered from hosts infected with these attenuated pathogens must contain protective antibodies. Briefly, proteins purified from pathogen lysates were digested with trypsin to generate a peptide library. Mice were infected with a sublethal dose of Ft-DclpB or Bp82, and immune status assessed through observed weight loss and measurement of seroreactivity of mouse sera to pathogen lysates via enzyme-linked immunosorbent assay (ELISA) or Western blot analysis ( Figure 2). Antibodies purified from the convalescent sera of infected mice were immobilized on magnetic beads and then incubated with pathogen-derived peptides to allow formation of antigen-antibody complexes. Peptides recovered from the immobilized antibodies were identified via liquid chromatography coupled with tandem mass spectrometry.

Bioinformatic Identification of Enriched Antigenic Peptides
The peptides recovered using pooled sera from infected mice (Experiment peptidome) were compared to those recovered from mock-infected mice (Control peptidome); a total of 8 pairs of Experiment-Control peptidomes were collected for Ft, and 9 pairs for Bp. For Ft, we found that out of the 1923 peptides that were recovered in at least two Experiment peptidomes, 44 had an enrichment score of 6 or greater ( Table 1), whereas only 20.1 +/-6.1 peptides would be expected at random (p=1x10 -6 ). For Bp, out of 2902 peptides that were recovered in at least two Experiment peptidomes, 46 peptides had an enrichment score of 6 or greater ( Table 2), whereas only 17.8 +/-4.3 peptides would be expected at random (p=1.9x10 -9 ). If a more stringent enrichment cutoff is desired, we found 16 Ft peptides with an enrichment score of 7 or greater, versus 3.5+/-1.6 expected at random (p=1.8x10 -7 ), and 20 Bp peptides with an enrichment score of 7 or greater, versus 4.1+/-2.1 expected at random (p=3.9x10 -9 ). The enriched peptides included some that were derived from protective antigens identified in previous studies, as well as predicted outer membrane and extracellular proteins (Tables 1, 2). There were many examples of multiple enriched peptides originating from the same protein (highlighted in bold in the tables), a further indication that enrichment was not random but rather due to immune response to a discrete set of bacterial proteins.
Note that we used C57BL/6J mice for two of the eight Ft experimental samples, because of previously reported differences in protection and antibody response after immunization of C57BL/6J and BALB/c mice with Ft-DclpB by Twine et al. (38). Analyzing the BALB/c Ft samples separately yielded a very similar set of results as in Table 1, but with lower p-value for the enrichment due to the smaller number of samples (data not shown). Therefore, we decided to combine the data and focus on antibody responses in common between both strains of mice. Although Twine et al. reported an antibody response against chaperonin protein GroL only in BALB/c mice, our data shows that there are several GroL epitopes that are enriched in samples from both mouse strains (see Table 1 and Figure 4).
Prior immunoproteomics analysis of the antibody response to F. tularensis using human or mouse sera has identified 164 antibody targets out of a total of 1667 proteins (~10% of the entire Ft proteome) (19,27,28). Out of the 1923 peptides that have hits in at least two Ft datasets, 876 peptides match known antigenic proteins. Given those numbers, we would expect only 20 such peptides to show up at random in our list of 44 in Table 1, but instead we observe that 38/44 peptides in the list correspond to known antigens -an almost two-fold enrichment (p=2.79x10 -9 ). Note that despite the extensive literature on antigens in Ft, only five B-cell epitopes have been experimentally determined ( Figure 4B), justifying the need for a simple experimental epitope screening method. The immune response to B. pseudomallei has not been studied in as much depth as for Francisella. So even though Bp with 6203 protein coding genes has a genome that is more than three times as large as that of Ft, we found only 61 known antigens identified in previous studies (29,30) (~1% of the entire proteome). Our list of 46 top Bp peptides in Table 1 includes one known antigen, which does not qualify as a statistically significant enrichment primarily because of the much smaller total number of known antigens for Bp. Figure 3 shows all 46 Ft DnaK peptides that were detected in at least two Experiment samples, regardless of their degree of enrichment. Eight of these DnaK peptides are in our list of 44 enriched Ft peptides ( Table 1 and red line segments in Figure 3A), including two that are enriched in all 8 Experiments (red line segments in Figure 3A). Note the lack of correlation between our experimental enrichment scores and the iBCE-EL and Bepipred scores (Figures 3B, C). All but one of the 8 enriched peptides are conserved in all 17 fully sequenced Ft strains ( Figure 3E), but some of the peptides towards the Cterminal show a greater evolutionary rate as measured by their Average Amino Acid conservation Score (AAACS, Figure 3D) and thus may be more prone to immune escape mutants. Figure 4 shows all 32 Ft GroL peptides that were detected in at least two Experiment samples in our study, regardless of the degree of their enrichment. Four of these GroL peptides are in our list of 44 enriched Ft peptides ( Table 1 and red line segments in Figure 4A), including three that are enriched in all eight Experiments. Lu et al. (39) used hydrogen/deuterium exchangemass spectrometry (DXMS) to experimentally identify one discontinuous and four linear B-cell epitopes for a selection of mouse monoclonal antibodies against GroL ( Figure 4B). Note that one of the four enriched peptides in Figure 4A (DNTTIIDGAGEK) overlaps with a linear epitope (NTTIIDGAGEKEAIAKRINVIK) and a discontinuous epitope (SEDLSMKLEETNM-NTTIIDGAGEKEAIA) identified by DXMS in Figure 4B, while a second enriched peptide . Seroreactivity of mice sera to microwells coated with corresponding pathogen lysate was assessed using protein-A/G-HRP and measuring sample absorbance (optical density). Sera of some mice infected with Ft did not yield positive results because Ft infection led to lethal outcome and mice had to be euthanized during the course of immunization. Graphs represent two technical replicates for sera collected from each mouse. Antibodies from sera with the strongest Western blot and ELISA signals were purified in this study and used to screen for immunogenic peptides.
(EGVITVEEGK) is directly adjacent to another of the linear DXMS epitopes (FEDEL). According to the Immune Epitope Database (IEDB) (40), these are the only experimentally validated B-cell epitopes for Ft (IEDB also lists four B. pseudomallei antigens that have been assayed for B-cell epitopes, none of which overlap with the proteins in Table 1).

DISCUSSION
We have developed a widely applicable shotgun immunoproteomic method that enables efficient identification of B-cell epitopes in the proteomes of pathogens. The results of this study have revealed a significant enrichment of peptides derived from previously identified antigens and vaccine candidates, validating the method's efficacy. This method was designed to identify linear epitopes efficiently without the need of genetic manipulation or other experimental techniques that can be costly and labor intensive. Attenuated strains made the optimization of this proof-of-concept study more efficient; however, the availability of an attenuated strain for the target organism does not represent a limitation, as our strategy could be applied to fully virulent strains of pathogens as well. Although the present study was performed using a mouse model, the workflow could be easily adapted to detecting targets relevant to the human immune system, using convalescent sera from patients. Utilizing peptide antigens for vaccine development has several advantages over typical vaccine development efforts. Similar to other types of subunit vaccines, peptide vaccines represent a safer alternative to attenuated vaccines due to lack of any potentially infectious materials in the vaccine formulation. Use of short peptides sufficient for stimulation of immune response favors  Table 1 was an enrichment score of ≥6 (shown in red). (B) B-cell epitope prediction score generated using iBCE-EL. At the default iBCE-EL score threshold of 0.35, nearly three quarter of all peptides were predicted to be likely Bcell epitopes (shown in dark blue). (C) B-cell epitope prediction score generated using Bepipred 2.0. The per-amino acid scores are indicated by the line graph. At the default iBCE-EL score threshold of 0.35, 37% of all amino acids were predicted to be in B-cell epitopes (regions of the graph shown in yellow). exclusion of deleterious sequences that may be present in full length antigenic proteins. Peptide vaccine formulations are defined and their contents fully synthetic, which simplifies quality control procedures and thereby streamlines the regulatory approval process. Production of peptide vaccines is expected to be relatively fast and inexpensive, due to ease of synthesis and recent advances in improved peptide stability (3,41,42). Moreover, once antigenic peptides are identified, evaluation of their efficacy could represent a lesser challenge due to the possibility of multiplexing peptides during in vivo trials, rather than use of one-at-a-time testing Among Ft proteins, the present screen identified multiple peptides for two well-characterized antigens, 60kDa chaperonin GroL (Q5NEE1) and chaperone protein DnaK (Q5NFG7). Both chaperonins have been previously implicated in virulence of Francisella (43)(44)(45), and are known to induce antibody production in mice and humans (27,46,47). These chaperonin proteins are important for facilitating folding of nascent proteins as well as post-translational modifications. They are also known as heat-shock proteins, as they protect cellular proteins from environmental stresses such as high temperature and low pH (47,48). Although their cellular localization is predicted to be cytoplasmic, they reportedly also associate with membrane proteins and are released into host cells during infection (47,(49)(50)(51) perhaps contributing to their ability to stimulate various immune functions, including innate immunity, humoral immunity and cell-mediated immunity (43,47,(52)(53)(54)(55). Heatshock proteins are good candidates for subunit vaccine design due to their ability to stimulate various immune responses without the need of adjuvant; in fact, both GroL and DnaK have been exploited for vaccine development efforts targeting Francisella and other pathogens (39,47,56,57).
Highly virulent Type A Francisella strains such as SCHU S4 can bind host plasminogen to the bacterial cell surface where it can be converted to plasmin, a serine protease that degrades opsonizing antibodies, inhibiting antibody-mediated uptake by macrophages (58,59). Among the 25 Ft proteins listed in Table 1, we find at least 3 that are known to be involved in plasminogen binding in Francisella or other pathogens, including conserved hypothetical lipoprotein LpnA (Q5NGE4) (59), fructose-1,6-bisphosphate aldolase (Q5NF78) (60), and elongation factor Tu (Q5NID9) (61). These proteins could make for particularly attractive vaccine targets, because if we can interfere with their function before the pathogen has activated its plasmin-mediated antibody evasion, that would make it more susceptible to other antibodies as well.
Among the antigenic peptides identified in the Bp proteome are those belonging to Type VI secretion system component Hcp-,1 and previously identified antigen 10kDa chaperonin GroES (62). Hcp-1 was previously found to be a major virulence determinant in Burkholderia and recognized by sera from infected human patients and animals (63)(64)(65). Due to this, Hcp-1 has been interrogated as a potential candidate for Burkholderia vaccine development (63)(64)(65). Additionally, a peptide from an ankyrin repeat-containing protein (A0A0H3HJC) came up as one of the highest scoring peptides in our study. Ankyrin repeats are typically eukaryotic protein domains involved in protein-protein interactions (66), but have been co-opted by many bacterial pathogens as type IV secreted effector proteins to mimic or manipulate various host functions (67).
Recovery of peptides derived from several supposedly cytosolic enzymes may seem puzzling. However several "housekeeping" enzymes are known to be displayed on the surface of pathogens where they play a role in virulence (68). For example, our top scoring peptides from B. pseudomallei include two derived from enolase (A0A0H3HLA6). While enolase is primarily thought of as a key glycolytic enzyme, it is also expressed on the surface of a wide variety of bacterial and fungal pathogens, where it interacts with host plasminogen and is associated with invasion and virulence (69). Antibodies against enolase have been detected in a large variety of infectious and autoimmune diseases (70). It is as yet unknown whether enolase plays the same role in Burkholderia, but the protein is predicted to be present both in the cytoplasm and on the cell surface, and  Table 1 was a score of ≥6 or better (shown in red). (B) Five B-cell epitopes identified by DXMS by Lu et al. (39), including one discontinuous epitope.
its production was found to be upregulated upon exposure to human lung epithelial cells (71). Other housekeeping proteins in our top scoring results whose homologs in other pathogens are known to play a role in adhesion, invasion, or virulence include elongation factor Tu (Q5NID9), malic enzyme/malate dehydrogenase (A0A0H3HP28, Q5NHC8), and fructose-1,6bisphosphate aldolase (Q5NF78) (68).
Overall, this immunoproteomic workflow has identified numerous peptides mapping to previously identified antigens and subunit vaccine targets, predicted membrane-associated proteins, as well as uncharacterized proteins. The Ft datasets revealed a significant enrichment of peptides belonging to previously identified antigenic proteins in Experiment samples relative to their respective Control samples, providing validation to this approach. Interestingly, several of these known antigens also yielded multiple top scoring peptides in our analysis. Despite the large amount of prior immunoproteomic analysis on Ft, covering 10% of the genome, experimentally validated B-cell epitopes are available for only a single protein, and our analysis captures two out of its five known epitopes. Due to the much smaller number of previously identified antigens for Burkholderia, we were not able to tell whether the enrichment in the Bp datasets was significant. Improved proteome coverage and more comprehensive immunogenic profiles could be achieved with the use of alternative enzymes with different specificities, since there is a risk of ablating epitopes that contain cut sites recognized by specific enzymes such as trypsin. Alternatively, performing incomplete digestion with one enzyme, or a cocktail of enzymes with different specificities, could increase the number of overlapping peptides and thereby improve the yield and diversity of identified epitopes. In addition, since the presented method is dependent upon extraction of proteins from whole cell lysates, it is conceivable that the proteome coverage could be biased toward highly abundant proteins or those proteins that are easier to extract, despite this disadvantage we have detected several membrane-bound antigens in this study.
A variety of computational B-cell epitope prediction tools have been developed to identify epitopes in antigens. However accurate computational prediction of B-cell epitopes still poses a major challenge (72), with sensitivity or specificity typically below 60% (35,(73)(74)(75)(76), leading some recent in-silico multiepitope vaccine design efforts to look at the consensus of up to 8 or 9 B-cell epitope prediction tools simultaneously (77,78). The recent development of prediction tools using state-of-the-art machine learning models that claim significantly higher performance on large benchmarking datasets seems promising (34,79). Here we compare the performance of Bepipred 2.0 (35), one of the most widely used B-cell prediction tools, and iBCE-EL (34). Interestingly, we find no significant correlation between the peptides experimentally identified using the method described here and computationally predicted linear B-cell epitope scores generated by Bepipred 2.0 and iBCE-EL, even for those antibodybinding peptides belonging to known Ft or Bp antigens, nor do we find any significant correlation between the Bepipred 2.0 and iBCE-EL scores themselves (see Supplementary Tables S1, S2, as well as Figures 3A-C for Ft DnaK), highlighting the value of an unbiased experimental method to screen for antibody targets, as presented here. At their default score thresholds, iBCE-EL correctly predicts 34/44 of the Ft peptides, and 39/46 of the Bp peptides, while Bepipred 2.0 correctly predicts 21/44 Ft peptides and 13/46 Bp peptides, but that is not actually significantly more than would be expected at random given their hit rates on other un-enriched tryptic peptides in our dataset. Part of the discrepancy between the computational Bepipred 2.0 predictions and our experimental results may be due to the fact that Bepipred 2.0 is trained on antigen-antibody 3D structures, which likely contain a mix of conformational and linear epitopes. In addition, Bepipred 2.0 has a relatively low selfreported 58.6% sensitivity and 57.2% specificity at the default score threshold of 0.5 (80), and thus is expected to exhibit a large number of false positives and false negative predictions. iBCE-EL is reported to have better sensitivity and specificity [73.2% and 72.4% (34)], but explicitly takes into account sequence features at the beginning and end of the epitope that may be missing in the tryptic peptides generated here, affecting their score. In cases where the tryptic peptide is too short to be used directly as a vaccine candidate (some are as short as 6 residues), we may in fact be able to use these computational tools to guide us in how to extend the boundaries of the peptide beyond its flanking trypsin cleavage sites.
Note that computational B-cell prediction tools such as these are trained to distinguish epitopes from non-epitopes in known antigens, but are not an effective alternative to experimentally screening for epitopes across an entire bacterial proteome. For example, on a random selection of 100 Ft and Bp proteins, Bepipred-2.0 using its default epitope threshold of 0.5 classified 40% of all amino acids as being part of an epitope, including an average of 5.5 peptides of length 9 or longer per protein (data not shown). Likewise, on a random selection of 1000 tryptic peptides from all our proteomics data, iBCE-EL classified 81% as B-cell epitopes using its default score threshold of 0.35 (data not shown). Applied across the entire proteome, the computational approach would predict tens of thousands of putative B-cell epitopes, likely with a high false-positive rate and, regardless, providing little guidance in winnowing the possibilities for experimental verification.
If so desired, peptides can be downselected for vaccine development by focusing only on those with the most stringent enrichment scores, or based on consensus with computational epitope prediction tools. Further downselection may include prioritizing highly conserved epitopes that can induce broadly protective immunity, and reduce the risk that emergence of pathogen variants will render the vaccine ineffective (81).
90% of the top scoring peptides were found to be present in 90% or more of the fully sequenced pathogenic F. tularensis and B. pseudomallei strains (see Supplementary Tables S1, S2, and Figure 3F for the case of Ft DnaK). In addition, we can target peptides that show even deeper evolutionary conservation based on their Average Amino Acid Conservation Score (AAACS), reflecting parts of the protein that may be important for its function (31) (see Supplementary Tables S1, S2, and Figure 3E for the case of Ft DnaK). Peptides that are only one or two amino acids different from human or mouse versions are likely less suitable as vaccine candidates and are marked with a subscript 1 or 2 respectively in Tables 1, 2. Note that while some of the proteins in Tables 1, 2 have homologs in human and mouse (e.g. mitochondrial DnaK), the peptides recovered here are unique to the bacterial versions. For vaccine design, we may also want to prioritize peptides which do not tend to occur in healthy human microbiomes, by comparing them against some of the large human metaproteomics datasets recently generated (82)(83)(84)(85)(86).
Further confirmation that the identified sequences are B-cell epitopes could be achieved through additional in vitro and in vivo experimentation (e.g., testing the reactivity of immune sera with synthesized candidate epitopes via ELISA or immunization studies). High throughput screening of peptides for efficacy is feasible due to recent advancements in solid phase peptide synthesis (SPPS), which enables efficient and cost-effective production of peptide candidates (3). For immunization studies, pools of multiple peptides could be incorporated into vaccine delivery systems containing adjuvants and T-helper epitopes known to stimulate the induction of adaptive immune response against peptide antigens, as reviewed in Skwarczynski et al. (3).
The method presented here identifies peptides that are immunoreactive, that is, they interact with antibodies in serum from previously infected individuals. Further experimental test would be needed to confirm immunogenicity, that is, whether they can stimulate antibody production themselves, and protectivity, that is, whether they can protect against infection or disease after immunization. Our immunoproteomic method represents a new tool for precise mapping of linear B-cell epitopes. Generation of such immunogenic profiles for pathogens could provide an ample pool of candidates for further experimental validation and efficient vaccine development. Accelerating the discovery of B-cell epitopes in the proteomes of pathogens will help fuel the development of peptide-based vaccines that have the potential to provide rapid solutions to biothreat agents and emerging pathogens.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ebi.ac. uk/pride/archive/, PXD026300 (87).

ETHICS STATEMENT
The animal study was reviewed and approved by LLNL Institutional Animal Care and Use Committee.

AUTHOR CONTRIBUTIONS
PD'h, NC, and MF contributed to the conception and design of the study. NC performed the in vivo experiments. VL provided laboratory support. VL and MF performed in vitro experimentation. PD performed the bioinformatics analysis. BS and SB provided critical input. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We thank Dr. Wayne Conlan (National Research Council Canada) for providing Francisella tularensis SCHU S4DclpB strain. Bp 82 reagent was obtained through BEI Resources, NIAID, NIH: Burkholderia pseudomallei, Strain Bp82 (DpurM), NR-51280. Our thanks go to Michael Ford and MS Bioworks team for help with sample preparation troubleshooting and specialized mass spectrometry analyses. We also thank past and present members of our laboratories -Drs. Sahar El-Etr, JoséPeña, Amy Rasley, and Emilio Garcia -for useful discussions and critical input.