# REVERSE VACCINOLOGY

EDITED BY : Pedro A. Reche, Richard Moxon and Rino Rappuoli PUBLISHED IN : Frontiers in Immunology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-385-2 DOI 10.3389/978-2-88963-385-2

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# REVERSE VACCINOLOGY

Topic Editors: Pedro A Reche, Complutense University of Madrid, Spain Richard Moxon, University of Oxford, United Kingdom Rino Rappuoli, GlaxoSmithKline (Italy), Italy

Citation: Reche, P. A., Moxon, R., Rappuoli, R., eds. (2020). Reverse Vaccinology. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-385-2

# Table of Contents

# *05 Editorial: Reverse Vaccinology* Richard Moxon, Pedro A. Reche and Rino Rappuoli


Gandharva Nagpal, Salman Sadullah Usmani and Gajendra P. S. Raghava

*38 Bacterial Vaccine Antigen Discovery in the Reverse Vaccinology 2.0 Era: Progress and Challenges*

Fadil A. Bidmos, Sara Siris, Camilla A. Gladstone and Paul R. Langford

*45 A Review on T Cell Epitopes Identified Using Prediction and Cell-Mediated Immune Models for* Mycobacterium tuberculosis *and*  Bordetella pertussis

Yuan Tian, Ricardo da Silva Antunes, John Sidney, Cecilia S. Lindestam Arlehamn, Alba Grifoni, Sandeep Kumar Dhanda, Sinu Paul, Bjoern Peters, Daniela Weiskopf and Alessandro Sette


Martin Christopher James Maiden

*91 Comparison of Open-Source Reverse Vaccinology Programs for Bacterial Vaccine Antigen Discovery*

Mattia Dalsass, Alessandro Brozzi, Duccio Medini and Rino Rappuoli

*103 The Development of a Vaccine Against Meningococcus B Using Reverse Vaccinology*

Vega Masignani, Mariagrazia Pizza and E. Richard Moxon

*117 Comprehensive Evaluation of the Expressed CD8+ T Cell Epitope Space Using High-Throughput Epitope Mapping* Paul V. Lehmann, Maneewan Suwansaard, Ting Zhang, Diana R. Roen, Greg A. Kirchenbaum, Alexey Y. Karulin, Alexander Lehmann and Pedro A. Reche

#### *130 Application of Modeling Approaches to Explore Vaccine Adjuvant Mode-of-Action*

Paul R. Buckley, Kieran Alden, Margherita Coccia, Aurélie Chalon, Catherine Collignon, Stéphane T. Temmerman, Arnaud M. Didierlaurent, Robbert van der Most, Jon Timmis, Claus A. Andersen and Mark C. Coles

# Editorial: Reverse Vaccinology

Richard Moxon<sup>1</sup> \* † , Pedro A. Reche<sup>2</sup> \* † and Rino Rappuoli 3,4 \* †

*<sup>1</sup> Department of Paediatrics, University of Oxford, Oxford, United Kingdom, <sup>2</sup> Department of Immunology & O2, Complutense University of Madrid, Madrid, Spain, <sup>3</sup> GSK, Siena, Italy, <sup>4</sup> Faculty of Medicine, Imperial College, London, United Kingdom*

Keywords: infectious diseases, vaccines, reverse vaccinology, microbiology, vaccinology

**Editorial on the Research Topic**

#### **Reverse Vaccinology**

For many, the semantics of the term "reverse vaccinology" may be puzzling. Literally, it implies a complete change of direction or action in the study of vaccines. The non-obvious point is that this volte face first came about through whole genome sequencing (WGS). WGS revolutionized biology, including microbiology. Specifically, it introduced a top-down, computer data-based approach to the discovery of candidate vaccine antigens; highly sensitive, but not specific and, crucially, not hypothesis driven. This contrasted with the classical laboratory based, hypothesis driven analysis of microbes to identify components that could elicit protective immunity. Reverse vaccinology relies on the use of computational methods and tools to identify vaccine candidates for further experimentation, refinement of which is crucial for their optimal use as argued and detailed by Dalsass et al. These computational tools serve to anticipate antigens that are likely to induce protective responses as well as the precise antigen regions, epitopes, recognized by the immune system (1).

#### Edited and reviewed by:

*Denise Doolan, James Cook University, Australia*

#### \*Correspondence:

*Richard Moxon richard.moxon@paediatrics.ox.ac.uk Pedro A. Reche parecheg@med.ucm.es Rino Rappuoli rino.r.rappuoli@gsk.com*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> Received: *03 November 2019* Accepted: *13 November 2019* Published: *03 December 2019*

#### Citation:

*Moxon R, Reche PA and Rappuoli R (2019) Editorial: Reverse Vaccinology. Front. Immunol. 10:2776. doi: 10.3389/fimmu.2019.02776*

Reverse vaccinology was first used to predict potential antigens for a vaccine against the B strains of Neisseria meningitidis (meningococci) in the 1990's, reviewed by Masignani et al. It is worth emphasizing that the formulation of this complex vaccine could not have been achieved without the systematic, WGS-based approach to population biology succinctly captured in the review by Maiden, a pioneer in laying the foundations of the molecular epidemiological tools so crucial to the design of vaccines both for infectious and non-infectious diseases. More recently, Bianconi et al. have been successful in applying the classical reverse vaccinology approach for Pseudomonas aeruginosa. Starting from the 5,570 open reading frames of the genome, they selected 52 vaccine candidates by applying a number of filters to exclude the proteins predicted not to be present on the bacterial surface, to be variable in different strains, to have homology to human proteins, or to be homologous to E. coli proteins. Of the 52 predicted vaccine antigens, 30 were successfully expressed and several of those gave a quite remarkable protection in the mouse challenge model. However, one of the main aims of this current series on reverse vaccinology is to highlight how many new concepts and technologies have been recruited to facilitate vaccine design including contributions from proteomics, immunology, structural biology, systems biology, and mathematical modeling. Thus today, the change of direction and action in vaccine research, captured in the term reverse vaccinology, embodies much more than innovation in antigen discovery.

Bidmos et al. describe the isolation and recombinant expression of the variable regions of heavy (VH) and light (VL = κ or λ) chain genes of immunoglobulin (IgG) using a variety of molecular tools. Referred to as reverse vaccinology 2.0, this permits the high-throughput screening of large numbers of antibody-secreting cells and was employed to identify functional anti-Staphylococcus aureus monoclonal antibodies induced during bacteremia (2) and anti-MTB surface antigen antibodies cloned from patient-derived plasmablasts of reactivated memory B-cell origins (3). In a further study, Bidmos et al. describe successful efforts to utilize the reverse vaccinology 2.0 approach to identify novel functional meningococcal antigens with the potential to expand the coverage of currently licensed meningococcal B vaccines.

**5**

The synergism of immune-information and systems immunebiology with WGS provides crucial tools that consider not just the challenges of the identification and molecular diversity of target antigens, but the importance of expression levels and how these variables, along with host genetic variation, impact on B-cell immune responsiveness. Immunologists do their best to identify the optimal epitopes of antigens for candidate vaccines, as exemplified by the work of Nagpal et al. These authors applied an immunoinformatic pipeline that led to identifying epitopebased vaccine candidates against 14 pathogenic bacteria and made them available through a web-resource named VacTarBac. Bacteria are complex pathogens encompassing numerous protein antigens that when targeted for epitope prediction will result in a huge number of candidates. But, this plethora of information and the challenge of what can be reasonably subjected to further rigorous investigation is a daunting challenge. Thereby, to simplify further experimental advances, the authors implemented a system to identify and prioritize virulence factors or other essential genes required for pathogenicity while also discarding epitopes cross-reactive with self-proteins. The application of stringent prioritization criteria to the selected 14 pathogenic bacteria led to the identification of just 252 unique B-cell and T-cell epitopes.

T-cell epitopes can be predicted starting from WGS. Tian et al. show how they made a full map of the T-cell epitopes starting from the 4,000 open reading frames of Mycobacterium tuberculosis. A metric (immunogenicity score) was devised based on predictions of their immunodominance, promiscuity, HLA restriction and conservation. In a second example, they describe how the prediction of the T-cell epitopes of Bordetella pertussis antigens, not just those included in currently licensed acellular vaccines, may help to design novel formulations based on Th1 and Th17 immunity to overcome the limitations of the existing vaccines which induce mostly a Th2 based immunity. Degoot et al. describe a new method to predict peptide binding to major histocompatibility complex class two (MHC-II) molecules, which is the main basis to anticipate CD4 T cell epitopes. The method is based on structural analyses of peptide-MHC II interactions and can predict peptide binding for all three human MHC-II loci (HLA-DR, HLA-DP, and HLA-DQ). The authors report that the performance of the method is in general

#### REFERENCES


comparable to neural network methods and is superior in predicting peptide binding to HLA-DP molecules. The main advantage of this approach reported over other machine learning models is that of being rooted on actual physicochemical peptide-MHC-II binding interactions. A main handicap is however that the authors have not made available the method for rigorous independent comparisons.

Sánchez-Ramón et al. makes a well-argued case for trainedimmunity based vaccines (TIbV). These are vaccines that induce an innate, non-specific immunity for long periods of time. A typical example of a TIbV Vaccine is BCG which induces two types of immunity, one based on adaptive immunity specific for Mycobacterium tuberculosis, and the other based on innate immunity which is non-specific but so effective that it is also recommended to cure prostate cancer. This immunity induces activation of dendritic cells, activation of non-specific effector responses of innate immune cells such as monocytes and macrophages and is maintained overtime by epigenetic changes. Vaccine adjuvants and non-toxic derivatives of toxins (4) inducing non-specific protection against bacteria or viruses can be considered a proxy of TIbV.

Buckley et al. describe modeling approaches that provide exciting insights into AS01, one of the most successful adjuvants licensed for human use, and how such an adjuvant may work. According to Sánchez-Ramón et al., in addition to adaptive immune responses, it is also likely to induce trained immunity. AS01 has been licensed for the RTS-S vaccine against malaria and the Shingrix vaccine against Shingles and is part of the first successful clinical trial showing protection from disease in people infected by Mycobacterium tuberculosis.

A quarter of a century after WGS revolutionized biology, this series is a timely and exciting opportunity to reflect on what has been achieved by reverse vaccinology and how best to galvanize future efforts to improve global public health through rigorous and imaginative exploitation of the explosion in technologies that can be used to develop a broad range of novel vaccines.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

disease. J Immunol. (2004) 173:7435–43. doi: 10.4049/jimmunol.173. 12.7435

**Conflict of Interest:** RR was employed by GSK and RM holds a consultancy agreement as a scientific adviser to GSK.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Moxon, Reche and Rappuoli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Trans-Allelic Model for Prediction of Peptide:MHC-II Interactions**

*Abdoelnaser M. Degoot 1,2,3 \*, Faraimunashe Chirove<sup>2</sup> and Wilfred Ndifon<sup>1</sup> \**

*<sup>1</sup>African Institute of Mathematical Sciences (AIMS), Muizenberg, South Africa, <sup>2</sup>School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa, <sup>3</sup>DST-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), Gauteng, South Africa*

Major histocompatibility complex class two (MHC-II) molecules are trans-membrane proteins and key components of the cellular immune system. Upon recognition of foreign peptides expressed on the MHC-II binding groove, CD4<sup>+</sup> T cells mount an immune response against invading pathogens. Therefore, mechanistic identification and knowledge of physicochemical features that govern interactions between peptides and MHC-II molecules is useful for the design of effective epitope-based vaccines, as well as for understanding of immune responses. In this article, we present a comprehensive transallelic prediction model, a generalized version of our previous biophysical model, that can predict peptide interactions for all three human MHC-II loci (HLA-DR, HLA-DP, and HLA-DQ), using both peptide sequence data and structural information of MHC-II molecules. The advantage of this approach over other machine learning models is that it offers a simple and plausible physical explanation for peptide–MHC-II interactions. We train the model using a benchmark experimental dataset and measure its predictive performance using novel data. Despite its relative simplicity, we find that the model has comparable performance to the state-of-the-art method, the NetMHCIIpan method. Focusing on the physical basis of peptide–MHC binding, we find support for previous theoretical predictions about the contributions of certain binding pockets to the binding energy. In addition, we find that binding pocket *P*5 of HLA-DP, which was not previously considered as a primary anchor, does make strong contribution to the binding energy. Together, the results indicate that our model can serve as a useful complement to alternative approaches to predicting peptide–MHC interactions.

**Keywords: major histocompatibility complex (MHC), modeling peptide–MHC-II interactions, antigen presentation, machine learning, inverse statistical mechanics**

# **1. INTRODUCTION**

Major histocompatibility complex class two (MHC-II) molecules are surface proteins that exist on the membrane of antigen presenting cells (APCs) such as macrophages, dendritic cells, and B cells. They bind short peptide fragments derived from exogenous proteins and present them to *CD*4 + helper-T cells. Upon the recognition of foreign peptides presented by MHC-II molecules, the helper-T cells (precisely speaking, *CD*4 + effector T cells) will initiate proper adaptive immune responses, including enabling sufficient maturation of B cells and cytotoxic *CD*8 <sup>+</sup> T cells (1). Therefore, the binding of peptide to MHC-II molecules is considered to be a fundamental and pre-requisite step in the initiation of adaptive immunity (2, 3). As such, mechanistic identification of the basic determinants of peptide–MHC-II interactions presents potential for understanding the immune system's mechanisms and improving the process of designing peptide- and protein-based vaccines.

#### *Edited by:*

*Pedro A. Reche, Complutense University of Madrid, Spain*

#### *Reviewed by:*

*Anne Searls De Groot, EpiVax, United States Morten Nielsen, Technical University of Denmark, Denmark*

> *\*Correspondence: Abdoelnaser M. Degoot degoot@aims.ac.za; Wilfred Ndifon wndifon@aims.ac.za*

#### *Specialty section:*

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> *Received: 06 February 2018 Accepted: 06 June 2018 Published: 20 June 2018*

#### *Citation:*

*Degoot AM, Chirove F and Ndifon W (2018) Trans-Allelic Model for Prediction of Peptide:MHC-II Interactions. Front. Immunol. 9:1410. doi: 10.3389/fimmu.2018.01410*

**7**

MHC genes for humans, referred to as human leukocyte antigen (HLA), are among the most polymorphic genetic elements found within a long continuous stretch of DNA on chromosome 6 (4). Such high polymorphism reflects the immense contribution of MHC molecules to the adaptive immune system and underpins their capacity to recognize a wide range of pathogens. Nonetheless, some viruses, such as hepatitis C, avian/swine influenza, and human immunodeficiency virus (HIV), undergo extensive mutations that allow them to partially escape recognition by the MHC molecules (5). MHC genes can be divided into HLA class I, II, and III. Loci corresponding to HLA class I are A, B, and C; HLA class II loci are DP, DQ, and DR; HLA class III genes encode for several other immune-related proteins and provide support for the former two classes (1, 4).

MHC-II molecules account for the likelihood of success of organ transplantation, and there are well-established associations between many disorders and particular classes of MHC-II molecules. These include the contribution of HLA-DQ genes to insulin-dependent diabetes (6); HLA-DR genes to multiple sclerosis; and narcolepsy (7) along with other autoimmune diseases resulting from degeneracy and misregulation in the process of peptide presentation (8). Moreover, genetic and epidemiological data have implicated MHC-II molecules in susceptibility to many infectious diseases such as HIV/AIDS, malaria (9), and cancer (10).

Experimental assays for prediction of peptide–MHC-II interactions are often faced with important obstacles, including substantial resources needed for laboratory work, high time, and labor demands. This is the case in particular, for experimental work aimed at finding out which promiscuous epitopes bind to specific MHC molecules, a necessary step in the design of peptide-based vaccines which protect against a broad range of pathogen variants. Computational methods, which are more efficient and less costly than biological assays, have been employed to complement these assays. Due to advances in sequencing technologies, immunological data have grown at an unprecedented pace and continue to accrue. This has been exploited in systematic computational analyses of genomes of multiple pathogens to determine which subunits might induce a potent immune response. The results have been the design and development of new vaccine candidates against HIV, influenza, and other hyper-variable viruses (11). Use of computational methods has significantly reduced experimental effort and costs by up to 85% (12).

Many immunoinformatics methods for prediction of peptide–MHC interactions, for both class I and II, have been developed based on machine learning approaches such as simple pattern motif (13), support vector machine (SVM) (14), hidden Markov model (HMM) (15), neural network (NN) models (16–18), quantitative structure–activity relationship (QSAR) analysis (19), structure-based methods, and biophysical methods (2, 20, 21; Degoot et al., unpublished). These methods can be divided into two categories, namely, intra-allele (allele-specific) and trans-allele (pan-specific) methods. Intra-allelic methods are trained for a specific MHC molecule on a limited set of experimental peptide-binding data and applied for prediction of peptides binding to that molecule. Because of the extreme polymorphism of MHC molecules, the existence of thousands of allele variants, combined with the lack of sufficient experimental binding data, it is impossible to build a prediction model for each allele. Thus, trans-allele and general purpose (22) methods such as *MULTIRTA* (2), *NetMHCIIpan* (18), and *TEPITOPEpan* (23) have been developed using richer peptide-binding data expanding over many alleles or across species (18). Similar methods for MHC-I are also available such as NetMHCpan (24, 25) and KISS (26).

The trans-allelic models are often designed to extrapolate either structural similarities or shared physicochemical binding determinants among HLA genes, to predict affinities for alleles that are not part of the training dataset. These models generally have better predictive performance for new alleles and a wide range of potential applications compared with the intra-allelic models.

Most of the existing trans-allelic models for MHC-II are extended versions of their earlier intra-allelic counterparts: TEPI-TOPEpan (23) was extended from *TEPITOPE* (27); *MULTIRTA* (2) evolved from *RTA* (20); and the series of *NetMHCIIpans* (1.0, 2.0, 3.0, and 3.1) (17, 18, 28, 29) were generalized from the NN align (30) method. In the same vein, in this article, we present a trans-allele method, an extension of our previous method (Degoot et al., unpublished), for prediction of peptide-HLA class II interactions based on biophysical ideas.

The remarkable strength of the method presented here over other existing advanced data-driven approaches is its physical basis. We formulate the process of binding affinity between peptide and MHC-II molecule as an inverse problem of statistical physics. From the observable macroscopic (bound and unbound) states of experimental data, we compute the microscopic parameters (Hamiltonians for amino acid residues involved in the interaction) that govern the system. In fact, many problems in computational biology can be solved in such a way (31, 32), taking advantage of the availability of vast amount of genomic data and high resolution structural information. Solutions obtained using this approach are more plausible and physically interpretable than those obtained using mere sequence-based methods (2; Degoot et al., unpublished). In addition, because sparsity is a hallmark feature of biological processes, we adjust the model's parameters via incorporating an L<sup>1</sup> regularization term into the model. The L<sup>1</sup> constraint, commonly named *Lasso*, encourages sparsity and improves the predictive performance of the model on novel data.

The rest of this article is organized as follows: in Section 2.1, we describe the idea of MHC-II polymorphic residue groups, which is employed to capture structure similarity among MHC-II alleles. In Section 2.2, we define our methodology and formulate the learning function. After that we briefly describe the benchmark dataset used to test the predictive performance of the model in Section 2.3 and present the results in Section 3. Finally, in Section 3.3, we summarize and discuss our results and compare our method with the state-of-the art method.

#### **2. MATERIALS AND METHODS**

## **2.1. MHC-II Polymorphic Residue Groups**

Crystal structures revealed that an MHC molecule is a combination of two domains, an *α* helix and a *β* sheet, linked together to form a Y-shaped groove which is used to locate peptides, and both domains equally contribute to the binding affinity. For HLA-I molecules, the *β* domain is largely conserved, and variation occurs mostly in the *α* domain. On the other hand, polymorphism occurs in both domains of HLA-II molecules; except for HLA-DR alleles, where the variation takes place in the *β* domain. In addition, the peptide-binding groove of the HLA-II is open at both ends, which allows binding peptides of variable lengths (33), ranging from 9 to 30 amino acid residues, or even an entire protein (29, 34). This is in contrast to the peptide-binding groove of the HLA-I alleles, which accommodate only short peptides of lengths ranging from 8 to 11 amino acids. This flexible constraint on peptide lengths together with its immense polymorphism, contribute to a lower predictive performance of computational methods for peptide–MHC-II interactions compared with MHC-I methods (2, 22).

The notion of MHC polymorphic residue groups, introduced by Bordner and Mittelmann (2), is based on a simple observation of an intrinsic (independent of peptide) feature of the MHC-II binding groove. Although a peptide could bind to an MHC-II molecule in various registers, due to the open-ended nature of the MHC-II binding groove, the strength of the binding affinity is primarily determined by 9 residues occupying the binding groove pockets. Interestingly, most of polymorphism in MHC-II genes occurs at these binding pockets (see the discussion in Section 3.3).

From the limited available crystallographic structural data of peptide–MHC-II complexes for a few MHC-II molecules from the Protein Data Bank (PDB) (35) (summarized in Table S1 in Supplementary Material), sets of important positions for the polymorphic residues in the binding groove that contact one or more peptide-binding cores and are within a distance of not more than 4 Å (2, 18, 36) in one or more of the MHC-II complex structures can be extracted. Then, by extrapolating the similarities among MHC molecules, their corresponding residues in different genes are determined using multiple sequence analysis (MSA) (37). Exploiting the fact that HLA-DR alleles are polymorphic only in the *β* domain and have the same *α* domain, the polymorphic residue groups for HLA-DR are extracted from its *β* domain sequences. Similarly, assuming sufficiency of the *β* domains for predicting MHC–peptide binding preferences (2) and for the sake of simplicity of the model, residue groups for HLA-DP and HLA-DQ were also extracted from the *β* domain.

Next, the set of polymorphic residues that always co-occur at the specified positions are clustered into the same group. The rationale of clustering polymorphic residue groups, rather than individual residues, is to avoid over-parametrization of the model. Table S2 in Supplementary Material shows such polymorphic residue groups for HLA-DRB, HLA-DP, and HLA-DQ alleles, assembled by the procedures described earlier.

#### **2.2. Trans-Allele Model**

In our previous intra-allele model (Degoot et al., unpublished) the probability of peptide P (k) to bind an MHC molecule M(T(k)) was computed as follows:

$$\pi\left(\mathbb{P}^{(k)}, \mathbb{M}^{(\mathsf{T}(k))}\right) = \frac{1}{1 + e^{\delta \mathsf{E}^{(k)}}},\tag{1}$$

where *δ*E (k) is the change in binding energy in terms of the sum of the differences of first- and second-order Hamiltonians between the bound and unbound states. Specifically, *δ*E (k) is given by the following equation:

$$\delta \mathbf{E}^{(k)} = \underbrace{\sum\_{i=1}^{|\mathbf{P}^{(k)}|} \delta H^{(1)}\left(\mathbf{a}\_{i}\right) + \sum\_{i=1}^{9} \delta H^{(1)}\left(\mathbf{b}\_{i}\right) + \delta S}\_{\text{first-order Hamiltonians}}$$

$$+ \underbrace{\sum\_{i=1}^{|\mathbf{P}^{(k)}|} \sum\_{j=1}^{9} \sum\_{r=1}^{\mathbf{R}} \delta H^{(2)}\left(\mathbf{a}\_{ir}^{(k)}, \mathbf{b}\_{j}\right)}\_{\text{second-order Hamiltonians}},\tag{2}$$

in which |P (k) | is the length of peptide k, R is the number of all possible configurations (registers) in which the peptide binds to the particular MHC molecule, and *δ*S is the difference in entropy between the bound and unbound states.

For the trans-allele model, two changes were introduced into the second term of equation (2). First, instead of residue-residue interaction, *δ*H (2) (a (k) ir *,* bj), with a (k) ir on the peptide sequence and b<sup>j</sup> on the MHC binding pocket, we rather focus on residuepolymorphic group interaction, *δ*H (2) (a (k) ir *,* gjn), where gjn is residue group number *n* of position *j* as defined in Section 2.1. Next, we introduce a binary operator T(k, j, n) that equals 1 if the MHC molecule type, M(T(k)), corresponding to peptide P (k) contains polymorphic residue group *n* at the set of pre-determined positions of pocket *j*, and equals 0 otherwise. Hence, *δ*E (k) is given by the following equation:

$$\delta \mathbf{E}^{(\mathbf{k})} = \underbrace{\sum\_{i=1}^{|\mathbf{P}^{(\mathbf{k})}|} + \delta H^{(1)}\left(\mathbf{a}\_{i}\right) + \sum\_{i=1}^{9} \delta H^{(1)}\left(\mathbf{b}\_{i}\right) + \delta s}\_{\text{first-order Hamiltonians}}$$

$$+ \underbrace{\sum\_{i=1}^{|\mathbf{P}^{(\mathbf{k})}|} \sum\_{j=1}^{9} \sum\_{r=1}^{\mathbf{G}\left(\mathbf{j}\right)} \sum\_{n=1}^{\mathbf{G}\left(\mathbf{j}\right)} \delta H^{(2)}\left(\mathbf{a}\_{ir}^{(k)}, \mathbf{g}\_{jn}\right) \mathsf{T}\left(\mathbf{k}, \mathbf{j}, n\right)}\_{\text{second-order Hamiltonians}}, \quad \text{(3)}$$

where G(j) is the number of polymorphic residue groups for binding pocket *j*. Column two of Table S2 in Supplementary Material shows G(j), *j* = 1, 2, *. . .*, 9, for HLA-DR, HLA-DP, and HLA-DQ alleles.

Let ∆ denote the model's parameters. Using equations (1) and (3), we formulate, through the maximum likelihood approach, the following cost function:

$$\mathcal{L}(\mathsf{P}, \mathsf{M}|\Delta) = \underset{\{\Delta\}}{\operatorname{argmin}} \left( \sum\_{k=1}^{\mathsf{K}} \mathsf{G}^{k} \left( \Delta^{k} \right) + \lambda \mathcal{P} \left( \Delta \right) \right), \tag{4}$$

where G k (∆) is the empirical loss function given by the following equation:

$$\mathbf{G}^{\mathbf{k}}\left(\Delta\right) = \mathbf{y}^{\mathbf{k}}\log(\pi^{\mathbf{k}}\left(\Delta\right)) + \left(1 - \mathbf{y}^{\mathbf{k}}\right)\log(1 - \pi^{\mathbf{k}}\left(\Delta\right)), \quad \text{(5)}$$

and y <sup>k</sup> *<sup>∈</sup>*{0, 1} is the experimental value; *<sup>y</sup>* <sup>=</sup> 1 for binding peptides and *y* = 0 for non-binding ones. *λP*(∆) is a regularization term with the following form:

$$
\lambda \mathcal{P} \left( \Delta \right) = \lambda ||\Delta||\_1 = \lambda \sum\_{i=1}^d \left| \Delta \right|, \tag{6}
$$

where *λ* > 0 is a hyper-parameter and *d* is the dimension of parameter vector ∆, which varies depending on the type of MHC-II molecule. The *L*<sup>1</sup> constraint penalty term *P*(∆), also known as Lasso (38), has an important role in the model. As the model is defined on a large number of parameters (*d* = 2,321, 561, and 401 for HLA-DR, HLA-DP, and DQ molecules, respectively) a few parameters are expected to contribute to the binding affinity while the rest are expected to be noisy. Lasso has the capability to filter out the noisy parameters by inducing sparsity in the model, as it shrinks most of the parameter values to 0, and avoids data overfitting. The hyper-parameter *λ* controls the degree of sparsity of the model; the larger the value of *λ* the more sparse the model. Equation (4) is a non-linear and non-smooth function; due to the *L*<sup>1</sup> constraint. But it is a convex function and we solved it, after quadratic approximation, by means of an iterative, cyclic coordinate descent approach using a soft-thresholding operator. This learning function takes the form of a generalized linear model and the algorithm we used to solve it is both fast and efficient. Details of this optimization method are found in Friedman et al. (39) and are summarized in the supplementary material.

## **2.3. Binding Affinity Dataset**

The model has been developed by using both quantitative peptidebinding data and MHC-II molecule sequences. We obtained a total of 51,023 peptide-binding data for 24 HLA-DR, 5 HLA-DP, and 6 HLA-DQ from the IEDB database (40). This is a wellcurated dataset and was used to develop NetMHCIIpan (18), the state-of-the-art method. The binding affinities data were given in the form of log-transformed measurements of the IC<sup>50</sup> (half maximum inhibition concentration) according to the formula 1 *−* log(*IC*50)/log(50,000) (16). We dichotomized these data using a moderate threshold of IC<sup>50</sup> 500 nM (*≡*0.426 of log-transformed data). Peptides with *IC*<sup>50</sup> less than or equal 500 nM (*≥*0.426 of logtransformed value) were considered as binders, and non-binders otherwise. This moderate threshold, which has been used in other previous methods including the state-of-the art method (20, 29, 30, 41), allows us to make direct comparisons.

Amino acid sequences for the MHC-II alleles used in this study were obtained from the EMBL-EBI online-database (42). **Table 1** gives a summary of the peptide-binding dataset used to develop the method.

#### **3. RESULTS**

This section presents prediction results of the model obtained from the dataset of three MHC-II allotypes as described in Section 2.3. We applied a fivefold cross validation analysis to the model and compared it against its intra-allelic version (Table S3 in Supplementary Material). We also examine its predictive performance on data which were previously unseen by the model.

#### **3.1. Performance of the Trans-Allele Model**

We tested the predictive performance of the model by using fivefold cross validation. The partitioning of the data used in fivefold cross validation was previously done by Andreatta et al. (29), by clustering together peptides in a way that minimizes over-estimation of predictive performance, using the technique described by Nielsen et al. (30). **Figure 1** shows results of the test done using alleles belonging to the three MHC-II loci considered in this study. The performance was measured in terms of area under the curve (AUC) (43) values, which range between 0 and 1. The higher the AUC value the better the predictive performance of model. Values below 0.5 reflect a worse performance than a random test. The model has an excellent performance for HLA-DP molecules (average AUC value = 0.930), and a good predictive power for both HLA-DQ and HLA-DR molecules (average AUC values = 0.830 and 0.802, respectively). The surprisingly excellent performance for HLA-DP could be the result of both a higher structural similarity (see Section 3.3) and a higher number of peptides per allele for HLA-DP. Indeed, for all HLA-DP alleles, the number of available peptides exceeds the empirically required number of peptide-binding measurements (*≈*200 peptides (22)), but this is not the case for all HLA-DR alleles. HLA-DQ alleles have sufficient number of peptide measurements but these have a lower structural similarity compared with the corresponding peptides for HLA-DP alleles (see Section 3.3).

## **3.2. Comparing the Intra-Allele vs Trans-Allele Methods**

Table S3 in Supplementary Material shows AUC values obtained with the intra-allele and trans-allele versions of the model. For the intra-alleles version, the model was evaluated on peptide-binding data corresponding to an individual allele only. On average, the performance of the trans-allele model is comparable to that of the intra-allele model for HLA-DP (0.930 vs 0.928), it is worse for HLA-DQ (0.830 vs 0.857) and it is better for HLA-DR (0.780 vs 0.771) (**Figure 2**).

These results demonstrate two important observations. First, there is a common binding preference among MHC-II loci, which is the basis of all trans-allelic models, and that has been successfully captured by the definition of MHC-II polymorphic groups for HLA-DP loci, and to a lesser extent for HLA-DQ and HLA-DR. Second, the trans-allelic model is able to extrapolate similarities among the MHC-II allotypes and achieve good predictive performance. As a result, the overall performance of the trans-allelic model is comparable to that of intra-allele model, even though the former model is applied on a much diverse set of MHC-II sequences.

A decreased performance of the trans-allelic model when compared with the intra-allelic method for HLA-DQ molecules is consistent with results reported in NetMHCIIpan (18). Here we suggest that this is probably because of the limited structural information available for HLA-DQ alleles. In fact, because of this limited structural information there are only 17 polymorphic residue groups for all the 9 binding pockets defined for HLA-DQ alleles. By contrast, there are 25 and 115 polymorphic residue groups defined for HLA-DP and HLA-DR molecules, respectively. **TABLE 1** | Overview of the MHC-II peptide-binding data utilized in this study.


*The first column gives the names of the 34 genes used to develop the method, distributed as 24, 5, and 6 for HLA-DR, HLA-DP, and HLA-DQ genes, respectively. The second column represents the index for each allele in the EMBL-EBI database (42). The third and fourth columns give the total number of peptide and the number of binder peptides, receptively, per allele. The last column shows the percentage of binder peptides. Binder peptides were identified using an IC<sup>50</sup> binding cutoff of 500 nM, as in previous studies (2, 17, 18, 30). The last row presents the overall statistics for the last three columns.*

Another reason for the reduction of the trans-allelic model's performance for HLA-DQ alleles is that there is a large sequence diversity of MHC-II molecules belonging to this locus. We will examine the empirical support for this assertion in Section 3.3.

# **3.3. Prediction on a Novel Dataset**

We examined the predictive power of the model on a blind dataset- i.e., a dataset which was not used in the training phase. More precisely, to make peptide-binding predictions for a particular allele, we train the model on an entirely different allele. The allele used for training was chosen based on its similarity to the focal allele as quantified using three different metrics: nearest-neighbor, Hamming distance, and Leave-One-Out (LOO) approach.

In the nearest-neighbor approach the distance between two MHC molecules is defined (17) as follows:

$$d\left(A,B\right) = 1 - \frac{\mathcal{S}\left(A,B\right)}{\sqrt{\mathcal{S}\left(A,A\right)\mathcal{S}\left(B,B\right)}}\tag{7}$$

in which *S*(*A*, *B*) is the score of the BLOSUM50 (44) metric between amino acid sequences of*A*and *B*. The BLOSUM50 metric measures genetic distance between two sequences by quantifying the likelihood that one amino acid will be substituted by another amino acid on evolutionary time scales. Hamming distance simply counts the different occurrences of corresponding amino acid residues between two sequences. In both nearest-neighbor and Hamming metrics, we train the model on peptide data belonging to the corresponding nearest allele to parameterize the model, and then we assess its accuracy in terms of AUC values calculated

HLA-DR with AUC = 0.802.

based on peptide data belonging to the focal allele using those parameters.

However, unlike the TEPITOPE and the series of NetMHCI-Ipan methods which defined nearest neighbor at pocket level, we derive both the nearest-neighbor metric and the Hamming distance at residue level. Our choice is based on the fact that accounting for the entire MHC-II sequence provides a broader allele coverage (2) and hence extend the model's applicability. Computing sequence similarity at residue level is an intuitive and natural approach to perform comparative analysis of sequences rather than other artificial ways that may be more computationally efficient. We found that 71% (for HLA-DR), 60% (HLA-DP), and 67% (HLA-DQ) of alleles used for training were consistent between the residue-level and pocket-level approaches. These statistics indicate that, as mentioned before, most of MHC-II polymorphisms occur at the binding pockets.

The LOO approach involved partitioning data into two parts; the peptide-binding data not belonging to the allele under consideration are used to learn the model's parameters and the remaining data, the peptide-binding data belonging to the focal allele, are used as test data. **Figure 3** shows a comparison of results from these three approaches (details are in Table S4 in Supplementary Material). The results show that, regardless of the metric we used, the trans-allele method has a high predictive power for HLA-DP allele and a moderate predictive power for the other alleles.

The much higher predictive power for HLA-DP compared with the other alleles is likely due to the comparatively lower sequence diversity of HLA-DP alleles. To make this assertion more precise we carried out a regression analysis by defining the AUC values from LOO approach as functions of both NN and Hamming metric distances. **Figure 4** gives results of our analysis. As seen in **Figure 4**, all HLA-DQ alleles fall below the least squares lines for both metrics (blue points). We also found that model performance for HLA-DP allele (red points) increases as the distance between alleles decreases. The authors of NetMHCIIpan also arrived at the same conclusion (18), but only for the NN metric.

### **3.4. Analysis of the Model's Parameters**

To determine the key factors that contribute to the binding affinities for the three MHC-II alleles considered in this study, we calculated the Hamiltonians corresponding to each amino acid residue and the 9 binding pockets of the MHC-II binding groove. These values were then averaged over only the polymorphic residue groups defined for each pocket containing the particular amino acid.

difference in the HLA-DP loci is limited.

**FIGURE 3** | Average performance results of the model in terms of AUC values for the three metrics: NN approach (gray bars), Hamming metric (blue bars), and the LOO method (red bars). Except for HLA-DQ loci, the LOO approach significantly out performs the other two metrics. Such results indicate that this method performs better than a random test even for un-characterized MHC-II molecules.

Analysis of HLA-DR parameters revealed that pocket *P*1 has moderate attractive interactions with peptide (negative energies indicated by blue color in **Figure 5**), via hydrophobic **(I**, **L**, **W**, **Y)** side chains and, to lesser extent, via the aromatic **(F**, **W)** amino acids and a single hydrophilic residue **(K)**. Remarkably, previous studies (2, 46) arrived at a similar conclusion of a large tendency of position *P*1 toward interactions involving the hydrophobic side chains. The repulsive interactions (positive energies indicated by red color in **Figure 5**) of pocket *P*1 mostly occur with the hydrophilic side chains **(D**, **E**, **N**, **S**, **T)** and the aliphatic residue **(A)**. Generally, most of the primary anchor pockets (*P*1, *P*4, *P*6, *P*7, *P*9) confer attractive interactions, but the pocket *P*1 makes the largest contribution. This is consistent with results obtained using the MULTIRTA method (2). Among the secondary anchors,

**FIGURE 4** | Regression analysis of AUC values from the LOO approach as function of: **(A)** nearest-neighbor and **(B)** Hamming distances. Negative slope lines in both graphs obtained by the least square fit method, with p-values 0.185 and 0.0.033 for both metrics, respectively. These lines and p-values associated with were produced using glm2 package in R (45).

we found that pocket *P*2 has attractive interactions with aromatic **(F**, **Y)** and the hydrophobic **(I**, **M**, **Y)** side chains. The most repulsive interactions come from the pocket *P*8, which has a strong unfavorable interactions involving the side chains of residues**C**, **D**, **E**, **F**, **G**, **I**, **L**, **W**, and **Y** (see **Figure 5A**).

For HLA-DP, we found that pocket *P*9 has significantly attractive interactions involving the hydrophobic residue (**L**). This is consistent with the previous results of Ref. (47) (see **Figure 5B**). Also, we found that pockets *P*4 and *P*5 have important attractive interactions with peptide via hydrophobic **(Y)** and aromatic **(F)** side chains, respectively. The contribution of the pocket *P*4 is concordant with other studies such as (41), but the contribution of the pocket *P*5 was not reported in the study of Andreatta and Nielsen (47), which was specifically dedicated to HLA-DQ and HLA-DP alleles. Furthermore, we found that the other two pockets *P*1 and *P*6, which were reported as primary anchors in that study, have a moderate contribution to calculated bind energies (see **Figure 5B**).

The pattern of energetic contributions for HLA-DQ alleles is less ordered. There is no common pattern except the observation of significant attractive interaction of pocket *P*1 via the hydrophobic residue **(W)** and the repulsive interaction of pocket *P*4 via the side chains **C**, **E**, and **D** (see **Figure 5C**). This finding is in line with the observations of Morten et al. (47).

# **3.5. Discussion**

Interactions between peptides and MHC-II molecules are central to the adaptive immune system. Precise prediction and knowledge of the physicochemical determinants that govern such interaction is useful in designing effective and affordable epitope-based vaccines, and in providing insights about the immune system's mechanism as well as in understanding the pathogenesis of diseases. In this study, we have developed a trans-allelic model that can predict peptide interactions to the three human MHC-II loci. It can be readily applied to MHC-II molecules of other species provided that relative structural information are available. This method is based on biophysical ideas, an alternative to the dominant machine learning approaches.

The model presented here is, in addition to NetMHCIIpan, only the second trans-allelic method that allows comprehensive prediction analysis of peptide binding to all three human MHC-II loci. Most trans-allelic models for MHC-II peptides are restricted to HLA-DR and HLA-DP alleles. The TEPITOPEpan method (23), which is popular among immunologists and is the successor of a pioneer method in this field, is limited to HLA-DR alleles.

In this work we employed the definition of MHC polymorphic residue groups of the MULTIRTA method (2), which is more intuitive and inclusive than the MHC pseudo sequences of NetMHCIIpan (18), in developing our trans-allelic model. Utilizing new structural data for MHC-II complexes, which were not present when MULTIRTA was being developed, we extended that idea to cover all three human MHC-II loci. There exist similar exercises for capturing structural similarity among MHC molecules. The earlier works of Murthy and Stern (48)

sensitivity and specificity) by calculating an AUC value. The higher the AUC value the better the predictive performance. The plot shows the average difference between the AUC values for alleles belonging to the same locus obtained using our model vs. the corresponding values obtained using NetMHCIIpan, when similarity is defined based on either **(A)** the NN or **(B)** the LOO metric. Error bars denote SDs. Strikingly, our model performs better than NetMHCIIpan when predicting peptide binding to HLA-DQ using the NN metric (p-value = 0.015). For all other cases, both models have equivalent performance.

and Sinigaglia and Hammer (49) were mostly limited to HLA-DR molecules. But in a previous study (2), the "polymorphic residue groups" were shown to be useful for inferring the interaction energy. This physical way of capturing structural similarity among MHC molecules works well in our biophysical approach.

We compared how well our model predicts the MHC-II allele binding preferences of a novel peptide dataset vs. how well the state-of-the-art NetMHCIIpan method performs the same task. In this comparison we applied both our model and NetMHCIIpan to predict binding preferences for peptides known to either bind or not bind a reference allele after training both models using peptide-binding data for a second allele. For a given MHC-II locus, the second allele was the one that was most similar to the reference allele. Similarity was quantified based on either a leave-one-out approach or a nearest-neighbor approach (see Section 3.3). When using the nearest-neighbor approach, we found that our model performs significantly better than NetMHCIIpan in predicting peptide-binding preferences for HLA-DQ alleles (P-value = 0.015; **Figure 6A**). Furthermore, at the 95% confidence level, for all other cases, we found no significant difference between the performances of the two models (**Figure 6**).

These results are reassuring and indicate that our inversephysics approach constitutes a promising complement to the widely used pattern-based approach to peptide–MHC-II binding predictions. The outstanding predictive accuracy of the NetMHCIIpan is not the result of its theoretical basis. Rather it derives from the use of sophisticated ensembles of neural networks, which are very powerful. However, our method has a distinguishing advantage over all the advanced machine learning models in that it is more physically meaningful. It is worth noting that our prediction results of peptide–MHC-II interaction were based on *in silico* analysis of real data. Additional, *in vivo* and *in vitro* investigations are needed to further validate the reported predictive performance.

# **REFERENCES**


# **DATA AVAILABILITY STATEMENT**

The peptide dataset used to evaluate this method can be found in the [IEDB] (http://tools.iedb.org/main/datasets/), and the MHC-II sequences data also can be found in the [EMBL-EBI] (ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/fasta/DRB \_prot.fasta).

# **AUTHOR CONTRIBUTIONS**

All authors contributed equally to this work.

## **ACKNOWLEDGMENTS**

The support of the DST-NRF Centre of Excellence in Mathematics and Statistical Sciences (COE-MaSS) toward this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the authors and are not necessarily to be attributed to the CoE-MaSS. We also gratefully acknowledge the support of the Centre for High Performance Computing (CHPC) at Cape Town, South Africa, for providing us access to their computational facilities.

# **FUNDING**

This work was supported by the following grants: AD is funded by DST-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS, award number BA2017/050) and the African Institute for Mathematical Sciences (AIMS) South Africa; WN is funded by the AIMS Global Secretariat.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at https://www.frontiersin.org/articles/10.3389/fimmu.2018.01410/ full#supplementary-material.


class II molecules. *J Biosci Bioeng* (2002) 94(3):264–70. doi:10.1016/S1389- 1723(02)80160-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Degoot, Chirove and Ndifon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Cross-Reactive Bactericidal Antimeningococcal Antibodies Can Be Isolated From Convalescing Invasive Meningococcal Disease Patients Using Reverse Vaccinology 2.0

*Fadil A. Bidmos1 \*, Simon Nadel1,2, Gavin R. Screaton1 , J. Simon Kroll <sup>1</sup> and Paul R. Langford1*

*1Section of Paediatrics, Department of Medicine, Imperial College London, London, United Kingdom, 2St. Mary's Hospital, Paddington, London, United Kingdom*

#### *Edited by:*

*Rino Rappuoli, GlaxoSmithKline, Italy*

#### *Reviewed by:*

*Sanjay Ram, University of Massachusetts Medical School, United States Mariagrazia Pizza, GlaxoSmithKline, Italy*

> *\*Correspondence: Fadil A. Bidmos f.bidmos@imperial.ac.uk*

#### *Specialty section:*

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> *Received: 17 April 2018 Accepted: 29 June 2018 Published: 16 July 2018*

#### *Citation:*

*Bidmos FA, Nadel S, Screaton GR, Kroll JS and Langford PR (2018) Cross-Reactive Bactericidal Antimeningococcal Antibodies Can Be Isolated From Convalescing Invasive Meningococcal Disease Patients Using Reverse Vaccinology 2.0. Front. Immunol. 9:1621. doi: 10.3389/fimmu.2018.01621*

The threat from invasive meningococcal disease (IMD) remains a serious source of concern despite the licensure and availability of vaccines. A limitation of current serogroup B vaccines is the breadth of coverage afforded, resulting from the capacity for extensive variation of the meningococcus and its huge potential for the generation of further diversity. Thus, the continuous search for candidate antigens that will compose supplementary or replacement vaccines is mandated. Here, we describe successful efforts to utilize the reverse vaccinology 2.0 approach to identify novel functional meningococcal antigens. In this study, eight broadly cross-reactive sequence-specific antimeningococcal human monoclonal antibodies (hmAbs) were cloned from 4 ml of blood taken from a 7-monthold sufferer of IMD. Three of these hmAbs possessed human complement-dependent bactericidal activity against meningococcal serogroup B strains of disparate PorA and 4CMenB antigen sequence types, strongly suggesting that the target(s) of these bactericidal hmAbs are not PorA (the immunodominant meningococcal antigen), factor-H binding protein, or other components of current meningococcal vaccines. Reactivity of the bactericidal hmAbs was confirmed to a single ca. 35 kDa protein in western blots. Unequivocal identification of this antigen is currently ongoing. Collectively, our results provide proof-of-principle for the use of reverse vaccinology 2.0 as a powerful tool in the search for alternative meningococcal vaccine candidate antigens.

Keywords: *Neisseria meningitidis*, invasive meningococcal disease, reverse vaccinology 2.0, human monoclonal antibodies, vaccines

# INTRODUCTION

*Neisseria meningitidis* is a major obligate human pathogen that frequently colonizes the nasopharynx asymptomatically, in a state known as carriage (1). Occasionally, invasive meningococcal disease (IMD) occurs, through invasion of pharyngeal tissues, proliferation in blood (meningococcal septicemia), and crossing of the blood–brain barrier leading to meningitis (2, 3). More than 70,000

**18**

cases of IMD are reported annually worldwide with case fatality ratios between 5 and 15%, even with therapeutic intervention (4, 5). Debilitating neurological sequelae are common among survivors of IMD (6–8).

The use of currently available polysaccharide conjugate meningococcal vaccines has been effective against targeted serogroups (mainly serogroups A, C, W, and Y) within vaccinated populations (9, 10). For serogroup B strains, which account for more than 60% of IMD in the UK and Europe (11, 12), vaccine development has focused heavily on sub-capsular vaccine components owing to the unsuitability of the serogroup B capsule (13). One of these vaccines, 4CMenB (or Bexsero®), is a protein-based vaccine whose major components are the factor-H binding protein (fHbp) (variant 1.1), Neisserial heparin-binding antigen (NHBA variant 2), Neisserial adhesin A (NadA variant 3), and the detergent-extracted outer membrane vesicle component of the New Zealand epidemic strain (with PorA variant P1.4) (14). Like the polysaccharide conjugate vaccines, accruing data shows high effectiveness of 4CMenB (15). However, we are seeing a gradual recrudescence of carriage and disease to pre-vaccine levels through vaccine-driven strain replacement (16–19). In addition, there are concerns that the changing epidemiology of IMD (20–22) may lead to a significant reduction in the efficacy of the vaccines in the long term. These limitations, coupled with the huge potential of the meningococcus to generate extensive antigenic diversity (leading to vaccine/immune escape) (23) justify the search for novel vaccine candidate antigens.

Preclinical vaccine development methods are enriched by detailed analysis of the human immune response to etiological agents of infectious diseases. For example, with the development of high-throughput technologies, deep sequencing of the gene segments encoding the variable regions of antibody heavy (VH) and light (VL = κ or λ) chains in a given B cell repertoire is providing valuable information useful in understanding adaptive immunity to infections, autoimmunity, and malignancies (24, 25). Identifying the targets of antibodies of interest by cloning and *in vitro* expression of VH and VL chains of B-cell antibodies is a powerful approach, which can be utilized to inform on the functional immunogenicity of both known and novel antigens. The use of this approach, termed reverse vaccinology 2.0 (26), in the cloning of neutralizing human recombinant monoclonal antibodies [human monoclonal antibodies (hmAbs)] from patients convalescing from viral infectious diseases is well documented; the first studies in the use of reverse vaccinology 2.0 focused on the isolation and functional characterization of antibodies targeting the dengue, HIV, and influenza viruses (27–29). The power of the approach lies in the expression of paired VH and VL regions from individual plasmablasts or memory B cells; the output being the expression of hmAbs mimicking natural VH + VL combinations induced in the host.

Because of the transience of peak plasmablast circulation [reviewed in Ref. (30)] and the higher incidence of IMD among infants and toddlers (placing a limitation on blood sample volume), we aimed to assess whether reverse vaccinology 2.0 could be employed in the discovery of novel meningococcal antigens of vaccine potential. In this brief research report, we will outline findings relating to the following aims: (i) whether cross-reactive antimeningococcal hmAbs targeting surface proteins could be cloned from patient samples; and (ii) if these hmAbs possessed bactericidal activity against a wide panel of strains, specifically those not covered by the protein-based meningococcal vaccines.

# MATERIALS AND METHODS

#### Ethics Statement and Study Participants

Studies with human blood samples were approved by the London—Fulham Research Ethics Committee (Ref.: 11/LO/1982). Informed written consent was obtained from patients or their representatives in accordance with the Declaration of Helsinki. Patients were recruited following admission to the Imperial Healthcare (St. Mary's Hospital) Paediatric Intensive Care Unit (PICU), London, UK.

Patient SM-P02 was a 7-month-old baby who presented with fever, irritability, and reduced oral intake. Rapidly spreading petechial rash was detected a few hours after arrival at the hospital. Blood cultures confirmed meningococcal septicemia. Molecular typing of the isolate, M14-240312, revealed that it was a serogroup B strain (MenB). A 4 ml blood sample was collected 7 days post-admission to St. Mary's Hospital PICU, following confirmation of negative blood cultures.

#### Bacterial Strains

Assays performed with whole bacteria or cell lysates involved patient isolates M14-240312 and a 16-strain meningococcal panel reflecting the current genetic epidemiological prevalence in the UK (obtained from the UK Meningococcal Reference Unit, Public Health England, Manchester) (**Table 1**). This 16-strain panel includes 10 strains genotypically mismatched for all the major 4CMenB antigens. Typing information was obtained from the

Table 1 | MenB strains composing the 17-strain screening panel.


*a Dash (–) denotes absence of NadA in isolate.*

*bND denotes no data, i.e. no typing information available for this antigen.*

*Neisseria* MLST database hosted on https://pubmlst.org/neisseria/. Meningococci were routinely grown on 5% horse blood agar or in brain heart infusion broth supplemented with 5% Levinthal's.

# Cell Sorting, cDNA Synthesis, and VH/VL Cloning

Peripheral blood mononuclear cells (PBMCs) were extracted from a 4 ml blood sample using a density gradient centrifugation method (Leucosep) and sorted singly into individual wells of 96-well plates containing catch buffer (10 mM Tris pH 8.0, 10 U RNAsin) as previously described (31). PBMCs were incubated with a cocktail of anti-human monoclonal antibodies targeting CD3 (PerCP/Cy5.5), CD14 (PerCP/Cy5.5), CD19 (RPE), CD20 (PerCP/Cy5.5), CD27 (FITC), CD38 (APC), and CD56 (PerCP/Cy5.5) for 30 min on ice, in the dark. Stained PBMCs were analyzed using a BD FACSAria III cell sorter. Plasmablasts were gated as follows: CD3<sup>−</sup>, CD14<sup>−</sup>, CD19<sup>+</sup>, CD20<sup>−</sup>, CD56<sup>−</sup>, CD27high, and CD38high. A single freeze-thaw cycle was used to lyse cells and release RNA. cDNA was synthesized from released RNA using the QIAGEN OneStep RT-PCR kit, as per manufacturer's protocols. A nested PCR was used to amplify the VH and VL regions; primers contained restriction endonuclease sites, which facilitated cloning into respective AbVec-IgH (*AgeI* + *SalI*), AbVec-Igκ (*AgeI* + *BsiWI*), or AbVec-Igλ (*AgeI* + *XhoI*) expression vectors.

# Expression of hmAbs in Human Embryonic Kidney (HEK-293) Cells

HEK-293 cells were transiently transfected with cognate plasmid pairs using polyethyleneimine as transfection agent, as previously described (31). Culture supernatants were harvested after 72 h. HmAbs were purified from culture supernatants with single-use Protein G spin columns (Ab SpinTrap, GE Healthcare), as per manufacturer's instructions.

# Assessment of hmAb Specificity

HmAb reactivity to surface-bound meningococcal antigens was assessed using indirect cell-based ELISAs, as previously described (32). Briefly, inactivated bacterial whole cell suspensions were normalized to an OD600 of ~0.5. Whole cells (100 µl of cell suspensions) or 1 µg of 4CMenB in Carbonate-Bicarbonate buffer (Sigma-Aldrich, cat. no. C3041) were transferred into designated wells of flat-bottom polystyrene 96-well plates and incubated at 4°C overnight. Wells were subsequently incubated with 200 µl of blocking buffer (PBS, 0.05% Tween-20, 1% BSA) for 1 h at room temperature prior to addition of 100 µl of an appropriate dilution of hmAbs or plasma IgG. Following a 1-h incubation, wells were washed thrice to remove unbound antibodies. A 1:2,000 dilution of an anti-IgG alkaline phosphatase conjugate (Sigma-Aldrich, cat. no. A9544) was used to probe wells for 1 h at room temperature. Wells were washed five times before addition of a phosphatase substrate (Bio-Rad, cat. no. 172-1063) as per the manufacturer's instructions. Signal detection was performed at OD405 using a microplate reader.

Further investigations into hmAb reactivity were performed using western blotting, as described elsewhere (33).

# Serum Bactericidal Activity (SBA) Assay

Functional activity of antimeningococcal IgG was determined using the standardized SBA assay (34). Briefly, 103 CFU meningococci were incubated with IgG (hmAb or purified total plasma IgG) (final assay concentration of 25%) and exogenous human complement (25%) for 90 min in a humidified CO2 incubator at 37°C, with gentle shaking. IgG that effected a reduction in meningococcal CFU by >50% after 90 min compared to negative controls (complement only; IgG only) was considered to possess SBA activity.

# RESULTS

# Induction of a Functional Immune Response in Patient SM-P02

Flow cytometric analysis showed that the circulating plasmablast population in patient blood was low, measured at 0.3% (**Figure 1**). To assess whether these plasmablasts represented a functional response to the meningococcal infection, total IgG purified from

patient plasma was assayed for bactericidal activity. Purified total IgG, rather than plasma, was assayed because of the presence of administered antibiotics in patient plasma. Reactivity of purified SM-P02 IgG to the patient isolate (M14-240312) and two other strains (MC58, which expresses the fHbp variant 1.1 in 4CMenB; and M08-240276, which is mismatched for all 4CMenB antigens) was assessed in ELISAs prior to assessment of SBA activity against the three strains. SM-P02 IgG bound to the infecting isolate, M14-240312, in cell-based ELISAs with corresponding SBA activity against the isolate. Despite significant binding of SM-P02 IgG to MenB strains MC58 and M08-240276 in ELISAs, no bactericidal activity was discerned against these strains. Thus, the induction of a specific bactericidal response in the patient to M14-240312 was confirmed (**Figure 2A**). Furthermore, the antimeningococcal immune response in patient SM-P02 was exclusive of the antigenic components of 4CMenB as no reactivity of SM-P02 plasma IgG to the vaccine antigens was discerned in ELISAs, under the experimental conditions employed in this study (**Figure 2B**).

## *In Vitro* Cloning and Expression of Antimeningococcal Antibodies From Patient Plasmablasts

Amplification of the VH and VL (κ or λ) gene segments from 336 plasmablasts occurred at a PCR efficiency of 80%. Sequencing of these variable region gene segments showed biased usage of the IGHV3-7:IGλV2-8 gene pair, accounting for 16.7% of V gene usage. Of 139 recombinant hmAbs that were successfully expressed in HEK-293 cells, eight (**Figure 3A**) were reactive with the infecting MenB strain (M14-240312) and, in varying degrees, with the other members of the MenB strain panel (**Figure 3B**). None of these eight hmAbs were reactive with *Actinobacillus pleuropneumoniae* cells (a porcine respiratory pathogen). All antimeningococcal hmAbs possessed the IGHV3 class gene, albeit with different subclasses and light chain gene pairs; the exception being P02-4F2 (IGHV4-59:IGκV1/1D-39). Two of these hmAbs (P02-5E10 and P02-6E9), however, possessed identical V gene pairs (IGHV3-30:IGκV4-1).

Specificity of the antimeningococcal hmAbs for the serogroup B capsule was ruled-out using the periodate assay, described in Ref. (35) (data not shown). Denaturing western blot data showed highly specific reactivity of three hmAbs (P02-1A1, P02-5E10, and P02-6E9) with a ~35 kDa meningococcal protein present in six members of the 17-strain panel (~35%). The target epitope of hmAb P02-1A1 was present in more strains (*n* = 6) than those of hmAbs P02-5E10 (*n* = 3) and P02-6E9 (*n* = 2) (**Figure 4**). The other five hmAbs, which were broadly cross-reactive with the strain panel (collectively recognizing 12 out of 17 strains), were non-reactive in denaturing western blots.

## Cloned Antimeningococcal hmAbs Possess SBA Activity Against Patient Isolate M14-240312

To assess the functional activity of cloned recombinant hmAbs, SBAs were performed using purified hmAbs at a final assay concentration of 80 µg/ml. Three hmAbs (P02-1A1, P02-5E10, and P02-6E9), in synergy with exogenous human complement, possessed SBA activity against strain M14-240312. All other hmAbs were non-bactericidal (**Figure 5**).

# SBA Activity of hmAbs Is Strongly Linked to Surface Expression of Target Epitopes

To assess the breadth of SBA exhibited by hmAbs P02-1A1, P02-5E10, and P02-6E9, each hmAb was assayed for bactericidal activity against the 17-strain panel. HmAb P02-1A1-mediated killing of the patient isolate, M14-240312, and four other strains to which it bound in immunoassays (M07-240646, M07-240657, M08-240014, and M10-240474). Only one strain, M07-240669,

12. M08-240164; 13. M08-240276; 14. M10-240474; 15. M10-240476; 16. M10-240480; 17. M11-240016; 18. M11-240123; 19. *A. pleuropneumoniae*.

which showed positive reactivity with hmAb P02-1A1 in immunoassays was resistant to SBA activity of P02-1A1.

Consistent with results obtained with P02-1A1, hmAb P02-5E10 possessed SBA activity against strain M10-240474 (to which it bound in immunoassays) but not strain M07-240669, suggesting resistance of M07-240669 to complement-dependent killing. Interestingly, all three bactericidal hmAbs could mediate killing of other strains to which no discernible reactivity was found in immunoassays; hmAb P02-1A1 reproducibly mediated killing of strain M08-240276 while both P02-5E10 and P02-6E9 mediated killing of strain M10-240474 (**Figure 6**; **Table 2**).

#### DISCUSSION

Despite vaccination, the ongoing threat from IMD is unquestionable, especially in the African meningitis belt, justifying the search for novel approaches to vaccine candidate discovery. Reverse vaccinology 2.0 has been useful in understanding the human adaptive immune response to disseminated viral (36, 37) and bacterial infections (38, 39). Hence, it presents as a potentially powerful tool that may identify novel meningococcal vaccine candidate antigens or reinforce the candidacy of some known antigens.

In the present study, we succeeded in isolating functional hmAbs from a 7-month-old patient convalescing from IMD despite challenging practical issues. The most affected age group for IMD is 6- to 24-months, and patients presenting at the St. Mary's PICU were mostly in this age group—children from whom the availability of sufficient quantities of blood sample for the isolation of PBMCs was very limited (4 ml). The isolation of specific antimeningococcal plasmablasts was further complicated by a paucity of knowledge on the magnitude and timing of peak plasmablast responses following primary meningococcal infection in infants. The plasmablast population measured in the single sample obtained at 7 days post-admission analyzed in this study was 0.5%. Plasmablast population following acute infection with dengue virus (37) occurred at 6–7 days postinfection. While the induction of peak plasmablast response following infection with nosocomially acquired bacteria such as *Acinetobacter baumanii* occurred in most patients at 8–16 days with 40–80% of the total B-cell population being plasmablasts, they composed >2% of B-cell population in patient samples obtained at 0–7 days postinfection (38). The

Figure 4 | Specificity of patient-derived antimeningococcal (hmAbs) for linear epitopes in denaturing western blots. Lysates from normalized suspensions of selected MenB strains were used as template in western blot experiments. A 1:1,000 dilution of each hmAb was employed.

low plasmablast induction seen in IMD patient SM-P02 may, therefore, be patient-specific and the timing of the peak level of cycling plasmablasts was likely missed (less than or more than 7 days post-admission). It is also plausible that low plasmablast induction is characteristic of a population at particular risk of IMD. Larger patient cohorts will be required to investigate this theory. Optimization of the IgG-cloning technique to increase the number of antimeningococcal plasmablasts using a recently published Ig capture-based assay (40) should improve productivity of the approach.

Reactivity of the antimeningococcal hmAbs isolated in this study with heterologous MenB strains possessing disparate PorA types strongly suggests that the antigen target of the hmAbs is not PorA, the immunodominant meningococcal antigen. It is likely that the three bactericidal hmAbs are reactive with a similar linear epitope contained in a yet-to-be identified ca. 35 kDa antigen. This 35 kDa antigen is surface-expressed in ~35% of the MenB strains, with heterologous 4CMenB antigen types, employed in this study. It is pertinent to note, however, that the ~35% presence of the linear epitope is specific to hmAbs generated in this study, and the antigen which it composes may possess more immunogenic surface-exposed epitopes and exhibit a wider presence among MenB and other meningococcal strains. One of these bactericidal hmAbs, P02-1A1, is reactive with a more diverse strain panel and possesses a higher bactericidal titer than hmAbs P02-5E10 and P02-6E9. This could be a result of somatic hypermutation of the P02-5E10 and P02-6E9 antibodies leading to the production of hmAb P02-1A1 with enhanced binding efficiency (affinity and/or avidity). Determination of the unequivocal identities of these hmAbs and their reactivities/ SBA with other serogroups is currently ongoing. Taken together, data generated so far on these bactericidal hmAbs (P02-1A1, P02-5E10, and P02-6E9) strongly suggest that their target is not PorA, fHbp, or any of the 4CMenB recombinant antigens signifying its novelty and most importantly, candidacy for inclusion in future vaccine preparations. It is acknowledged, however,

hmAbs normalized to final assay concentration of 80 µg/ml, in three biological replicates. Percentage survival after 60-min incubation (T60) in the presence of 25% human complement versus inoculum (T0) is shown.

Table 2 | Summary table showing reactivity of the bactericidal human monoclonal antibodies—P02-1A1, P02-5E10, and P02-6E9—in immunoassays (ELISA and western blot), and SBA activity versus the 17-strain MenB panel.


*I, immunoassay (ELISA and/or western blotting); SBA, serum bactericidal assay. Plus (*+*) denotes positive reactivity in immunoassays or bactericidal activity in SBA. Dash (–) denotes no discernible reactivity in immunoassays or bactericidal activity in SBA.*

that while our current data suggest that the 35 kDa antigen is absent from the NZ OMV component of 4CMenB, further work is required to determine its unequivocal absence. Given the need for protein antigens that would compose improved or entirely novel cross-serogroup antimeningococcal vaccines, data from this study show that reverse vaccinology 2.0 can be employed as a useful tool in identifying functionally immunogenic antimeningococcal antigens.

# ETHICS STATEMENT

Studies with human blood samples were approved by the London—Fulham Research Ethics Committee (Ref.: 11/LO/ 1982). Informed written consent was obtained from patients or their representatives in accordance with the Declaration of Helsinki. Patients were recruited following admission to the Imperial Healthcare (St. Mary's Hospital) Paediatric Intensive Care Unit (PICU), London, UK.

# AUTHOR CONTRIBUTIONS

Conceptualization: PL, JK, SN, and GS; investigation: FB and PL; writing—original draft: FB; writing—reviewing and editing: FB, PL, SK, SN, and GS.

# ACKNOWLEDGMENTS

We thank Stuart Gormley, Sobia Mustafa, and Michael Levin for their help in obtaining the patient sample, Hedda Wardemann for permitting the use of the AbVec vectors, and the Mongkolsapaya group for their help with the hmAb cloning protocol. This work was funded by a research grant from John and Michelle Bresnahan *via* MeningitisNow (awarded to PL, JK, SN, and GS) and an Imperial College Confidence-in-Concept Award (PL and FB).

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer MP and handling Editor declared their shared affiliation.

*Copyright © 2018 Bidmos, Nadel, Screaton, Kroll and Langford. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A Web Resource for Designing Subunit Vaccine Against Major Pathogenic Species of Bacteria

Gandharva Nagpal 1,2†, Salman Sadullah Usmani 1,3† and Gajendra P. S. Raghava1,3 \*

*<sup>1</sup> Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India, <sup>2</sup> Centre for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India, <sup>3</sup> Center for Computational Biology, Indraprastha Institute of Information Technology, Okhla, India*

Evolution has led to the expansion of survival strategies in pathogens including bacteria and emergence of drug resistant strains proved to be a major global threat. Vaccination is a promising strategy to protect human population. Reverse vaccinology is a more robust vaccine development approach especially with the availability of large-scale sequencing data and rapidly dropping cost of the techniques for acquiring such data from various organisms. The present study implements an immunoinformatic approach for screening the possible antigenic proteins among various pathogenic bacteria to systemically arrive at epitope-based vaccine candidates against 14 pathogenic bacteria. Thousand four hundred and fifty nine virulence factors and Five hundred and forty six products of essential genes were appraised as target proteins to predict potential epitopes with potential to stimulate different arms of the immune system. To address the self-tolerance, self-epitopes were identified by mapping on 1000 human proteome and were removed. Our analysis revealed that 21proteins from 5 bacterial species were found as virulent as well as essential to their survival, proved to be most suitable vaccine target against these species. In addition to the prediction of MHC-II binders, B cell and T cell epitopes as well as adjuvants individually from proteins of all 14 bacterial species, a stringent criteria lead us to identify 252 unique epitopes, which are predicted to be T-cell epitopes, B-cell epitopes, MHC II binders and Vaccine Adjuvants. In order to provide service to scientific community, we developed a web server VacTarBac for designing of vaccines against above species of bacteria. This platform integrates a number of tools that includes visualization tools to present antigenicity/epitopes density on an antigenic sequence. These tools will help users to identify most promiscuous vaccine candidates in a pathogenic antigen. This server VacTarBac is available from URL (http://webs.iiitd. edu.in/raghava/vactarbac/).

Keywords: reverse vaccinology, vaccine designing, immunotherapeutic, epitopes, antigen, virulence factor, essential genes

# INTRODUCTION

Evolution of existing bacterial pathogens and emergence of new pathogenic strains are continuously causing problems to mankind. To address the pathogenic challenges to the human health, researchers developed various vaccines and antibiotics during the twentieth century. A worldwide usage of such therapeutic strategies led to the expansion of structural as well as

#### Edited by:

*Pedro A. Reche, Complutense University of Madrid, Spain*

#### Reviewed by:

*Giampiero Pietrocola, University of Pavia, Italy Ghita Ghislat, INSERM U1104 Centre d'immunologie de Marseille-Luminy, France*

> \*Correspondence: *Gajendra P. S. Raghava raghava@iiitd.ac.in*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> Received: *29 June 2018* Accepted: *13 September 2018* Published: *02 October 2018*

#### Citation:

*Nagpal G, Usmani SS and Raghava GPS (2018) A Web Resource for Designing Subunit Vaccine Against Major Pathogenic Species of Bacteria. Front. Immunol. 9:2280. doi: 10.3389/fimmu.2018.02280*

**27**

behavioral properties in various bacterial species. Diverse defense mechanisms were acquired by the known bacterial species that resulted in antibiotic tolerance or drug resistance, currently a major concern during administration of known antibiotics in pathogenic conditions. Lord Jim O'Neil and his team estimated that antimicrobial resistance could cause 10 million deaths a year by 2050 (1).

Vaccination has proved to be a promising strategem to protect human population from dreadful diseases like smallpox, polio, etc. Yet, vaccines are currently unavailable for many infectious diseases such as melioidosis caused by Burkholderia pseudomallei (2). Even among the existing vaccines, BCG which is heavily used vaccine against tuberculosis, is less effective in the immunocompromised and adults in tropical and subtropical region (3) and its efficacy is rapidly decreasing (4). Anthrax, DPT, Hib, Meningococcal vaccine, Pneumococcus heptavalent conjugate polysaccharide vaccine (PCV7), Pneumococcus 13 valent conjugate polysaccharide vaccine (PCV13) etc are generally considered effective at controlling disease symptoms. However, a major concern of recent vaccination programs tends to be the preference of vaccine escape mutants, i.e., the repertoire of immune targets in the pathogen is different from that used in the vaccine formulation (5). Antigenic variation and continuous bacterial evolution has necessitated designing of better vaccine candidates.

Virulence factors (VFs) have been explored to design vaccines against several pathogens such as Porcine enteric coronaviruses (CoVs) (6), Vibrio cholerae, enterotoxigenic Escherichia coli (7) etc. Bacterial virulence factors (VFs) are captivating, as they play a major role in establishment of infection to cause pathological conditions while surviving in a hostile environment (8). Virulence factors are thus defined as the gene products enabling the pathogen to enter, replicate, and persist in a host even in a small inoculum. These include cell surface proteins that facilitate bacterial attachment, cell surface carbohydrates which protect a bacterium, bacterial toxins, and hydrolytic enzymes that subsidize pathogenicity of bacterium (8).

In addition to virulence factors, genes essential for the living of the bacterial cell, have proved to be attractive drug targets in various pathogens such as the fungus Aspergillus fumigatus (9, 10), protozoan Leishmania species, (11) as well as bacteria like Streptococcus pneumoniae, Haemophilus influenza, and Moraxella catarrhalis (11). Essential genes are indispensable for cellular metabolism.

The last two decades witnessed an embracement of a more revolutionary vaccine development approach, namely the information driven "reverse vaccinology" which involves identification of vaccine candidates by genome or proteome sequence analysis as the primary step. High cost and practical limitations of feasibility and time have engendered a predilection toward computational techniques for rational reduction in the number of vaccine candidates to be experimentally tested. The epitope prediction methods have made possible searching of sequence sets as large as the whole proteomes of organisms with various restriction filters to arrive at novel vaccine candidates.

In order to facilitate the scientific community, researchers have developed numerous pipelines, databases, and tools for identifying vaccine candidates against specific types of diseases (12, 13). Recently, researchers have proposed potential vaccine candidates against diverse pathogens like Ebola (14), Zika (15), M. tuberculosis (16), Shigella (17), Hepatitis B Virus (18), Helicobacter pylori (19), Vibrio cholera (20), Dengue (21), etc. using in silico predictions. These predicted candidates help reduce the burden in experimental labs for the researchers, thus saving time and cost of vaccine development. Most of the abovementioned studies only focus on a single pathogen and lacked identification of adjuvant candidates.

In this study, we have systemically arrived at epitope based vaccine candidate against major pathogenic bacterial species. **Table 1** provides a summary of the major pathogenic manifestations of the bacterial species considered in the present study. Both, the virulence factors and the essential genes, have been considered to predict epitopes. It is a well-known fact that, diverse bacterial species and strains have evolved multiple mechanisms for evading the host immune responses, not only by producing likes of host inhibition pathway receptor ligands ("molecular mimicry") but also by producing virulence factors resembling the intermediates of the host inhibitory pathways. Hence, peptides found in proteomes of the 1000 human genomes were removed while implementing the pipeline. To assist the scientific community all the potential vaccine candidates have been compiled and presented in the form of a database (http:// webs.iiitd.edu.in/raghava/vactarbac/).

# MATERIALS AND METHODS

#### Raw Data Source

Proteins were taken as raw data to predict the potential vaccine candidates. We considered both virulence factors as well as essential genes of pathogenic bacteria.

#### Virulence Factors

The Virulence Factor Database (VFDB) is a resource for housing the Virulence Factors (VFs) that enable a microorganism to establish itself on or within a host thus bestowing the pathogen with a potential to cause disease (22). VFDB provides two hierarchical non-redundant datasets, i.e., sets A and B. Set A is a core dataset, covering genes associated with experimentally verified virulence factors. Therefore, we have selected protein sequences of core dataset A for further analysis. The link http://www.mgc.ac.cn/VFs/Down/VFDB\_setA\_pro.fas. gz provides protein sequences of the VFDB database in FASTA format when uncompressed using the command "gunzip" on a unix command line or using the tool "Winzip" on a Windows platform.

#### Essential Genes/ Proteins

Database of Essential Genes (DEG) maintains currently known essential genomic elements, such as protein-coding genes and non-coding RNAs, among bacteria, archaea, and eukaryotes (22). These genes are mandatory for survival of the corresponding bacterium. DEG contains 18286 proteins encoded form genes experimentally found to be essential for the survival of the 32 bacteria. For our study, we restricted the analysis of bacterial TABLE 1 | Bacterial species considered in the study with the pathological conditions caused by them.


essential genes belonging to 14 species that have also been included in VFDB, for the convenience of comparison and combining with VFDB analysis.

#### Target Selection for Vaccine Identification

The most important aspect of subunit vaccine designing is vaccine target selection. In case of data downloaded from VFDB (set A), it contains 2595 proteins (11) from 46 bacterial species. We have matched these bacterial species into DEG database that resulted into 14 common bacterial species. Therefore, our dataset contains only those bacterial species, which have atleast one virulence factor and one protein encoded by essential genes from VFDB and DEG respectively. DEG contains 9664 proteins corresponding to these pathogenic bacteria. These are the essential proteins of pathogenic bacteria. Some specific bacterial components, because of their function proved to be better possible candidate for designing subunit vaccine (20, 23– 25) therefore; we have selected some specific significant proteins.

#### Membrane Protein

These are the protein emerged from the membrane of various pathogenic bacteria. There are evidences that membrane proteins mediate pathogen entry and colonization and are also the primary target of adaptive immune response (24). A total of 420 membranous proteins were selected from DEG for further processing.

#### Envelope Protein

These proteins are derived from the envelope of various pathogenic bacteria and are essential for their survival. These are important for attachment to host cell surface and also responsible for facilitating an immune response in the host cell (25). A total of 36 proteins were screened for further processing.

#### Secretion Protein

Fifty-two secretory proteins corresponding to bacterial species of our interest were deposited in DEG and play important role in their survival. Secretory system of pathogenic bacteria has been heavily exploited for screening of new possible vaccine (26, 27). These proteins play many roles in promoting bacterial virulence, from enhancing attachment to eukaryotic cells, to scavenging resources in an environmental niche, to directly intoxicating target cells and disrupting their functions (28).

#### Repair Protein

Bacterial damage repair mechanisms have broader roles encompassing responses to virulence as well as stress (29). Therefore we have also considered 38 proteins associated with the repair mechanism of various pathogenic bacteria.

# Epitope Prediction Pipeline Generation of Nona-Peptides

In order to design new vaccine candidate, first we have generated nona-peptides. Nona-peptides were 9-mer sequences (9 residues continuous stretch of peptide) originated from the essential or virulent proteins selected for the study. In order to avoid redundancy, we have removed all duplicated nona-peptides.

#### Removal of Self-Epitopes

Self-tolerance must also be considered in vaccine designing, as body's immune system rarely acts against self-epitopes**.** Therefore, all the nona-peptides, which are also present in human body need to be removed. To achieve this, we mapped nonapeptides on 1000 human proteome, and removed all the nonapeptides which are 100% identical.

#### Epitope Prediction

Our subsequent objective was to select peptides, which could activate human immune system, and generate memory cells. Therefore, we have used a pipeline for predicting different kinds of immune epitopes among these nona-peptides. This pipeline predicts; (i) B-cell epitopes using LBtope (30) (ii) MHC class II binders using ProPred (31) (iii) T-cell epitopes using CTLpred (32) and (iv) vaccine adjuvant using VaxinPAD (1).

#### IEDB Mapping

In addition to these tools, the experimentally reported epitopes present in the Immune Epitope Database (IEDB) were also mapped on the vaccine target proteins selected for this study. **Figure 1** shows the complete workflow of this study.

#### Database Web Interface

All the predicted epitopes have been stored and presented on a web-portal named as VacTarBac. It is built on Apache HTTP server (version 2.2.17), which is installed on machine with Ubuntu as operating system. The responsive front-end, which is suitable for mobile, tablet, and desktop, was developed using HTML5, CSS3, PHP5, and JavaScript. MySQL (a relational database management system, version 5.0.51 b) was used at the back-end to manage the data.

## RESULTS

# Screening of Bacterial Species

The VFDB virulence factor proteins download link is available at http://www.mgc.ac.cn/VFs/Down/VFDB\_setA\_pro.fas.gz, providing all the sequences in FASTA format with the headers providing the information for the protein, for example, the VFDB id, source organism, etc. From these headers, the source organism names were extracted. The DEG is available for download at http://www.essentialgene.org in the form of FASTA sequences too. Only those sequences from DEG were extracted that sourced from organism species (indicated by the headers) present in the VFDB headers too. Finally, the protein sequences from the VFDB and DEG belonged to 14 bacterial species.

# Screening of Target Proteins

These 14 bacterial species contain 1459 virulence factor as stated in VFDB. All these 1459 virulent factors were selected for the further study. Previous studies have advocated that bacterial proteins at specific cellular locations (20, 23–25, 33) or performing specific functions essential for cell (34) are potential candidates for designing subunit vaccine. Therefore, on the basis of function and localization with in the cell, 546 Proteins from the 14 bacterial species, were categorized as membrane, envelope, repair, and secretory proteins. **Table 2** shows distribution of screened target proteins among selected 14 bacteria.

The most important aspect of this rational determination of vaccine targets was to extract proteins common in both VFDB and DEG dataset. Our analysis shows that 21 IDs from VFDB dataset were identical to 35 IDs of DEG dataset. (**Supplementary Table 1**) These are 21 proteins from 5 bacterial species. We consider these as the most suitable vaccine targets, being both, virulence factors as well as gene products essential for survival.

### Epitope Predictions

After generating the nona-peptides, redundant nona-peptides as well as self-antigens were removed by mapping on 1000 human proteome. For rest of the peptides, we predicted the immunogenicity using our prediction pipeline. LBtope, ProPred, CTLpred, and VaxinPAD helped us identify B-cell epitopes, MHC class II binders, T-cell epitopes, and adjuvants respectively. These tools integrated and implemented as a pipeline helped identify the immunogenic regions of the individual proteins in the form of nona-mer epitopes. On the other hand, the visualization tools integrated in the platform "VacTarBac" helped envisage the

stretches of lengths longer than nona-mer having high density of predicted epitopes. The user may anticipate that these regions could prove to be antigens effective in the form of vaccines. Thus, for experimental investigation of the sequences that may prove to be effective vaccines, sequences longer than 9-mers containing numerous predicted epitopes could be taken up as a better strategy for the development of vaccines.

The epitopes predicted individually by each tool of our pipeline are too many to be tested experimentally. (**Table 3**) Therefore, we have applied an intuitive method to arrive at a reasonable number. (**Figure 2** and **Supplementary Table 2**) Epitopes were separately identified for T-Cell and B-Cell categories. In case of T-Cell epitopes only those were retained that were predicted T-Cell epitopes, MHC-II Binders as well as adjuvants. For B-Cell epitopes, peptides predicted to be B-Cell epitopes, MHC-II Binders and adjuvants were finalized for recommendation as therapeutics.

#### Epitopes Assessed to be the Best Vaccine Candidates

The aim of this study is to enhance the vaccine designing against various pathogenic bacteria by predicting potential MHC binder, B cell and T cell epitopes as well as vaccine adjuvants using a prediction pipeline. Yet, the idea of using a pipeline instead of the prediction tools individually was intended to pose stepwise filters on the numerous peptides that may be predicted as epitopes for different arms of the immune system. These filters reduce the number of epitopes to be experimentally tested and lead to a final set of peptide sequences with more possibility of activating immune system as these are recommended by all the individual tools in the prediction pipeline. Epitopes positively predicted by all the prediction tools, provided coverage of 13 out of 14 bacterial species. An organism-wise as well as vaccine target protein category-wise representation as shown in **Table 4** provides more comprehensive analysis of the epitopes assessed

#### TABLE 2 | Bacterial species-wise distribution of Target Proteins.


TABLE 3 | Distribution of proteins, nona-peptides and predicted epitopes among various categories.


to be the "best vaccine candidate" using all the prediction tools.

#### Best Possible Antigenic Proteins

As stated earlier, the study started with 1459 virulence factor and 546 essential proteins, we have predicted epitopes from all of the generated nona-peptides. Some proteins, have high number of predicted MHC binders, T cell eptiopes, B cell epitopes as well as vaccine adjuvants. For example, the peptide syntase (pyoverdine) protein, a virulence factor of Pseudomaonas aeruginosa results into 8197 predcited epitopes. Such proteins or regions within the proteins could be wisely used as potential antigens to stimulate the immune system (**Supplementary Table 3**). **Figure 3** shows the portion of an essential protein from Bacillus subtilis, which has a high number of predicted epitopes. This indicates that instead of experimenatlly validating a single nona-peptide, taking a longer portion of the protein, having overlapping predicted epitopes, could prove to be a better approach.



*Considering all the epitopes together in this table, these belonged to 13 bacterial species out of the total 14 species considered for the study.*

FIGURE 3 | Mapping of predcited epitopes on one of the essential protein of *Bacillus subtilis* as (A) user friendly interactive Java-enabled view and (B) traditional simpler view. The blue colored sequences are the predicted 9-mer epitopes starting from red colored amino acid.

#### Webserver Implementation

To assist the scientific community in expediting the peptidebased vaccine designing against pathogenic bacteria, all the predicted potential epitopes were compiled in the form of a database and were provided as a web-based service (http:// webs.iiitd.edu.in/raghava/vactarbac/). A user can browse all the

recommended potential B cell and T cell epitopes designed by targeting virulence factors as well as proteins encoded by essential genes of all 14 bacterial species. Browsing by the pathogenic bacteria as well as targeted proteins considered while designing of potential epitopes is also implemented. A user can also browse the top 5 antigens from the enlisted bacteria. In addition, a list of proposed vaccine candidates is also provided in the browse option (**Figure 4**). For the ease of user, results have been displayed in tabular and graphical form as well as an interactive visualization.

# DISCUSSION AND CONCLUSION

The twentieth century witnessed a remarkable success in development of antibiotics against various pathogenic bacteria but the emergence of drug resistance and toxicity caused a shift in the antimicrobial strategies toward peptide-based therapies. Several anti-bacterial peptides have been studied to check the pestiferous effects of pathogenic bacteria (35, 36). Rapid bacteriocidal activity and low propensity for resistance development are some of the major assets of these peptides, but high cost, limited stability and unknown toxicology, and pharmacokinetics are the major disadvantages (37). Contrary to antibiotics or peptide-based drugs, instigating host immune responses against the pathogen offers multifaceted management of the invasion and has been successfully achieved in the past by the use of vaccines like DPT, BCG etc against few deadly infectious diseases like tenaus, diptheria, tuberculosis etc. Consequently, active search for vaccines is underway currently for many infectious diseases caused by pathogenic bacteria, for example, melioidosis caused by Burkholderia pseudomallei (2). Even in case of tuberculosis, for which multiple lines of drugs and BCG vaccines are available; reduced efficacy of BCG and antibiotic resistance forced investigation of better and more potent vaccines. A recent computational effort has gone up to the strain level comparison and considered proteins as vaccine targets that were shared among tuberculoid, non-tuberculoid, and vaccine strains (16). Continuous efforts are being made by researchers to identify novel vaccine candidates against several pathogens such as Ebola virus (14), Zika virus (15), vaccinia virus (38), Neisseria meningitidis (39), Corynebacterium pseudotuberculosis (40) Edwardsiella tarda, and Flavobacterium columnare; fish pathogens (41, 42).

In past, several in silico, protein-based vaccine candidates have been identified, using peptide conservation score and predicted peptide-MHC binding (43) as well as molecular docking and MHC-peptide complex stabilization assay (18). In this study, we employed immunoinformatic tools to identify antigen and epitope-based vaccine candidates, having the potential to evoke one of the many arms of immune system. The present study includes proteins; virulence factors as well as proteins encoded by genes essential for survival from multiple pathogenic bacteria. The pipeline created for the identification of vaccine candidates, consists of widely used, accurate and recommended tools (12, 13). Our pipeline also includes a tool to identify peptide-based adjuvant candidates, a major advantage over other studies. The numbers of epitopes predicted by the individual tools within the pipeline that are not present in the proteome derived from the 1000 human genomes, were in thousands. For arriving at economical number of epitopes that could be tested experimentally, an intuitive rational criterion was to extract epitopes predicted positive by more than one tool in the pipeline. This would effectively mean that the selected epitope could have a predictive capability of evoking different arms of the immune system.

Upon executing the rational criteria, 1459 virulent factors from VFDB yielded 3622 nona-peptides having ability to bind with MHC-II, activate T-cell response and could act as selfadjuvants. Beside this, peptides are predicted to bind with MHC-II, activate B-cell response as well as self-adjuvants. Similarly, 420 membrane proteins from the DEG provided 1307 predicted T-cell epitopes and 155 predicted B-cell epitopes that are also MHC II binders and adjuvants. The 36 DEG envelope proteins when subjected to the epitope prediction pipeline, yield 104 predicted T-cell epitopes and 20 predicted B-cell epitopes with probability that these will bind MHC Class II molecules and would also be adjuvants. In cases of 52 secretory proteins and 38 repair proteins taken from the DEG, the predicted T-cell epitopes, and B-cell epitopes were respectively 173 and 23 for secretory proteins and 138 and 44 for the repair proteins, all of them being also the positively predicted MHC II binders as well as adjuvants.

In conclusion, the ultimate aim of the present study was the identification of epitopes capable of activating multiple wings of the human immune system arrived at by filtering the epitopes through stringent criteria. Combining the results of the VFDB and DEG datasets, this study was able to identify 252 unique epitopes predicted to be T-cell epitopes, B-cell epitopes, MHC II binders and Vaccine Adjuvants not present in the human proteome (belonging to the 1000 genomes) extracted from proteins (essential gene products and/or virulence factors) of 13 bacterial species. All the recommended vaccine candidates have been stored in a repository; VacTarBac (http://webs.iiitd.edu.in/ raghava/vactarbac/) freely available on the world-wide web.

Peptides are the promising candidate as immunotherapeutic but are less immunogenic when used alone as vaccine. They need potent immunostimulatory adjuvants to effectively activate the innate and adaptive arms of immune system. In this study, we have implemented VaxinPAD in our prediction pipeline, which aids in predicting based vaccine adjuvant, thus strengthening the study. Beside this, peptide formulation is a critical task and a formulation scientist must overcome the chemical instability of peptides. The conformational fluxionality and propensity to self-associate makes peptide more difficult to formulate. Various computational tools predict the aggregation propensities of polypeptide chains such as Zyggregator (44), PASTA 2.0 (45), etc. These can be used prior to formulating the peptide-based vaccine. THPdb is a database of FDA approved protein and peptide therapeutics, and provide detailed formulation strategies of peptide-based drug available in the market (46). Although little literature exist about formulating peptide drug product but strategies used in successful peptide drugs will be helpful in designing newer formulation techniques and constituent for peptide based vaccines.

#### REFERENCES

1. Neill JO'. Antimicrobial Resistance: Tackling a Crisis for the Health and Wealth of Nations The Review on Antimicrobial Resistance Chaired (2014).

We have predicted epitopes based on highly cited, published and accurate immune epitope prediction tools. Yet, these prediction algorithms have their own limitations. Thus, the antigen or epitope should be experimentally validated before suggesting it for medical purpose. Apart from this, the platform, VacTarBac, would require continuous revamp owing to the rapidly growing sequencing data particularly that of the novel and emerging pathogenic strains. A future work for this platform may include comparison of the pathogenic proteomes at the strain level. This may lead to identification of recently acquired proteins/peptides of the pathogen that may have rendered existing therapeutic strategies against the emergent strain ineffective. Moreover, the platform, VacTarBac, still lacks information of vaccine formulations that may be added in future to help in the actual development of the vaccines. Despite such limitations, we anticipate that the current study and the current data in the repository VacTarBac, will be helpful for researchers and will boost and hasten the vaccine designing against pathogenic bacteria considered in the study.

## AVAILABILITY AND UPDATE OF THE RESOURCE

VacTarBac is freely available at http://webs.iiitd.edu.in/raghava/ vactarbac/. We will update the platform at every 6 months depending upon the availability of virulence factor and essential genes information in other resources.

#### AUTHOR CONTRIBUTIONS

SU and GN downloaded and processed the data. GN prepared the pipeline and predicted epitopes. SU and GN analyzed results and prepared tables and figures. SU, GN, and GR wrote the manuscript. SU developed web interface. GR conceived the idea and coordinated the project.

### FUNDING

Authors are thankful to funding agencies J.C. Bose National Fellowship (DST), Department of Biotechnology (DBT) and Council of Scientific and Industrial research (CSIR) for fellowships and financial support.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2018.02280/full#supplementary-material

2. Peacock SJ, Limmathurotsakul D, Lubell Y, Koh GCKW, White LJ, Day NPJ, et al. Melioidosis vaccines: a systematic review and appraisal of the potential to exploit biodefense vaccines for public health purposes. PLoS Negl Trop Dis. (2012) 6:e1488. doi: 10.1371/journal.pntd.0001488


multi-epitope peptide vaccine against Staphylococcus aureus. Infect Genet Evol. (2017) 48:83–94. doi: 10.1016/j.meegid.2016.12.010


binding and peptide conservation scores. PLoS ONE (2014) 9:e115745. doi: 10.1371/journal.pone.0115745


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Nagpal, Usmani and Raghava. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Bacterial Vaccine Antigen Discovery in the Reverse Vaccinology 2.0 Era: Progress and Challenges

Fadil A. Bidmos\*, Sara Siris, Camilla A. Gladstone and Paul R. Langford

Department of Medicine, Imperial College London, London, United Kingdom

The ongoing, and very serious, threat from antimicrobial resistance necessitates the development and use of preventative measures, predominantly vaccination. Polysaccharide-based vaccines have provided a degree of success in limiting morbidity from disseminated bacterial infections, including those caused by the major human obligate pathogens, Neisseria meningitidis, and Streptococcus pneumoniae. Limitations of these polysaccharide vaccines, such as partial coverage and induced escape leading to persistence of disease, provide a compelling argument for the development of protein vaccines. In this review, we briefly chronicle approaches that have yielded licensed vaccines before highlighting reverse vaccinology 2.0 and its potential application in the discovery of novel bacterial protein vaccine candidates. Technical challenges and research gaps are also discussed.

#### Edited by:

Pedro A. Reche, Complutense University of Madrid, Spain

#### Reviewed by:

Sudheer Gupta, All India Institute of Medical Sciences Bhopal, India Pietro Speziale, Università degli Studi di Pavia, Italy Paola Massari, Tufts University School of Medicine, United States

> \*Correspondence: Fadil A. Bidmos f.bidmos@imperial.ac.uk

#### Specialty section:

This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology

> Received: 30 June 2018 Accepted: 17 September 2018 Published: 08 October 2018

#### Citation:

Bidmos FA, Siris S, Gladstone CA and Langford PR (2018) Bacterial Vaccine Antigen Discovery in the Reverse Vaccinology 2.0 Era: Progress and Challenges. Front. Immunol. 9:2315. doi: 10.3389/fimmu.2018.02315 Keywords: reverse vaccinology 2.0, human monoclonal antibodies, bacterial pathogens, vaccine candidate antigens, immunotherapy

## BACKGROUND: VACCINE DISCOVERY IN THE PRE-WHOLE GENOME SEQUENCING (WGS) ERA

Precedent to advancements in genomics, vaccines were developed based on Pasteur's rules of vaccinology. A 230-fold serial passage of a bovine bacillus in bile medium produced the live attenuated Bacillus Calmette-Guerin (BCG) vaccine against Mycobacterium tuberculosis (MTB) (1). Then, a trivalent blend of poliovirus inactivated in <0.5% formalin was used by Salk et al. as a safe and effective vaccine against poliovirus (2). The inability to culture some pathogens in vitro (owing to safety or lack of suitable culture conditions), extensive antigenic variability, and molecular mimicry limit the broad applicability of traditional culture-based techniques in the development of vaccines targeting other economically-important pathogens such as Mycobacterium leprae and Neisseria meningitidis.

The inadequacies of culture-based techniques caused a shift in focus to the use of subunit components as vaccine candidates. Identification of these subunit vaccine candidates was largely hypothesis-driven, targeted cellular components and were often well-known virulence factors: for example, the pertussis toxin and fimbriae in the acellular pertussis vaccine (3); and the meningococcal porin, PorA, in the epidemic-specific detergent-extracted outer membrane vesicle (OMV) vaccines of Chile, Norway, and New Zealand (4–6). With complex biosynthetic methods, bacterial capsular polysaccharides served as prime components of effective vaccines, used singly, for example, in the 23-valent pneumococcal polysaccharide vaccine, PPSV23 (7). In addition, these capsular antigens have been conjugated to carrier proteins in a dose-sensitive manner for enhanced immunogenicity, as found in the Haemophilus influenzae type b (Hib) (8), pneumococcal

**38**

(9), and meningococcal (10) conjugate vaccines. The limited capacity of these hypothesis-driven studies to focus on only a handful of candidates at a time was costly in time, labor and financial terms. This is especially because of the large pool from which prospective candidates for individual bacterial pathogens are screened, coupled with the low likelihood of targets satisfying key vaccine candidacy criteria (abundantly-expressed, surfaceexposed, functionally-immunogenic, and highly-conserved). Thus, alternative high throughput methods were sought to accelerate the pre-clinical vaccine development phase, especially in situations requiring rapid curtailment of disease transmission.

## WHOLE GENOMIC AND PROTEOMIC APPROACHES

# Reverse Vaccinology (RV)

The publication of the first complete bacterial genome sequence in 1995 [for H. influenzae (11)] heralded a revolution in approaches to vaccine development. By using genomic data and preset bioinformatic screens, putative surface-associated antigens of a pathogen were identified. The subsequent recombinant expression of these genes and immunization of animals with recombinant proteins, for the determination of active and passive levels of protection, provided data that substantiated or annulled the vaccine candidacy of selected antigens (12, 13). This "classical" RV approach led to the development of the multicomponent meningococcal serogroup B vaccine (4CMenB) (14). While 4CMenB has potential for crossserogroup protection (15), it has been argued that pan-genomic in silico analysis is more appropriate because of the high degree of intraspecific diversity exhibited by many bacterial pathogens (16). Using this pan-genomic approach, Maione et al. (17) identified four protective antigens from the analysis of an octa-genomic panel derived from the most prevalent diseasecausing Streptococcus agalactiae strains. The main attraction of RV lies in its applicability to any pathogen with WGS data and to which antibody-mediated immunity for protection against disease is crucial. Its use in the discovery of candidate antigens comprising vaccines targeting other bacterial pathogens, including the multidrug-resistant Acinetobacter baumanii, has been demonstrated (18–20). However, important non-classical surface-associated proteins may be missed due to the parameters of the bioinformatic screen(s).

Related to RV is the use of transcriptomics to identify novel vaccine antigens. For example, the comparative analysis of the meningococcal transcriptome in ex vivo human whole blood and in vitro nasopharyngeal colonization models revealed three antigens that were differentially regulated between invasive disease and asymptomatic colonization, and were thus subjects for further vaccine candidacy studies (21) However, this transcriptomics-based approach has not been widely employed.

#### Surfome and Secretome Analysis

Whole proteomic approaches, involving enzymatic processing of whole cells or extracellular exudates followed by liquid-chromatography mass spectrometry (LC-MS) or peptide fragment fingerprinting, also allow for high-throughput screening of the antigenic repertoire of a pathogen (22). The power of these proteomic methods in identifying rare protective antigens missed by the in silico screens of RV makes them appealing [as exemplified by the case of the cell wall-anchored antigen, SAN\_1485, of S. agalactiae (23)]. Converse to RV, proteolytic digestion is more suited toward Gram-positive bacteria, since Gram-negative bacteria are more susceptible to proteolysis-induced cell lysis.

# REVERSE VACCINOLOGY 2.0

The majority of currently-available bacterial vaccines provide protection by inducing pathogen-specific antibodies. Therefore, harnessing the antibody component of a potent human humoral response to disseminated infection is valuable for the identification of novel protective antigens. This approach, termed reverse vaccinology 2.0 (RV 2.0) (24, 25), relies on the isolation and recombinant expression of the variable regions of heavy (VH) and light (VL = κ or λ) chain genes of immunoglobulin (focus has centerd on IgG) using a variety of molecular tools. Enriched by the development of high-throughput technologies, the screening of large numbers of antibody-secreting cells (ASCs) is also advancing knowledge of host-pathogen interactive biology and auto-immunity (26, 27).

# Monoclonal Antibody (mAb) Generation From ASCs

The first, and perhaps most crucial, phase of RV 2.0 is the cloning of human monoclonal antibodies (mAbs) from ASCs. Previously, immortalization of these ASCs via myeloma fusions or Epstein Barr virus (EBV) transformation were valuable to mAb production (28, 29). Because these were culture-based methods, the survival of all B-cells was not guaranteed and the omission of ASCs expressing antibodies cognate to crucial antigens was probable. Other techniques such as phage-display technology (30) and proteomic mining (31, 32) circumvent the unique issues affecting ASC immortalization techniques by focusing on recombinant antibody expression. However, the small proportion of antigen-specific antibodies (estimated at 10– 15%) that are produced (33) because of the random pairing of VH and VL sequences make phage display and proteomic mining imprecise.

A more favored approach to mAb cloning is the single-cell sorting of ASCs into multi-well plates using flow cytometry, followed by the cloning of mAbs from each well (34, 35). To clone a high proportion of antigen-specific antibodies, this approach, termed expression cloning, requires blood sampling during the peak immune response and is thus more suited to short-lived plasmablasts (CD3−, CD14−, CD19+, CD20−, CD56−, CD27high, and CD38high), since higher circulating numbers of these are indicative of very recent history of infection (36). Notwithstanding, several studies have demonstrated its applicability to memory B-cells (37, 38). Further in vitro selection of antigen-specific plasmablasts or memory B-cells using eGFP-bound viral-like particles (39), labeled-antigen probes (40, 41) or in vivo antigen-specific plasmablast enrichment in irradiated SCID/beige chimera mice (42) enhance the pathogen-specific mAb output of the approach. Converse to phage display and proteomic mining technologies, expression cloning yields mAbs with natural, host-like VH+VL pairings. Further refinements to this elegant method include: substituting restriction endonuclease cloning with Gibson assembly to enhance cloning precision (43); assembly of both VH and VL fragments into a single expression vector (44); and succeeding cell sorting with paired-chain antibody repertoire sequencing, thereby encompassing all V gene families, including unique clones expressed at low frequencies (33, 45, 46).

#### Assessment of Recombinant mAb Function

Subsequent to cloning, the clinical relevance of mAbs is assessed in in vivo investigations of passive immunity (47) or in vitro functional assays: for example, the well-established viral neutralization (48) and serum bactericidal assays (49), some of which have provided data employed in vaccine licensure (50). The cognate antigens targeted by functional mAbs can subsequently be determined using protein array screens or classical immunoproteomic approaches.

## Application of RV 2.0 to Viral Vaccine Development

The power of RV 2.0 (see **Figure 1**) in the identification of viral vaccine candidates has been demonstrated in several studies focussing on human cytomegalovirus (HCMV), respiratory syncytial virus (RSV), HIV, influenza and dengue viruses (25, 51). Some of these candidate antigens, discovered using RV 2.0, include a novel pentameric glycoprotein complex, the gHgLpUL128L pentamer, which induces high neutralizing titres against HCMV in mice (52) and the F protein of RSV stabilized to the prefusion conformation (53, 54). Accruing data from phase 1b/2a clinical trials show that a mAb (MEDI8897) reactive with prefusion F epitopes is effective when used prophylactically in preterm infants (55). Like MEDI8897, mAb MHAA4549A, cloned from a healthy vaccinee and which targets and neutralizes all known influenza A strains (56), demonstrated significant antiviral activity in a phase 2 human influenza A virus challenge (57). Thus, these studies have signified the use of RV 2.0 in producing broadly-neutralizing mAbs for post-infection prophylaxis in addition to identifying functionally-immunogenic vaccine candidates.

# POTENTIAL APPLICATION TO ANTIBACTERIAL VACCINE DISCOVERY

Judging by the progress made with the development of novel and effective viral immunotherapies, RV 2.0 is showing promise and is equally applicable to bacterial vaccinology. RV 2.0 was employed by Lu et al. (58) to identify functional anti-Staphylococcus aureus mAbs induced during bacteraemia. A total of ten mAbs were produced, four of which enhanced opsonophagocytosis of Wood46, a S. aureus reference strain. While three of the four functional mAbs targeted S. aureus antigens with known identities, the fourth mAb reacted with a novel antigen. Recently, Zimmermann et al. (59) also demonstrated that functional anti-MTB surface antigen antibodies can be cloned from patient-derived plasmablasts of reactivated memory B-cell origins, providing further evidence for a role for antibodies in the modulation of potent immune responses toward MTB. Taken together with other studies investigating the importance of antibody-mediated neutralization of intracellular pathogens (60), a role for the vaccine-induced generation of antibodies against pathogens such as MTB and Chlamydia trachomatis, using antigens derived with RV 2.0 is, thus, evidenced. Similarly, Bidmos et al. (61) and Blum et al. (45) cloned functional antibodies from sufferers of meningococcal and Lyme disease, respectively; thus, underscoring the utility of the approach for identifying novel targets in different classes of bacteria. Continued use of RV 2.0 in bacterial vaccine discovery is, therefore, encouraged following the surmounting of technical challenges and filling of research gaps. In the following sections of this mini-review, emphasis will be placed on human mAb cloning and serological correlates of protection, since other related technical aspects of RV 2.0 such as recombinant protein expression, high-throughput sequencing of bacterial genomes and antibody repertoires, antigen identity determination and structure-based antigen design have been reviewed elsewhere (62–66).

# Pathogen-Specific mAb Output

To identify novel antigens using the expression cloning method with precision, plasmablasts from patients convalescing from bacterial disease are required. Fundamental to the application of the expression cloning approach, therefore, is the determination of the magnitude and peak duration of the plasmablast response in these patients. The information on the duration of peak plasmablast circulation instructs optimum sampling time, which in turn impacts on the precision of pathogen-specific mAb generation. Studies assessing this magnitude of circulating plasmablast following bacterial infection have reported similar durations of peak response to those reported for primary or secondary viral infections [6–7 days for primary infections and ∼10 days post-infection for secondary infections; reviewed in (36)]. Recently, Band et al. (67) reported a significant induction of differentiating (Ki-67+) plasmablasts in patients of nosocomial bacterial infections compared to healthy controls. This induction peaked between days 8 and 16 post-culture positivity in A. baumanii-infected patients reaching levels as high as 21% of the total lymphocyte population. Perhaps unsurprisingly, it was also observed that this induction was markedly different in individuals, reflecting differences in immunocompetence, as peak plasmablast levels ranged from: 1 to 21% among A. baumaniiinfected patients; and 5–40% in Escherichia coli-infected patients. Consistent with the findings of Band et al. (67), a plasmablast response presented to: MTB infection in 38% of a patient cohort with levels ranging from 1 to 4% in those with strong serum IgG responses (59); S. aureus bacteraemia with mean levels of ∼3.2% (1–7% range) (58); and up to 4% of circulating CD19+ cells in untreated sufferers of Lyme borreliosis (45).

The implications of these data for the precision of the pathogen-specific mAb output are considerable. Firstly, there

is a paucity of information in published literature on how many plasmablasts are pathogen-specific (bacterial) following inductions in patients, owing to the unavailability of suitable molecular probes that will enhance the Fluorescent Activated Cell Sorting (FACS) gating strategy. In the absence of such data, strategies such as the Ig-capture based technique described by Pinder et al. (40) could be employed to enrich for specific plasmablasts. It is more likely that complex antigens (whole bacterial cells, OMV, or outer membrane preparations) would be more beneficial in these strategies, when adapted, compared to single-antigen probes in order to obtain a plasmablast population targeting a wider antigen pool. It is noteworthy that in cases where patients are subjected to immediate antibiotic therapy on hospital admission because of rapid progression of disease (e.g., septicaemia and meningitis), clinical isolates may be unobtainable, making the design of plasmablast enrichment probes difficult (also, the reason behind the unsuitability of memory B-cells in the absence of enrichment strategies). While clinical isolates from other disease sufferers could be utilized, they are non-ideal because rare mAb epitopes specific for the infecting strain will be missed. Secondly, considering differences in the magnitude of the plasmablast response and for logistic reasons (for example, restrictions on blood sample volume in pediatric cases), pooling of patient samples may be required for the generation of a highly-diverse plasmablast pool, targeting several antigens, (and their variants) of the same pathogen. This is especially necessary for pathogens in which certain antigens are immunodominant such as PorA of N. meningitidis, which may mask immunity to rare but equally protective antigens.

If a total plasmablast sort approach is warranted (i.e., inclusive of non-pathogen specific plasmablasts), an attractive option is the rational selection of over-represented VH+VL combinations for mAb cloning based on the assumption that overrepresentation of V families, specifically among plasmablasts, is an indicator of preferential usage in response to a pathogen. Adequate depth of sequencing is, however, required in order to avoid non-inclusion of clonal V families expressed at lower frequencies (58). In silico analysis should also include antibodies with similar complementarity-determining region H3 loops [key to antibody conformation and affinity (68)] in addition to the exploitation of de-noising algorithms, which would minimize the presence of errors introduced by sequencing (69).

#### In vitro Serological Correlates of Protection

Assessment of pathogen-specific mAb function is performed via standardized assays. Given the differences in biology of bacterial pathogens, these assays are specifically tailored to reflect mode of clearance of the pathogen from systemic circulation. Antibody-driven, complement-dependent bactericidal activity is measured in the standardized serum bactericidal assays designed for the meningococcus (49) while phagocytosis by neutrophils and macrophages enhanced by opsonic antibodies is assessed in the opsonophagocytic assays used in pneumococcal vaccine development (70). Similar assays have been employed in the assessment of functional immunity against Campylobacter jejuni, Group B Streptococcus, typhoidal and non-typhoidal Salmonella and Neisseria gonorrhoeae (71–75). While standardization of some of these pathogen-specific assays is pending, de novo design of in vitro correlates assessing functional activity of antibodies is not as straightforward for other pathogens, such as Bordetella pertussis (76). For facultative intracellular pathogens such as Francisella tularensis and MTB, for example, current correlate strategies in development are not suitable for assessments of mAb function as they involve peripheral blood lymphocytes only (77, 78). An added benefit of in vitro assessments of cloned mAb function, as a component of RV 2.0, is the needlessness of or significant reduction in usage of experimental animals. Efforts to develop and standardize in vitro correlates to assess mAb function are, hereby, merited.

Beyond bactericidal or opsonic functions, mAbs exhibit a variety of functions, including the modulation of cellular immune responses [extensively reviewed in Cooper (79), Amanna and Slifka (80)], which require assessment. These functions also include toxin neutralization (useful in pertussis

#### REFERENCES


and diphtheritic infections, for example) (81, 82) and increase in cellular cytotoxicity affecting intracellular pathogens such as C. trachomatis (83). Hence, non-bactericidal or non-opsonic mAbs, if exhibitive of these other functions, can still be utilized in other immunotherapeutic avenues.

### CONCLUSION

With the increase in multidrug resistance among bacterial pathogens, the development of further effective preventive measures will be of significant benefit to public health. RV 2.0, a conceptually-advanced approach with the advantages of employing the natural host response (patient VH-VL combinations), relative speed, and reduction in animal use, has the potential to be a powerful tool in bacterial vaccine development. However, use of RV 2.0 is dependent on optimization of the technical aspects, and there are excellent prospects that this is achievable.

# AUTHOR CONTRIBUTIONS

FB and PL conceptualization. FB, SS, and CG writing—original draft. FB and PL writing—reviewing and editing.

#### ACKNOWLEDGMENTS

We thank Dr. Mubarak Bidmos (Qatar University) and Dr. Victoria Wright (Imperial College London) for their critical but constructive review of the manuscript. PL and FB have received research grants from John and Michelle Bresnahan via MeningitisNow, and the Imperial College Confidencein-Concept Award; and CG is funded by a BBSRC grant BB/R505742/1 from the National Productivity Investment Fund (NPIF), for work related to this manuscript.


immunological, functional and structural characterization of the antigens. Vaccine (2012) 30:B87–97. doi: 10.1016/j.vaccine.2012.01.033


heavy and light chain repertoire. Nat Biotechnol. (2013) 31:166–9. doi: 10.1038/nbt.2492


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Bidmos, Siris, Gladstone and Langford. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Review on T Cell Epitopes Identified Using Prediction and Cell-Mediated Immune Models for Mycobacterium tuberculosis and Bordetella pertussis

Yuan Tian<sup>1</sup> , Ricardo da Silva Antunes <sup>1</sup> , John Sidney <sup>1</sup> , Cecilia S. Lindestam Arlehamn<sup>1</sup> , Alba Grifoni <sup>1</sup> , Sandeep Kumar Dhanda<sup>1</sup> , Sinu Paul <sup>1</sup> , Bjoern Peters 1,2, Daniela Weiskopf <sup>1</sup> and Alessandro Sette1,2 \*

<sup>1</sup> Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, United States, <sup>2</sup> Department of Medicine, University of California San Diego, La Jolla, CA, United States

#### Edited by:

Pedro A. Reche, Complutense University of Madrid, Spain

#### Reviewed by:

Pandjassarame Kangueane, Biomedical Informatics (P) Ltd., India Etienne Caron, Université de Montréal, Canada

> \*Correspondence: Alessandro Sette alex@lji.org

#### Specialty section:

This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology

> Received: 23 August 2018 Accepted: 12 November 2018 Published: 29 November 2018

#### Citation:

Tian Y, da Silva Antunes R, Sidney J, Lindestam Arlehamn CS, Grifoni A, Dhanda SK, Paul S, Peters B, Weiskopf D and Sette A (2018) A Review on T Cell Epitopes Identified Using Prediction and Cell-Mediated Immune Models for Mycobacterium tuberculosis and Bordetella pertussis. Front. Immunol. 9:2778. doi: 10.3389/fimmu.2018.02778 In the present review, we summarize work from our as well as other groups related to the characterization of bacterial T cell epitopes, with a specific focus on two important pathogens, namely, Mycobacterium tuberculosis (Mtb), the bacterium that causes tuberculosis (TB), and Bordetella pertussis (BP), the bacterium that causes whooping cough. Both bacteria and their associated diseases are of large societal significance. Although vaccines exist for both pathogens, their efficacy is incomplete. It is widely thought that defects and/or alteration in T cell compartments are associated with limited vaccine effectiveness. As discussed below, a full genome-wide map was performed in the case of Mtb. For BP, our focus has thus far been on the antigens contained in the acellular vaccine; a full genome-wide screen is in the planning stage. Nevertheless, the sum-total of the results in the two different bacterial systems allows us to exemplify approaches and techniques that we believe are generally applicable to the mapping and characterization of human immune responses to bacterial pathogens. Finally, we add, as a disclaimer, that this review by design is focused on the work produced by our laboratory as an illustration of approaches to the study of T cell responses to Mtb and BP, and is not meant to be comprehensive, nor to detract from the excellent work performed by many other groups.

Keywords: epitope, T cell, HLA, bacteria, Mycobacterium tuberculosis, Bordetella pertussis

#### EPITOPE IDENTIFICATION FOR MYCOBACTERIA

Our group has a long-standing interest in tuberculosis (TB). Early studies characterized a DR17-restricted epitope (1) and the effect of different T cell subsets on intracellular infection (2, 3). More recent studies characterized CD8 T cells (4). Subsequent studies described the first, to the best of our knowledge, true genome-wide screen for CD4 T cell epitopes (5). Subsequent studies report that CD4 T cells can recognize and target epitopes derived from mycobacterial ribosomal proteins and provide protective functions (6, 7).

The epitopes were further characterized by studies on the recognition of Mycobacterium tuberculosis (Mtb)-derived epitopes in a cohort with latent Mtb infection (8) and in diverse populations from five continents (9) and intragenus conservation (10). We presented an analysis of the complexity of Mtb-specific epitopes in Mtb infected South Africans (11) and provided evidence that bi-allelic RORC mutations are detrimental to host immunity against Mtb (12). We further showed that transcriptomic analysis revealed novel immune signatures associated with TB (13–15) and the differentiation and function of T cells are influenced by the availability antigens (16).

In particular, previous studies (5) demonstrated the feasibility of utilizing genome-wide screen to identify human leukocyte antigen (HLA) class II epitopes derived from Mtb, based on combined bioinformatic predictions and high throughput ex vivo ELISPOT assays. Feasibility of the approach had previously been demonstrated for viral targets, however tackling a bacterial genome expressing over 4,000 open reading frames (ORFs) had not been attempted. Genome-wide screens have also been conducted to identify CD8 T cell Mtb epitopes (17–20). Notably, immunodominant CD8 T cell epitopes are enriched in cell wall and secreted proteins (18, 19). Future studies will utilize the same approach to focus on Bordetella pertussis (BP), which causes whooping cough.

#### EPITOPE IDENTIFICATION FOR OTHER BACTERIA ESPECIALLY BP

While our initial focus was mostly directed toward the study of Mtb, other microbes have also been studied. Maybeno et al. (21) described Salmonella epitopes, and Cannella et al. reported studies in Brucella (22). In the context of BP, we showed that initial whole-cell pertussis (wP) vaccination results in long-term Th1/Th17 polarization even with subsequent acellular boosters (23, 24).

It is hypothesized that the recent reemergence of BP infection is linked to the adoption of acellular pertussis (aP) vaccines based on specific BP antigens (FHA, Fim2/3, PRN, and PT). It is possible that the previous whole cell inactivated (wP) vaccine elicited a broader reactivity and targeted additional antigens, some of which might be of particular relevance and linked to superior vaccine performance. The extent and targets of T cell immunity in the context of natural infection and clinical disease are likewise not yet defined in a comprehensive fashion.

These considerations argue for performing broad epitope identification and characterization studies in BP as well. In the following sections we describe the techniques we have developed for the purpose of epitope identification and characterization, and then we describe specific applications to the TB and BP systems.

#### MEASURING HLA EPITOPE AFFINITY

Activation of alpha/beta classical T cells in general requires recognition of a specific peptide epitope, bound to specific major histocompatibility complex (MHC) molecules, a phenomenon classically named "HLA-restriction." The methods used to establish restriction are described in a separate section below. Here we focus on the fact that, since HLA binding is a prerequisite for a peptide being actually recognized as an epitope, measuring its HLA binding affinity is a powerful method to select epitope candidates. The relevant quantitative binding thresholds have been defined for both class I (25) and class II (26–28).

Our group has been a pioneer in the development of techniques to measure the binding of peptides to MHC molecules, termed HLA molecules in humans. Over the course of the last 30 years we have measured almost half a million MHC peptide binding constants for over 100,000 peptide/MHC combinations, and our group contributed a chapter describing our assay platform in detail to the laboratory compendium Current Protocols in Immunology (29).

The results obtained with this assay have been published in several 100 different peer reviewed journal articles. Our current assay panel allows measurements of binding to over 40 different HLA class I molecules and 35 HLA class II molecules. MHC binding is evaluated using a classical competition assay where peptides of interest competes with radiolabeled probe peptide for MHC binding (**Figure 1**). Plenty of supply of purified MHC molecules as well as labeled and unlabeled peptides is necessary for the establishment and usage of an MHC-peptide binding assay. Thus, our immunochemistry group has established an ongoing operation where cell lines expressing different alleles are expanded to allow for large-scale HLA purification by affinity chromatography.

Each cell line is rigorously characterized by HLA typing to ensure identity of the HLA allelic variant, and expression is monitored by flow cytometry. Standardized MHC purification protocols using affinity chromatography are in place. Furthermore, the purity and quantity of purified products are assessed by gel filtration and bicinchoninic acid assay (BCA) assays on each preparation. High affinity probes specific for each HLA have been identified for each allelic variant and used in a classic quantitative receptor ligand inhibition assay where IC50 values are used to approximate true K<sup>D</sup> values. Bound and unbound radiolabeled peptides are separated following incubation for 2 days, and their relative abundance is quantified. The specificity of each of the assays was rigorously determined by demonstrating high affinity binding of known independently defined eluted peptides or T cell epitopes restricted by the allelic variant in question.

# HLA POLYMORPHISM AND BINDING PROMISCUITY

Thousands of different class I and class II HLA types exist in human populations. Most of the polymorphic residues line specific pockets in the HLA, which are involved in the peptide-HLA binding interaction. Accordingly, different HLA molecules have, in general, different binding specificities or "motifs" (30, 31). In fact, the capacity of general populations to bind a plethora of different sequences is the evolutionary force driving HLA polymorphism, to oppose the potential of pathogens to escape immune recognition by avoiding presentation of their peptides.

This extensive polymorphism poses a fundamental challenge for epitope identification and validation. A comprehensive

effort targeting thousands of variants would be impractical and unfeasible. A simple pragmatic solution would be to target HLA alleles that are present with the highest frequencies. However, this approach has to factor the reality that the frequencies of different HLA can vary dramatically across different ethnicities. Thus, comprehensive coverage of ethnically diverse populations requires careful analysis.

A phenomenon that counterbalances these difficulties and was originally recognized as a means to simplify the task is linked to HLA promiscuity. Indeed, it was already observed by the early 1990s that the same peptide could bind and be recognized by multiple HLA types (32–37), denoted as epitope "promiscuity." Subsequent studies demonstrated that both classes I and II HLAs can be grouped into supertypes, defined on the basis of overlapping peptide binding repertoires (38–46). This is specifically applicable to HLA class II of the DR loci (33, 47, 48), and also DP (27, 49–51) and DQ (26, 51, 52).

A subsequent study used binding data of HLA DR, DQ, and DP to quantitatively assess how much promiscuity existed in HLA class II molecules. We found that these HLAs could be divided into seven major supertypes and, rather surprisingly, the repertoire overlap of class II supertypes was five to ten-fold higher than that of class I supertypes (51). These results indicated that if promiscuous binding would translate into promiscuous T cell recognition, then promiscuous epitopes might constitute a significant proportion of the total response.

#### PROMISCUOUS HLA CLASS II EPITOPES

In several independent studies and antigen systems we tested the hypotheses that promiscuous epitopes account for a significant fraction of the specific immune response and that promiscuous epitopes can be identified by bioinformatic approaches. The notion that HLA class II promiscuous epitopes correspond to dominant epitopes accounting for a large fraction of the antigen specific response was initially evaluated with a set of overlapping 15-mer peptides spanning the Erythropoietin (EPO) protein (53). A large volume of subsequent data has demonstrated that these findings are generalizable to other systems, including proteins derived from infectious agents and allergens.

One series of experiments analyzed in detail the reactivity of allergic donors to the common allergen timothy grass (Phleum pretense), a mediator of hay fever (54). Over 40 different epitope regions were recognized, but upon closer inspection it was determined that only nine of them were required to cover 51% of the total response. These dominant regions were shown to correspond to promiscuously recognized epitopes, and shown to be predicted by bioinformatics algorithms targeting the most common DR, DP, and DQ variants.

This result was not limited to the timothy grass system. Indeed, similar results were obtained in several different allergen systems, including the Blatella germanica (Bla g) antigens associated with cockroach allergies (55). And, in a broader study, a panel of 133 allergens derived from 28 different sources, including fungi, trees, grasses, weeds, and indoor allergens, was surveyed utilizing predicted promiscuous HLA class II-binding peptides and ELISPOT assays with PBMC from allergic donors, resulting in the identification of 257 T cell epitopes (56).

In conclusion, a number of studies have shown that many peptides with highly promiscuous binding capacity are frequently recognized by immune individuals, and that promiscuous recognition in the context of multiple HLA class II molecules may be a mechanism significantly contributing to epitope immunodominance (33, 53, 57–60). This might be related to the fact that promiscuous epitopes tend to bind HLA with high affinity, or simply that binding to multiple HLAs gives an epitope multiple contexts where it can be associated with immunogenicity. Several studies by our group have demonstrated that bioinformatic predictions directed toward selection of the most promiscuous binding peptides can identify a significant fraction of the pathogen or allergen specific response (54–56, 61).

## DEVELOPMENT OF TOOLS TO PREDICT PROMISCUOUS EPITOPES

In the next series of investigations, we sought to derive and optimize a universal prediction schema based on data where sets of 15-mers overlapping by 10 amino acids representing the entire sequences of over 30 different allergens and bacterial proteins had been tested for T cell reactivity in human patients (62) (**Figure 2**). We specifically wanted to answer the question of how many predictions needed to be combined for maximal efficacy in real human patient populations, and furthermore, we wanted to determine which specific alleles should be included in such an optimal prediction tool. We defined optimal prediction parameters, and the resulting strategy was validated using a blind set of immunogenicity data that had not been utilized to derive the prediction scheme. We found that a 20th percentile IEDB consensus rank, combining predictions for a particular set of seven HLA class II, can predict about half of the total response. This approach can therefore be utilized as a universal prediction scheme, as it has been validated in a broad set of antigenic systems and in genetically diverse human patient populations.

In the study referenced above, the use of actual binding data instead of predictions did not improve the efficacy of the scheme, nor did performing allele specific predictions based on the particular HLA expressed in each individual patient. This indicated that, as expected the limited efficacy of the prediction was not due to the limitations of the algorithms, but rather that HLA binding predictions are associated with a rather high false positive rate, which fits well with the understanding that HLA binding is necessary but not sufficient for immunogenicity. Thus, other factors related to T cell repertoire and antigens processing also play a prominent role.

To address this issue, we used matched sets of dominant epitopes and negative peptides curated from the literature to train neural networks (63). The resulting "immunogenicity score" was further validated on 57 additional datasets (**Figure 2**). In all, data derived from more than 1,500 human donors and 2,000 peptides was considered in this training and validation effort. The results demonstrated that this agnostic "immunogenicity score" was effective in predicting dominant epitopes and human immunogenicity data. Surprisingly, the combination of immunogenicity score and HLA promiscuous predictions was associated with limited overall predictive improvement, suggesting, as previously noted, that antigen processing/T cell repertoire selection and HLA binding capacity might be influenced by coordinate evolution (64). Taken together, these results highlight that the bioinformatic tools necessary to identify promiscuous epitopes are available and have been validated in several independent studies, in different antigen system and different ethnicities.

## DETERMINING HLA RESTRICTION OF T CELL RESPONSES

Determination of HLA restriction is a key element of epitope characterization, and precise knowledge of HLA restriction is also necessary to derive tetrameric staining reagents. HLA restriction was originally determined by the use of antibodies specific for different HLAs, in conjunction with antigen presentation assays; the essence of this strategy is to identify an antibody that blocks presentation of a given peptide to a given defined source of responding T cells (60). While straightforward in principle, this assay is often challenging, since antibodies with suitable specificity and selectivity are often not available, and T cells might promiscuously recognize the same peptide presented by multiple HLAs, yielding results difficult to interpret. Furthermore, the antibody and epitope concentration in the assay must be carefully controlled as excessive amounts of antibodies will inactivate the antigen-presenting cells (APCs) non-specifically, and high concentration of epitope will lead to self-presentation from the responding T cells.

An alternative is represented by determining whether the peptide can be presented by panels of partially HLA matched/mismatched cell lines/PBMCs. This is a powerful and simple approach, but can be limited by availability of suitable cell lines, and complicated again by promiscuous presentation, and the fact certain HLA combinations are in tight linkage disequilibrium. To overcome this limitation, we described an approach to define HLA class II restriction covering DP, DQ, and DR allelic variants that are most commonly represented in the general population (65). We specifically selected 46 DP, DQ, and DR HLAs which were projected to cover ∼90% of these loci and constitute >66% of all the genes at each of these loci. Utilizing HLA data of actual populations from different geographical locations in the USA and Africa, we verified that these projections were accurate. A panel of single HLA transfected cell lines was developed and validated in a series of experiments, involving assessing HLA expression, identity, peptide binding, and epitope presentation (65).

The utility of this panel was further demonstrated by a quantitative study of HLA restriction and antigen-specific responses in a cohort of Mtb-immune individuals (11). Using APCs transfected with the panel of HLA class II molecules described above, HLA restrictions for nearly 300 different epitope/donor combinations were mapped. These results were the first large scale estimate of epitope complexity of CD4 T cell responses in a patient population and a microbial human pathogen, and indicated that the majority of epitopes were associated with promiscuous HLA restriction, further demonstrating the feasibility of the approach developed.

established. In addition, an artificial neural network model using sets of dominant epitopes and negative peptides has been built to generate "immunogenicity score" that predicts CD4 T cell immunogenicity in the absence of HLA data.

As an alternative complementary approach, we developed a method called Restrictor Analysis Tool for Epitopes (RATE) that can infer HLA restriction using CD4 T cell response data from HLA-typed individulas (66), The method, available online in the IEDB analysis resource, starts by inspecting, one epitope at a time, the HLA types present in individuals responding to each epitope. Then for each of these HLAs, calculates those enriched in frequency, comparing responders and non-responders to the specific epitope. The automated calculation yields a table of likely restrictions, Odd Ratios (ORs) and associated p-values. The method was validated by various experimental approaches, which derived strategies and thresholds for optimal performance (66, 67). The method is most effective for monogamous restrictions and by definition less able to detect promiscuous restrictions and HLA frequency variations due to genetic linkage.

### ANALYSIS OF EPITOPE CONSERVATION

Several lines of evidence indicate that sequence variability and conservation can have a dramatic effect on the shaping and effectiveness of immune responses in general and T cell responses in particular. This influence is dynamic and can have both positive and negative effects.

One broad series of effects relates to immunological pressure exerted by antimicrobial responses, of which probably amongst the most well noted cases are the widespread mutation of T cell epitopes observed in HIV and HCV (68, 69). It has to be underlined that pathogen escape by mutation is most effective for microbes with small genomes, or with responses of limited breadth, since simultaneous escape of a responses directed against large genomes and a large number of antigens/epitopes is by definition unlikely. It has indeed been proposed that the switch from wP to aP generated a response that is less diverse and created an opportunity for BP to escape vaccine responses (70– 72). It will be important to compare mutation rates of epitopes and non-epitopes, in wP, and aP antigens, to potentially either refute or support this hypothesis.

We have also noted that sequence variation or conservation can have a profound influence in shaping T cell responses by a different set of mechanisms. In general, we have noted that when individuals are exposed to different strains of the same species, or different species of phylogenetic related microbes, the immune response tends to focus on conserved epitopes. This is because repeated exposure of different but crossreactive microbes ends up "teasing out" T cell recognizing conserved/homologous epitopes. Specifically, this has been observed in the case of dengue virus (DENV), where repeated exposure to different serotypes focuses the response on conserved epitopes (73, 74). The phenomenon is not limited to viruses, and is also observed in the case of grass pollens and ragweed pollen specific allergic responses (75, 76). In the case of bacterial genomes, we have shown that intragenus conservation within different mycobacteria species shapes T cell responses (10), and epitopes shared between mycobacteria tubercoloid species and other non-pathogenic mycobacteria are preferentially recognized, indicating that differential reactivity may at least partially accounted for by environmental factors. It is currently unknown whether BP antigens and epitopes that share significant homology to other microbes encountered as a result of environmental exposure might be preferentially recognized.

In a separate study, we have recently shown that the sequence similarity between antigens and human microbiome can either dampen or increase T cell epitope immunogenicity (77). In this study, we systematically evaluated the homology of human microbiome sequences and sets of control peptides and T cell epitopes of various autoantigens, allergens, and infectious pathogens. We expected that human adaptive immune system would be largely tolerant toward sequences identical or highly similar to those found in the human microbiome. We therefore predicted that these sequences would be more frequently found in the non-epitope category, as compared to the dominant epitope category. In many instances of epitope categories this was indeed the case, and reactivity was dampened (tolerogenic effect) suggesting that exposure to microbiome-derived sequence homologs might lead to T cell tolerization. However, in other cases, such as for example mycobacteria, and consistent with the studies mentioned above, the reactivity was increased (inflammatory effect) when the epitope sequence was conserved in the microbiome. It is currently unknown whether BP antigens and epitopes that share significant homology to other microbes contained in the human microbiome might be preferentially recognized or conversely tolerized.

### VALIDATION AND CHARACTERIZATION OF T CELL EPITOPES

T cell epitopes can be characterized by various techniques such as mass spectrometry (MS), ELISPOT, intracellular cytokine staining (ICS), activation induced marker (AIM) assay, antigen-reactive T cell enrichment (ARTE) assay, tetramer staining, multidimensional fluorescence-based flow cytometry, and cytometry by time-of-flight (CyTOF), RNA-Sequencing (RNA-seq), and T cell receptor (TCR) sequencing (**Figure 3**). These techniques can also be combined. For example, performing TCR analysis of tetramer positive cells, or AIM/ARTE assays combined with ICS staining for particular cytokines.

#### In vitro vs. ex vivo Characterization

T cell responses can be characterized directly ex vivo or after in vitro re-stimulation in case that epitope-specific T cells are rare. Though in vitro re-stimulation allows for greater sensitivity, it may alter the phenotype of responding T cells; thus, the characterization of re-stimulated T cells require specific adjustments to the experimental strategy. Certain epitope characteristics are not altered by in vitro expansion, such as which particular TCR genes are expressed, HLA restriction, sequence conservation of the epitope recognized, or the pattern of cytokine polarization. On the other hand, memory and activation markers and other phenotypic markers usually detected by flow cytometry analysis are altered by the activation caused by cell culture. We have found that it is often possible to assess responses directly ex vivo, by using pools of different epitopes or peptides, so that the overall frequency of responding cells is enhanced. This approach is particularly effective when combined with the AIM assay described below, and particularly key to analyze samples with small volume. More specifically, our group has developed a megapool approach, which consists of large numbers of peptides (78). These "megapools" have been utilized in several systems such as allergies (79, 80), tuberculosis (11), tetanus and pertussis (24, 81), and DENV for both CD8 and CD4 T cell epitopes (82–84).

# Mass Spectrometry

MS-based approach has been utilized to identify and characterize T cell epitopes presented by MHC molecules since the 1990s (85, 86). Briefly, MHC molecules are purified from cell lysates and their associated peptides isolated and analyzed by MS. Although powerful and widely used, a full discussion of this approach is beyond the scope of the current review. Thus, we refer readers to (87) for more details on MS-based immunopeptidomics.

### ELISPOT and ICS Assays

In our experience, ELISPOT assay is the most sensitive and high throughput–friendly method to measure T cell cytokine production. Our group has extensive experience using this method and we routinely utilize ELISPOT as a primary screen. In contrast, ICS assay is better at evaluating T cell phenotype and polyfunctionality. Both ELISPOT and ICS assays can characterize epitope pools even with small amounts of PBMCs. In our hands, we can characterize T cell responses with as little as 1 ml of peripheral blood.

#### AIM and ARTE Assays

In addition to ICS, the selection of ex vivo activated antigenspecific CD4 T cell populations can also be performed by measuring different activation molecules using the AIM assay [e.g., OX40 and CD25, which was co-developed by our group (81, 88)] or ARTE assay. The ARTE approach utilizes magnetic-enrichment of T cells that upregulate CD154 (CD40L) to assess human antigen-specific CD4 T cells ex vivo (89). ARTE has been applied to identify antigen-specific T cells for several infections and could select rare antigen-specific T cells after short stimulation period without the need for ICS (90).

#### Tetramer Staining

This approach identifies antigen-specific T cells using tetramer staining reagents (91, 92). Furthermore, tetramer enrichment technique can be utilized if the frequency of antigen-specific T cells is low (93). However, specific reagents for each unqiue HLA:epitope combination of intertest must be produced in order to use this approach. Thus, it is usually used for in-depth characterization of T cells of selected epitope-specificities and HLA restrictions.

#### Multidimensional Flow Cytometry and CyTOF

These are powerful techniques to characterize cell samples by evaluating the expression of many different markers associated with cell lineages (94), activation and functional activities (95), memory cell subtypes and chemokine receptor expression (96). Multicolor fluorescence-based flow cytometry is in general more user and equipment friendly, as antibodies are more generally available. In addition, this technique allows recovery of the cells by cell sorting, and thus is readily coupled with transcriptomic analysis. In contrast to flow cytometry, CyTOF can detect, discriminate, and quantify antibodies that are conjugated to various heavy-metal isotopes with high accuracy (97). This avoids spectral overlap between fluorophores and

allows measuring more cellular parameters simultaneously. High-dimensional phenotypic data can be visualized using algorithms such as visualization of stochastic neighbor embed (viSNE) and spanning-tree progression analysis of densitynormalized events (SPADE) (98). We have utilized CyTOF and viSNE to visualize and characterize the heterogeneity of human CD4 effector memory T re-expressing CD45RA (Temra) cells (95).

#### Transcriptomic Profiling

Epitope-specific T cells can be further characterized in-depth by transcriptomic profiling that uses deep-sequencing technologies, including bulk RNA-seq or single-cell RNA-seq (scRNA-seq). By comparison with bulk RNA-seq, scRNA-seq is a more powerful tool to address cellular heterogeneity and to identify novel subpopulations in a "hypothesis free" manner, since individual cells within the "same" population may differ dramatically (99–101). Gene expression profiling using these methodologies is routinely utilized in our laboratory. Examples include the definition of signatures predictive of latent tuberculosis infection (13), the characterization of CD4 cytotoxic memory T cells (95, 102, 103) or CD4 differential responses to BP primary vaccination after aP boost vaccination (23).

#### TCR Sequencing

In addition to functional and phenotypic characterization of epitope-specific T cell responses, one can further define their TCR repertoires by TCR sequencing (104). TCRs dictate the antigen specificity of T cells through the interactions with peptide and major histocompatibility complexes. By analyzing epitope-associated TCR repertoires, it is possible to investigate common features of TCRs that are specific for a particular epitope and identify determinants that may predict specificity (105, 106). Thus, this strategy will enable researches to systematically integrate epitopes with their specific TCR sequences as well as their associated T cell responses.

# GENOME-WIDE SCREEN OF Mtb HLA CLASS II EPITOPES

As a way to illustrate how the various techniques can be utilized to tackle even large complex microbial genomes, we briefly summarize the results of an Mtb genome-wide screen (5). Our general strategy has been to first study in detail a limited number of well characterized dominant antigens, to investigate the mechanisms associated with immunodominance, and provide a point of reference for the genome-wide screen (60). While several dominant antigens were known and well described, a truly systematic and unbiased screen had not been attempted before, due to the complexity of the genome and the large number of ORFs. Next, as summarized below, we performed an unbiased genome-wide screen, and based on the results we selected the dominant epitopes and antigens (9). These were then utilized to characterize the epitope and the phenotype of the associated T cells (8, 10, 11, 11, 14), and also to develop a Mtb epitope megapool that was utilized in numerous studies and has proven a valuable tool to analyzed responses in a number of different settings (8, 11, 13, 105).

To perform a genome-wide screen of Mtb, we selected all full genome sequences available at that point, and utilized the approaches described above, to define a library of about 20,000 predicted promiscuous binders (5). These were synthetized, and screened first as pools and then in deconvolution experiments to identify the actual epitopes responsible for T cell activation. The library also included over 1,500 different variants not totally conserved amongst the genomes analyzed. Here it could be noted that the capacity to readily test for sequence variants is an advantage of our approach.

We have identified hundreds of different epitopes; the response was thus remarkably broad, and each individual recognized tens of different epitopes, the dominant epitopes, and antigens varied appreciably from one individual to the next. The overwhelming majority of the response was CD4 restricted, which was not unexpected sine the epitopes were identified based on their predicted ability to bind to HLA class II molecules. When the epitopes were mapped back to their antigen of origin using the H37Rv reference genome, a set of 82 antigens were identified as dominant, in that they accounted for about 80% of the total response. The majority of these antigens were not previously identified as T cell antigens.

Further analysis revealed that the vast majority of the response mapped to very discrete regions of the Mtb genome, and specifically to three clusters of reactivity within the genome, which encoded close to half of the total reactivity. One of the islands contained the well-characterized antigens early secretory antigenic target 6 kDa (ESAT-6) and culture filtrate protein 10 kDa (CFP10), secreted by Type VII secretion systems (T7SS or Esx system). The other two islands also contained Type VII secretion protein pairs. To further highlight the novelty of these observation, we discovered that the antigens that were recognized as dominant were not limited to the secreted proteins, but also included proteins from the actual secretion apparatus. Thus, the results obtained illustrated the feasibility of the approach, while at the same time identifying a number of novel epitopes and antigens, and providing new insights into the mechanisms of immunodominance.

# THE RESURGENCE OF BP AS A PUBLIC HEALTH CONCERN

BP has been a health concern since the Middle Ages (107), and whooping cough was prevalent and associated with high morbidity and mortality until the widespread vaccination (108). Vaccination with wP vaccine in general population has greatly reduced whooping cough since the 1950s. Nevertheless, the wP vaccine was associated with of minor adverse reactions and very rare serious side-effects, which resulted in its replacement by the aP vaccine in the United States (109, 110). In spite of widespread vaccination, the cases of whooping cough have recently been steadily increasing in the United States (www. cdc.gov). Epidemiological evidence indicates that the increased prevalence may be associated with the switch from wP to aP vaccine in the mid-1990s, further implicating a potential role for waning immunity (www.cdc.gov).

Although the phenomenon of "waning BP immunity" is a serious issue (111), it is not straightforward to address as its manifestation appears more than 15 years following the initial vaccination. Therefore, it is crucial to understand the underlying mechanisms of waning immunity in order to guide the design of effective vaccines. In addition to qualitative differences in the response, several other mechanisms may exist. Two main additional hypotheses have been put forth (**Figure 4**). First, as the wP vaccine contains >3,400 ORFs, whereas the aP vaccine includes only a few BP proteins, it is likely that a differential breadth of response is induced by the wP and aP vaccines (112). Furthermore, the chemically detoxified pertussis toxin (PT) contained in the aP vaccine may have altered antigenicity and could potentially influence vaccine efficacy (113, 114). Second, it has been proposed that decreased vaccine efficacy might be due to antigenic drift (108, 115–119).

Both antibody and T cell responses are thought to be associated with the effectiveness of pertussis vaccination. Notably, protective immunity against BP persists even after antibody levels have reduced (120–122), suggesting that T cells play a role in long-term protection against BP. Animal studies suggest that memory CD4 T cells of Th1 and Th17 phenotype mediate for long-term protection, which are induced by infection as well as wP vaccination (123–125). In contrast, aP vaccination is associated with a predominant Th2 response in humans (126–129). Furthermore, a few studies have reported that aP vaccination induces qualitative changes in T cell responses, resulting suboptimal efficacy (130–133) (**Figure 4**).

## GENOME-WIDE SCREEN OF T CELL RESPONSE TO BP

To date, the question of whether the wP vaccine elicits strong T cell responses to additional and different set of antigens to those elicited by aP vaccination, and if so which antigens, has not been addressed. Given the fact that our genome-wide screen of Mtb (5) revealed novel dominant antigens that had escaped detection, despite decades of investigation of Mtb-specific T cell responses, we consider this possibility likely. By the same token, the breadth of responses induced by natural infection and clinical disease are not known. Here as well, it is likely that additional antigens beyond those included in the current vaccine are of importance; for example, the ACT toxin has been shown to be targeted by BP infected individuals, and the combination of PT and ACT results in superior protection from disease in animal models of BP infection and disease (134–136). These considerations underscore further investigation of BP antigens and T cell epitopes, as well as correlates of protection.

#### DEFINITION OF aP EPITOPES FOLLOWING aP vs. wP VACCINATION

We previously completed a series of studies aimed at the definition of CD4 T cell epitopes derived from the antigens contained in the aP vaccine (24). These illustrate the general feasibility of the study of epitopes and T cell reactivity in BP, and also provide a point of reference to interpret the results obtained in a potential genomic screen of BP T cell reactivity.

In those studies (24), PBMCs from either aP- or wPprimed healthy adults with recent aP booster were used to screen overlapping peptides derived from the protein components that are the foundation of the acellular vaccine: PT, Pertactin, Filamentous hemagluttinin, and Fimbrae 2 & 3). We utilized high-throughput ex vivo ELISPOT assays to measure T cell cytokine production of interferon-γ (IFNγ) and IL-5, and deconvolution of positive peptide pools identified individual T cell epitopes. Epitope mapping revealed the same epitopes were recognized by both aP- and wPprimed individuals (24). However, the ratios of IFN-γ and IL-5 revealed a Th1 bias in originally wP-primed donors, and

dominance of IL-5 in individuals primed with aP (24). This differential polarization persists following booster, even decades after original priming (24).

## CHARACTERIZATION AND VALIDATION OF EPITOPES DERIVED FROM aP-ANTIGENS

As a result of the studies described in the previous section, we defined a "megapool" encompassing the 132 most dominant epitopes recognized, which allowed to assess BP responses directly ex vivo using the AIM assay combined with ICS assays, without need for in vitro re-stimulation, and thus allow direct phenotyping avoid the alterations induced by the in vitro restimulation step. This strategy was utilized to evaluate the phenotype and function of T cells in the PBMCs from either wPor aP-primed donors, following an aP booster 1-3 months postvaccination, to allow for memory T cells return to steady state conditions (23).

Using the ex vivo readouts we still detected the persistent differential polarization previously detected after in vitro restimulation. Moreover, we detected differential polarization toward IL-4 and IL-9 in aP-primed donors and IFN-γ and IL-17 in wP-primed donors (23). This effect was specific for the vaccine antigens, since no difference was noted for other epitopes such as megapools from the ubiquitous antigens CMV and EBV. The IL-17 polarization of wP vaccination had been previously in baboon models, but not for humans (124, 125, 137, 138). The observation of IL-9 differential polarization is a novel aspect of our study. In-depth phenotypic analysis using combined ICS and transcriptomic analysis of BP-specific memory T cells from aP vs. wP donors revealed clear differences, especially at the level of effector memory T (Tem) cell. 13 differentially expressed genes were identified by comparing ap-Tem and wP-Tem cells, including IL9 and TGIF2, which is related to regulation of TGFβ-responsive genes (139, 140). IL5, IL13, and TGFB1 were also up-regulated in samples from aP donors (23).

In contrast to aP prime, wP prime is associated with substantially higher magnitude of CD4 T cell responses following aP booster, when ex vivo responses were assayed in a time window ranging from a few days to several months. Consistent with these findings, by the use of in vitro proliferation assays we could show that the aP originally primed donors were associated with lower proliferative capacity (23). In conclusion, these results demonstrate that the various techniques described above can be used to dissect and define the phenotype associated with BP specific T cell responses, and reveal important biological differences.

# CONSERVATION OF BP EPITOPES ACROSS BP VARIANTS

As mentioned before, previous studies (108, 115–119) indicated that mutation might be accumulating in the acellular vaccine antigens, and that this phenomenon might be related to the apparent waning of BP immunity. Conversely, we also have recently shown that the human microbiome composition modulates T cell responses via molecular mimicry (141).

A first line of preliminary analysis, considers the possibility that new BP strains that carry mutations at key epitopes (pathogen escape), have evolved. Several studies have identified mutations in circulating BP strains that could be the result of pathogen adaptation to immune pressure. For example, Bart et al. (115) identified a total of 471 coding SNPs (genetic variations that result in amino acid changes in the encoded proteins) from prevaccination strains. Precise mapping of the T cell epitopes that are prevalently recognized in the human population, including for antigens that are not also targets of immune responses, will further elucidate if the observed genetic variability of circulating BP strains is indeed a result of T cell immune pressure.

#### CONCLUSION AND PERSPECTIVE

There is strong evidence suggesting that T cells have important functions in BP immunity and vaccine efficacy. However, T cell epitopes elicited by either natural infection or whole cell (wP) vaccination have not been comprehensively defined, and the corresponding T cell phenotypes have not been characterized. Based upon the success of genome-wide screen of Mtb, we make an argument to support performing a genome-wide screen of T cell responses in individuals vaccinated with wP vaccines, and individuals previously diagnosed with whooping cough disease, to understand the targets of cellular immunity in those conditions. Such an investigation could utilize techniques developed and validated over the years, which include both direct ex vivo assays such as the AIM assay and in vitro expansion of memory T cells utilizing BP lysates. Although powerful, the full genome-wide screen approach also has its limitations. For instance, this approach will not select noncanonical peptides presented by HLA molecules such as peptides originating from non-coding regions and spliced peptides. In fact, recent studies (one of which we coauthored) indicate that a substantial fraction of the HLA peptidome (class I but probably

#### REFERENCES


class II as well) is composed of hybrid peptides that originate from two different peptide fragments (so-called cis-spliced or trans-spliced peptides) (142, 143). If general rules that predict the splicing mechanisms can be defined, these sliced peptides could be predicted and thus incorporated in the analysis.

T cell responses against the various epitopes and associated antigens can be characterized and validated using several different complementary approaches. These include determining HLA restriction, and measuring HLA binding affinity, characterizing memory phenotypes, functionality and helper T cell subsets, and patterns of epitope sequence variation. Additionally, it would be of considerable interest to characterize transcriptomic profiles associated with recognition of the new epitopes identified, as compared to the ones currently included in the aP vaccine. These studies could potentially address several hypotheses proposed to explain the decreased efficacy of aP vaccines, namely differences in antigen specificity, differences in functionality, and/or mutations associated with the antigen/epitope associated with vaccine responses. Furthermore, we anticipate that these studies will be broadly applicable to other intracellular bacterial pathogens such as Salmonella and Brucella.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

This work was supported by National Institutes of Health contracts and grants HHSN272200900044C, HHSN272201200010C, HHSN272201400045C, U19 AI118626, and U01 AI141995.

#### ACKNOWLEDGMENTS

We thank Eugene L. Moore for designing **Figure 1**.

in a CXCR3+CCR6+ Th1 subset. PLoS Pathog. (2013) 9:e1003130. doi: 10.1371/journal.ppat.1003130


predictions of proteasomal cleavage. Immunogenetics (2005) 57:33–41. doi: 10.1007/s00251-005-0781-7


virus-specific CD4+ T-Cell responses. J Infect Dis. (2016) 214:1117–24. doi: 10.1093/infdis/jiw309


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Tian, da Silva Antunes, Sidney, Lindestam Arlehamn, Grifoni, Dhanda, Paul, Peters, Weiskopf and Sette. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Trained Immunity-Based Vaccines: A New Paradigm for the Development of Broad-Spectrum Anti-infectious Formulations

Silvia Sánchez-Ramón1,2 \*, Laura Conejero<sup>3</sup> , Mihai G. Netea4,5, David Sancho<sup>6</sup> , Óscar Palomares <sup>7</sup> and José Luis Subiza<sup>3</sup>

<sup>1</sup> Department of Clinical Immunology and IdISSC, Hospital Clínico San Carlos, Madrid, Spain, <sup>2</sup> Department of Immunology, ENT and Ophthalmology, Complutense University School of Medicine, Madrid, Spain, <sup>3</sup> Inmunotek, Alcalá de Henares, Spain, <sup>4</sup> Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, Netherlands, <sup>5</sup> Department for Genomics and Immunoregulation, Life and Medical Sciences Institute, University of Bonn, Bonn, Germany, <sup>6</sup> Immunobiology Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain, <sup>7</sup> Department of Biochemistry and Molecular Biology, School of Chemistry, Complutense University of Madrid, Madrid, Spain

#### Edited by:

Pedro A. Reche, Complutense University of Madrid, Spain

#### Reviewed by:

Randy A. Albrecht, Icahn School of Medicine at Mount Sinai, United States Michael Schotsaert, Icahn School of Medicine at Mount Sinai, United States

> \*Correspondence: Silvia Sánchez-Ramón ssramon@salud.madrid.org

#### Specialty section:

This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology

> Received: 04 September 2018 Accepted: 29 November 2018 Published: 17 December 2018

#### Citation:

Sánchez-Ramón S, Conejero L, Netea MG, Sancho D, Palomares Ó and Subiza JL (2018) Trained Immunity-Based Vaccines: A New Paradigm for the Development of Broad-Spectrum Anti-infectious Formulations. Front. Immunol. 9:2936. doi: 10.3389/fimmu.2018.02936 Challenge with specific microbial stimuli induces long lasting epigenetic changes in innate immune cells that result in their enhanced response to a second challenge by the same or unrelated microbial insult, a process referred to as trained immunity. This opens a new avenue in vaccinology to develop Trained Immunity-based Vaccines (TIbV), defined as vaccine formulations that induce training in innate immune cells. Unlike conventional vaccines, which are aimed to elicit only specific responses to vaccine-related antigens, TIbV aim to stimulate broader responses. As trained immunity is generally triggered by pattern recognition receptors (PRRs), TIbV should be formulated with microbial structures containing suitable PRR-ligands. The TIbV concept we describe here may be used for the development of vaccines focused to promote host resistance against a wide spectrum of pathogens. Under the umbrella of trained immunity, a broad protection can be achieved by: (i) increasing the nonspecific effector response of innate immune cells (e.g., monocyte/macrophages) to pathogens, (ii) harnessing the activation state of dendritic cells to enhance adaptive T cell responses to both specific and nonrelated (bystander) antigens. This capacity of TIbV to promote responses beyond their nominal antigens may be particularly useful when conventional vaccines are not available or when multiple coinfections and/or recurrent infections arise in susceptible individuals. As the set of PRR-ligands chosen is essential not only for stimulating trained immunity but also to drive adaptive immunity, the precise design of TIbV will improve with the knowledge on the functional relationship among the different PRRs. While the TIbV concept is emerging, a number of the current anti-infectious vaccines, immunostimulants, and even vaccine adjuvants may already fall in the TIbV category. This may apply to increase immunogenicity of novel vaccine design approaches based on small molecules, like those achieved by reverse vaccinology.

Keywords: adjuvants, innate immunity, immunostimulants, pattern recognition receptors (PRRs), PRR-ligands, trained immunity, trained immunity-based vaccines (TIbV), vaccines

# BACKGROUND

Conventional anti-infectious vaccines are primarily intended to target specific pathogens by enhancing an antigen-specific adaptive immune response. This response is based on triggering B and T lymphocytes that, by virtue of their clonally segregated antigen receptors, generate effector, and memory cells. Proliferation and differentiation of specific lymphocytes is the basis of immunological memory, a hallmark of the adaptive immune response and the rationale behind conventional vaccines. Louis Pasteur built on the work of Edward Jenner to develop the principles of using attenuated live microbes to prevent the pathogen's caused disease (1). Pre-exposure vaccination was a major breakthrough in the prevention of many infectious diseases. His rabies vaccine for post-exposure prophylaxis in the severely ill boy Joseph Meister in 1885 raised the concept of therapeutic vaccines (2).

Immunological adaptive memory has been traditionally defined as long-term acquired memory against an encountered antigen through infection or immunization, leading to a quicker and heightened immune response upon an ulterior rendezvous (3). Resting clones of memory B and T cells can survive at different compartments for several decades until reactivation by recall responses (4, 5). Recent epidemiological studies highlight the role of subclinical infections or repeated endemic exposure for the maintenance of protective antigen-specific antibodies and T cells, indicating the dependency of this adaptive memory on antigen-re-exposure (6, 7). Besides, the persistent specific T and B lymphocyte activation can also favor "infectious immunity," a process by which innate immune responses are enhanced by mechanisms depending on the persistence of the activation of adaptive immunity (3).

### Trained Immunity

Importantly however, solid epidemiological data have also demonstrated that certain mild infections or vaccinations, such as with bacilli Calmette-Guerin (BCG), lead to protection against heterologous infections, with a strong impact on overall mortality due to infection for up to 1 year (8–11). When vaccination against smallpox was introduced around 200 years ago, positive side-effects such as protection against measles, scarlet fever and whooping cough, among others, were noticed (12). These and many other clinical observations, pointed to a long-lasting nonspecific collateral benefit associated to these vaccines, regardless of specific priming and subsequent clonal selection of T and B lymphocytes specific for the nominal antigens present in the vaccine. In recent years, it has become evident that cells of the innate immunity may be primed upon encounter with certain pathogens or molecular patterns associated to pathogens (PAMPs), acquiring a higher resistance to a second infection against the same or unrelated pathogens (cross-protection) for a relatively long time (13, 14). This concept has been termed "trained (innate) immunity." Trained immunity implies adaptation of innate immunity processes in a de-facto innate immunological memory, and plays an essential role in vertebrates (15), which is similar to that described for bacteria, plants and invertebrates (16).

Mechanistically, trained immunity is defined by immunological, metabolic and epigenetic hallmarks (17– 20). Several studies have shown that metabolic reprogramming through a shift from oxidative phosphorylation to aerobic glycolysis (the Warburg effect) mediated by the Akt/mTOR/HIF-1α pathway is a key mechanism for trained immunity responses (21, 22). The glycolysis, glutaminolysis, and cholesterol synthesis pathways in monocytes and macrophages were identified as the essential underlying mechanism linking epigenetic rewiring and the induction of improved innate immunity (22, 23). Thus, changes in cellular metabolism influence the epigenetic reprogramming of innate immune cells having an impact on cytokine and reactive oxygen species' production. In this regard, trained immunity regulates epigenetic changes such as H3K4 trimethylation and H3K27 acetylation, both associated with active chromatin, and H3K9 trimethylation, a repressive marker (19, 22, 24). Trained immunity favors the production and release of proinflammatory cytokines such as TNF-α, IL-6 and IL-1β by innate immune cells upon exposure to a second stimulus (14, 24, 25). Most of these features differ from what classically has been postulated for the innate immune system, as trained immunity induces functional reprogramming within innate immune cells that maintain these cells in a "ready-to-react" functional state over extended periods of time. Interestingly, although a maximum duration of trained immunity effects has been reported up to 3 months (24), a long lasting effect of trained cells with the capacity to enhance T cell responses up to 1 year is feasible (26), thus bridging innate training with adaptive responses. Moreover, the storage of specific long peptides for ulterior long-lasting cross-presentation to elicit cytotoxic T lymphocytes is an additional feature that may be associated to trained immunity in monocytes and could bridge innate and adaptive imprinting (27).

A striking example indicating a durable change within the innate immunity compartment is the imprinting of BCG on bone marrow hematopoietic stem cells and multipotent progenitors, giving rise to epigenetically modified macrophages that provide better protection against virulent M. tuberculosis than naïve macrophages (28). Unlike the classical memory following the adaptive immune response, long-term responses associated to trained immunity are not based on a clonal expansion of lymphocytes but on reprogramming myeloid cells by stable epigenetic changes (**Table 1**).

Pathogen recognition receptors (PRRs) expressed on innate immune cells, including long-lived macrophages and their precursors, are involved in the stimulation of trained immunity. Different PRRs have been involved in this task, such as C-type lectin receptors (CLRs) and Nod-like receptors (NLRs). Training of the innate immunity is therefore based on boosting nonspecific immunity to re-infection by bacteria, fungi or viruses by certain pathogen's derived components (19). There are many examples of pathogen-associated molecules with evidence of cross-protection in experimental models (**Table 2**). The increased host defense induced by trained immunity, while effective against a range of pathogens, is non-specific as it is mediated by the release of proinflammatory cytokines such as IL-1α and TNF-α and/or reactive oxygen species (ROS) (17). The



TABLE 2 | Examples of pathogen-associated molecules with experimental evidence of cross-protection.


role of IL-1α in host resistance to infection and how TNFα protects against infections have been recently updated (39, 40). Higher resistance due to trained immunity does not mean however an absolute resistance to every type of second infection that, on the other hand, may be favored by facts beyond the innate immune response. This might account for why a natural infection, such as primary infection by influenza A virus, can result in bacterial pneumonia resulting from superinfection by Streptococcus pneumoniae or other bacterial strains.

## MOVING AHEAD CONVENTIONAL VACCINES: TRAINED IMMUNITY-BASED VACCINES (TIBV)

The exploitation of the principles of trained immunity may result in a next generation of anti-infectious vaccines (41–43). Trained immunity-based vaccines (TIbV) may confer a broad protection far beyond to the nominal antigens they contain. By proper targeting of innate immune cells to stimulate trained immunity, both nonspecific and specific immune responses can be enhanced by TIbV. Such responses can also be driven against bystander pathogens encountered by the host during the window of trained immunity.

Vaccines using attenuated and/or inactivated pathogens may be examples of TIbV as long as they contain PAMPs able to trigger PRRs inducing trained immunity. Different PRR ligands have been described as trained immunity stimuli, like Candidaderived β-glucan or BCG-derived muramyl dipeptide, triggering CLRs (dectin-1) or NLRs (NOD2), respectively. Of note, by using these training stimuli, slight differences on how they modify the cellular metabolism of innate immune cells has been described (22). This opens the possibility that different trained immunity outcomes may be achieved varying the set of PRR ligands used in the TIbV. At this point, it should be noted that, within the context of a vaccine, the fact that trained innate immune cells may enhance adaptive responses is essential. In this regard, the role played by DCs, the cellular link between innate and adaptive immunity (44), must be pivotal. Although trained immunity is linked to innate cells such as monocytes, macrophages and NK cells (19), the trained immunity-promoting BCG vaccine can also enhance heterologous T cell responses (26, 45, 46). It has been speculated that the increased expression of certain PRR in innate trained cells, as well as the release of typical innate immunity cytokines, such as IL-1β, contribute to enhance adaptive T cell responses (26). In this regard, the polybacterial sublingual vaccine MV130, which contains whole cell heat-inactivated bacteria, has been shown to enhance in vivo T cell responses to unrelated antigens, while priming DCs and inducing IL-1β release in vitro (47). DCs with high immunostimulatory properties that enhance adaptive immune responses via IL-1β release have been described (48). Moreover, the role of inflammasome-associated IL-1 family cytokines in delineating the adaptive immune responses is established regarding differentiation of Th17 cells and promoting effector functions of Th1 cells (49). Interestingly, these DCs are rendered "hyperactive" by releasing IL-1β in absence of cell death by virtue of an alternative inflammasome pathway modulated by certain TLR ligands of microbial origin, like LPS and peptidoglycans (49). Thus, strong adaptive immune responses may be stimulated by certain PRRs stimuli that may complement those inducing trained immunity.

One of the most interesting aspects of trained immunity is that innate immune cells maintain a primed functional state for a quite long period of time (50). As it may last more than several months (26), the enhanced immune responses induced by a given TIbV may also be applied to possible bystander pathogens encountered by the host during this time frame. Thus, another relevant aspect of TIbV is that they may promote both nonspecific and specific resistance to unrelated pathogens while trained immunity is still present. This may be of particular interest when co-infections with the pathogens included in the TIbV are likely, especially in the context of recurrent infections. Thus, under the umbrella of trained immunity, the TIbV concept emerges as a new paradigm of vaccines seeking to increase host resistance against a broad spectrum of pathogens.

With the above concepts in mind, the proposed term of TIbV can be applied for those anti-infectious vaccines composed of whole microorganisms or derived products that display the following features (**Figure 1**):


In contrast to conventional vaccines, TIbV efficacy cannot be measured solely in terms of specific responses to the nominal antigens included in the vaccine. In this regard, the clinical outcome scored by lower infectious rates in particular clinical settings is necessary.

## Examples of Non-conventional Vaccines That Can be Ascribed as TIbV

Vaccines that may fall in the category of TIbV include those bacterial preparations used for recurrent infections for either the respiratory or urinary tract (51–53). Previous and recent studies provide many clinical observations that combinations of inactivated bacterial vaccines induce cross-protection against infections produced by quite different microorganisms (41, 54). In the case of the sublingual vaccine MV130, designed to prevent recurrent respiratory tract infections, a significant reduction in patient's rate of infection was observed (51). Besides inducing specific T cell immunity against the bacteria included in MV130, treated patients showed an enhancement in T cell response to unrelated flu antigens (51). MV130 triggers TLR and NLR signaling pathways on DCs releasing trained immunity hallmark cytokines (TNFα, IL-6, and IL-1β) (47). In addition, MV130

FIGURE 1 | Trained immunity-based vaccine components. TIbV consist of two essential components: (a) Trained immunity (TI) inducers: a range of PAMPs that target a variety of PRRs triggering different signaling pathways that mediate trained immunity. (b) TIbV-related Ags: the antigens associated with the pathogens acting as TI-inducers to which an adaptive immunity is aimed. Thus, TIbV are characterized by conferring Ag-nonspecific resistance directly dependent on trained immunity stimulation plus an Ag-specific resistance dependent on adaptive immunity against the TIbV components and eventual bystander pathogens. PAMP, pathogen-associated molecular pattern; PRR, pattern recognition receptor; Ag, antigen.

promoted the generation of Th1 and Th17 responses with high levels of IL-10 both in vitro and in vivo to MV130-related and bystander antigens (47). A recent clinical trial performed in children with recurrent wheezing attacks (mostly of viral etiology) has shown the clinical benefit of MV130 as well as the protection in experimental models of respiratory viral infections by trained immunity mechanisms (Nieto et al., manuscript in preparation). Similarly, MV140 another sublingual whole cell heat-inactivated bacterial vaccine designed to prevent recurrent urinary tract infections (52, 53, 55), was also effective against urobacteria species not included in its composition (53). MV140 also triggers the release of TNFα, IL-6 and IL-1β by DCs, albeit using different signaling pathways from MV130, and induces Th1 and Th17 responses by mechanisms mediated by CLRs and TLRs as well (56). Thus, both MV130 and M140 vaccines induce the release of a similar set of cytokines ascribed to trained immunity, and favor heterologous Th1 and Th17 responses in vivo as described for vaccines stimulating trained immunity (26). **Figure 2** summarizes the proposed action mechanisms of either MV130 or MV140 as putative TIbV. Other bacterial preparations used as nonspecific immunostimulants may also act as trained immunity inducers (see below).

FIGURE 2 | Trained immunity-based vaccine mechanisms of action (A) and clinical outcome (B). (A) TIbV act on the cells of the innate immune system, such as macrophages/monocytes and DCs inducing trained immunity which in turn will lead to nonspecific resistance and pathogen clearance. In addition, trained DCs enhance T cell responses and T helper differentiation (e.g., Th1 and Th17) against TIbV-related and unrelated (bystander pathogens) antigens. (B) In the context of recurrent respiratory or urinary tract infections, TIbV have the potential to induce a protective period of time providing the host resistance against TIbV-related and bystander pathogens during this frame time, reducing the infection rate. DC, dendritic cells; Mo, monocyte; PRR, pattern recognition receptor; Th, T helper cell; Th0, naïve T cells; TIbV, trained immunity-based vaccine. MV130, polybacterial vaccine containing whole cell heat-inactivated bacteria that produce frequent infections in the respiratory tract. MV140, polybacterial vaccine containing whole cell heat-inactivated bacteria that produce frequent infections in the urinary tract.

# Conventional Vaccines With Associated Trained-Immunity Effects as Potential TIbV

A number of conventional anti-infectious vaccines, most of them containing live-attenuated pathogens, have been shown to induce, in addition to the intended specific memory, broad protection by nonspecific mechanisms (46, 58). Therefore, they can be considered within the category of TIbV provided that such mechanisms are related to trained immunity. If this were the case, harnessing innate memory as part of a vaccination strategy with these vaccines may be considered.

#### BCG

As above mentioned, most studies that examinated and elucidated the mechanisms of trained immunity have been performed with BCG as a model. Randomized-controlled trials carried out in Guinea-Bissau with low-birth-weight infants early vaccinated against tuberculosis with BCG, demonstrated clear beneficial effects reducing all-cause mortality, especially due to neonatal sepsis, respiratory infections, and fever (9). The effects of neonatal BCG vaccination on T and B lymphocytes subsets in infants in Denmark showed limited impact though (57) and did not affect parent-reported infections (59). It is not known whether the clinical setting in different populations, exposed to a high vs. low rate of pathogens, might account for these divergent outcomes. Recently, Arts et al. have demonstrated that BCG vaccination confers protection against viral infection (25). In a placebo-controlled clinical trial with BCG, all volunteers received the yellow fever vaccine 1 month after BCG, as an experimental mild viral infection. BCG-vaccinated volunteers displayed a significant reduction of viremia compared to the placebo group, which highly correlated with enhanced IL-1β production (25). In some experimental models of virus infection BCG immunization has been reported to confer non-specific protection; yet not in all, pointing out that the route and dose of BCG administration may be important (60).

BCG is currently used as local immunotherapy in bladder cancer (61). Interestingly, Buffen and cols. have demonstrated that the anticancer effects of BCG were dependent on trained immunity (62). In addition to a nonspecific cytotoxic effect for tumor cells by BCG-trained innate cells, these may enhance tumor specific T cell responses as a massive accumulation of tumor specific T lymphocytes are recovered in urine after successful BCG therapy (63). As endogenous tumor antigens may act as bystander antigens under the umbrella of trained immunity, this opens the possibility of TIbV as immunostimulants outside of anti-infectious purposes, e.g., tumor immunotherapy. In fact, pioneer studies of William Coley in cancer immunotherapy were based on administering bacterial products to cancer patients (64).

#### Vaccinia Virus

Live vaccinia virus was successfully used against smallpox until its eradication in 1977. Two observational studies carried out in Africa concluded that adults being smallpox vaccinated had significantly lower mortality risk, with a stronger effect observed in women than in men (65, 66). Since both studies were carried out when smallpox was already eradicated and, therefore, in the absence of the targeted infection, the beneficial effects are necessarily non-specific. The capacity of a subset of NK cells to exhibit certain aspects of innate memory following infection with vaccinia virus was found by Gillard et al. in 2011. They demonstrated that this innate memory provides host protection against a subsequent systemic infection with a lethal dose of vaccinia virus, in some cases resulting in the complete clearance of detectable virus (67).

#### Influenza Vaccine

Trivalent live attenuated influenza vaccine has been shown to confer indirect protection from respiratory illness among children (68).

Respiratory syncytial virus (RSV) and influenza virus share common features, including innate immunity activation via PRRs, such as TLR3, TLR7, and retinoic acid-inducible gene I (RIG-I) (69, 70). The cold-adapted, live attenuated influenza vaccine (CAIV) has been shown to provide non-specific crossprotection against RSV in a murine model of infection (71). The results demonstrated that this vaccination induces local immune responses that provide a broad range of antiviral immunity, including protection against RSV, and that TLR3- and TLR7 mediated innate immunity plays an important role in protection against RSV (71).

#### Immunostimulants and Adjuvants as Putative TIbV

In addition to the examples described above, it is likely that other bacteria, fungi and viral preparations used as immunostimulants for different conditions might promote trained immunity if containing suitable inducers. In this regard, Candida-derived βglucan is a paradigmatic example as it is a well-known inducer of trained immunity via dectin-1 (14). At this point, it should be noted that trained immunity-based immunostimulants might be considered within the TIbV concept because, under the umbrella of trained immunity, they may enhance innate and adaptive responses to bystander pathogens and their corresponding antigens (**Figure 3A**).

In 1986, Bistoni et al. demonstrated that systemic infection of mice with an avirulent C. albicans strain conferred protection not only against subsequent intravenous challenge with a pathogenic C. albicans strain but to S. aureus as well (72). More recently, it has been shown that protection from secondary lethal infection can be achieved with β-glucan and is dependent on epigenetic reprogramming linked to trained immunity (14). Although the immunostimulating effect of β-glucans is known for decades (73), the molecular mechanisms involved has started to be understood in the last few years (19, 21). The potential of oral β-glucan as "immune trainer" has been assayed in a pilot study in healthy volunteers (74). Innate immune responses were subsequently evaluated in peripheral blood mononuclear cells re-stimulated in vitro with C. albicans. However, the results showed a lack of cytokine production or microbicidal activity, which could be due to several reasons, including the dose and route of administration (74).

The immunostimulant OM-85 is a mixture of bacterial lysates for oral administration able to increase protection in

a murine model of respiratory viral infection, reducing viral load in the lung following experimental infection (75). It also reduced rhinovirus infection of lung epithelial cells (76) and had a protective role in models of viral/bacterial respiratory infections, reducing disease symptoms and improving survival (77). OM-85 has demonstrated clinical efficacy reducing the incidence, prevalence and/or duration of infections in children and adults (78–80). It is not known whether the mechanisms behind crossprotection of this and similar immunostimulants are dependent on trained immunity, but it is likely in analogy with the bacterial vaccines described above.

Another aspect of trained immunity-based immunostimulants is that they might be considered adjuvants when combined with other antigens to which an enhanced immune response is expected (**Figure 3B**; AgX). Thus, TIbV containing exogenous or chimeric antigens can be furnished. This possibility has been recently described for BCG being used as adjuvant for recombinant hepatitis B surface antigen vaccination (rHBsAg) (81). Even if the antigen may be administered in a second step once the trained immunity is induced (**Figure 3B**; AgY). In this composition the TIbV is split in two separated elements, the trained immunity-inducer and the antigen itself. In this sense, the influence of BCG on antibody and cytokine responses to human neonatal vaccination has been described already (82). Either combination can be used for the development of novel vaccines with very specific but low antigenic molecules such as those synthetic peptides designed by reverse vaccinology.

## CLINICAL APPLICATIONS OF TIBV

The development of TIbV may represent an advantage over conventional vaccines in certain settings. Under the trained immunity umbrella, broader and stronger immune responses may be expected, without the limitations of antigen specificity. Some applications include the following:


In addition, the TIbV concept can be applied to the design of vaccines directed to the antigen of interest whether it is combined or not with the trained immunity stimuli. As mentioned above this concept is also broad and TIbV may be considered immunostimulants for endogenous (bystander) pathogens, or as a type of adjuvant for any antigen. While all these applications are of primary clinical interest, caution is needed about the potential deleterious function of trained immunity in patients suffering diseases characterized by excessive inflammation. This potential deleterious effect of trained immunity could apply to atherosclerosis (95), cardiovascular events (96), gout (97), and a variety of autoimmune diseases and autoinflammatory disorders such as rheumatoid arthritis, systemic lupus erythematosus or hyper-IgD syndrome (98), where monocytes/macrophages show a detrimental trained immunity-like phenotype. Similarly, trained microglia has been linked to neurological disorders and stroke (99). Current knowledge, however, does not support such a deleterious role for TIbV: (a) Due to their nature, trained immunity is likely to have a transitory rather than a permanent effect, giving the system the required plasticity to avoid long-term potentially deleterious effects; (b) at least in two models (MV130 and MV140), TIbV have been shown to induce the production of the regulatory cytokine IL-10 by DCs and T cells, both in vitro and in vivo (47, 56); (c) MV130 has been shown to reduce recurrent infections in patients with rheumatoid arthritis without adverse effects (Candelas et al., manuscript in preparation), an observation also noted with other bacterial-derived immunostimulants (100, 101).

Many currently licensed vaccines consist of whole or inactivated pathogens; however, there has been a recent shift toward using simpler molecules such as highly purified antigens, recombinant, or synthetic peptides or DNA vaccines. Computational analysis of genetic sequences is now used for the prediction of just few T-cell epitopes fitting with most HLA molecules (102) as well as for antigen searching by the socalled reverse vaccinology (103). While these new generation of vaccines are focused to get a quite specific driven response, they are poorly immunogenic in absence of proper adjuvants. Thus, to confer protective immunity a strategy might be the combination of the adjuvant potential of trained immunity with the selected antigen epitopes. An important aspect to take into account is that it is not yet known whether or not all trained immunity stimuli produce the same functional behavior on innate immune cells with regard of driving T cells responses. As different PRR ligands may trigger different cell activation pathways with additive, synergies and opposite effects on key cell functions (44), it is likely that there are more than a single trained immunity functional program. Thus, the same antigen molecule can be eventually combined with different trained immunity stimuli for tailoring the better desired T cell response, like currently is being done with other adjuvants. In this regard, TLR8 agonists that mimic the immunomodulating effects of BCG, and enhance innate

#### REFERENCES


and adaptive immune responses have been described recently (104).

#### FUTURE DIRECTIONS

Much knowledge is required in this field to successfully develop the potential of TIbV. Although their main advantage is that they act broadly on different pathogens and their potential as novel immunotherapy approach in both infectious, and even non-infectious immune related diseases such as cancer immunotherapy is obvious, their extent and limitations may depend on their composition. The pattern of response induced in the host innate immune cells by the specific training stimulus may also dictate the duration and interaction with ongoing specific responses. Despite the beneficial effects of TIbV inducing trained immunity as a host defense mechanism, any possible harmful effect in the induction and/or maintenance of autoimmune disorders cannot be ruled out, irrespective that current evidence does not support such a deleterious role. More studies will have to be carried out to fully understand the advantages and limitations of TIbV. Moreover, the indication of TIbV might consider several factors such as drug intake (105) or diet (106) that may affect the induction of trained immunity or its activation status, respectively. Finally, trained immunitybased stimulants and/or adjuvants widen the spectrum of the insilico models for predicting immune responses. Thus, searching for new candidate combinations as trained immunity-based adjuvants for improved immunization purposes as TIbV is outlined as a novel and promising vaccine design.

#### AUTHOR CONTRIBUTIONS

SS-R, LC, and JS conducted literature searches, selected the studies, and wrote the manuscript. MN, DS, and ÓP critically contributed to the editing of the manuscript, final version and approval.

#### ACKNOWLEDGMENTS

We are grateful to Dr. Miguel Casanovas for helpful discussion. MN was supported by a Spinoza grant of the Netherlands Organization for Scientific Research.

virus disease enhancement for decades following infection. J Virol. (2012) 86:2665–75. doi: 10.1128/JVI.06335-11


beneficial nonspecific effects in the neonatal period? J Infect Dis. (2011) 204:245–52. doi: 10.1093/infdis/jir240


Staphylococcus aureus mammary infection in mice. Front Immunol. (2017) 8:833. doi: 10.3389/fimmu.2017.00833


An observational study from Guinea-Bissau. Vaccine (2006) 24:5718–25. doi: 10.1016/j.vaccine.2006.04.045


**Conflict of Interest Statement:** JS is the CEO of Inmunotek SL, a pharmaceutical company that manufactures bacterial vaccines. LC is an employee of Inmunotek. SS-R, DS, and ÓP have received research grants from Inmunotek.

The handling Editor declared a shared affiliation, though no other collaboration, with two of the authors ÓP and SS-R.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sánchez-Ramón, Conejero, Netea, Sancho, Palomares and Subiza. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genome-Based Approach Delivers Vaccine Candidates Against *Pseudomonas aeruginosa*

Irene Bianconi 1†, Beatriz Alcalá-Franco1†, Maria Scarselli 2†, Mattia Dalsass 2,3 , Scilla Buccato<sup>2</sup> , Annalisa Colaprico<sup>2</sup> , Sara Marchi <sup>2</sup> , Vega Masignani 2‡ and Alessandra Bragonzi <sup>1</sup> \* ‡

1 Infection and Cystic Fibrosis Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy, <sup>2</sup> GSK, Siena, Italy, <sup>3</sup> Dipartimento di Scienze Cliniche e Biologiche, Universitá degli Studi di Torino, Turin, Italy

#### *Edited by:*

Pedro A. Reche, Complutense University of Madrid, Spain

#### *Reviewed by:*

Paola Massari, Tufts University School of Medicine, United States Giampiero Pietrocola, University of Pavia, Italy

#### *\*Correspondence:*

Alessandra Bragonzi bragonzi.alessandra@hsr.it

†These authors have contributed equally to this work and share first authorship

‡These authors have contributed equally to this work and share senior authorship

#### *Specialty section:*

This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology

> *Received:* 17 September 2018 *Accepted:* 06 December 2018 *Published:* 09 January 2019

#### *Citation:*

Bianconi I, Alcalá-Franco B, Scarselli M, Dalsass M, Buccato S, Colaprico A, Marchi S, Masignani V and Bragonzi A (2019) Genome-Based Approach Delivers Vaccine Candidates Against Pseudomonas aeruginosa. Front. Immunol. 9:3021. doi: 10.3389/fimmu.2018.03021 High incidence, severity and increasing antibiotic resistance characterize Pseudomonas aeruginosa infections, highlighting the need for new therapeutic options. Vaccination strategies to prevent or limit P. aeruginosa infections represent a rational approach to positively impact the clinical outcome of risk patients; nevertheless this bacterium remains a challenging vaccine target. To identify novel vaccine candidates, we started from the genome sequence analysis of the P. aeruginosa reference strain PAO1 exploring the reverse vaccinology approach integrated with additional bioinformatic tools. The bioinformatic approaches resulted in the selection of 52 potential antigens. These vaccine candidates were conserved in P. aeruginosa genomes from different origin and among strains isolated longitudinally from cystic fibrosis patients. To assess the immune-protection of single or antigens combination against P. aeruginosa infection, a vaccination protocol was established in murine model of acute respiratory infection. Combinations of selected candidates, rather than single antigens, effectively controlled P. aeruginosa infection in the in vivo model of murine pneumonia. Five combinations were capable of significantly increase survival rate among challenged mice and all included PA5340, a hypothetical protein exclusively present in P. aeruginosa. PA5340 combined with PA3526-MotY gave the maximum protection. Both proteins were surface exposed by immunofluorescence and triggered a specific immune response. Combination of these two protein antigens could represent a potential vaccine to prevent P. aeruginosa infection.

Keywords: *Pseudomonas aeruginosa*, reverse vaccinology, vaccine, respiratory infection, mouse model

# INTRODUCTION

P. aeruginosa infections are among the most severe public health issues. This opportunistic bacterium belongs to the multi-drug resistant (MDR) ESKAPE pathogens, along with Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, and Enterobacter. According to data from Centers for Disease Control, P. aeruginosa is responsible for millions of infections each year in the community, 10–15% of all healthcare-associated infections, with more than 300,000 cases annually in the EU, USA and Japan (1). Patients hospitalized in intensive care units (ICU) run a high risk of acquiring P. aeruginosa as they may develop ventilator-associated pneumonia (VAP) and sepsis (2–4). Other risk groups are patients with a compromised immune system, either from immunosuppressive therapies and underlying diseases such as cancer, AIDS or hereditary cystic fibrosis (CF). This high prevalence is partly due to the vast arsenal of virulence factors that facilitates acute infections and the propensity of P. aeruginosa to form highly structured biofilm communities that cause chronic infections (5).

Taking the severity of the illness into account, current treatment guidelines for the management of bacterial infection recommend single antibiotic or combination therapy. Despite the wide arsenal of drugs for P. aeruginosa infections available on the market, inefficacy of these treatments is commonplace. Resistance rapidly emerges, usually linked to intrinsic bacterial resistance mechanisms, development of new antibiotic resistance, and/or limited penetration of antibiotics into biofilms (3, 6). Development of antibiotics with a novel mode of action and/or alternative therapies remains an urgent need for patients. Antibacterial agents launched in recent decades were modifications of existing molecules; the development of entirely new classes of antibiotics has been largely abandoned (7). Immunotherapy for preventing pulmonary infection has also been tested (8), but clinical efficacy has been disappointing (9, 10). Immunization strategies do not cover P. aeruginosa infection in healthcare practices.

In recent years, remarkable progress has been made in the identification of P. aeruginosa virulence factors and their variations among different infection processes. It has been more accurately recognized that P. aeruginosa is an antigenically variable microorganism that adapts easily to different growth conditions and escapes host immune recognition. The high variability of the proteins among different P. aeruginosa strains and within the same strain, grown in diverse environmental conditions, may represent a serious obstacle to the development of a globally effective anti–P. aeruginosa vaccine (10). So far, P. aeruginosa vaccine candidates have been found by classical approach—by identifying more abundant surface proteins and oligosaccharides or by selecting specific virulence factors, according to their relevance in the disease outcome. Integrated genomics and proteomics approaches have been recently used to predict vaccine candidates against P. aeruginosa (11). Although several vaccine formulations have been tested clinically, none has been licensed (10, 12). P. aeruginosa vaccines tested so far in humans consisted of antigens targeting single rather than multiple virulence mechanisms—OprF-OprI fusion (13), flagella (14), O antigen-conjugated vaccines (15), high molecular weight alginate (16). Further success in P. aeruginosa vaccine development may require a different approach, including bacterial genome evaluation to identify novel antigen combinations potentially addressing multiple virulence mechanisms, such as initial bacterial colonization, immune evasion, colony aggregation and cytotoxicity.

During the past two decades, reverse vaccinology has revolutionized the approach to vaccine research (17, 18), ultimately leading to the development of new generation vaccines based on antigens previously unrecognized by other approaches. One is 4CMenB, the first universal vaccine against serogroup B Neisseria meningitidis (MenB), now licensed in several countries worldwide (19, 20). Reverse vaccinology aims at identifying surface exposed proteins, ideally playing a relevant role in pathogenesis, which can serve as targets of the host immune system. This approach has not yet been implemented for P. aeruginosa. In this study, reverse vaccinology approach was combined with advanced genomic technologies to select new protein antigens against P. aeruginosa. We report that combinations of selected candidates, more than single antigens, effectively control P. aeruginosa infections in a mouse model of acute pneumonia.

#### RESULTS

#### *P. aeruginosa* Antigens Selection by Genome-Wide Screening

Among 5,570 open reading frames (ORFs) encoded by P. aeruginosa PAO1 strain (5), we predicted a total of 2,430 surface or membrane-associated proteins by using high throughput bioinformatics localization prediction tools. In particular, subcellular localization was predicted by PsortB software. In the case of predicted non-cytoplasmic polypeptides the presence of signal peptide and localization of cleavage site were predicted with SignalP. N-terminal signatures predictive of lipoproteins were identified by using the LipoP server and putative transmembrane regions were predicted with Tmpred. Among them, 307 were classified as outer membrane proteins or lipoproteins, 583 as periplasmic proteins, and 2,109 as inner membrane proteins. The remaining 2,562 ORFs were predicted to encode cytoplasmic proteins. Since inner membrane proteins are barely exposed on the outside of the bacterium and are difficult to express and purify, all were discarded from selection except those with sequence similarities to known virulence factors or extracellular proteins from other bacterial pathogens. The final selection totaled 950 ORFs (**Figure 1**).

In a second step of prioritization and to avoid selection of potential self-antigens, we excluded all proteins containing domains with sequence similarity (E-value>1e-10) to human and/or mouse proteins, narrowing the selection to 824 proteins. To identify proteins more directly related to P. aeruginosa pathogenesis and fitness, and to avoid widespread bacterial housekeeping factors, sequence comparison to E. coli K12 whole proteome was performed; we discarded P. aeruginosa proteins sharing more than 40% sequence similarity over at least 70% of the length of the E. coli counterpart. As sequence conservation is highly desirable for broad-spectrum vaccine candidates, a comparative analysis was performed with the genome sequence of seven published P. aeruginosa strains including clinical isolates of different origins (PA14, LESB58, PA7, 2192, C3719, PACS2, RP73); only proteins belonging to the core genome were kept. Candidate selection was also refined by removing short peptides (<150 aa long), eventually leading to a total of 531 hits. To further prioritize the candidates and reduce the final pool of proteins for experimental testing, a PSI-Blast analysis was conducted; results were manually curated to remove any residual protein putatively involved in intermediate metabolism, DNA synthesis, translation

and repair, protein synthesis and transport, and more generally, in any cellular process confined to the bacterial cytoplasm. The final pool of in silico selected candidates included 52 antigens— 31 proteins of known and 21 of unknown functions (**Figure 1** and **Table S1**). The presence of well-known virulence factors like ExoA and ExoT, as well as relevant outer membrane proteins like OprF and OprH in the final list of candidates confirmed the reliability of the selection strategy.

When tested against an extended panel of 104 P. aeruginosa complete genomes, it emerged that all candidates share a mean identity/coverage ratio ranging from 0.78 to 1.00 (**Figure S1**) confirming that a vast proportion of epitopes potentially presented by each candidate to the immune system is conserved across the natural P. aeruginosa population.

## Evaluation of Candidates in a Murine Model of Pneumonia

Of 52 P. aeruginosa vaccine candidates (**Table S1**), 30 ORFs (57.7%) were successfully expressed in E. coli BL21 as His-tag fusions. OprF-OprI fusion, designed according to the known construct used in the recent clinical trial (21), was included in this study. Ability of these antigens to protect against P. aeruginosa infection was tested in a mouse model of acute pneumonia (22). C57Bl/6 mice were immunized intraperitoneally (i.p.) with 10 ug of each protein formulated with aluminum hydroxide (Alum) as adjuvant at 0, 21, and 35 days. At day 50, mice were challenged intratracheally (i.t.) with 5 × 10<sup>6</sup> CFU/lung of P. aeruginosa reference strain PAO1 and monitored twice a day for 1 week for health parameters indicative of animal wellness. In this model, all mice immunized with Alum alone (negative control group) consistently showed symptoms of a severe clinical disease and died within 48 h, whereas mice immunized with whole cell inactivated PAO1 bacteria were consistently protected by homologous challenge in a dose-dependent manner. Of the 30 antigens tested, 10 showed a modest increase in survival compared to the negative control (up to 20% at day 5) and were investigated further (**Table 1** and **Figure 2**). The other 20 proteins tested did not differ substantially from the negative control and were discarded (data not shown). Survival of mice vaccinated with OprF-OprI fusion (10.7%) was comparable to that observed by vaccination with the 10 selected proteins.

### Evaluation of Combinations of Candidates in a Murine Model of Pneumonia

In order to increase the survival rate of vaccinated mice, proteins conferring higher protection were pooled in group of two, and 22 combinations were tested. Five combinations, all containing the antigen candidate PA5340, were the most promising (PA5340+PA1178-OprH, PA5340+PA3526-motY, PA5340+PA5112-EstA, PA5340+PA5047, PA5340+PA0328- AaaA). These combinations showed a significant increase in both survival curves (Mantel-Cox test p < 0.0002, 0.0019, 0.0027,

#### TABLE 1 | Top line vaccine candidates of P. aeruginosa.


<sup>a</sup>Surface exposure suggested by immunofluorescence microscopy co-localization (see text). <sup>b</sup>Evaluated by Western Blot against recombinant proteins, P. aeruginosa strain PAO1 and clinical isolate MDR-RP73 (see text). Sequence conservation expressed as mean percentage of amino acid identity ± standard deviation calculated among a collection of CF clinical strains <sup>c</sup>and on the public collection of 104 completed P. aeruginosa genomes available in GenBank (see text)<sup>d</sup> .

cfu) 2 weeks after last vaccination with ten single antigens. Comparisons were performed with mice immunized with Alum alone (negative control) and PAO1 heat-inactivated (h.i.) groups (positive control). An additional group was vaccinated with OprF-OprI, tested as clinical vaccine candidate. Data were pooled from at least two/three independent experiments (n = 16–40). Results are represented as Kaplan–Meier survival curves and analyzed by the Mantel-Cox test against negative control group. N refers to the number of animals.

0.015, and 0.015, respectively) and mean survival time (one-way ANOVA p-value < 0.01) when compared with negative controls (**Figure 3** and **Figure S2**). The best antigens combination was PA5340+PA3526-MotY, with survival increased up to 50%. Three combinations described above, PA5340+PA3526-motY, PA5340+PA5112-EstA, and PA5340+PA0328-AaaA, increased survival significantly when compared with OprF-OprI (Mantel-Cox test 0.0091, 0.0009, and 0.012, respectively). Interestingly, an increase in survival rate was also observed (though not statistically significant) when OprF-OprI was combined with PA5340, going from 10.7% of the fusion alone to 40% when tested in combination (Mantel-Cox test 0.058).

#### *In vitro* Characterization of Selected Antigens

To characterize antigenic potential and cellular localization of selected antigens (PA1178-OprH, PA1248-AprF, PA5112-EstA, PA0328-AaaA, PA2407-FpvC, PA3526-MotY, PA4082- CupB5, PA4765-OmlA, PA5047, and PA5340) the antisera obtained immunizing with recombinant proteins were used in Western Blot and immunofluorescence microscopy (**Table 1** and **Figure 4**). All the antisera recognized the recombinant proteins, the homologous P. aeruginosa strain PAO1 and the heterologous clinical isolate MDR-RP73; this demonstrates the capacity of the vaccine candidates to induce specific antibody production that can recognize the native proteins.

To determine whether the selected proteins were effectively expressed and exposed on the surface of bacterial cells, a double immunofluorescence was carried out with the murine antisera and a specific antibody for P. aeruginosa anti-cell wall as initial characterization. Co-localization of the two signals could suggest that the proteins were present at the bacterial cell surface. As expected, sera of naïve mice did not recognize antigens while antisera of mice immunized with whole cell inactivated

FIGURE 3 | Survival curves of groups of mice immunized with combined antigens selected as vaccine candidates. C57Bl/6 male mice were challenged with PAO1 (5\*10<sup>6</sup> cfu) 2 weeks after last vaccination with combined antigens. Comparisons were performed with mice immunized with Alum alone (negative control) and PAO1 heat inactivated (h.i.) groups (positive control). An additional group was vaccinated with OprF-OprI, tested as clinical vaccine candidate. Data were pooled from at least two/three independent experiments (n = 17–33). Results are represented as Kaplan–Meier survival curves and analyzed by the Mantel-Cox test against negative control group: \*p < 0.05, \*\*p < 0.01, \*\*\*p < 0.001. N refers to the number of animals.

FIGURE 4 | Cellular localization of vaccine candidates PA5340 and PA3526-MotY and controls by immunofluorescence microscopy. Immunofluorescence staining with confocal microscopy shows the localization of antigens (green) (A–E) and the PAO1 cell wall (red) (F–J). For antigens localization the antisera of naïve mice (A) or immunized with PA3526-MotY (B), PA5340 (C), OprF-OprI (D) or heat inactivated PAO1 (E) were used. Merged images show the co-localization of the two signals (yellow) (K–O) suggesting that proteins could be surface exposed. Detailed co-localization of antigens of interest is shown in the magnification (L, M, N).

PAO1 showed a strong co-localization signal. The same staining was performed with the antisera of mice immunized with the 10 selected antigens and OprF-OprI (**Table 1** and **Figure 4**). According to immunofluorescence microscopy results, all the selected antigens co-localized with the cell surface antibody, with the exception of PA5112-EstA. In particular, a similar co-localization pattern was observed with antisera of PA5340, PA3526-MotY, and OprF-OprI.

### Conservation Profile of Selected Antigens Among a Collection of CF Clinical Isolates

Conservation of the selected antigens among P. aeruginosa genomes was initially considered as selection criteria and the candidates were checked in the ensemble of publicly available complete genome sequences. Moreover, we investigated the sequence conservation of the ten most protective antigens in a collection of 19 clinical strains isolated from CF patients at the onset of infection and after years of chronic colonization (**Figure S3**). Full-length sequences of the ten genes were obtained by PCR from most of the 19 strains. Nucleotide sequences were translated into the amino acid (aa) sequences and compared with PAO1 protein sequence. Overall, identity conservation levels were higher than 98% (**Table 1** and **Table S2**), confirming a strong potential of these proteins to induce effective and crossprotective immunity among P. aeruginosa circulating strains.

# DISCUSSION

As an alternative treatment to antibiotics, immunotherapy should represent an option to prevent MDR infections by P. aeruginosa. A universal protein-based vaccine against P. aeruginosa remains a critical unmet medical need. To identify possible antigens suitable for the development of a P. aeruginosa vaccine, we explored the reverse vaccinology approach integrated with additional genomic and bioinformatic approaches. This strategy involved an initial screening of target antigens on the basis of their putative cellular localization; this identified a large number of proteins (2,430 ORFs) predicted to be surface or membrane-associated to various extents. Note that the P. aeruginosa genome is larger than most sequenced bacteria (6.3 Mb) (5) and the resulting high number of 5,570 predicted protein encoding genes challenges this approach. Hence, additional criteria were included for antigen selection, with the aim of reducing the pool to a reasonable number of candidates and to rationalize the subsequent experimental steps. In particular, the absence of sequence similarity to proteins encoded by the commensal E. coli K12 strain, as well as sequence similarity to proteins involved in primary house-keeping and cell metabolism functions have been used as cutting edge to narrow the shortlist of antigens. Proteins displaying epitopes very similar to those present in widely-conserved proteins from human and mouse origin were also excluded as candidates, as they might be poorly immunogenic and have a high probability of behaving as self-antigens.

Considering that large sequence diversity characterizes P. aeruginosa genome, we tested the conservation of selected candidates among the P. aeruginosa complete genome sequences available on public databases. This analysis confirmed the presence and conservation of selected antigens in the core genome. Further genomic analysis considered sequence intraclonal diversity of selected antigens in clinical strains isolated from CF patients. It is recognized that P. aeruginosa is an antigenically variable microorganism and can undergo phenotypic variation under changing environmental conditions, such as the airways in CF patients (23). In particular, the adaptation process generates unique lineages of P. aeruginosa pathogenic variants that differ systematically from environmentally-acquired strains and can escape immune recognition (24). We considered this question worthy of investigation and expanded the comparative sequence analysis on a collection of 19 clinical strains isolated from CF patients at the onset of infection and after years of chronic colonization (25, 26). The selected antigens were conserved among CF clinical isolates indicating that the corresponding genes were not under positive selection and these antigens could be useful for targeting both the initial infecting strains and those promoting progression toward chronic infection. Considering the epidemiology of P. aeruginosa infection, both environmental-to-host and patientto-patient transmission have been described and it appears likely that highly conserved not-adapted antigens might have superior clinical relevance.

Previous studies that tested P. aeruginosa antigens provided valuable information on the feasibility of vaccination but were limited by either the number of antigens tested and by redundancy in their selection (10). Abundant surface proteins and oligosaccharides, particular virulence factors have been previously considered. However, none of these started from a large scale screening, performed comparative analysis of the P. aeruginosa protein antigens and tested in animal models for preclinical studies. Our screening identified 52 antigens distributed as having known (31) or unknown functions (21). The presence of known virulence factors in the final list of candidates confirmed the reliability of this general selection strategy. We identified a number of proteins, like ExoA and ExoT, as well as relevant outer membrane proteins, like OprF, and OprH, already shown to be required for virulence in P. aeruginosa. The outer membrane protein PscC precursor (PA1716) identified in this study was previously identified by integrated genomics and proteomics approach (11). Furthermore, different proteins involved in the chaperone/usher pathways (CupA-E) were identified in this study and Rashid et al. (11). Half of the candidate antigens identified belonged to the functional category of hypothetical, unknown, and unclassified genes (21 genes, 35.8% of the total), suggesting that there is still a large proportion of potentially immunogenic antigens to be discovered within the unexplored part of the P. aeruginosa.

Several diverse animal models have been used in preclinical studies of vaccination to evaluate in vivo protection against P. aeruginosa infection. Mouse models, including burned animals, those with immunocompromised intraperitoneal infection, or with acute pneumonia, have primarily been used in the past. These models were used for preclinical evaluation of candidates like the flagellum, the alginate exopolysaccharide conjugated to tetanus toxoid, polysaccharides, and outer membrane protein such as OprF and OprI. (27–31). Based on our previous experience we consider respiratory infection in immunocompetent mice a highly robust and appropriate model to predict efficacy of vaccine candidates for further clinical testing in patients at risk of respiratory infection, such as VAP or CF (22, 32). The mouse model of acute infection has been extensively employed as the standard in P. aeruginosa pathogenesis and efficacy studies (24, 33–35). In this work, C57Bl/6 mice succumbed following infection with high dose of PAO1 virulent P. aeruginosa strain. Vaccination with whole cell inactivated P. aeruginosa induced an effective immunological response as demonstrated by full protection of the mice. Conversely, mice immunization with the adjuvant alone did not provide any protection as all mice showed severe clinical disease. These data strongly support the choice of this robust mouse model for screening and selecting the best vaccine candidates. Vaccination with the ten purified soluble proteins on our short list demonstrated distinct disease phenotypes, ranging from severe pneumonia to partial protection from a lethal dose of P. aeruginosa infection. We report that half of the vaccine candidates screened in this study were more effective when compared to OprF/OprI. It is worth noting that OprF/OprI fusion was one of the treatments evaluated in clinical trial although clinical efficacy has been disappointing (36). Given that vaccines containing several antigens have been shown to confer better protection than those containing only one antigen (20, 37), we decided to assess antigen combinations. Selection of antigens to combine was made from the shortlist of the ten most promising antigens and tested to further increase the vaccine efficacy and survival rate in murine models. This systematic screening identified five combinations, capable of significantly increasing the survival rate among challenged mice. All combinations included PA5340, a hypothetical protein exclusively present in P. aeruginosa. The maximum proportion of mice protected against challenge was 50%, achieved with PA5340 combined with PA3526-MotY. These two proteins were capable of triggering a specific immune response and initial characterization could indicate surface exposure. Nevertheless, as their function is still undetermined, it is unlikely that either protein would ever be selected by a traditional approach. Overall this study confirms the capability of reverse vaccinology to give new impetus in the research of vaccines against P. aeruginosa infection through the rapid identification of novel vaccine candidates.

#### MATERIALS AND METHODS

#### Ethics Statement

Animal studies adhered to the Italian Ministry of Health guidelines for the use and care of experimental animals (protocol #443). The use of the clinical data is in line with study no. 3739 that was approved by the Ethics Commission of Hannover Medical School.

#### *In silico* Analysis and Computational Tools

PSORTb (38) was used for the subcellular localization prediction, SingnalP (39) to predict the SPs and their probable cleavage site in secreted proteins, the TatP prediction server to predict the presence of bacterial Tat signal peptides (40), LipoP (41) predict lipoproteins, TMpred to predict transmembrane segments (42).

To check presence and conservation of vaccine candidates in other P. aeruginosa genomes, comparative protein sequence analysis against the sequenced genomes was performed using BLAST. The amino acid sequence of strain PAO1 was aligned against the protein translation of complete genome sequences with BLASTp (BLAST 2.2.26+) and against the genome of human and mouse with VAXGEN (http://www.violinet.org/ vaxgen/index.php) (43).

#### Gene Sequencing

Reference strain PAO1 (5) and 19 P. aeruginosa clinical strains isolated from CF patients attending the Medizinische Hochschule Hannover and described previously (25, 26) were used to sequence the genes of vaccine candidates. PCR genes amplification was carried out using the list of primers detailed in the **Supplementary Information**.

### Cloning, Expression, and Purification of *P. aeruginosa* Recombinant Proteins

Polypeptides antigens from P. aeruginosa PAO1 were PCRamplified using specific oligonucleotides and P. aeruginosa chromosomal DNA as templates. Resulting PCR products were cloned in pET15b (Novagen) using the PIPE method (44). To express cloned proteins, BL21(DE3)T1<sup>r</sup> clones containing pET15b constructs were grown in LB medium containing <sup>100</sup>µg/ml Ampicilin at 37◦C until OD<sup>600</sup> <sup>=</sup> 0.5. Protein expression was induced by adding 1 mM IPTG and growing at the same temperature for additional 3 hrs. Conventional protein extractions and SDS-Page were performed to check protein expression. Western Blot was used to confirm proper expression of tested P. aeruginosa antigens.

Protein purification has been performed as reported previously (45). Briefly, bacteria cells undergone to mechanical or chemical lysis and recombinant polypeptides were recovered from crude cell extracts by immobilized-metal ion affinity chromatography (IMAC) using His MultiTrapTM HP 50 mL NiSepharose High-Performance 96 well-vacuum plates (GE Healthcare). Polypeptides expressed as insoluble inclusion bodies were solubilized in 50 mM Tris–HCl elution buffer, pH 8.8 containing 8 MUrea, 1 mM TCEP-HCl, and 250 mM imidazole. Renaturation was performed by dialysis in 50 mM NaH2PO4, pH 8.8 containing 10% (v/v) glycerol, 0.5 M arginine, 5.0 mM of reduced glutathione, 0.5 mM of oxidized glutathione in the presence of 4, 2, or 0 M urea. Protein concentration was determined using the Micro BCA protein assay reagent kit (Pierce). Protein purity was checked by SDS-PAGE CRITERION XT Precast Gel (Biorad) followed by Coomassie blue staining.

## Mouse Immunizations and Protection Model

C57BL/6NCrlBR 5 week-old male mice (Charles River) were immunized i.p. at day 0, 21, and 35 with recombinant proteins formulated with Alum, either individually or as a combination of proteins. The formulations were optimized for pH and osmolarity. Each antigen was used at 10 µg/formulation/animal. The final concentration of Alum was 2 mg/ml in 10 mM histidine buffer (pH 6.5). Negative control mice were immunized with Alum alone, while positive control mice were boosted with whole cell heat-inactivated PAO1 strain at different doses (10<sup>5</sup> - 10<sup>7</sup> CFU). To obtain the antisera, mice were bled at day - 1, 34, and 49. At day 50, mice were challenged with 5 × 10<sup>6</sup> CFU of P. aeruginosa PAO1 strain by acute infection and monitored every 12 h for general wellness as detailed in the **Supplementary Information**.

#### Western Blot and Immunofluorescence Microscopy

Specific antisera from immunized mice were used to confirm protein expression by Western Blot and surface localization of antigens by immunofluorescence as detailed in the **Supplementary Information**.

#### Statistical Analysis

Statistical calculations and tests were performed using Mantel-Cox test and one-way ANOVA, considering p < 0.05 as the limit of statistical significance.

# AUTHOR CONTRIBUTIONS

AB and VM: conceiving and designing the experiments. BA-F, IB, MS, SB, SM, AC, and MD: performing experiments. AB, BA-F, IB, MS, and VM: analyzing data and interpretation of the experiments results. AB, BA-F, IB, MS, and VM: writing the manuscript.

#### REFERENCES


#### FUNDING

This work was sponsored by Novartis Vaccines and Diagnostics Srl; in March 2015 the Novartis non-influenza Vaccines business was acquired by the GSK group of companies. The sponsor was involved in all stages of the study conduct and analysis. This work was supported in part by the Italian Cystic Fibrosis Research Foundation (FFC#08/2006 and FFC#10/2009) to AB with the contribution of the Delegazioni FFC of Como, Catania, Vittoria Ragusa, Latina and LIFC onlus Associazione regionale siciliana, in memory of Simone.

#### ACKNOWLEDGMENTS

The authors thank B. Tummler (Klinische Forschergruppe, Medizinische Hochschule Hannover, Germany) for supplying the P. aeruginosa clinical strain and G. B. Pier for providing specific rabbit anti P. aeruginosa cell wall antibody (Brigham and Women's Hospital, Channing Labs, Boston, USA).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2018.03021/full#supplementary-material


of Pseudomonas aeruginosa Pneumonia. Antimicrob Chemother. (2016) 60:4991–5000. doi: 10.1128/AAC.00390-16


**Conflict of Interest Statement:** MS, SM, SB, AC, and VM were employees of Novartis Vaccines and Diagnostics Srl at the time of the study (now part of the GSK group of companies). They are now employees of the GSK group of companies. MD is a student at the University of Torino and participated in a post graduate studentship program at GSK.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Bianconi, Alcalá-Franco, Scarselli, Dalsass, Buccato, Colaprico, Marchi, Masignani and Bragonzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Impact of Nucleotide Sequence Analysis on Meningococcal Vaccine Development and Assessment

Martin Christopher James Maiden\*

*Department of Zoology, University of Oxford, Oxford, United Kingdom*

Since it became available as a routine tool in biology, the determination and analysis of nucleotide sequences has been applied to the design of vaccines and the investigation of their effectiveness. As vaccination is primarily concerned with the interaction of biological molecules with the immune system, the utility of sequence data is not immediately obvious and, indeed, nucleotide sequence data are most effective when used to complement more conventional immunological approaches. Here, the impact of sequencing on the field of vaccinology will be illustrated with reference to the development and implementation of vaccines against *Neisseria meningitidis* (the meningococcus) over the 30-year period from the late-1980s to the late-2010s. Nucleotide sequence-based studies have been important in the fight against this aggressive pathogen largely because of its high genetic and antigenic diversity, properties that were only fully appreciated because of sequence-based studies. Five aspects will be considered, the use of sequence data to: (i) discover vaccine antigens; (ii) assess the diversity and distribution of vaccine antigens; (iii) determine the evolutionary and population biology of the organism and their implications for immunization; and (iv) develop molecular approaches to investigate pre- and post-vaccine pathogen populations to assess vaccine impact. One of the great advantages of nucleotide sequence data has been its scalability, which has meant that increasingly large data sets have been available, which has proved invaluable in the investigation of an organism as diverse and enigmatic as the meningococcus.

Keywords: Neisseria meningitidis, conjugate polysaccharide vaccines, outer membrane vesicle vaccines, population biology, herd immunity, efficacy

#### INTRODUCTION

The 40 years following the introduction of the Sanger dideoxy method in 1977 (1) saw a revolution in biology, which was driven by the improvement of nucleotide sequencing technologies. At the start of this period, determining a DNA or RNA sequence was a highly specialized task, which was undertaken in a very few laboratories most often using their own home-made equipment and reagents. Only individual genes or viruses could be sequenced, and then at great expense. By 2018, nucleotide sequencing was conducted on an industrial scale, employing mass-produced reagents and highly automated equipment, often in large factory-like installations. Complete genome sequences were assembled routinely for tens of thousands, even hundreds of thousands of organisms, including those with the largest genomes. Major advances had also been made in the

#### Edited by:

*Pedro A. Reche, Complutense University of Madrid, Spain*

#### Reviewed by:

*Scott D. Gray-Owen, University of Toronto, Canada William William Shafer, Emory University School of Medicine, United States*

\*Correspondence: *Martin Christopher James Maiden martin.maiden@zoo.ox.ac.uk*

#### Specialty section:

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> Received: *19 October 2018* Accepted: *20 December 2018* Published: *15 January 2019*

#### Citation:

*Maiden MCJ (2019) The Impact of Nucleotide Sequence Analysis on Meningococcal Vaccine Development and Assessment. Front. Immunol. 9:3151. doi: 10.3389/fimmu.2018.03151*

**79**

Maiden Sequencing and Meningococcal Vaccines

computer equipment and software available to interpret the data produced, although it is fair to say that while issues of data generation were for all practical purposes resolved, data interpretation remained a major hurdle. In common with most areas of biology, vaccine development, and evaluation were transformed by these advances and this transformation will be illustrated here using meningococcal vaccines as an example.

Many of the most successful bacterial and viral vaccines were developed in the mid- to late- twentieth century, without recourse to the detailed genetic information that nucleotide sequencing provides; however, most of these conventionallydeveloped vaccines targeted antigenically stable pathogens, such as the smallpox virus, or those that rely on a single, stable, molecule for their pathogenicity, such as Corynebacterium diphtheriae (diphtheria) and Clostridium tetani (tetanus). The bacterium Neisseria meningitidis, the meningococcus, very nearly falls in to this category, as almost all invasive meningococcal disease (IMD) is caused by bacteria that express one of only six capsular polysaccharide antigens, referred to as serogroups A, B, C, W, X, and Y. Polysaccharide and, especially, proteinconjugate polysaccharide, vaccines are effective in protecting against disease for five of these (serogroups A, C, X, W, X, and Y) and provide a means of eliminating most IMD worldwide (2). Unfortunately, however, vaccines directed against meningococcal serogroup B polysaccharide have not been developed. This is a consequence of poor immunogenicity and fear of inducing host autoimmune disease, due to similarities of the serogroup B polysaccharide with human antigens (3, 4). The search for alternative antigens that target serogroup B meningococci has been complicated by the high variability of virtually all other possible meningococcal vaccine components. Ever-increasing volumes of nucleotide sequence data have been used in this search.

As successful immunization is almost always a population process, population studies based on nucleotide sequences have important applications in the development, deployment, and evaluation of vaccines. Here I shall describe how nucleotide sequence technologies have contributed to vaccine development and the assessment of vaccines, outlining the development, and implementation of meningococcal vaccines from the late 1970s to the time of writing.

#### STRUCTURE AND VARIATION IN INDIVIDUAL ANTIGENS

Determining the nucleotide sequence of a gene encoding a vaccine antigen permits the deduction of a wealth of information concerning the protein and enables a wide range of followup studies. For example, the PorA protein was established as a potential vaccine component for "serogroup B substitute" vaccines (i.e., vaccines developed as an alternative to those including serogroup B capsular polysaccharide as an antigen) in the late 1970s and early 1980s, on the basis of immunological and biochemical studies (5, 6). The cloning and sequencing of the porA gene in 1991 provided much additional information, for example confirming that it was a typical porin related to those

found in the gonococcus (7). The meningococcal and gonococcal sequences were used to design oligonucleotide primers for the then new PCR technique, enabling the amplification, and sequencing of the porA gene from multiple meningococcal isolates (8), illustrating how sequence technologies lead to cumulative gains in knowledge. These comparative studies enabled structural models to be proposed and established that the antigenic variability identified by subtype-specific monoclonal antibodies was mostly determined by the peptide sequences of two major variable regions (VR1 and VR2) and one less variable region (VR3 or sVR) of the PorA protein. Combining sequencing and immunological studies enabled these antigens and their interaction with immune molecules to be defined (8–10).

The combination of PCR amplification and Sanger sequencing enables multiple variants of a given gene to be characterized accurately and rapidly at high volume. This permits the antigenic variability of a given protein vaccine component to be established. In the case of meningococcal PorA, the variants of which were initially identified with monoclonal antibodies (11), an ever-increasing diversity of variants have been identified by sequencing in the past 30 years. This required that the original nomenclature, which was based on monoclonal antibody reactivity, needed to be replaced with an updated nomenclature scheme that was based on the peptide sequences themselves, rather than the antibodies that reacted with them (12). It was important that this scheme was backwards compatible with the designations of the established antibody-based system. The scheme needed to be infinitely expandable, and to this end

# BACTERIAL POPULATION BIOLOGY AND EVOLUTIONARY STUDIES

Knowledge of the extent of variation in vaccine antigens inevitably poses questions of how this variation comes about and how it moves through the population. Both topics have important implications for vaccine design, as it is essential to know how effective a given vaccine will be once it has been introduced and how easily vaccine escape variants might arise and spread. Population studies of the pathogenic Neisseria, both the meningococcus and Neisseria gonorrheae (the gonococcus), have played a major role in developing bacterial population genetics. In the case of the meningococcus, these studies have also been central in the design and implementation of new vaccines.

Bacterial population studies predate the sequencing era, with seminal investigations of the meningococcus using a combination of multi-locus enzyme electrophoresis (MLEE) (26) and immunological typing (27–29), establishing fundamental concepts that have been built on subsequently by sequencebased investigations. MLEE investigations conducted on isolates from cases of invasive disease, showed that a limited number of groups of closely related organisms, known as "clones" or "genetic lineages," each of which was associated with particular antigenic characteristics, were predominant causes of disease (27, 29–32). By contrast, meningococci isolated from asymptomatic carriers were much more diverse (33, 34). This led to the concept of "hypervirulent" or "hyperinvasive" meningococci: persistent

sequences can be recorded on open-access web based databases (e.g., https://pubmlst.org/neisseria/PorA/), enabling easy access to the nomenclature (13). Similar approaches have since been used to catalog the variation of a number of meningococcal vaccine candidates including: factor H binding protein (fHbp) (14); the ferric enterochelin receptor, FetA (15); Neisseria adhesin A (NadA) (16); and the heamoglobin receptor (HmbR) (17).

In addition to describing the nature and extent of variation of genes, including those encoding vaccine antigens, sequence analyses help to reveal the mechanisms whereby this variation occurs. In the case of PorA, the immunogenic VRs are relatively short continuous sequences, encoding surface-exposed parts of the porin structure (18), which vary by point mutation, insertion, and deletion. Each of these processes can have an impact on the binding of immune molecules such as antibodies to the expressed protein. These impacts have been assessed by a combination of sequence comparison, biochemical, structural, and immunological analyses (19, 20) (**Figure 1**). Sequencing studies have also established that protein antigen expression can be influenced by the sequence of control regions, with polynucleotide tracts playing an important role in the expression of a number of meningococcal antigens (21). Another mechanism of variation that sequencing studies identified is the exchange of genetic material via horizontal genetic transfer (HGT), and the PorA protein was one of the first bacterial genes in which this was extensively documented (22). The recognition of the importance of HGT in bacterial evolution came from studies of antibiotic resistance and vaccine antigens

genotypes that were especially likely to cause IMD (35). One such lineage, associated with electrophoretic type 5 (ET-5), caused an international outbreak of serogroup B IMD (30), resulting in the development and implementation of specific outer membrane vesicle (OMV) vaccines generated against the local outbreak strain in Norway (36) and Cuba (37). This approach was repeated in New Zealand 15 years later (38).

The high levels of HGT observed in genes encoding antigens (22, 39) and antibiotic resistance (23), sometimes involving interspecies transfer (23, 40), are also a observed in "housekeeping" genes: those expected to be subject to stabilizing selection for the conservation of metabolic function (41). This demonstrates the important of HGT in bacterial evolution generally, and especially in the Neisseria. One consequence of this is the a range of population structures observed in the meningococcus (25). MLEE technology was difficult to implement and compare across studies and, as sequencing technology developed, it was replaced by a nucleotide sequence-based approach, multilocus sequence typing (MLST), which indexed the variation in seven ∼400 bp fragments of housekeeping genes: abcZ; adk; aroE; fumC; gdh; pdhC; and pgm. MLST was much easier to implement at scale and had the advantage of being easily portable among laboratories (42). Originally developed for Neisseria meningitis, MLST has had wide application to many bacterial species and, in combination with web-accessible databases, has become a widely used method, with applications in, for example: evolutionary studies; epidemiology; population biology; and taxonomy (42). MLST studies enabled genetic lineages to be defined in terms of groups of related STs called clonal complexes (ccs), which were named after the ST that was used to define that complex. Thus, ST-11 is the defining sequence type for clonal complex 11 (cc11), also known as the "ST-11 complex." These molecular designations have been incorporated with other typing schemes such as serogrouping (for capsule) (43) (**Figure 2**), subtyping (for PorA) (12) and typing of the FetA antigen (**Figure 1**) (15) into a standardized typing nomenclature with the form: B: P1.19,15: F5-1: ST-33 (cc32), referring to: group (in this case B); PorA VR1 and VR2 type (P1.19,15); FetA VR type (F5-1); sequence type (ST-33); and clonal complex (cc32), respectively (44).

### THE POPULATION APPROACH TO VACCINE ASSESSMENT

High-throughput sequencing approaches, which permit the characterization of hundreds or thousands of bacteria isolates at multiple loci, enable a population genetic approach to be taken to the assessment of vaccination programs. In the case of N. meningitidis, these approaches played a central role in understanding the impact of conjugate polysaccharide vaccines, which targeted the meningococcal capsular antigen (2). In late 1999, just after the publication of the MLST approach in 1998, the United Kingdom (UK) Department of Health implemented a novel monovalent serogroup C conjugate polysaccharide vaccine, in response to a dramatic increase in serogroup C meningococcal disease in the UK (45). The protein-polysaccharide vaccines had significant advantages over the older "plain" polysaccharide vaccines, in that the conjugate vaccines generated more effective immune responses. This included eliciting protective responses in younger individuals, affinity maturation, and a memory response, none of which were generated by the "plain" polysaccharide vaccines (46). A combination of the urgency of the public health response and the epidemiology of IMD meant that the new meningococcal C conjugate (MCC) vaccines were implemented without the benefit of a phase III efficacy trial (2). This led to several uncertainties, not least that it was unknown whether the vaccine would affect asymptomatic carriage. Although the prospect of directly preventing disease was wholly positive, there was a concern that population effects might lead to the emergence of vaccine escape variants—variants of the epidemic clone that had acquired a different capsule by

HGT (47). Investigation of this required monitoring, not only disease isolates to establish impact on disease, but also carried meningococci, during vaccine implementation period.

The United Kingdom Meningococcal Carriage (UKMenCar) study was established to measure the impact of MCC on carriage and surveyed 47,765 teenagers from 1999 to 2001, immediately before and for 2 years after vaccine implementation (48–50). Oropharyngeal swab samples and conventional culturing were used to isolate meningococci from the throats of the subjects. In combination with phenotyping, high-throughput single-locus sequencing of the capsular region combined with MLST analysis enabled the nature and status of the serogroups to be determined, together with the clonal complex of each isolate. These data showed that, before vaccine implementation, the point prevalence of carriage of the epidemic-causing meningococcus (C:cc11) was low (0.31% of individuals), which is characteristic of a highly hyperinvasive meningococcus. In 2001, 2 years after implementation, this carriage rate decreased by 80% (to 0.04% of individuals). As a proportion of the meningococci isolated, the C:cc11 epidemic strain dropped by nearly 90%, from 1.83 to 0.21%, with a coincident reduction in the proportion expressing the capsular antigen of around 50% (48). This resulted in a significant herd immunity (protection) of the unvaccinated (51), which was fortunate as the initial vaccination schedule used for infants (2, 3, and 4 months) provided no direct protection more than a year after immunization (52). The vaccination program had no significant effect on the carriage of other genotypes and serogroups, although there was some evidence for secular changes over the period of the surveys (50).

The success of the monovalent MCC vaccines, demonstrated to be in large degree due to their ability to induce herd immunity, catalyzed interest in developing a similar vaccine to target the periodic very large epidemics seen in the African Meningitis Belt. These represent one of the most important manifestations of IMD globally and were first described by Lapeyssonnie (53). Since that time, repeated serogroup A epidemics had been observed, with especially large outbreaks in the late 1990s (54). The meningitis vaccine project (MVP) was established in 2001 as a partnership between PATH (formerly the Program for Appropriate Technology in Health) and the World Health Organization (WHO), funded by the Bill and Melinda Gates foundation. The aim of the MVP was the elimination of meningococcal epidemics in Africa by means of an affordable serogroup A protein-polysaccharide conjugate vaccine. Employing an innovative public-private approach with northern and southern partners, this goal was achieved, with the vaccine PsA-TT, "MenAfriVac <sup>R</sup> ," introduced in December 2010, with a plan to immunize all African Meningitis Belt countries by 2013 (55, 56).

The Meningococcal Carriage in Africa (MenAfriCar) consortium was established in 2008 to measure the impact of the PsA-TT vaccine on carriage and the impact of herd immunity (57). Up until that time, knowledge of the carriage of meningococci in this region was incomplete, with a number of studies undertaken at various times employing a variety of techniques. This limited the collation of consistent information and a wide range of carriage prevalence rates (3–30%) had been reported (58). The MenAfriCar consortium aimed to conduct large-scale carriage surveys across the meningitis belt before and

after the introduction of the PsA-TT vaccine using consistent methods. Risk factor data for carriage were simultaneously collected, as in the UKMenCar surveys, although the risk factors included were somewhat different from those seen in the UK (59, 60). One unknown was the impact of the carriage of other Neisseria species, especially Neisseria lactamica, which is commonly isolated from individuals in Africa (61). This was challenging as members of the genus are very similar: they are poorly distinguishable by 16SrRNA sequencing (62), for example. The large number of isolates that would have to be processed by laboratories in resource-limiting environments presented further challenges that were met by the exploitation of nucleotide sequence-based approaches, made possible by the availability of genomic technologies.

The MenAfriCar surveys employed a combination of conventional culture, biochemical, and sequence-based methodologies. As in UKMenCar surveys, oropharyngeal swab samples were collected and cultured on selective media with putative Neisseria identified using biochemical tests in local laboratories (60). From these cultures, boiled cell preparations were made, which were shipped to Oxford where the molecular analyses were performed (61). To solve the speciation problem a novel assay was designed. This took advantage of an extended MLST scheme, ribosomal MLST (rMLST), which indexes the 53 genes encoding ribosomal proteins, and enables bacterial isolate characterization "from domain to strain" (63). From the rMLST sequences extracted from WGS data from 44 isolates of diverse species, a 413 bp fragment of the rplF gene was identified, the sequence of which was diagnostic for each Neisseria species. This fragment could be amplified and sequenced at highthroughput, enabling Neisseria species identification which was rapid, accurate, and cost-effective (64). The capsule genes were identified with a real time PCR assay and fine typing performed by sequencing the porA and fetA loci (60).

The MenAfriCar studies showed great diversity of meningococcal carriage across the belt and over time, which differed from the patterns of carriage typically observed in high-income countries with temperate climates, where carriage is more consistent (65). The age profiles of carriers was also different, with meningococcal carriage highest in individuals aged 5–14 (60), rather than in adults and adolescents as typically seen elsewhere (65). Transmission among children within households was shown to be important (66), again different from other settings, where social interactions outside the family are important (59). There was also much more diversity in the non-meningococcal Neisseria species isolated, the carriage of which was also age-related (61). This different dynamic was consistent with the unique epidemiology observed in African meningitis belt countries, and with the idea that the occurrence of seasonal epidemics in the meningitis belt were dependent on the transmission of epidemic clones.

Most of the counties in which PsA-TT was introduced including the first, Burkina Fasso, were not experiencing an epidemic of serogroup A IMD at the time of introduction, making assessment of the herd effects of the vaccine difficult, although effects on disease and carriage rates were consistent with such effects (67). An epidemic in Chad during the introduction there, however, enabled a direct demonstration of efficacy against IMD and carriage of the epidemic strain (68) (**Figure 3**). The rollout of the vaccines in different districts in over two epidemic years, combined with the MenAfriCar sampling and sequence-based isolate characterization protocols, demonstrated high efficacy of the vaccination against both IMD and carriage (68). Samples collected in this study also demonstrated that even at genomic levels of isolate characterization, there were no consistent differences between carried and invasive meningococci (69).

The marked successes of the conjugate vaccines in different settings provided the prospect of a "meningitis free world," so long as an effective group B vaccine could be developed (70); however, no vaccine against the meningococcal serogroup B polysaccharide, either plain or protein-conjugate, has been developed to the time of writing and none was under development (3). This was due to a combination of the poor immunogenicity of the antigen combined with fears of the induction of auto-immunity, as a consequence of its similarity of the antigen to human neural polysaccharides (4). Consequently, even with the prospect of conjugate polysaccharide vaccines that target serogroups A, C, W, X, and Y (71), there is a need for alternative or "substitute serogroup B" vaccines, if IMD is to be comprehensively prevented (3).

## GENOMIC DISEASE SURVEILLANCE: UNDERSTANDING AND COMBATTING VIRULENCE AND VACCINE ESCAPE

The development and validation of next generation sequencing approaches for the determination of high-quality draft WGSs of meningococci (72), led to the establishment of the Meningitis Research Foundation Meningococcus Genome Library (MRF-MGL) (73) (**Figure 4**). This repository contained the WGS data for all meningococci isolated from cases of IMD in the UK. The MRF-MGL provides the opportunity to identify and react to IMD outbreaks in real time or near real time. An example of this is the reaction of the UK public health authorities to a serogroup W IMD outbreak. From the early 2010s onwards, coinciding with the establishment of the MRF-MGL, there was a year-on-year increase in cases of serogroup W meningococcal disease, which data from the MRF-MGL showed to be W:cc11 (74). A similar increase had been observed a decade before (75), just after the successful introduction of the MCC vaccines, causing some alarm (76). The former increase had been associated with the global spread of a particular W:cc11 strain after the Hajj pilgrimage and had fortunately been transitory (77); however, the epidemiology of the cases in the 2010s was somewhat different, leading to the concern that this might be a different circumstance and that a larger epidemic might occur (74). A WGS study of a global collection of 750 diverse cc11 meningococci demonstrated that, although the cases in the UK were indeed related to the Hajjassociated isolates, they were much more closely related at the WGS level to W:cc11 meningococci that had caused large-scale epidemic outbreaks in South America (78). In combination with other epidemiological information, these data formed the basis of a decision to implement tetravalent A, C, W, Y conjugate vaccines into the UK immunization program for teenagers (79). Certainly, this was the first use of genomics to change national vaccination policy for meningococci and perhaps for any organism.

#### GENOME SEQUENCING VACCINE DESIGN AND ASSESSMENT

The availability of whole genome sequences (WGS) of bacterial pathogens also provided novel opportunities in the development of vaccines. In the case of the meningococcus, the first meningococcal WGSs, from bacterial isolates MC58 (80) and

modification from Brehony et al. (91) under CC BY.

Z2491 (81) both published 2000 played a role in the development of two "serogroup B substitute" vaccines: Bexsero <sup>R</sup> (4CMenB, developed in Siena, Italy) (82); and Trumenba <sup>R</sup> (LP2086, developed in Pearl River, USA) (83). In the case of the Bexsero <sup>R</sup> vaccine a "reverse vaccinology" (84) approach was adopted using the MC58 genome sequences as its starting point. In contrast to more conventional approaches, which took the interaction of a bacterial isolate as a starting point, reverse vaccinology started by predicting potential vaccine antigens (i.e., surface-exposed proteins) from a genome sequence, and then assessing these in animals. This identified three potential targets, fHbp, NadA, and NHBA that were eventually included in the final Bexsero <sup>R</sup> formulation, along with the MenNZB OMV vaccine, made against the New Zealand outbreak strain (38). Interestingly, a more conventional vaccine development approach, but which did use sequence information from the Z2498 meningococcal isolate to find the gene sequence from protein sequences, identified the fHbp protein (also known as LP2086) as an important vaccine candidate (83).

Given the known diversity of meningococcal protein antigens, assessment of the levels of diversity of these new vaccine components, especially the leading candidate fHbp (LP2086) formed a major part of the preclinical studies. Based on the analysis of deduced peptide sequences, the Pearl River group (studying "LP086") proposed two subfamilies of the protein, A and B, whereas the Siena group (studying "GNA1870" later called fHbp), using a similar analysis of a different meningococcal collection, proposed three subvariants (1, 2, and 3, with subvariants 2 and 3 more closely related to each other) (85). As further sequencing was performed of this antigen, it became necessary to cross-reference and unify the sequence nomenclatures between the two typing schemes and a single nomenclature was proposed (14) (**Figure 5**), along with a web accessible database enabling the cross-referencing of the various nomenclature schemes (https://pubmlst.org/neisseria/ fHbp/). At the time of writing (September 2018), a total of 1,157 peptide sequence variants of this protein were described on the database.

As with the capsular polysaccharide-conjugate vaccines, it was impractical to conduct phase III efficacy studies on these novel "serogroup B substitute" vaccines. Further, the correlates of protection were less well-established than those for the capsular polysaccharide vaccines, where bactericidal assays using blood samples from vaccinees were employed (3). This was a particular problem in assessing the breadth of coverage of these vaccines, given the diversity of meningococcal protein antigens. This prompted the development of indirect assessments of coverage: the Meningococcal Antigen Typing System (MATS) assay for Bexsero <sup>R</sup> , which incorporates sequence data from the porA gene (86); and the Meningococcal Antigen Surface Expression (MEASURE) Assay for Trumenba <sup>R</sup> (87). Both vaccines were licensed and used based on phase II clinical studies and studies using these assays (88, 89).

Pre- and post-introduction assessment of the coverage of these vaccines is an important component of assessing their likely and continued efficacy, especially for an organism as diverse and dynamic as the meningococcus. An efficient way to achieve this is by the extraction of antigen gene sequences from genome data collected as part of routine surveillance. The PubMLST website, which hosts data for the MRF-MGL and a number of other important global isolate collections, indexes all known genes and these can be flexibly grouped into "schemes," which are groups of genetic loci that are analyzed together for typing or functional purposes (13, 90). It was straightforward to combine the typing schemes for the various antigens in Bexsero <sup>R</sup> into a Bexsero <sup>R</sup> Vaccine Antigen Typing Scheme (BAST) (91), which enabled the assessment of the changing prevalence of the vaccine antigen variants in the UK and Ireland (91, 92) (**Figure 6**). Approaches such as this have great potential for supporting vaccine development implementation and monitoring into the future.

#### FUTURE PROSPECTS

As demonstrated by the examples outlined above, nucleotide sequence data, ranging from single gene fragments from individual bacterial specimens to whole completed genomes that are representative of populations, have many applications in vaccinology. These data are particularly useful in the rapid and cost-effective characterization of bacterial isolates. Combining such data with population and evolutionary analyses generates many informative inferences; however, whilst this complements data on the interaction of bacterial components with the host immune system, sequence analyses cannot wholly replace immunological studies. In an era where nucleotide sequences are low-cost commodities, the important advances of the future will depend on the interpretation and open-access dissemination of these data. In addition to novel statistical genetic techniques and integration with phenotypic data, the implementation of visualization tools is likely to be important in the further exploitation of this rich source of biological information.

#### REFERENCES


# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.

### FUNDING

During the earlier parts of this work (1988–1997) MM was employed by the UK National Health Service at the National Institute for Biological Standards and Control (NIBSC), he was funded for a sabbatical in the laboratory of Mark Achtman by the Alexander von Humboldt Stiftung (1996–7) from 1997 to 2015 he was Wellcome Trust Senior Fellow and from 2004-present Professor of Molecular Epidemiology and a Fellow of Hertford College at the University of Oxford. He is grateful to the European Union (especially grants QLK2-CT-2001-01436 and FP7-278864-2), the Wellcome Trust (especially grants: 047072/Z/96/B, 047072/Z/96/C, 062057/Z/00/Z, 081494/Z/06/Z, 087622/Z/08/A, 086546/Z/08/Z, 091634/Z/10/Z, and 104992/Z/14/Z), the Meningitis Research Foundaton, the UK Department of Health (especially contract PR-ST-0915-10015) and the Oxford Martin School for funding.

#### ACKNOWLEDGMENTS

I would like to express my thanks to the members of past and present MaidenLab and the many collaborators who have contributed to my work over the years. Special thanks go to Ian Feavers of NIBSC, whom was the crucial collaborator in establishing this program of work.


Frosch M, Maiden MCJ, editors. Handbook of Meningococcal Disease. Weinheim: Wiley-VCH Verlag GmbH & Co KGaA (2006). p. 17–35. doi: 10.1002/3527608508.ch2


**Conflict of Interest Statement:** As an employee of the University of Oxford, MM has, over the past 20 years, undertaken contract research and consultancy for, and has been paid expenses and honoraria by, companies involved in vaccine development including GSK, Chiron, Novartis, Wyeth, and Pfizer.

Copyright © 2019 Maiden. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparison of Open-Source Reverse Vaccinology Programs for Bacterial Vaccine Antigen Discovery

#### Mattia Dalsass 1,2, Alessandro Brozzi <sup>1</sup> , Duccio Medini <sup>1</sup> and Rino Rappuoli <sup>1</sup> \*

<sup>1</sup> GlaxoSmithKline, Siena, Italy, <sup>2</sup> Dipartimento di Scienze Cliniche e Biologiche, Università degli Studi di Torino, Turin, Italy

Reverse Vaccinology (RV) is a widely used approach to identify potential vaccine candidates (PVCs) by screening the proteome of a pathogen through computational analyses. Since its first application in Group B meningococcus (MenB) vaccine in early 1990's, several software programs have been developed implementing different flavors of the first RV protocol. However, there has been no comprehensive review to date on these different RV tools. We have compared six of these applications designed for bacterial vaccines (NERVE, Vaxign, VaxiJen, Jenner-predict, Bowman-Heinson, and VacSol) against a set of 11 pathogens for which a curated list of known bacterial protective antigens (BPAs) was available. We present results on: (1) the comparison of criteria and programs used for the selection of PVCs (2) computational runtime and (3) performances in terms of fraction of proteome identified as PVC, fraction and enrichment of BPA identified in the set of PVCs. This review demonstrates that none of the programs was able to recall 100% of the tested set of BPAs and that the output lists of proteins are in poor agreement suggesting in the process of prioritize vaccine candidates not to rely on a single RV tool response. Singularly the best balance in terms of fraction of a proteome predicted as good candidate and recall of BPAs has been observed by the machine-learning approach proposed by Bowman (1) and enhanced by Heinson (2). Even though more performing than the other approaches it shows the disadvantage of limited accessibility to non-experts users and strong dependence between results and a-priori training dataset composition. In conclusion we believe that to significantly enhance the performances of next RV methods further studies should focus on the enhancement of accuracy of the existing protein annotation tools and should leverage on the assets of machine-learning techniques applied to biological datasets expanded also through the incorporation and curation of bacterial proteins characterized by negative experimental results.

Keywords: reverse vaccinology (RV) programs, antigen, bacterial pathogens, potential vaccine candidates (PVCs), bacterial protective antigens (BPAs)

# INTRODUCTION

Reverse Vaccinology (RV) is a genome-based approach developed for the first time in early 1990's by Rappuoli (3) to identify meningococcal protein vaccine candidates in Group B meningococcus (MenB). In its original conception, since antigens inducing humoral antibody response are primarily located in extracellular or outer membrane district, all the open reading frames extracted

#### Edited by:

Alexandre Barbosa Reis, Universidade Federal de Ouro Preto, Brazil

#### Reviewed by:

Rory Cristiane Fortes De Brito, Universidade Federal de Ouro Preto, Brazil Paola Massari, Tufts University School of Medicine, United States

> \*Correspondence: Rino Rappuoli rino.r.rappuoli@gsk.com

#### Specialty section:

This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology

> Received: 18 October 2018 Accepted: 15 January 2019 Published: 14 February 2019

#### Citation:

Dalsass M, Brozzi A, Medini D and Rappuoli R (2019) Comparison of Open-Source Reverse Vaccinology Programs for Bacterial Vaccine Antigen Discovery. Front. Immunol. 10:113. doi: 10.3389/fimmu.2019.00113

**91**

from the genome sequence of MenB strain MC58 were screened to select proteins predicted to be surface exposed, secreted or lipoproteins.

RV approach has revolutionized vaccine development by adopting computerized screening of protein sequences from the pathogen as the first step of the process, to select a subset of promising antigens, aka potential vaccine candidates (PVCs) (**Figure 1A**).

RV offers two main advantages compared to traditional vaccine development approaches: (i) identification of candidate antigens without the need to grow the pathogen (ii) identification of any antigen independently by its purified quantity to be suitable for vaccine testing.

Proteins returned by RV methods are called throughout this review PVCs (Potential Vaccine Candidates). Other names given to the selected proteins are VCs (Vaccine Candidates), VTs (Vaccine Targets), PVCs (Protein Vaccine Candidates). PVCs identified by RV undergo in-vitro and in-vivo validation through experimental assays aimed at confirming their protective potential. Each pathogen has its specific experimental assays and it is hard to standardize a common set of experimental features; the most common experimental evidences are the protection outcomes in animal models against virulent bacterial challenge or results obtained from correlate to protection like the human bactericidal assay (4). In the context of this review we refer to any candidate protein that gave positive results in confirmatory preclinical experimental assays as BPAs (bacterial protective antigens). In the literature synonymous of BPAs are protective antigens (PAg), known antigens (KA), or known protective antigens (KPA). Lists of BPAs for different bacteria or viruses might be found in databases like Violin (Protegen) (5) or mining the literature. A comprehensive review of the main biological features characterizing BPAs deposited in Violin (Protegen) might be found in Ong et al. (6).

## The First RV Protocol

The first RV protocol started with the prediction of all open reading frames from the genome of MenB (strain MC58), in total 2,158. These open reading frames were screened to search for homology to bacterial surface-associated proteins using FASTA (7) and PSI-BLAST program (8). Proteins with no hits found (hypothetical proteins) were analyzed by PSORT (9), SignalP (10), and TMPRED program (11) to search for putative lipoproteins, secreted proteins, outer membrane, or periplasmatic proteins.

From the 2,158 proteins, 570 were selected as PVCs. Out of them 350 were successfully expressed in Escherichia coli and injected to immunize mice. Sera from immunized animals were screened in a serum bactericidal assay (SBA)—a correlate of protection against invasive meningococcal diseases—and proteins with negative results were discarded. Among the 28 proteins able to induce bactericidal activity, 5 candidates were selected for final formulation and, combined to outer membrane vesicles (OMVs), later approved with the commercial name of Bexsero <sup>R</sup> (12).

### RV Programs Overview

In the following years the RV protocol was successfully applied to other bacterial pathogens. These pathogens include Chlamydia pneumonia (13), Streptococcus pneumoniae (14) in which open reading frames encoding putative surface proteins and with significant homology to virulence factors of other bacteria were selected and Porphyromonas gingivalis (15), in which PVCs were identified by searching for global homology to proteins of known surface exposure or virulence. In these cases, the selection criteria to identify PVCs were restricted to extracellular subcellular localization and to homology to virulence factors already known in other bacterial species. A review about these first applications might be found in Masignani et al. (16).

Only in 2006 the first standalone RV program, distributed with the name of NERVE (New Enhanced reverse Vaccinology Environment), was published (17). Since then several other pathogen-independent RV programs have been released.

Until now there has been no comprehensive review of the available open-source RV programs and a systematic comparison on a benchmark dataset was missing. In this review we compared 6 open-source standalone RV programs designed for bacterial pathogens: NERVE, VaxiJen (18), Vaxign (19), Bowman-Heinson (1, 2), Jenner-predict (20), and VacSol (21). We tested them on 11 different bacterial proteomes.

#### RV Programs Categories

RV packages can be categorized in two types, according to their algorithmic approach: decision-tree or "filtering" and machine learning or "classifying." Both types take as input protein sequences and call them as PVCs or not-PVCs.

#### **Decision-tree or filtering RV programs**

They are flowchart-like programs: the pathogen's protein sequences are passed through a series of filters until a subset is identified as PVCs. The filters are done on protein features that can be directly measured, like the molecular weight, or predicted by a computational program, like the subcellular localization or the probability to be an adhesion protein. When the filter is applied on a numerical feature (e.g., number of predicted transmembrane domains) an a-priori cut-off is used. Decision tree RV programs differ each other by the number of filters adopted. Examples of decision tree RV tools are NERVE (17), Vaxign (19), Jenner-predict (20), and VacSol (21).

**Abbreviations:** RV, reverse vaccinology; MenB, Neisseria meningitidis serogroup B, Meningococcus B; PVC, potential vaccine candidate; VC, vaccine candidate; VT, vaccine target; BPA, bacterial protective antigen; PAg, protective antigen; KA, known antigen; KPA, known protective antigen; NERVE, new enhanced reverse vaccinology environment; BLAST, basic local alignment search tool; PSI-BLAST, position-specific iterated basic local alignment search tool; SBA, serum bactericidal assay; DEG, database of essential genes; OMV, outer membrane vesicle; ACC, auto cross covariance; SVM, support vector machine; MHC, major histocompatibility complex; MHC, major histocompatibility complex class I; MHC, major histocompatibility complex class II; VFDB, virulence factor database; TP, true positive; TN, true negative; FP, false positive; FN, false negative; fHbp, meningococcal factor H binding protein; NadA, Neisseria adhesin A.

#### **Machine-learning or classifying RV programs**

These kinds of applications earlier aggregate the features measured or predicted on the pathogens' protein sequences into a matrix and then, given a known set of training examples of PVCs and not-PVCs, an algorithm builds a model that assigns new input proteins to one of the two classes usually in a probabilistic way. Machine-learning RV programs don't discard proteins, like decision tree RV tools do, but rank the entire set of input proteins for their likelihood of being a PVC. This results to be very useful when preclinical confirmatory assays must be planned since the experimenter might begin with the most promising candidates ranked in top positions.

Machine-learning RV tools are newer in the field and better intercept the increasing attention data analytics is paying to artificial intelligence methods. RV machine-learning tools differ each other from the type of classification algorithm they use, from the number of features they measure and from the size and assortment of proteins that constitute the training set.

Examples of machine-learning RV tools are VaxiJen (18), Vacceed (22) -designed for eukaryotes pathogens- and the method described in Bowman et al. (1) and revised by Heinson et al. (2).

#### Programs Interface

The interfaces to the RV programs fall into two categories, those that operate on the command line and those that have a graphical interface.

Command line input allows for high throughput analysis but has a high barrier to entry for non-technical users. Graphical interfaces, such as web-sites, provide point and click interfaces that non-technical users find easier to use initially, however, they are often limited to the analysis of a few samples at a time.

A synoptic summary of the types, year of release and interfaces of the six programs is provided in **Figure 1B**.

#### Software Description

In this section we describe one by one each of the six RV programs object of study of this review. We refer the reader to each specific publication for any further details. **Table 1** summarizes the criteria used by each of the six programs to identify PVCs and reports main advantages or disadvantages come upon their usage.

#### NERVE (17)

NERVE (New Enhanced Reverse Vaccinology Environment) has the primacy to be the first RV standalone software. It is a decision



tree and command line tool. Once installed in a Unix-like operating system (NERVE is implemented in Perl programming language), the tool imports the sequences of the pathogen proteins and launches computational programs to predict five biological features:


NERVE parses the results of the five programs and stores the results in a MySQL database.

NERVE uses a priori cut-offs to select the PVCs. Based on tests done on 10 proteomes (Bacillus anthracis, Pseudomonas aeruginosa, Yersinia pestis, Streptococcus agalactiae V, III, Ia, Neisseria meningitidis B, Porphyromonas gingivalis, Borrelia burgdorferi, Chlamydia trachomatis D) the authors of NERVE suggest the following criterion to identify PVCs: any non-cytoplasmatic protein, with no more than 2 predicted transmembrane helices, with a predicted probability of being and adhesin >0.46 or 0.38 and without sequence similarity to human proteins.

NERVE shows the advantage to be very simple and intuitive; it also allows the user to change the filtering cut-offs according to his/her preferences for long or short lists of PVCs.

NERVE has not been updated since its first release: some Perl libraries became obsolete and to be used not negligible changes must be done to the source code. Homology with human proteins is done comparing by BLAST algorithm each pathogen protein sequence against a dataset of potential MCH ligands derived from the database MHCPEP (26) that has not been updated since 1998.

#### VaxiJen (18)

Published soon after NERVE, is the first RV software adopting machine learning strategy. VaxiJen proposes an alignmentindependent method for antigen prediction based on auto cross covariance (ACC) transformation of protein sequences into uniform equal-length vectors. Differently by other RV programs, VaxiJen might predict not only bacterial but also viral and tumor antigens. For bacterial antigens prediction VaxiJen applies ACC transformation to a set of 100 known bacterial antigens that the authors derived mining the literature; a protein was included in the set of known bacterial antigens if it (or part of it) was shown to induce a protective response in an appropriate animal model after immunization. Conversely a twin-set of 100 non-antigens was constructed to mirror the antigen set, randomly selecting proteins from the same set of species without similarity to the set of the 100 known antigens (BLAST expectation value of 3.0 was used). Two-class discriminant analysis by partial least squares was applied to the merged set (200 proteins) to derive a model of prediction that the user might apply on his own dataset of proteins uploading a file through a web-interface.

VaxiJen is a web-interface program. The results page reports antigen probability (as a fraction of unity) for each protein. Criterion to call PVCs is any protein with an antigen probability above a threshold (defaults 0.5).

VaxiJen is the only tool currently allowing classification based solely on the physiochemical properties of protein sequences without any related biological or functional information.

While very easy to use and very fast a major limitation is though represented by the fact that, at least in its current release, it is not possible for the user to change the training dataset upon which the prediction model is derived. A review of VaxiJen applications during the last years might be found in Zaharieva et al. (27).

#### Vaxign (19)

Vaxign is decision-tree software that works via web-interface. Vaxign is available in two forms: Vaxign Query that provides precomputed results for users to explore, and Dynamic Vaxign Analysis that allows dynamic execution and result visualization.

In Dynamic Vaxign Analysis, likewise NERVE, it runs different external computational programs on input protein sequences to predict five biological properties:


The authors, analyzing 11 known protective antigens from four bacterial pathogens strains (N. meningitidis, H. pylori, B. anthracis, M. tuberculosis), suggest the following criterion to identify PVCs: any protein surface exposed, with no more than one transmembrane helix, with probability to be an adhesin >0.51 and no sequence similarity to any host protein (human and mouse).

Vaxign mostly resembles NERVE in terms of the protein features predicted, computational programs used and thresholds set to call PVCs, though there are differences:


#### Jenner-Predict (20)

It is decision-tree software published in 2013. Jenner-predict identifies PVCs by filtering upon:


Pfam domains include classes of adhesion, invasion, toxin, porins, colonization, virulence, flagellin, penicillin-binding, choline-binding, transferring-binding, fibronectin-binding, and solute-binding.

The criterion to identify PVCs for Jenner-predict is: any noncytosolic protein with <3 trans-membrane helices and with at least one hit in the list of Pfam domains involved in hostpathogen interactions and pathogenesis. This final list of PVCs is then ranked according to the degree of conservation in different pathogenic and non-pathogenic strains, presence of known epitope sequences (both B and T epitopes) and degree of conservation with human proteins.

The novelty of Jenner-predict is to relax the criterion applied by NERVE and Vaxign on adhesin-likeliness to call PVCs. Jennerpredict doesn't use SPAAN to predict the probability for a candidate to be an adhesion but uses Pfam domains.

Differently by Vaxign, Jenner-predict uses the sequence similarity to human proteins only as a score to rank PVC. Jennerpredict at the time of writing is unavailable for users through its web-interface. We contacted directly the authors to ask for a local evaluation of the software on our benchmark dataset.

#### Heinson-Bowman (1, 2)

We called this method with the names of the first authors who published a machine learning RV method initially in 2011 (1) and then enhanced the classifier publishing the results in 2017 (2). Bowman et al. (1) merged the existing tools NERVE and VaxiJen, adopting from NERVE the idea of use a set of protein annotation tools and from VaxiJen the use of a machine-learning classifier.

The method uses a Support Vector Machine (SVM) classifier using a training dataset constituted by 200 bacterial protective antigens (BPA) extracted from literature. Bacterial protective antigens mean that the proteins have evidences about their protective potential in an appropriate animal model after immunization. Other 200 non-BPA were randomly selected from the same proteomes without sharing sequence similarity to BPAs. This dataset was initially annotated with 525 features coming from 31 different annotation tools. After a feature selection step, the number of features has been reduced to 10. This short-list of 10 includes:


The criterion to call PVC is any candidate protein with an antigen probability value greater than an a-priori threshold (0.5).

Even if the software is designed for bacterial PVCs, 8 out of 10 features are predicted by tools designed and tested for eukaryotic organisms, such as NetAcet (39) that predicts substrates of N-acetyl transferase A trained on yeast data with similar performances on mammalian substrates.

#### VacSol (21)

It is the last RV software appeared in the field of Reverse Vaccinology. It is decision-tree software that filters input protein candidates by:


The selection criterion for PVCs is: any non-host homologous, essential, virulent protein residing in the extracellular membrane with <2 transmembrane helices.

The final list of PVCs is then ranked accordingly to the prediction of MHC Class I and II binding regions and to the B-Cell epitope prediction.

#### BENCHMARK DATASETS

To compare the six RV tools, we selected a list of 11 bacterial species for which we could retrieve a list of BPAs combining information from literature (reviews) and publicly available in Protegen database (5).

The list of the 11 species -both Gram positive and Gram negative- includes bacteria that were already reported in the publications of the RV programs: eight species reported in NERVE publication or VacSol (Neisseria gonorrhoeae, Neisseria meningitides, Staphylococcus aureus, Streptococcus pyogenes, Helicobacter pylori, Chlamydia pneumoniae, Campylobacter jejuni, Borrelia burgdorferi), two reported in Jenner-predict publication or Vaxign (Escherichia coli, Streptococcus pneumoniae) and Treponema pallidum. For each species the list of BPAs and their relative references is reported in **Table S1**.

#### EVALUATION

Regarding Bowman-Heinson the original material has not been made available by the authors within the timelines needed to submit this manuscript. Being the pipeline of the program unavailable we decided to reproduce the analysis as far as possible in line with the description present in the articles (2).

TABLE 2 | Prototype of the golden-standard 2 x 2 table to measure the RV performances.


For each bacterial species the proteome was downloaded from Uniprot database (43) version 2018\_05 and was given in input to each RV program that returned in output the list of PVCs. In not specified, default settings were used for each RV program.

#### Performances' Measures of RV Programs

The golden standard to measure how well a RV program performs would be in theory to purify all the pathogen's proteins, test experimentally each of them in the appropriate animal model through pathogen-specific laboratory assays and finally compare predictions and experimental results like in **Table 2**.

From results arranged like in **Table 2** one could calculate both sensitivity or recall (TP/TP + FN), specificity (TN/TN + FP) and other performance metrics. Though in real-world scenario **Table 2** is almost unfeasible because of time and cost constraints for entire bacterial proteomes that consists of thousands of proteins. In this review we decided to focus on BPAs only and accordingly to measure the performances of RV methods by:


#### RESULTS

#### Comparison of the PVC Selection Criteria and Computational Tools

VaxiJen classifies PVCs extracting information from the chemical-physical properties of the aminoacids composing bacterial proteins. Conversely the remaining five tools in order to define PVCs work on features predicted by external programs (for a list see **Table 3**).

From the comparison of the PVCs selecting criteria of these five RV programs we observed that they share two common features:


About the prediction of the extracellular localization the RV programs use mostly Psort while Bowman-Heinson implements TargetP.


TABLE 3 | Summary of the external computational programs used by the six programs to predict the protein features instrumental to filter or classify PVCs.

The major virulence characteristic that is searched for is adhesion. SPAAN is the software of election to predict the probability of a protein being an adhesin and is used by NERVE and Vaxign. VacSol searches PVCs in the database of virulence factors VFDB that contains discrete proportion of adhesins. Among the Pfam domains used by Jenner-predict 96 domains are reported as related to adhesion. Also lipoproteins have been shown to play key roles in adhesion to host cells and translocation of virulence factors into host cells (44). Heinson uses LipoP software that produces predictions of lipoproteins.

Differently from what one might expect not all the RV programs use the sequence similarity to host proteins (either mouse or human) as a selecting criterion. Jenner-predict for instance uses the homology to human proteins only to rank the PVCs accordingly to what they call their "vaccine potential." Machine-learning approach of Heinson doesn't include in the list of the 525 initial potential discriminative features anything related to homology or similarity to host proteins.

Finally, HMMTOP is common to all the four decisiontree programs NERVE, Vaxign, Jenner-predict and VacSol. It is used to predict the number of transmembrane domains that is directly linked to the likelihood each protein has to be successfully purified.

#### Running Time

The performances in terms of time needed to predict PVCs are reported in **Table 4**. Time has been calculated using a set of 100 protein sequences with an average length of 360 TABLE 4 | Summary of run times on a benchmark dataset of 100 proteins (average length 360 a.a.).


aminoacids. Tools like Vaxign and VaxiJen are very fast and are able to predict 100 proteins in a few seconds or minutes, instead other tools like NERVE, Bowman-Heinson and VacSol are slower and need between 15 and 60 min to analyze the same protein dataset on a MacBook Pro (2.6 GHz Intel Core i7, 16 Gb RAM).

This difference is due to the fact that tools used via browser like Vaxign and VaxiJen have been developed in a specific way integrating the software needs with the hardware. In the case of tools such as NERVE, VacSol and Bowman-Heinson, the analysis depends on the characteristics of the hardware used and the running time may vary depending on the capabilities of the system.

In addition, must be noticed that tools like NERVE, VacSol and Bowman-Heinson are not available as preconfigured virtual machine so time must be dedicated to install the software itself and all its dependencies. Vaxign and VaxiJen, available via TABLE 5 | Fraction of PVCs predicted by each of the six programs (NERVE, VaxiJen, Vaxign, VacSol, Bowman-Heinson, and Jenner-predict) where pathogens are listed following the order of their proteome size.


browser, are easier to use, only necessitating to copy and paste fasta sequences of the proteins.

#### Fraction of PVCs

The results are presented in **Table 5** where pathogens are listed following the order of their proteome size (decreasing order).

Among the six programs VacSol resulted to be the most conservative predicting as PVCs on average only 3% of a bacterial proteome (min 0.7% Chlamydia pneumoniae—max 5.3% Streptococcus pyogenes). On the opposite side VaxiJen is the most permissive with on average 34.4% (min 27% Staphylococcus aureus—max 43.5% Neisseria gonorrhoeae) of a bacterial proteome predicted as PVC. A graphical summary is provided in **Figure 2.** As shown in the figure based on proteome fraction predicted as PVC we could hierarchically cluster the six programs into three groups corresponding to high, medium and low fraction of predicted PVCs.

VaxiJen is the software that predicts the greatest fractions of PVCs (always more than 25% of a proteome) and stands separately from the other tools. VacSol and Jenner-predict constitute the second group with low fractions of PVCs (always <10% of a proteome). In the middle are NERVE, Vaxign and Bowman-Heinson with similar medium fractions of PVCs predicted.

Analyzing the output of the six RV programs for each single protein we observed heterogeneous agreement among the programs (**Figure 3**). To quantify the strength of each pair-wise agreement among the six programs we used the Choen's kappa (45). If two programs are in complete agreement, then kappa is equal to 1. If there is no agreement between two programs other than what would be expected by chance kappa is equal or even <0. The values of kappa for the pairwise comparisons between programs are given in **Table 6**.

The programs are scarcely in agreement with the only exception of NERVE and Vaxign that show a high kappa value


In bold the maximum value of each column.

(0.769). VacSol seems to be the software that returns a list of PVCs mostly not in common with others (kappa ranges between −0.012 and 0.032).

#### Fraction of BPAs Identified and Fold-Enrichment

For each software we measured the fraction of BPAs identified in the subset of PVCs, the recall and the fold-enrichment associated with p-value based on hypergeometric distribution as described in section Performances' Measures of RV Programs.

As reported in **Table 7** the software with the highest fold enrichment is Jenner-predict that however has a recall of 44%. VaxiJen recalls the maximum absolute number of BPAs (76 BPAs in 9,357 PVCs) but has a low foldenrichment (2.2). In comparison to VaxiJen, Bowman- Heinson with 3,445 PVCs recalls 75 BPAs showing therefore the best performance in terms of combined recall and foldenrichment (5.9). Data for each single pathogen are provided in **Table S2**.

# DISCUSSION

Reverse vaccinology represents a critical step toward the discovery and development of protein subunit vaccines.

From its conception in early 2000 to date several programs have been developed to do Reverse Vaccinology. We reviewed six of them, open-source, designed for bacterial pathogens.

We found two types of RV programs: those based on decision-tree or filtering and those based on machine-learning or classifying.

The first type—including NERVE, Jenner-predict, Vaxign, and VacSol—has the advantage of using a predefined set of core features to predict PVCs, without requiring training on a preexisting list of good and bad candidates. Core features include extracellular localization, probability to be an adhesion, lack of similarity to host proteins and limited number of transmembrane domains.

We observed that on average 10–15% of a bacterial proteome matches these criteria, resulting in a list of hundreds of proteins to be potentially tested in preclinical laboratories.

Conversely, methods based on machine-learning use training sets. VaxiJen uses as predictive features values calculated from the aminoacidic composition of the proteins and returns long lists of PVCs: on average one third of a bacterial


TABLE 7 | Summary of the performance on the RV programs in terms of recall of BPAs and fold-enrichment.

In bold the maximum value of each column. Numbers are referred to the total number of proteins (27,247) of the 11 pathogens. BPAs are 100 in total.

proteome is called PVC. It is likely that changing the training set—at the time of writing the review composed by 200 proteins—the output lists of PVCs might change as well. The other machine-learning approach (based on a Support Vector Machine) developed by Bowman and enhanced by Heinson uses features extracted from programs predictive of subcellular localization, B and T cell responses and posttranslational modifications. Differently by VaxiJen the output list of PVCs is contained (12% of a proteome on average) and the method shows in our benchmark dataset a valuable enrichment in BPAs.

One advantage of the filtering RV programs is represented by user's full control of the step-wise process toward the selection of PVCs. PVCs are then easy to interpret and communicate. NERVE has not been updated since its release though Vaxign constitutes a valid alternative as it implements a very similar pipeline. The accordance between the two is indeed very good. VacSol represents also a valid RV filtering program but the number of resulting PVCs is so restricted that the likelihood to miss good candidates is not negligible.

Machine-learning methods are able to rank all the proteins of a pathogen based on their likelihood to be a PVCs. They can handle simultaneously much more features than filtering RV methods. However, these methods need an a-priori training dataset of good and bad antigens. This represents their main Achille's heel because if it true that on literature one might found experimental evidences for good antigens, the same is not always valid for negative cases i.e., candidate proteins that didn't succeed in preclinical testing. The shortcut commonly used to artificially populate a set of bad antigens randomly selecting proteins not tested in laboratory but with scarce similarity to good antigens is questionable. Evidence of this are for instance the two antigens fHbp and NadA present in Bexsero <sup>R</sup> vaccine. Considering fHbp a good antigen, based on the almost null sequence similarity to NadA one would consider NadA as bad candidate. It would be beneficial to increase the performances of RV methods if manually curated set of candidate proteins with negative experimental outcomes would be publicly available. A limitation of machine-learning RV methods might be represented by the interpretation of the results since it is not straightforward to map backwardly PVCs to the features space.

#### CONCLUSIONS

We have extensively reviewed, for the first time, the state-of-theart of Reverse Vaccinology bioinformatic tools used in bacterial antigen prioritization, visualized their diversity, and examined their performances.

We found that independently by the number of predicted PVCs, none of the six programs was able to recall more than 76 BPAs out of the benchmark list of 100 composed from eleven different bacterial species. The machine learning based method of Bowman-Heinson demonstrated the best ratio between BPA identified and number of PVCs predicted, recalling 75% of BPAs in a total of 3,445 PVCs. This is relevant in the filed because while reducing the number of laboratory tests this method should simultaneously guarantee the identification of the vast majority of proteins with potential protective efficacy.

When we looked at the overall agreement in terms of PVC calls among the six programs we found a low score indicating that each program capture a specific profile for PVCs. Being the time of processing reasonable we suggest to explore the results of at least one filtering and one classifying method. We finally observed that a distinguishing feature in the most cited and applied RV packages VaxiJen and Vaxign, is their accessibility to final users through graphical user interfaces. We encourage researches in this field to invest in the development of user-friendly interfaces, as much as to the improvement of the predictive power of the algorithms.

#### AUTHOR CONTRIBUTIONS

All authors contributed to methods design, editing, and approved the final manuscript. AB and MD wrote and tested code, sourced data, performed data analysis, and drafted the manuscript. All authors read and approved the final manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2019.00113/full#supplementary-material

# REFERENCES


based on host-pathogen interactions. BMC Bioinformatics (2013) 14:211. doi: 10.1186/1471-2105-14-211


with functional gene ontology annotation. PLoS ONE (2014) 9:e99368. doi: 10.1371/journal.pone.0099368


45. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. (1960) 20:37–46. doi: 10.1177/001316446002000104

**Conflict of Interest Statement:** AB, DM, and RR were employees of GSK group of companies at the time of the study. MD is a Ph.D. student at the University of Turin and participates in a postgraduate studentship program at GSK.

Copyright © 2019 Dalsass, Brozzi, Medini and Rappuoli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Development of a Vaccine Against Meningococcus B Using Reverse Vaccinology

Vega Masignani <sup>1</sup> , Mariagrazia Pizza<sup>1</sup> and E. Richard Moxon<sup>2</sup> \*

*<sup>1</sup> GSK Vaccines, Siena, Italy, <sup>2</sup> Department of Pediatrics, Oxford University, Oxford, United Kingdom*

The discovery of vaccine antigens through whole genome sequencing (WGS) contrasts with the classical hypothesis-driven laboratory-based analysis of microbes to identify components to elicit protective immunity. This radical change in scientific direction and action in vaccine research is captured in the term *reverse vaccinology*. The complete genome sequence of an isolate of *Neisseria meningitidis* serogroup B (MenB) was systematically analyzed to identify proteins predicted to be secreted or exported to the outer membrane. This identified hundreds of genes coding for potential surface-exposed antigens. These were amplified, cloned in expression vectors and used to immunize mice. Antisera against 350 recombinant antigens were obtained and analyzed in a panel of immunological assays from which 28 were selected as potentially protective based on the -antibody dependent, complement mediated- serum bactericidal activity assay. Testing of these candidate vaccine antigens, using a large globally representative strain collection of Neisseria species isolated from cases of disease and carriage, indicated that no single component would be sufficient to induce broad coverage and that a "universal" vaccine should contain multiple antigens. The final choice of antigens to be included was based on cross-protective ability, assayed by serum bactericidal activity and maximum coverage of the extensive antigenic variability of MenB strains. The resulting multivalent vaccine formulation selected consisted of three recombinant antigens (Neisserial Heparin Binding Antigen or NHBA, Factor H binding protein or fHbp and Neisseria Adhesin A or NadA). To improve immunogenicity and potential strain coverage, an outer membrane vesicle component obtained from the epidemic New Zealand strain (OMVNz) was added to the formulation to create a four component vaccine, called 4CMenB. A series of phase 2 and 3 clinical trials were conducted to evaluate safety and tolerability and to estimate the vaccine effectiveness of human immune responses at different ages and how these were affected by various factors including concomitant vaccine use and lot-to-lot consistency. 4CMenB was approved in Europe in 2013 and introduced in the National Immunization Program in the UK starting from September 2015 when the vaccine was offered to all newborns using a 2, 4, and 12 months schedule., The effectiveness against invasive MenB disease measured at 11 months after the study start and 5 months after the second vaccination was 83% and there have been no safety concerns.

Keywords: 4CMenB vaccine, reverse vaccinology, strain coverage, cross protection, antigenic variability, Neisseria meningitidis serogroup B (MenB)

#### Edited by:

*Lee Mark Wetzler, Boston University, United States*

#### Reviewed by:

*Andrew Gorringe, Public Health England, United Kingdom Scott D. Gray-Owen, University of Toronto, Canada*

\*Correspondence: *E. Richard Moxon richard.moxon@paediatrics.ox.ac.uk*

#### Specialty section:

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> Received: *06 December 2018* Accepted: *20 March 2019* Published: *16 April 2019*

#### Citation:

*Masignani V, Pizza M and Moxon ER (2019) The Development of a Vaccine Against Meningococcus B Using Reverse Vaccinology. Front. Immunol. 10:751. doi: 10.3389/fimmu.2019.00751*

# MENINGOCOCCUS B: THE LAST FRONTIER

The development of a meningococcal vaccine to protect against invasive disease caused by serogroup B strains of Neisseria meningitidis (MenB) represents a milestone in vaccinology. MenB is a major cause of sepsis and meningitis in North and South America, Canada, Europe, Australasia, and many other countries, but developing an effective vaccine was for many years an unsolved problem. The stumbling block was that, in contrast to all other variant capsular polysaccharides of the meningococcus for which effective conjugate vaccines were developed and licensed, the B polysaccharide does not induce an effective antibody response. A study of 50 healthy adults immunized with MenB polysaccharide showed that all but three of them failed to produce any antibodies (1) and even conjugation to tetanus toxoid failed to improve its immunogenicity (2). In contrast, in the 1960s, it was shown that adult military recruits immunized with plain MenC polysaccharide responded with copious amounts of antibody that protected against meningitis and sepsis (3). Further, a trial in Finland using unconjugated MenA polysaccharide also showed strong, protective immune responses even in young children (4). Why was the MenB such a poor immunogen? Scientists from Finland showed that the MenB capsular polysaccharide was identical to sugars found on the surface of many human cells especially, but not exclusively, in the brain during its pre-natal development (5, 6). It was concluded that inducing antibodies to MenB capsular polysaccharide ran the risk of damaging structures found on the surface of human cells and the authors proposed that the immune system had evolved tolerance to the B polysaccharide as a mechanism to avoid autoimmune pathology because of mimicry between components on the cell surface of human cells and surface structures of bacteria. The B polysaccharide is a homo-polymer of α (2→ 8) N-acetyl neuraminic acid, known also as polysialic acid (PSA), located on the surface of human cells. PSA has unusual and important biological properties. To allow intimate intercellular interactions, water must be excluded and PSA, richly hydrated, modulates the cell to cell signaling. Using antibodies or an enzyme that specifically destroys PSA, animal experiments have shown its key role in programming CNS development, including the migration of nerve cells, the connectivity between dendritic cells and the formation of junctions between muscle and nerves (7, 8).

Many scientists concluded that inducing antibodies to PSA in humans represented an unacceptable safety risk. A particularly alarming concern that consolidated opposition to using the polysaccharide as a vaccine concerned the risk to immunized women who become pregnant. Antibodies cross the placenta and reach the developing embryo, so antibodies resulting from immunization with B polysaccharide could disrupt CNS development in the unborn child, especially since the amount of PSA on neural tissues is known to be at its highest level during fetal development. But not all scientists were convinced that the evidence precluded using the B polysaccharide as a vaccine. Harold Jennings, one of the first to demonstrate that the B polysaccharide was inert as an immunogen had the idea of modifying the B polysaccharide by introducing an N-propionyl side chain with the aim of increasing immunogenicity and precluding cross reactivity to human cells (9). This formulation elicited functional antibody responses in mice, but not in humans (10).

John Robbins was strongly supportive of this approach and even today remains steadfast in his opinion that using the B polysaccharide as a vaccine would not be harmful. He and his collaborators have documented that humans, through exposure to naturally occurring antigens, make antibodies to the polysaccharide, yet do not have an increased susceptibility to auto-immune disease (11). But, for the majority of scientists, an alternative approach to a vaccine, one that avoided the use of the potentially harmful B polysaccharide, was considered imperative. Crucially, vaccine manufacturers were strongly influenced by safety concerns and were unwilling to embark on investing millions into a research and development programs that risked being derailed when ethical approval was sought to carry out the mandatory clinical trials in humans.

Over many years, research on alternative approaches to develop a vaccine that protected against invasive diseases caused by strains of MenB was undertaken. Efforts were largely directed to the non-capsular antigens, proteins, or lipopolysaccharide (in meningococcus, often referred to as lipo-oligosaccharide or LOS). This change brought about a radical conceptual shift; referring to these non-capsule based vaccines as "group B" vaccines is a misnomer as they do not contain the defining feature of MenB bacteria (the polysaccharide capsule), and the vaccines will also potentially protect against other capsular groups (A, C, W, and Y strains).

The ability of the antibodies induced by each vaccine antigen to activate complement and induce bactericidal activity, measured in the serum bactericidal assay (SBA), was shown to be predictive of protection in humans (12, 13). However, although the SBA was well-established in the context of conjugate vaccines, its credentials as a correlate of protection in the context of protein antigens required further, independent validation. This came from the evidence of Goldschneider (12, 13) and experience with outer membrane vesicles (OMVs) [in particular those used in Norway and New Zealand (14, 15)], whose protective efficacy was shown to be largely mediated by bactericidal antibodies to the meningococcal surface protein, PorA. Although a number of surface-exposed candidate antigens were identified (16–18), none possessed sufficient capacity to elicit cross-reactive bactericidal activity against the diversity of meningococcal strains, the sine qua non that was essential for developing an effective vaccine. However, OMVs, treated with detergents to extract LOS and decrease endotoxin activity, were safe and effective in preventing group B meningococcal disease (19–22); A variety of "tailor-made" MenB OMV vaccines have been developed and licensed to control epidemics dominated by a single clone. OMV vaccines have been used in Norway (23), Cuba (24), Chile (25), and New Zealand (26). MeNZB, which was implemented in New Zealand, was associated with substantial reductions in invasive meningococcal disease caused by an outbreak clone that reached an incidence of 17.4/100,000 of the population overall and more than 200/100,000 among some indigenous populations (27). In published studies, efficacy of two doses given to children 4 years or older, or to young adults, ranged from 57 to 83% but with the limitation that the protective, bactericidal responses of infants were specific for the major outer membrane porin protein (PorA) (14). Thus, the utility of OMV vaccines is limited to control clonal epidemics where disease is caused by strains expressing a PorA serosubtype matching that in the OMV vaccine. In an effort to broaden protection, OMV vaccines were prepared from several strains expressing distinct alleles of PorA, but the manufacturers of this multivalent formulation were not able to overcome a number of problems that included variable immunogenicity and consistency of formulation (28).

New technology was needed to overcome the impasse and this came about in 1995 when a team from The Institute for Genomic Research (TIGR) sequenced the complete genome of the human commensal-pathogen bacterium Haemophilus influenzae (29). The first completely assembled genome of a free-living organism was a revolution in biology; whole genome sequencing (WGS) transformed the scientific basis of epidemiology, diagnosis, and prevention of all diseases, including those caused by microbes. In the field of infectious diseases, WGS introduced a cost-effective method to acquire comprehensive information on pathogens and commensals, including those whose biology was elusive because they could not be cultivated in the laboratory. The implications of completing the first WGS were immediately apparent. The idea of using a genomics platform as a discovery tool to identify vaccine antigens was first explicitly published in 1997 (30) and the public health imperative to develop a vaccine to prevent invasive infections caused by MenB provided an ideal opportunity to exploit this concept. Scientists from Oxford University provided Venter and his team at The Institute for Genomics Research (TIGR) with DNA from a MenB strain (MC58) isolated from a UK outbreak of meningococcal disease (31). Preliminary genome sequence data, initially a modest 2-fold genome coverage, validated the potential of the approach by identifying a novel antigen (17, 18). Meantime, Italian scientists from Sclavo, Siena (led by Rino Rappuoli) had for several years dedicated their research efforts toward the development of meningococcal vaccines. In 1998, a collaboration between Chiron Vaccines (who acquired Sclavo), TIGR and Oxford University, carried out a comprehensive evaluation of all potential meningococcal vaccine antigens in the MC58 MenB strain, as described below. The WGS approach, highly sensitive but lacking the specificity to identify and prioritize antigens with respect to protective potential, was published in 2000 (32, 33). Antigen discovery through WGS contrasted with the hypothesis-driven classical laboratory based, bottom-up analysis of microbes to identify components that could elicit protective immunity. This radical change in scientific direction and action in vaccine research was subsequently captured in the term reverse vaccinology (34). Many of these were outer membrane proteins that had relatively low levels of surface expression, one reason why they had not been discovered before the use of WGS.

# FROM BIOINFORMATICS TO BIOLOGY

Although the concept of mining genome information was straightforward, the challenge of whether it could result in the development of a vaccine was not. Indeed, proper cellular localization is the key attribute of a bacterial protein to be considered as a potential vaccine candidate. While proteins located in the cytosol are generally not good immunological targets, surface-associated structures are potentially accessible to the immune system and therefore more likely to induce a functional immune response. Based on this assumption, an in silico bioinformatics approach was used to identify novel antigens for vaccine development. The genome was therefore screened systematically to identify proteins predicted to be secreted or exported to the outer membrane, localized in the periplasm or in the inner membrane. Furthermore, selection was also extended to proteins containing amino acid signatures predictive of a possible role in the adhesion to host factors, as well as other virulence mechanisms. This was challenging since the MenB genome consisted of more than 2,000 predicted genes, only a minority of which coded for surface expressed molecules of potential utility as vaccine antigens (33). Although today sophisticated software and dedicated suites of programs exist to accurately predict a protein's cellular localization and potential biological function, this was not the case 20 years ago. In the late 1990s, interrogation of sequence data was in its infancy, the utility of many of the algorithms was not validated and annotations were often misleading. Management and interpretation of about two million base pairs of meningococcal genome sequence data were prone to errors. For instance, prediction of the start codons was based on the identification of the first ATG occurring after a previously identified stop codon. Unfortunately, this did not take into account either the presence of a correctly spaced Shine Dalgarno sequence, or the potential presence of less frequent start codons like TTG or GTG (coding for leucine or valine, respectively). For example, the annotation of GNA1870 (later renamed fHbp) was incorrect as a result of automatic procedures **Figure 1** and (35) is now one of the most important meningococcal antigens. The plethora of repetitive DNA elements in the genomes of meningococci was a deterrent to efficient assembly and unambiguous identification of genes because of frame shifts or sequencing errors. In the particular case of MC58 genome annotation, 65 open reading frames that contained stretches of repetitive DNA were identified. Of these, only 16 were previously known; the remainder was discovered through complete genome sequencing (36). The fact that some of the genes with DNA repeats encode for surface associated proteins poses a problem in terms of antigen selection, as phase variable genes, potentially generating escape variants, may not be ideal candidates for vaccine development. Despite these challenges, 18 months after the beginning of sequencing, 600 potential vaccine candidates were identified in silico. The highest proportion of these putative candidates was represented by integral membrane proteins (characterized by multiple hydrophobic domains), followed by periplasmic proteins, lipoproteins, and outer membrane and secreted proteins, the latter group representing <15% of the total. Interestingly, only

half of the selected gene products displayed homologies to proteins of defined function, whereas the others had no clearly attributable functional role.

The genes coding for the 600 potential surface-exposed antigens were amplified from the genome of strain MC58 and cloned in expression vectors to generate Histidine (His) or GST (Glutathione S-transferase) tagged proteins. The fusion protein form, His or GST tagged, which showed higher solubility was purified from E. coli and used to immunize mice. The antisera raised against each recombinant antigen were analyzed in a panel of immunological assays: Western blot to confirm that the antigen was expressed in meningococcus and at the predicted molecular weight; flow cytometry by FACS (Fluorescence Activated Cell Sorter) to evaluate accessibility of the antigen to antibody binding on the meningococcal surface, and SBA to assess the ability of the antibodies to bind the antigen and to promote complement mediated bacterial killing. In addition, some of the antigens were tested for their ability to confer protection in the infant rat or mouse septicemia models by active or passive immunizations. Following this approach, 350 of the 600 predicted surface-exposed proteins were successfully expressed in E. coli and purified as recombinant proteins. The rate of expression was mainly driven by the intrinsic features of the selected antigens, with those containing more than one predicted transmembrane domain being the most difficult to express. Of the 350 candidate antigens, 91 proved to be surfaceexposed, and 28 were able to elicit a bactericidal response. The identification of 28 new bactericidal antigens represented a real breakthrough in the field, considering that in more than 50 years of research only few bactericidal antigens were characterized (32) (**Figure 2**).

Selecting which of the most promising antigens should be included in the MenB vaccine required an approach informed by sequence analysis of individual surface expressed proteins to assess the extent of their variation within the natural population of meningococcal strains (associated with invasive disease and carriage) and by evaluation of the cross-bactericidal activity of antisera raised against each of the antigens. A collection of strains was assembled by scientists at University of Oxford so as to be representative of the Neisseria species, based on multiple serogroups of N. meningitidis. Also included were strains of Neisseria gonorrhoeae, Neisseria cinerea, and Neisseria lactamica to evaluate the sequence conservation of the top cross-protective antigens. This sequence conservation analysis revealed a substantial degree of variability in at least some of the candidates, suggesting that no single component would be sufficient to induce broad coverage and that a "universal" vaccine should contain multiple antigens (**Figure 2**).

## FORMULATING THE MULTICOMPONENT VACCINE AND FUNCTIONAL CHARACTERIZATION OF ITS COMPONENTS

The final choice of antigens to be included in this multivalent vaccine formulation was based on cross-protective ability, assayed by bactericidal activity and maximum coverage of the extensive antigenic variability of meningococcal B (MenB) strains. When initially identified, each candidate antigen was referred to as GNA (Genome derived Neisseria Antigen) followed by a number representing the position of the encoding gene in the genome. The three most promising antigens identified were GNA2132, GNA1870, and GNA1994. In subsequent studies, these three antigens were given names: NHBA, fHbp, and NadA, respectively, based on their functional activity (**Figure 3**).

NHBA: GNA2132 is a lipoprotein of around 420–490 amino acids and was shown to bind heparin and heparan-sulfate in vitro, through an Arginine-rich region. Because of this function, it has been named Neisserial heparin binding antigen (NHBA) (42). Binding to heparin affects survival of Neisseria in a human blood killing assay (42). The Arginine rich region plays also a key role in adhesion to eukaryotic cells (43). The nhba gene is

amplified by PCR and cloned into *Escherichia coli* expression vectors. Three hundred and fifty recombinant proteins were successfully produced, purified, and used to immunize mice. Recombinant protein candidates were then selected based on their surface expression (assessed by FACS), and ability to induce serum bactericidal antibodies (assessed by the SBA serum bactericidal assay) and conservation in a panel of Neisseria strains. The antigens selected by reverse vaccinology were finally prioritized, with NadA, fHbp, and NHBA as the three top antigens.

ubiquitous in all meningococcal serogroups and is also found in N. gonorrhoeae and other commensal Neisseria species. On the basis of the sequence variability, over 400 different peptides have been described and the relationship between sequence variability and cross-protection remains to be defined The NHBA protein has an N-terminal region of approximately 250 residues predicted to be intrinsically disordered, and a highly conserved C-terminal domain (∼180 residues), with an 8-stranded antiparallel β-barrel folding (38, 39). NHBA undergoes proteolytic cleavage by meningococcal NalP protease and by eukaryotic proteases like human lactoferrin, Kallicrein, and the C3 convertase (42, 44).

fHbp: GNA 1870 is a lipoprotein of 253–266 amino acids able to bind human Factor H (FH), an inhibitor of the alternative complement pathway. Because of this activity it has been named factor H binding protein or fHbp (the same antigen discovered using a biochemical approach was named rLP2086) (35, 45). The binding of fHbp to FH enhances meningococcal serum resistance allowing the bacterium to replicate in human blood. The three-dimensional structures of fHbp alone or in complex with domains 6 and 7 of human FH have been solved. Interestingly, the side chains of fHbp that interact with FH resemble the glycosaminoglycan binding region of FH on host cells (37). Therefore, Neisseria is able, through fHbp, to recruit FH by mimicking the host. Sequence diversity analysis allowed identification of three variants, named variants 1, 2, and 3 (or subfamily A and B), serologically distinct and with only a low cross-bactericidal activity between variants 2 and 3 strains. The amount of fHbp expressed by different MenB strains is controlled by the fHbp promoter and can vary of at least 15-fold (46). FHbp contains multiple bactericidal epitopes and bactericidal activity of anti-fHbp antibodies varies according to the genetic diversity and level of expression of fHbp in the different strains (47).

NadA: GNA 1994 is a trimeric autotransporter belonging to the Oca family (oligomeric coiled-coil adhesins) of 323–405 amino acids. It mediates adhesion and invasion to epithelial cells and for this reason it has been named NadA (Neisseria adhesin A) (48). The nadA gene is not present in all meningococcal strains, and its presence is mainly associated with the hyperinvasive sequence type 8 (ST-8), ST-11, ST-32, and ST-213 clonal complexes (cc) but is rarely present in ST-41/44 and ST-269 cc isolates (49). Six variants exist of which NadA1, NadA2, and NadA3 are highly immunogenic and induce cross-reactive SBA responses. NadA4 is associated with carriage strains. NadA5 is rare and found in only a few invasive isolates (50). NadA expression levels vary among isolates and expression is upregulated by niche-specific signals via the transcriptional regulator NadR, which binds the NadA promoter and represses transcription. DNA-binding activity of NadR is attenuated by 4 hydroxyphenylacetic acid (4-HPA), a natural molecule released in human saliva, thus leading to the de-repression of nadA in vivo (51). Because of this tight regulation, the role of NadA in vaccine coverage may be underestimated in vitro. NadA forms stable trimers on the bacterial surface and mediates binding to epithelial cells through interaction with protein receptor molecules differentially expressed by various epithelial cell lines. The three dimensional structure reveals a novel TAA (trimeric auto-transporter adhesins) organization made mostly of a coiled-coil with protruding wing-like structures forming a head-like domain (40).

With the aim of maximizing strain coverage while facilitating large-scale-manufacturing, fHbp, NHBA, and NadA were fused to additional candidate antigens, previously selected based on their ability to induce bactericidal activity and/or protection in animal models. More than 30 protein-protein fusions were generated and analyzed for their biochemical and immunological properties. Based on these analyses, GNA2132-GNA1030 and GNA2091-GNA1870 were the most stable and the most immunogenic in animal testing. Surprisingly, bactericidal activity induced by immunization with fHbp and NBHA was increased when each was fused to other antigens. In contrast, NadA was less immunogenic when fused to other antigens, probably because of the loss of its trimeric structure.

A vaccine consisting of three recombinant proteins, two protein-protein fusions plus a single antigen, named recombinant MenB vaccine (rMenB) was formulated with aluminum hydroxide and used to immunize mice. From a collection of 214 N. meningitidis clinical isolates (obtained from Europe, Canada, US, Australia, and New Zealand) to represent the global population diversity of invasive serogroup B isolates, bactericidal assays were performed on 85 strains using rabbit sera as exogenous complement source. The rMenB vaccine induced bactericidal antibodies against 78% of these strains. To improve immunogenicity and potential strain coverage of rMenB, an outer membrane vesicle component obtained from the epidemic New Zealand strain was added to the formulation to create a four component vaccine, called 4CMenB (52).

#### FROM WGS VACCINE ANTIGEN DISCOVERY TO A LICENSED VACCINE

Each component of the MenB vaccine had to satisfy a plethora of demanding regulatory conditions with respect to safety and immunogenicity, each having complex cost implications with respect to their manufacture and formulation. Six years preclinical research on toxicity, stability and immunogenicity were required before the two candidate MenB formulations, rMenB and 4CMenB, were approved for clinical trials that commenced in adults in 2004 (53). Clinical trials of these vaccines in infants began in 2006 (54, 55), bypassing the conventional pathway involving a step-wise decrease in the age of subjects; primary school to pre-school to toddler to infant. This accelerated program in part reflected an awareness from the experience with OMV based vaccines that the breadth of immune response to meningococcal antigens was likely to be age dependent. Therefore, there was an imperative to evaluate the breadth of the immune response (i.e., cross-reactivity with non-vaccine variants of the vaccine antigens) in the age group most at risk of disease early in the vaccine's clinical testing.

A major challenge for these clinical trials was how to determine the immunogenicity of the vaccine candidates. In contrast to the previously licensed meningococcal glycoconjugate vaccines for which target antigens (distinct polysaccharide capsules) were invariant structures, the outer membrane proteins contained in rMenB and 4CMenB are variable in both primary sequence and level of expression. As in pre-clinical studies, this required judicious selection of MenB isolates on which to perform bactericidal assays, such that these distinct strains were representative of the diversity of invasive disease MenB target antigens. A further constraint was the small amount of serum that could be obtained in clinical trials involving infants; this limited the number of assays that could be performed to evaluate immunogenicity.

Thus, while the initial phase 1 human studies tested the immune response against 15 strains, the early phase 2 infant studies tested post-immunization sera against a panel of 7 strains (54, 55). Four of these strains were reference strains chosen to demonstrate the immunogenicity of individual vaccine antigens (fHbp, NHBA, NadA, and PorA) (**Table 1**). Immunization with 4CMenB induced bactericidal antibodies against a greater proportion of meningococcal strains than did rMenB. These findings were the basis of the decision to select 4CMenB, rather than rMenB, for further clinical development. A series of clinical trials evaluated how the immune response to the 4CMenB vaccine antigens was influenced by factors such as age of administration, concomitant vaccine use and lot-tolot consistency.

4CMenB was approved in Europe in 2013 and introduced in the National Immunization Program in the UK starting from September 2015; the vaccine was offered to all newborns using a 2, 4, and 12 months schedule. The effectiveness against IMD measured at eleven months after the study and five



*H44/76 is the indicator strain for fHbp, 5/99 for NadA, NZ98/254 for PorA and M10713 for NHBA.*

months after the second vaccination, was 82.9% (95% CI: 24.1 – 95.2). The wide confidence limits reflect the challenges of interpreting the post-implementation data in the short term, given the relatively small numbers of cases and the temporal fluctuations in rates of disease that are typical of meningococcal disease epidemiology. Nonetheless, following implementation of 4CMenB vaccine, the number of cases in vaccine-eligible infants was reduced by 50% (95% CI 36–71; p = 0.0001), compared to the pre-vaccine period. The long term impact of 4CMenB vaccine implementation on disease burden, disease severity and safety will continue as part of the National Surveillance program (56). There were also extensive phase 2 and 3 studies to investigate the safety and tolerability of the vaccine, of particular importance given the previous experience of the reactogenicity of OMV vaccines. These clinical trials, involving approximately 7,400 children under 11 years of age prior to licensure in Europe "EMEA assessment report November 2012. http://www. ema.europa.eu/docs/en\_\_GB/document\_\_library/EPAR\_\_-\_\_ Public\_\_assessment\_\_report/human/002333/WC500137883.

pdf"), demonstrated that ∼60% of children receiving 4CMenB concomitantly with DTaP (Diphteria, Tenanus, and acellular Pertussis) and pneumococcal conjugate vaccines experienced fever, compared to ∼30% when these vaccines were given without 4CMenB. In infants, local and systemic reactions appeared to be more frequent when 4CMenB was co-administered with other vaccines, but medical attendance after vaccination and fever-related serious adverse events (SAEs) were rare. The occurrence of febrile seizures was comparable to that reported from other combination vaccine studies. Two cases occurred within 24 h after the first and another two cases after the second vaccination with 4CMenB and routine vaccines. These cases were assessed as possibly associated with vaccination but were deemed as mild and resolved spontaneously. Most other adverse events were common childhood illnesses or events consistent with solicited reactions and resolved at final follow up (57). After its introduction in the routine UK immunization program, there was an increase in presentations to Accident and Emergency and in hospital admissions for transient adverse events following immunization (58). In contrast, a suggestion of an association with Kawasaki disease in early clinical trials was not supported by post-implementation surveillance (59).

To obtain licensure by the European Medicines Agency and multiple other regulatory agencies internationally, a major challenge was how to estimate the protective potential of 4CMenB against invasive disease. Owing to the low incidence of meningococcal disease, classical efficacy studies were impractical. Thus, SBA using human complement was used to estimate vaccine functional immunity against invasive meningococcal disease. But, because the MenB strains that cause invasive meningococcal diseases are highly diverse with respect to the quantity and immunological cross-reactivity of the vaccine antigens expressed, estimating the effectiveness of the vaccine required performing SBA against large numbers of isolates, an undertaking that was judged to be impractical. Therefore, an innovative method was developed to assess coverage and predict effectiveness of the 4CMenB vaccine. This assay, called MATS (Meningococcal Antigen Typing System), correlated information on the quantity and quality of the antigens expressed by individual MenB strains and the potency of the immune response elicited by the vaccine based on bactericidal assays.

MATS is based on the assumption that a given MenB strain is susceptible to killing by vaccine-induced antibodies, providing that this strain expresses one or more surface proteins in sufficient amounts so as to be adequately cross-reactive with a vaccine component (**Figure 4**). To develop the MATS assay, ELISA reactivity with antisera raised against fHbp, NHBA, and NadA expressed by each tested strain was compared to antigen specific reference MenB strains, a metric called relative potency (RP). Coverage of each individual strain is assumed if the RP is higher than an antigen-specific positive bactericidal threshold (PBT), defined for each antigen on the basis of bactericidal activity of infant sera against a panel of 57 serogroup B strains. PorA cross-reactivity is evaluated by exact sequence matching to PorA P1.4 vaccine serosubtype (60, 61). The MATS assay has been transferred to national reference laboratories in Europe, US and Australia. Worldwide, MATS-predicted coverage afforded by 4CMenB has been estimated at 66% in Canada (62) 68% in Portugal (63), 69% in Spain (64), 74% in Czech Republic (65), 76% in Australia (66), 78% in other 5 European countries (60), 81% in Brazil (66), 84% in Poland (67), 89% in Greece (68), and 91% in US (69). Finally, in England, Wales and Northern Ireland, the MATS estimate of coverage was measured at 73 and 67%, respectively, on different strain collections of 2007–2008 and 2014–2015 the latter representing the baseline before vaccine implementation (56) (**Figure 5**). Indeed, several publications (70, 71) claim a potential underestimation of MATS predicted coverage estimates, due to a series of reasons: (i) MATS provides an estimation of the contribution of each antigen independently, therefore the synergistic effect of antibodies recognizing different antigens is not measured; (ii) the NadAmediated contribution to protection is underestimated as NadA expression is downregulated in the in vitro conditions in which MATS is performed, compared to expression of the antigen in vivo (51); (iii) the contribution of OMV to protection is limited to the presence of a matched PorA antigen, although it is commonly accepted that PorA-independent protection can be afforded by OMV against some strains.

The underestimation of protection predicted by MATS was further supported by a study performed in the UK on a panel of circulating clinical strains where the MATS predicted coverage was 73%, while the SBA showed 88% strain coverage (70). Similar data were also generated on a panel of MenB strains from Spain, showing that isolates found negative in MATS were in fact killed by sera of adolescents and infants immunized with 4CMenB (64). The overall underestimation has more recently been confirmed by the preliminary "real-world" effectiveness of 82.9% based on the results of the routine infant immunization with 4CMenB in the UK (56).

In Canada, 4CMenB was licensed in 2013 for use in 2 months to 18 years old. A mass vaccination campaign, targeting individuals aged 2 months to 20 years was implemented in the Saguenay-Lac-Saint-Jean region of Québec in 2014 to control the high incidence rate of MenB disease. Following the campaign, the incidence in the region decreased, with no

cases reported in the vaccinated individuals but with two cases occurring among the unvaccinated (72). In US, 4CMenB has been authorized in 2015 and recommended for use in the 10–25 years old as two doses vaccine. It has been used to control MenB outbreaks at University and college campus in Oregon, New Jersey and California (73, 74). No cases of MenB disease have been reported so far in vaccinated individuals, suggesting that the vaccine is effective in this age group. Moreover, when the immune responses induced by 4CMenB during the outbreak in Princeton was measured, 33% of 4CMenB vaccines showed no SBA against the outbreak strain, although no cases of meningococcal disease caused by N. meningitidis B were reported among vaccinated student (73).

# DISCUSSION

The licensure in 2013 of the four component MenB vaccine (Bexsero) was the culmination of a scientific collaboration between university and industry-based scientists. The former provided cutting edge genomic, genetic, and clinical trials expertise; the latter undertook the vital high-throughput, "brute force" evaluation of hundreds of candidate antigens discovered through genomics, the in-depth characterization of the functional and immunological properties of the selected vaccine antigens and then stage-managed the pre-clinical and clinical testing required to obtain licensure. The facilitating technological breakthrough of WGS of bacterial pathogens came about through a former NIH academic, Craig Venter, who used his entrepreneurial vision to set up TIGR, the sequencing facility that made the MenB project possible. The 2018 Gairdner Award to Rino Rappuoli https://www. aditecproject.eu/2017/05/04 who oversaw this academiccommercial partnership was fitting recognition of his role in driving through the innovative application of genomics to antigen discovery, the first example of what has become known as "reverse vaccinology."

4CMenB represents a striking departure from the successful research and development platform that resulted in several, highly safe and effective conjugate meningococcal vaccines (against meningococcal serogroups A, C, W, and Y strains) formulated through covalent chemical linkage of different serogroup capsular polysaccharides to proteins. Although each of the meningococcal capsular polysaccharides shows strikingly distinct chemical compositions, each is an invariant structure whose target epitopes do not change over time or region. Diversity in the "carrier" proteins used to formulate conjugate vaccines are not problematic providing that these variations do not interfere with their role in recruiting T-cell help. But for vaccines, such as 4CMenB, where the antigens inducing protective immunity are proteins, the scenario is fundamentally different. The amino acid sequence of each of the protein antigens is highly variable, a consequence of their location on the bacterial surface where exposure to immune responses drives selection and fixation of diversity in circulating strains of meningococci. The multivalent protein vaccine, 4CMenB, is not without precedent; the several acellular pertussis vaccines have formulations consisting of up to 5 proteins, although in retrospect there was inadequate appreciation of the complications of allelic variation of these vaccine proteins. Loss or gain of DNA has

over many years impacted on the effectiveness of B. pertussis vaccines (75), but its population structure is clonal (76), so there is no recombination and the rate at which antigenic variation accumulates is very gradual over time. In contrast, genetic variation in meningococci occurs predominantly through recombination, not intra-genomic mutations. Thus, within the natural population of meningococci, there is frequent horizontal transfer of DNA, mainly through DNA transformation, not only between distinct genotypes of N. meningitidis, but also from other sub-species of Neisseria and, rarely, other distinct bacterial species. For example, conserved homologs of the nhba gene have been found in commensal Neisseria species, such as N. lactamica, N. polysaccharea, and N. flavescens (77). This finding is relevant because of the potential "selective impact" that a NHBA-containing vaccine could have not only on encapsulated meningococcal strains, which are potentially pathogenic, but also on the commensal flora. This rampant recombination has major implications in that to be an effective vaccine, 4CMenB must elicit antibodies that protect against an enormous diversity of circulating meningococcal strains in a microbial population that is also constantly evolving over time. There was a requirement to develop a vaccine typing scheme to characterize any carriage or invasive meningococcal isolate. This effort resulted in the identification of the MATS assay as predictor of vaccine coverage. Since MATS can be applied only to cultivable strains, and considering that more than 50% of cases do not have an isolate, genomic driven predictor of coverage, such as BAST [Bexsero Antigen Sequence Typing (78)] or gMATS (genetic MATS), under development, will be instrumental to more precisely evaluate vaccine coverage.

The need for new lines of thinking emerged early in the pre-clinical phases of 4CMenB development. Given the rarity of IMD [0.5–1 case per 100,000 per annum in Europe and the Americas] (79), reliance on a surrogate of protection to select appropriate protein antigens was paramount. The acceptance of SBA as a gold-standard surrogate of protection against meningococcal surface proteins (80) by scientists and regulatory authorities was a major milestone. It meant that the pre-clinical and clinical studies could proceed to licensure without the need for the conventional phase 3 efficacy trials for which cases of invasive disease provide the key metric. Given the logistics, expense, and large numbers of subjects required to assess efficacy, it was considered unlikely that any such clinical trial could be carried out.

Indeed, overall, the complexity of 4CMenB and the pathway to licensure made unprecedented demands on both the vaccine development teams and the regulatory authorities. Dialogue and an iterative scientific interchange was essential to address all regulatory requirements and translate them into practice.

The pathway of reliance on phase 2 immunogenicity studies of 4CMenB, backed by SBA and the derivative innovation of the MATS assay, was enormously facilitated by the previous experience with the MenC conjugate vaccines whose successful implementation in the routine UK immunization programme was a game changer (81). The way forward for 4CMenB has followed along similar lines, but has been immensely more complicated. For the reasons discussed above, estimates of effectiveness for the invariant meningococcal C polysaccharide vaccine antigen were far simpler than for the variable four protein antigens of 4CMenB. One major lesson emanating from experience with the conjugate vaccines in general, specifically exemplified by data on serogroup C meningococcal conjugates, has been the extent to which their success depends on indirect, or herd, immunity (82). The mechanism of indirect protection is through curtailing transmission of meningococci and therefore decreasing the probability of new acquisitions and the risk of invasive disease. A UK study estimating the effect of meningococcal vaccines on herd protection against N. meningitidis in University students, showed that both, 4CMenB and MenACWY vaccines induced carriage reduction only for a subset of Neisseria strains, 4–12 months after vaccination (83). To date, the impact of 4CMenB on carriage of meningococci remains uncertain. Further studies are required, including those that ascertain whether there is an impact of the vaccine on bacterial load.

The inclusion of 4CMenB in the UK routine infant immunization programme since October 2014 allows postimplementation surveillance that over many years will provide crucial information on its effectiveness and duration of protection. In being given routinely only to infants, 4CMenB is not expected to prevent cases of meningococcal IMD in older children, adolescents or adults. In addition, estimates of protection, based on WGS, hSBA, and MATS, indicate that a proportion of strains lack a biologically relevant match to the antigens in the vaccine. As an example, the proportion of strains negative in hSBA in an UK strain panel was 12% (70). These in silico and in vitro predictions of vaccine effectiveness must be interpreted with caution since these metrics for estimating protection have not been validated. Two fundamentals of determining vaccine effectiveness are accurate information on immunization uptake and a robust system for disease notification. These data enable the calculation of vaccine effectiveness, as the likelihood of a child with disease being immunized (i.e., a vaccine failure) or unimmunized can be compared to that in the general population. This so-called screening method (84) was used to establish the effectiveness of the UK MenC conjugate vaccines and the OMV Vaccine in New

REFERENCES


Zealand. It is crucial to have consensus on the definitions of what constitutes a case of meningococcal disease and an "immunized" child. To this end, organizations such as the European Centre for Disease Prevention and Control (ECDC) have provided definitions (85). More problematic is the definition of vaccine failure, which can be considered at either an individual or population level. At an individual level, not every case of MenB disease in an immunized child should be seen as a vaccine failure as the disease causing strain may not have expressed the vaccine target antigens. Thus, defining what criteria should be used to identify vaccine failures remains an exercise in pragmatism, dependent for validation on the accumulation of real-time data on rates of meningococcal disease, information that will require many years of surveillance using the screening method. Further complications include the changing incidence of IMD in the UK, a sharp decline in recent years (86) and the fact that in about half of all cases, no organism is isolated and confirmation is based on PCR (87), presenting major challenges to complete characterization of the target antigens of infecting meningococcal genotypes, in terms of expression and surface accessibility. 4CMenB is an exemplar of what can be truly considered a new era in vaccines.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### ACKNOWLEDGMENTS

The authors thank Matthew D. Snape for his critical review of the manuscript and contributions to the clinical section.


vesicles (OMV): lessons from past programs and implications for the future. Hum Vaccin Immunother. (2013) 9:1241–53. doi: 10.4161/hv.24129


**Conflict of Interest Statement:** VM and MP are employees of the GSK group of companies. EM has a consultancy contract with GSK.

Copyright © 2019 Masignani, Pizza and Moxon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comprehensive Evaluation of the Expressed CD8+ T Cell Epitope Space Using High-Throughput Epitope Mapping

Paul V. Lehmann<sup>1</sup> \*, Maneewan Suwansaard<sup>1</sup> , Ting Zhang<sup>1</sup> , Diana R. Roen<sup>1</sup> , Greg A. Kirchenbaum<sup>1</sup> , Alexey Y. Karulin<sup>1</sup> , Alexander Lehmann<sup>1</sup> and Pedro A. Reche<sup>2</sup>

<sup>1</sup> Cellular Technology Ltd., Shaker Heights, OH, United States, <sup>2</sup> Laboratorio de Inmunomedicina & Inmunoinformatica, Departamento de Immunologia & O2, Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain

T cell immunity is traditionally assessed through functional recall assays, which detect the consequences of the T cells' antigen encounter, or via fluorescently labeled multimers that selectively bind peptide-specific T cell receptors. Using either approach, if the wrong antigen or peptide of a complex antigenic system, such as a virus, is used for immune monitoring, either false negative data will be obtained, or the magnitude of the antigen-specific T cell compartment will go largely underestimated. In this work, we show how selection of the "right" antigen or antigenic peptides is critical for successful T cell immune monitoring against human cytomegalovirus (HCMV). Specifically, we demonstrate that individual HCMV antigens, along with previously reported epitopes, frequently failed to detect CD8+ T cell immunity in test subjects. Through systematic assessment of T cell reactivity against individual nonamer peptides derived from the HCMVpp65 protein, our data clearly establish that (i) systematic testing against all potential epitopes encoded by the genome of the antigen of interest is required to reliably detect CD8+ T cell immunity, and (ii) genome-wide, large scale systematic testing of peptides has become feasible through high-throughput ELISPOT-based "brute force" epitope mapping.

# Keywords: epitope, peptide, HCMV, EBV, MHC, HLA, T cell, ELISPOT

#### INTRODUCTION

Unlike B cells that endow humoral immunity through secretion of immunoglobulins, T cells exert their protective functions through direct interactions with antigen bearing cells. Conventional T cells recognize, via their T cell receptor (TCR), processed peptide fragments derived from polypeptide antigens, hereafter referred to as epitopes. These epitopes are displayed for T cell recognition on dedicated antigen-presenting molecules that are encoded by the major histocompatibility complex (MHC), which in humans is termed human leukocyte antigen complex (HLA). The TCR binds a binary complex of the epitope aligned in the peptide-binding groove of the MHC molecule (1, 2) displayed on the surface of antigen presenting cells (APC).

The TCR of the CD8+ subpopulation of T cells (CD8+ T cells) recognize epitopes presented in the context of MHC Class I molecules (MHC-I), while the TCR of CD4+ T cells recognize epitopes presented by Class II (MHC-II) molecules. During infection or vaccination, MHC molecules on

#### Edited by:

Rene De Waal Malefyt, Merck, United States

#### Reviewed by:

Katalin A. Wilkinson, Francis Crick Institute, United Kingdom António Gil Castro, University of Minho, Portugal

#### \*Correspondence:

Paul V. Lehmann paul.lehmann@immunospot.com

#### Specialty section:

This article was submitted to T Cell Biology, a section of the journal Frontiers in Immunology

Received: 16 November 2018 Accepted: 11 March 2019 Published: 26 April 2019

#### Citation:

Lehmann PV, Suwansaard M, Zhang T, Roen DR, Kirchenbaum GA, Karulin AY, Lehmann A and Reche PA (2019) Comprehensive Evaluation of the Expressed CD8+ T Cell Epitope Space Using High-Throughput Epitope Mapping. Front. Immunol. 10:655. doi: 10.3389/fimmu.2019.00655 APC will be loaded with peptide fragments derived from the individual proteins (antigens) that constitute the infectious agent (the antigen-system), and are transported to the APC's surface for T cell recognition (3). The specific peptide fragments from an antigen that are presented to T cells is primarily dictated by the MHC molecules encoded by an individual, and like the MHC molecules themselves, underlies considerable interindividual variability. MHC I and II molecules are polygenic, and moreover each locus underlies extensive allelic polymorphism. Consequently, individuals in an outbred population will express a unique set of MHC molecules (4). As the polygenism and polymorphism of MHC primarily affects the peptide binding grove, each MHC locus and allele essentially encode for a distinct peptide binding motif, i.e., peptide binding specificity (5). While human subjects can share a common HLA-allele, there are few, if any, unrelated individuals on this planet who express the same constellation of HLA molecules. For this reason, it is extremely unlikely that two humans on this earth whose T cell system, when faced with the need to recognize a given infectious organism, would recognize an identical set of epitopes. The potential epitope space, that is, the array of peptides that meet MHC binding criteria in a given host is, therefore, unique to that individual. The highly individualized MHC-restricted recognition of epitopes by T cells is thought to have evolved to protect the species against antigenic mimicry: if a pathogen succeeds to evade immune recognition in one individual, or a subset of individuals sharing a certain MHC allele, it might endanger this subject or subpopulation, but not the species as a whole (6).

The T cell response elicited by an infection or immunization is dictated not only by the above inter-individual diversity of epitopes displayed for T cell recognition, but also by the inherent variability in the TCR repertoire possessed by different individuals. An individual's TCR repertoire is shaped by genetics, but also through selection during the development of self-tolerance, and by foreign antigens during environmental exposures or infections (7, 8). Due to such TCR repertoire variations, different individuals' T cells will not respond uniformly to the potential epitope space. For example, even if individuals share an HLA molecule, such as the HLA-A ∗ 02:01 allele, which should present the same set peptides, these individuals will not generate a uniform T cell response to these peptides (9). The array of epitopes actually recognized by T cells (constituting the expressed T cell repertoire) can therefore differ between individuals even if the potential epitope space (dictated by the MHC) is identical or overlapping between these individuals. The expressed T cell repertoire encompasses only a fraction of the expressed epitope space (10), but being dictated by the latter, will also be highly unique to each individual.

T cell immune monitoring aims at defining the expressed T cell repertoire, that is, the magnitude and quality of the antigen-specific T cells that have been generated in the course of the immune response. The magnitude of the expressed T cell repertoire is determined by the sum of all T cells that target all individual epitopes of the antigen system (e.g., virus) that have triggered a T cell response in the individual. Likewise, the quality of the elicited T cell response is defined by the effector functions of these T cells, such as the type of cytokine they secrete, or whether they are cytolytic. Given the multitude and variability of epitopes (peptides) of an antigenic system recognized in different individuals, a major challenge for T cell monitoring has been identification of the "relevant" epitopes (i.e., test peptides) for each test subject. In an attempt to address this, the mainstream approach has focused on one, or very few, epitopes restricted by an individual HLA allele and to study T cell recognition of peptides that have either been previously reported, or predicted, to be restricted by that allele due to its peptide binding motif (11). Owing to the high prevalence of the HLA-A<sup>∗</sup> 02:01 allele in the Caucasian population, immune monitoring has frequently focused on peptides restricted to this particular HLA-A allele. Approximately half of the Caucasian population expresses the HLA-A<sup>∗</sup> 02:01 allele (12), but it is less frequent in other races.

Although narrowing immune monitoring to single HLAalleles is common, presently it is unclear how adequate this approach is. Any T cell immune monitoring effort that restricts itself to one or few epitopes presented by one HLA allele of a test subject inherently has the disadvantage of neglecting the remainder of HLA alleles expressed by the test subject. For example, selecting a single HLA Class I allele (one of up to six capable of presenting epitopes to CD8+ T cells in each test subject), and one test peptide capable of binding to the selected HLA molecule, can only be a valid approach to CD8+ T cell immune monitoring if CD8+ T cells indeed prevalently target that single epitope in the test individuals. Such immune dominance, however, might be the exception in humans, as we tend not only to respond to a multitude of epitopes of an antigen, but also with highly variable clonal sizes to the individual epitopes (9). Therefore, ideally, T cell immune monitoring should comprehensively include the entire potential epitope space, encompassing all peptides that could be restricted by all HLA alleles present in every test subject. This can be accomplished by using pools of such peptides, or by testing all peptides individually, in a "brute force" highthroughput approach (13–15). The former, while much simpler in scope, enables comprehensive assessment of the expressed antigen-specific T cell repertoire, but does not reveal the epitope specificity of the antigen-reactive T cells, whereas the latter also provides this valuable information.

The ability to perform high-throughput testing of hundreds, even tens of thousands of peptides, on individual test subjects has only recently become technologically feasible (16, 17). The hurdles that needed to be overcome included limitations in peripheral blood mononuclear cell (PBMC) numbers available from humans, ease of access to extensive custom peptide libraries, high-throughput T cell assay platforms, and automated data analysis including databasing. While such brute force epitope mapping seems to call for a major effort that can be undertaken only by industrial scale laboratories with essentially unlimited manpower and budgets, we have developed and report here high-throughput epitope mapping strategies that can be readily applied in even small academic laboratories operating on tight budgets. In this paper we outline how it can be implemented. Before doing so, using CD8+ T cell immunity to HCMV (and, **Supplementary Material** for EBV and influenza virus) as a model, we illustrate how incomplete our understanding of an

individual's epitope space is to date, and thus how far we still are from comprehensive T cell immune monitoring. We also show the feasibility of systematic high-throughput ("brute force") epitope scans as the necessary next step to fill this void.

# MATERIALS AND METHODS

Peripheral blood mononuclear cells (PBMC) from healthy human donors (HD) were obtained from CTL's ePBMC library (CTL, Shaker Heights, OH, USA). The PBMC had been collected by HemaCare Blood Donor Center (Van Nuys, CA) under HemaCare's IRB and sold to CTL identifying donors by code only while concealing the subjects' identity. All PBMC were from healthy adults who had not taken medication within a month of the blood draw that might influence their T cell response. In addition, tests were done on each donor at HemaCare's CLIA certified laboratory to identify common infections: serological testing was done for Syphilis, Human cytomegalovirus (HCMV), Epstein Barr Virus (EBV), Hepatitis B, Hepatitis C, Human Immunodeficiency Virus, Human T-Lymphotropic Virus, and Trypanosoma cruzi. Subjects positive for any of these infections, except for HCMV and EBV were disqualified for the ePBMC library. The CMV and EBV serological status is specified for donors and cohorts in the manuscript. The donor's age, sex, ethnicity and HLA-type is shown in **Supplementary Table 1**. HLA-typing was done by the HLA Typing Laboratory of University of Oklahoma Health science Center (Oklahoma City, OK). For all experiments donor cohorts were selected according to the HCMV or EBV serostatus specified, and then tested in T cell assays with the T cell assay results shown. The cryopreserved cells were thawed following an optimized protocol (18) resulting in viability exceeding 90% for all samples. The PBMC were resuspended in CTL-TestTM Medium (from CTL). CTL-TestTM Medium is serumfree and has been developed for low background and high signal performance in ELISPOT assays. The number of PBMC plated in the different ELISPOT experiments varied between 2 and 4 × 10<sup>5</sup> PBMC per well, with the number specified in Results for each experiment. The ELISPOT data are expressed as spot forming cells per million input PBMC (SFC/ 1 × 10<sup>6</sup> PBMC).

#### Antigens and Peptides

The peptide pools representing the HCMV antigens listed in **Table 1**, and EBV antigens specified in **Supplementary Table 2**, consisted of 15-mer peptides that covered the entire amino acid (aa) sequence of the respective proteins in steps (gaps of) 11 aa. Each of these peptide pools was purchased from JPT (Berlin, Germany), and were tested at a final concentration of 1µg/mL in ELISPOT assays. The HCMV pp65495−<sup>503</sup> peptide is an immunodominant peptide of HCMV that is presented by the HLA-A<sup>∗</sup> 02:01 allele (19). This HCMV pp65495−<sup>503</sup> peptide, along with additional HCMV, EBV, and influenza virus peptides specified in **Figure 3**, **Supplementary Figures 2**, **3**, respectively, were purchased from Panatecs (Heilbronn, Germany) at >95% purity according to the manufacturer's specification. Each of these purified, single peptides were tested at 1µg/mL in ELISPOT assays. HCMV and EBV grade 2 antigens (UV-inactivated virions) were purchased from Microbix (Mississauga, Ontario, Canada) and tested at a final concentration of 30µg/mL. CPI (from CTL) was used as a positive control in all experiments because unlike CEF peptides, CPI elicits T cell recall responses in all healthy donors (20). CEF is a pool of 32 immune dominant nonamer peptides derived from CMV, EBV and the influenza virus commonly used as a positive control for CD8+ T cell recall (20). CPI is a combination of protein antigens derived from CMV, influenza and parainfluenza viruses, and was used at a final concentration of 6µg/mL in ELISPOT assays.

The 553 nonamer peptides, spanning the entire HCMV pp65 sequence in steps of single aa were purchased from JPT as a FastTrack CD8 epitope library, and are hereafter referred to as pp65 9-mer peptides. Individual pp65 9-mer peptides were not subjected to further purification following their synthesis; however, individual peptides were verified and quantified by JPT using LC-MS. The average purity of the pp65 9-mer peptides was 56%, and the purity of individual 9-mer peptides that elicited CD8+ T cell recall responses are specified in **Table 2** for each respective peptide. The pp65 9-mer peptides were delivered as lyophilized powder with ∼25 µg of a single 9-mer peptide present in a designated well of a 96-well plate, and were distributed across six 96-well plates. Individual peptides were initially dissolved using 50 µL DMSO, followed by addition of 200 µL of CTL-Test Medium generating a "primary peptide stock solution" at 100 µg/mL with 20% v/v DMSO. From each of these wells, using a 96-well multichannel pipettor, a "secondary, 10X, peptide stock solution" was prepared, with peptides at 2 µg/mL. On the day of testing, 20 µL from each well was transferred "en block," with a 96-well multi-channel pipettor into pre-coated ImmunoSpot <sup>R</sup> assay plates containing 80 µL CTL-Test Medium. Finally, 100 µL of PBMC (containing 3 × 10<sup>5</sup> cells) in CTL-Test media was added "en block" to achieve a final peptide concentration of 0.2 µg/mL in the ELISPOT assay.

# Human IFN-γ ELISPOT Assays

Single-color enzymatic ImmunoSpot <sup>R</sup> kits from CTL were used for the detection of IFN-γ producing cells. Test procedures followed the manufacturer's recommendations. In brief, antigens/peptides were plated at the specified concentrations into capture antibody-precoated ELISPOT assay plates in a volume of 100 µL per well, dissolved in CTL-Test Medium. The plates with the antigen were stored at 37◦C in a CO<sup>2</sup> incubator for less than an hour until the PBMC were ready for plating. The PBMC were added in the numbers specified (between 200,000 and 400,000 cells/well) in 100 µL CTL-Test Medium and cultured with the antigens/peptides for 24 h at <sup>37</sup>◦C and 9% CO<sup>2</sup> in an incubator. After removal of the cells, addition of detection antibody, and enzymatic visualization of plate-bound cytokine, the plates were air-dried prior to scanning and counting of spot forming units (SFU). ELISPOT plates were analyzed using an ImmunoSpot <sup>R</sup> EpiScanTM Reader, by CTL. SFU were automatically calculated by the ImmunoSpot <sup>R</sup> Software for each stimulation condition using the AutogateTM function (15).

TABLE 1 | HCMV-seropositive and -seronegative healthy human donors' reactivity to 20 HCMV antigens represented by peptide pools.


Pools of 15-mer peptides were tested that span the polypeptide sequence of each specified HCMV protein in steps of 11 amino acids, with the number of peptides contained within each pool specified in parentheses. Media was used as negative control and CPI as the positive control. The results are expressed as SFU/ 1 × 10<sup>6</sup> PBMC.

a lmmediate-Early protein 1 of HCMV; <sup>b</sup> lmmediate-Early protein 2 of HCMV; <sup>c</sup>65 kDa phosphoprotein (pp65) of HCMV; <sup>d</sup>Uncharacterized protein UL28 of HCMV; <sup>e</sup>Large structural phosphoprotein UL32 of HCMV; <sup>f</sup> Uncharacterized protein UL36 of HCMV; <sup>g</sup>Uncharacterized protein UL40 of HCMV; <sup>h</sup> subpool of Deneddylase UL48 of HCMV; <sup>i</sup> subpool of Deneddylase UL48 of HCMV; <sup>j</sup>Envelope glycoprotein B (UL55) of HCMV; <sup>k</sup>71 kDa phosphoprotein (UL82) of HCMV; <sup>l</sup>Capsid-binding protein UL94 of HCMV; <sup>m</sup>Tegument protein UL99 of HCMV; <sup>n</sup>Protein UL103 of HCMV; <sup>o</sup>0rf UL151 of HCMV; <sup>p</sup>0rf UL153 of HCMV; <sup>q</sup>Unique short US3 glycoprotein of HCMV; <sup>r</sup>Tegument protein US24 of HCMV; <sup>s</sup>Uncharacterized protein HHRF4 (US29) of HCMV; <sup>t</sup>Uncharacterized protein HHRR7 (US32) of HCMV.

### Statistical Analysis

Purified peptides and protein antigens were tested in triplicate wells. For these, a paired one-tailed Student's t-test was performed to identify positive responses relative to the medium control wells. As ELISPOT counts follow Gaussian (normal) distribution among replicate wells, the use of parametric statistics was justified to identify positive and negative responses, respectively (21). A p < 0.05 was considered as the cutoff for positive responses induced by the purified peptides. The 553 individual peptides of the pp65 9-mer peptide library were tested in single wells. For these peptides, the threshold for a positive response was set at exceeding 5 SD of the mean SFU count detected in 18 replicate media control wells.

#### HLA-Binding Predictions

We assessed peptide-HLA I presentation by predicting peptide-HLA I binding using HLA I allele specific profile motif matrices (22–24). We considered that a given peptide binds to a specific HLA I molecule when its binding score ranks within the top 3% percentile of the binding scores computed for 1,000 random 9-mer peptides (average amino acid composition of proteins in the SwissProt database).

# RESULTS

# T Cells Target Multiple Antigens of HCMV

The genome of HCMV encodes multiple viral proteins, each of which could constitute a viable target antigen for T cell recognition. To this end, we tested 20 such HCMV antigens, specified in **Table 1**, for their ability to recall T cell responses in healthy human donors. Peripheral blood mononuclear cells (PBMC) obtained from six HCMV-seropositive and six HCMVseronegative human subjects were challenged for 24 h with the specified HCMV antigens to selectively stimulate the respective antigen-specific T cell populations to secrete IFN-γ. IFN-γ production was measured in a standard ELISPOT assay format in which the cytokine is captured on the membrane around the cells that secrete it, permitting the visualization and quantification of individual IFN-γ-secreting T cells as "spot forming units" (SFU). Thus, this assay measures, at a single-cell level, the number of T cells that engaged in IFN-γ production following antigen stimulation (25). The individual HCMV antigens used


 counts 1 × 10 PBMC are shown only for peptides that elicited recall responses with peptides being identified by amino acid positions and sequence. Positive results were defined as antigen-triggered SFU counts that of the medium control SFU counts (the latter calculated from a total of 18 medium control wells tested per donor on six plates). While all five donors shared the HLA-A \*02:01 allele, the other HLA class I alleles expressed are in the table as they could also serve as restriction elements for the peptide-specific CD8+T cells.

 also

for stimulation were 15 amino acid (aa) long peptides that collectively spanned the respective polypeptide sequences in steps of (skipping) 11 aa, hereafter referred to as peptide pools. Each peptide was present at ∼1µg/mL within the respective peptide pools, and the number of peptides contained in each pool is specified in **Table 1**.

Stimulation of all six HCMV-seronegative donors with each of the twenty HCMV peptide pools failed to elicit an increased number of IFN-γ-producing T cells relative to PBMC cultured in media alone (**Table 1**). However, each of these HCMVseronegative donor PBMC robustly responded to a combination of cytomegalovirus (C), parainfluenza (P), and influenza (I) antigens, collectively referred to as CPI (20), which confirmed T cell functionality in the respective samples (**Table 1**). The inability to detect a recall response to the HCMV peptide pools in HCMV-seronegative donors, in the face of their CPI reactivity, establishes the exquisite specificity of the HCMV peptide pooltriggered recall responses.

Stimulation of all six HCMV-seropositive donors' PBMC, in contrast, revealed recall responses to several of these HCMV antigens (**Table 1**). T cells specific for IE-1, pp65, and UL55 were detected in all six HCMV-seropositive donors, but the magnitude of recall responses was variable between donors, and varied also within a donor, ranging from relatively low SFU counts (in the tens) to high counts (in the hundreds). As the peptide pools tested on all donors were the same, and these were tested in a single experiment, the variability of responses observed must lie in the T cell compartment itself. There was no apparent response hierarchy seen for IE-1, pp65, and UL55. The IE-2, UL28, UL32, UL36, UL82, UL94, UL103, UL153, and US3 peptide pools also elicited recall responses in at least half of these donors, and again there was no clear response hierarchy seen against these antigens. For example, Donor 64 exhibited a high frequency recall response to both the UL36 and UL55 peptide pools, with negligible responses against several other peptide pools. In contrast, the response against pp65, UL32, and US3 prevailed in Donor 99. In aggregate, IE-1, pp65, UL28, UL32, UL36, UL55, UL94, US3, and US24 peptide pools were preferentially recognized by effector T cells from these HCMV-seropositive donors. Importantly, these antigens were not consistently immunodominant, whereas IE-2, UL40, UL48-1/2, UL99, UL103, UL151, UL153, US29, and US32 were either not recognized, or recalled a low frequency of T cells in the test subjects.

Collectively, the data presented in **Table 1** demonstrate that for monitoring T cell immunity to HCMV one should not focus on a single viral antigen. Even if an antigen is immune dominant in a subset of donors, it might be subdominant or even negative in other donors who respond strongly to other antigens of the virus. Selecting one antigen over another may not only grossly underestimate the magnitude of the expressed T cell repertoire, but can also misrepresent it entirely. For example, pp65 was co-dominant with UL32 and US3 in Donor 99, whereas pp65 was co-dominant with IE-1 and UL55 in Donor 137. However, pp65 can also be a subdominant antigen. Such was the case for Donor 182, in whom the number of T cells responsive to IE-1, UL55, UL94, and US3 were each 10 times higher than those targeting pp65. Therefore, if pp65 reactivity would be used in this

5 SD

listed

particular donor to assess the magnitude of the HCMV-specific T cell repertoire, one would detect only 1.6% of the HCMV antigen-specific T cells responsive to the epitope space covered by the 20 HCMV antigens used for testing. For HCMV, therefore, these data highlight the need to comprehensively test all antigens of the virus to reliably assess the magnitude of the anti-viral T cell response. This notion might apply to other viruses as well, since similar observations were made through testing T cell recognition of peptide libraries representing several EBV antigens (**Supplementary Table 2**).

# CD8+ T Cell Recognition of HCMV Epitope pp65(495-503) in HLA-A∗02:01-Positive Test Subjects

T cell immune monitoring is frequently performed using single peptides. Specifically, the NLVPMVATV peptide, corresponding to amino acids 495-503 of HCMV's pp65 protein (pp65495−503) has been described as an immune dominant epitope of HCMV in HLA-A<sup>∗</sup> 02:01-positive subjects (19). Having established that restricting immune monitoring to only the pp65 antigen might be insufficient, we next sought to establish how reliably restricting immune monitoring to a single pp65 peptide would reflect on the entire pp65 epitope space. As one approach to address this question, we tested 32 HLA-A<sup>∗</sup> 02:01-positive, HCMV-seropositive healthy human donors for reactivity against the single pp65495−<sup>503</sup> peptide and the pp65 peptide library in parallel using an IFN-γ ELISPOT assay (**Figure 1**; the raw data are shown in **Supplementary Table 3** permitting the identification of individual donors). As controls, PBMC from sixteen HLA-A<sup>∗</sup> 02:01-positive, HCMV-seronegative donors were tested in the same manner. Each of these control subjects' PBMC failed to respond to either antigen (**Figure 1A**), while exhibiting a strong recall response to CPI (data not shown). The absence of a recall response to the pp65495−<sup>503</sup> peptide, and the pp65 peptide pool, in HCMV-seronegative donors serves to further support the specificity of the recall response these antigens elicited in HCMV-seropositive individuals.

For the HLA-A<sup>∗</sup> 02:01-positive, HCMV-seropositive donor cohort (n = 32), the number of T cells responsive to either the pp65495−<sup>503</sup> peptide or pp65 peptide pool was determined (**Figure 1B**). Since our objective was to address whether the pp65495−<sup>503</sup> peptide was indeed immunodominant within the epitope space of the pp65 antigen, we normalized the number of T cells activated by the pp65 peptide pool to 100% and expressed the number of pp65495−<sup>503</sup> peptide reactive T cells as a percentage of this total (**Figure 1C**). Using 50% as an arbitrary cut-off for immunodominance, 15 of these 32 (47%) donors exhibited a dominant response to the pp65495−<sup>503</sup> peptide. Therefore, in approximately half of HCMV-seropositive HLA-A ∗ 02:01 subjects, reactivity to the pp65495−<sup>503</sup> peptide accurately reflected the expressed T cell repertoire against the pp65 antigen. However, in 17 of the remaining HCMV-seropositive HLA-A<sup>∗</sup> 02:01 subjects (53%), reactivity against the pp65495−<sup>503</sup> peptide constituted <50% of the pp65 peptide pool-specific T cell repertoire. In these donors, it was therefore likely that additional epitopes covered by the pp65 peptide pool were also targeted, and that these might even outnumber the pp65495−<sup>503</sup> -specific T cells. Notably, in six of these HLA-A<sup>∗</sup> 02:01-positive, HCMV-seropositive subjects (19%), the pp65495−<sup>503</sup> peptide failed to recall detectable numbers of T cells whilst responses against the pp65 peptide pool were detected. In such donors,

immune monitoring with the single pp65495−<sup>503</sup> peptide, but not with the pp65 peptide pool, would yield false negative results since the expressed T cell repertoire targeted other regions of the pp65 protein. To illustrate this point further using raw data, in **Figure 2** we present representative well images of donors for which the pp65495−<sup>503</sup> peptide was immune dominant, encompassing nearly the entire epitope space covered by the pp65 peptide pool (Donor 261 in Row A of **Figure 2**). In contrast, a donor in whom pp65495−<sup>503</sup> specific CD8+ T cells cannot be detected in spite of this donor's reactivity to the pp65 peptide pool (Donor 183 in Row C of **Figure 2**), and a donor for which the pp65495−<sup>503</sup> epitope covers only a fraction of pp65's epitope space (Donor 213 in Row B of **Figure 2**) are also presented.

Overall, **Figures 1**, **2** establish that even in those rare cases in which immune dominance of a single peptide can occur, such as pp65495−<sup>503</sup> peptide recognition in HLA-A<sup>∗</sup> 02:01-positive subjects, the expressed antigen-specific T cell repertoire can be frequently underestimated by this single peptide, even to the point of obtaining false negative data. Furthermore, immune monitoring of T cell responsiveness to a single antigen, or peptide thereof, would fail to capture the extent to which other antigens of the same antigen system were targeted (see **Table 1**).

For the above comparisons of T cell recognition of the pp65495−<sup>503</sup> peptide vs. the entire pp65 epitope space, it must be noted that the numbers of pp65495−<sup>503</sup> peptide-specific CD8+ cells can be expected to be accurate, whereas the numbers obtained using the pp65 peptide pool are likely to underrepresent the entire expressed pp65 antigen-specific T cell repertoire. The pp65495−<sup>503</sup> peptide is HLA-A<sup>∗</sup> 02:01-restricted and too short for presentation by HLA class II molecules, hence it should exclusively elicit CD8+ T cells. In contrast, the pp65 peptide pool, consisting of 15-mer peptides, recalls both CD4+ and CD8+ T cells (**Supplementary Figure 1**). Fifteen-mer peptides can trigger, but are not ideal for, CD8+ T cell activation as the peptide binding groove of HLA class I molecules is closed on both ends and therefore accommodates only peptides of 9–11 aa, and moreover is intolerant to frame shifts of the peptide binding motif (11). The 15-mer peptides therefore need to undergo further processing to generate peptides 9-10 aa in length that are suitable for binding HLA I molecules, and such processing can also destroy these peptides. Moreover, as the peptides in the pp65 pool walk the pp65 sequence in steps of 11 aa, considerable gaps in epitope coverage can be expected. The possibility that the 15-mer peptide pool insufficiently covers the entire CD8+ T cell epitope space of pp65 suggests that this pool detects only a fraction of the antigen-specific CD8+ T cell repertoire. Therefore, the difference between the single peptide data, and the full antigen-specific CD8+ T cell repertoire is potentially even larger than suggested by the data shown in **Figure 1**. To this end, for comprehensive assessment of complete CD8+ cell epitope coverage of an antigen, it would be ideal to test peptides that are 9 to 10 aa long and that cover the protein sequence in steps of single amino acids, an approach we have undertaken below.

#### CD8+ T Cell Recognition of HCMV Epitopes in HLA-B35-, B44-, B7-, and B18-Positive Subjects

Peptide pp65123−<sup>131</sup> (IPSINVHHY) is HLA-B35-restricted and has been described as an epitope frequently targeted by CD8+ T cells from HCMV-infected donors bearing this HLA allele (20). We identified seven HLA-B35-positive, HCMV-seropositive, and pp65 peptide pool-reactive donors and tested these donors' PBMC for T cell reactivity to this peptide. As shown in **Figure 3A** (the raw data are shown in **Supplementary Table 4** permitting the identification of individual donors), only one of these seven donors' PBMC harbored pp65123−131-specific T cells in high numbers. In the other six donors tested, the frequency of pp65123−131-specific T cells was below the limit of detection for the assay, that is, <1 in 400,000 PBMC.

To further test whether single peptides are insufficient for accurate T cell immune monitoring, we assessed T cell reactivity against three additional pp65-derived peptides that are restricted by HLA-B alleles. The HLA-B44-restricted pp65511−<sup>525</sup> (EFFWDANDIY) peptide has previously been reported as a prevalent epitope targeted by HCMV-infected individuals expressing this HLA allele (26). We identified 15 PBMC donors who were HLA-B44-positive, HCMV-seropositive, and also reacted strongly to the pp65 peptide pool. Only one

of these donors exhibited a strong recall response to the pp65511−<sup>525</sup> peptide (**Figure 3B**; the raw data are shown in **Supplementary Table 4** permitting the identification of individual donors). Three additional HLA-B44-positive donors responded weakly to the pp65511−<sup>525</sup> peptide, while we failed to measure a detectable response in the remaining 11 donors (73%). The pp65417−<sup>426</sup> (TPRVTGGGAM) peptide is an HLA-B7 restricted epitope that has also been implicated as a prevalently recognized HCMVpp65 epitope (27). We had accessed to PBMC from six individuals who were HLA-B7-positive, HCMVseropositive, and pp65 peptide pool-reactive. As shown in **Figure 3C** (the raw data are shown in **Supplementary Table 4** permitting the identification of individual donors), only two of these donors (33%) possessed pp65417−426-specific T cells in high numbers, while in the remaining four donors, pp65417−426 specific T cells were undetectable in spite of their HCMV-positive status. Lastly, the pp65378−<sup>389</sup> (SDEEEAIVAYTL) peptide is HLA-B18-restricted and has been described as a frequently targeted epitope in HCMV-infected donors who express this HLA allele (19). We had access to three HLA-B18-positive donors who were also HCMV-seropositive and responded to the pp65 peptide pool: none of these donors displayed a recall response to the pp65378−<sup>389</sup> peptide (data not shown).

Supporting the notion put forward for the pp65495−<sup>503</sup> epitope in **Figure 1**, the data presented in **Figure 3** offer further support using three additional peptides previously described as "immune dominant" epitopes: such peptides were not necessarily targeted by T cells, and thus would frequently provide false negative results, or underestimate the HCMV-specific T cell repertoire. Instead, T cells frequently recognized alternative epitopes from the pp65 antigen, as suggested by the single-peptide negative donors' responsiveness to the pp65 peptide pool. Furthermore, pp65 is only one of many HCMV antigens recognized by the expressed T cell repertoire, and is not immunodominant in all HCMV-seropositive subjects (**Table 1**). Immune monitoring, therefore, that relies on single HCMV peptides is likely to provide false negative results in a considerable fraction of test subjects, and when peptide-reactive T cells are detected, their numbers might not provide an accurate reflection of the overall magnitude of the antigen-specific T cell repertoire in the test subject. We have made similar observations using the EBV and influenza test systems (**Supplementary Figures 2**, **3**), suggesting that this notion might hold for T cell immunity to other viruses as well.

# Brute Force Epitope Mapping of HCMV pp65 Antigen

The frequent discordance between the frequency of CD8+ T cells specific for the pp65495−<sup>503</sup> peptide vs. the pp65 peptide pool in HLA-A<sup>∗</sup> 02:01-positive, HCMV-seropositive subjects (**Figure 1**), along with the notion that the recall response triggered by the pp65 15-mer peptide pool encompasses both a CD8+ and CD4+ T cell component (**Supplementary Figure 1**) suggested that pp65495−<sup>503</sup> might not be the sole immune dominant epitope in these subjects. In particular, we hypothesized that this was the case for donors in which the pp65495−<sup>503</sup> peptide-induced recall response was weak relative to that triggered by the pp65 peptide pool (e.g., Donors 213 and 183 in **Figure 2**). To directly address this hypothesis, we set out to systematically identify all MHC class I-restricted T cell epitopes in the pp65 protein that were recognized using a brute force epitope mapping approach. Therefore, a series of 9-mer peptides was synthesized that span the entire sequence of the pp65 protein progressing in steps of a single amino acid (**Supplementary Figure 4**). These individual peptides were plated at 0.2µg/mL, one peptide sequence per well, with the 553 unique peptides spanning across six 96-well plates for each test subject. Each plate contained 3 medium only control wells, and one well allocated for a CPI positive control. In the same experiment, each donor's PBMC (at 3 × 10<sup>5</sup> cells/well) were also tested for their recall response to the pp65 15-mer peptide pool and to pp65495−<sup>503</sup> peptide (from a different synthesis, at >95% purity). Of note, the individual peptides of the 9-mer library were not subjected to further purification after their synthesis, and averaged 56% purity. The results of the IFN-γ ELISPOT assays using the individual peptides of the pp65 9-mer series, along with controls, for stimulation of four HLA-A<sup>∗</sup> 02:01 positive, HCMV-seropositive healthy donors (Donors 284, 300, 331 and 350) and a single HLA-A<sup>∗</sup> 02:01-positive HCMVseronegative donor (Donor 285) are summarized in **Table 2**.

As seen in **Table 2**, Donor 300 responded vigorously to the 15-mer pp65 peptide pool (1,681 SFU/ 1 × 10<sup>6</sup> PBMC), and with a similar frequency to purified pp65495−<sup>503</sup> peptide (1,114 SFU/1 × 10<sup>6</sup> PBMC PBMC). The magnitude of the recall response to the corresponding (unpurified) pp65495−<sup>503</sup> peptide from the peptide series was 1,074 SFU/1 × 10<sup>6</sup> PBMC, which was essentially identical to the purified pp65495−<sup>503</sup> peptide. Since the recall response induced by the purified and unpurified pp65495−<sup>503</sup> peptides provided highly similar SFU counts for all donors tested, this result suggested that the 9 mer pp65 peptide series was well-suited for assessing CD8+ cell activation despite the absence of further purification following their synthesis. Unlike other donors, Donor 300 failed to exhibit a response to any of the other peptide in the 9-mer pp65 peptide series. Therefore, the expressed T cell repertoire of Donor 300 was indeed exclusively targeting the HLA-A<sup>∗</sup> 02:01-restricted pp65495−<sup>503</sup> peptide.

For Donor 284, the pp65495−<sup>503</sup> peptide-elicited recall response was 175 SFU/1 × 10<sup>6</sup> PBMC for the purified peptide, and 60 SFU/300,000 PBMC for the corresponding 9-mer in the pp65 peptide series. Therefore, the response against the pp65495−<sup>503</sup> peptide constituted only ∼20% of the observed pp65 peptide pool response (1,004 SFU/1 × 10<sup>6</sup> PBMC). In agreement with this observation, Donor 284 demonstrated recall responses to several additional 9-mers in the pp65 peptide series. Specifically, the pp65116−<sup>125</sup> (238 SFU/1 × 10<sup>6</sup> PBMC), pp65203−<sup>211</sup> (394 SFU/1 × 10<sup>6</sup> PBMC), pp65324−<sup>332</sup> (1,145 SFU/1 × 10<sup>6</sup> PBMC) and pp65325−<sup>333</sup> (1,327 SFU/1 × 10<sup>6</sup> PBMC) peptides all stimulated IFN-γ secretion by this donor's PBMC. **Supplementary Figure 5** depicts raw data (well images) of the ELISPOT plate in which adjacent pp65324−<sup>332</sup> and pp65325−<sup>333</sup> peptides triggered a recall response in this donor. Being representative of results obtained from all plates, these well images are shown to illustrate the clarity of peptidetriggered signal over the negligible background noise in these epitope scanning assays. Since the pp65324−<sup>332</sup> and pp65325−<sup>333</sup> peptides are adjacent, the observed responses likely identify a single naturally processed CD8+ T cell epitope for this donor. Therefore, the peptide pp65495−<sup>503</sup> was only one of 4 codominant epitopes recognized by CD8 T cells from Donor 284. Likewise, Donor 331 followed a similar pattern and recognized six distinct pp65 epitopes, of which five were identified by brute force epitope scanning and were recognized in a codominant fashion.

In Donor 350, the pp65495−<sup>503</sup> peptide elicited recall response (40 SFU/1 × 10<sup>6</sup> PBMC for the purified peptide, and 38 SFU/1 × 10<sup>6</sup> PBMC PBMC for the corresponding nonamer from the 9 mer series) was only 2% of that triggered by the pp65 peptide pool (1,976 SFU/1 × 10<sup>6</sup> PBMC). For this particular donor, two adjacent peptides, pp65417−<sup>425</sup> and pp65418−426, that likely jointly reveal a single epitope, triggered a vigorous recall response with 1,496 and 513 SFU/1 × 10<sup>6</sup> PBMC, respectively. No other peptide, except for the weak recall response to pp65495−<sup>503</sup> itself, triggered SFU counts above the medium background (**Table 2**). Therefore, for Donor 350 the immune dominant epitope was pp65417−425/pp65418−426, with the pp65495−<sup>503</sup> peptide contributing only minimally to CD8+ T cell recognition of the pp65 antigen.

Unlike the above HCMV-seropositive donors, the HCMVseronegative Donor 285 failed to mount a significant response to the pp65 peptide pool or pp65495−<sup>503</sup> peptide. This donor also did not yield a positive response to any of the peptides in the pp65 9-mer peptide series, supporting the specificity of responses induced by such peptides in Donors 284, 300, 331 and 350.

Overall, these brute force CD8+ epitope scanning experiments confirm the notion that immune monitoring that relies exclusively on a single "immune dominant" peptide can largely underestimate (Donors 284 and 331), or fail to detect (Donor 350) the antigen-specific CD8+ T cell repertoire. Only in one of four instances was the single peptide approach sufficient to accurately monitor T cell reactivity against a complex antigen (Donor 300). Therefore, to comprehensively assess the magnitude of antigen-specific CD8+ T cell immunity in a test subject population, it would be advisable to either test peptide pools or cover the entire epitope space using single peptides.

#### Predicted vs. Actually Recognized pp65 Epitopes

The epitope space evaluated for pp65 consisted of 553 peptides, 9 aa in length, spanning the entire length of the protein with overlaps of 8 aa residues. We selected 9-mer peptides since the HLA I molecules have a preference for binding peptides of that size. We tested the response of 5 subjects to each of these peptides, identifying the high frequency responses shown in **Table 2**, plus 194 weaker responses, that is, SFU counts that exceeded the mean of the 18 medium control wells by 5 SD. We assessed how many of these responses could be anticipated by determining if peptides could be predicted to bind to any of the HLA I alleles expressed by the subjects (details in Material and Methods). Of note, we would have only predicted 20 of these responses, which constitute only ∼10% of experimental responses. This rate of response anticipation is likely an underestimation since we were limited by the availability of relevant profile motifs for predicting peptide-HLA I binding (detailed in Material and Methods). Thus, we could only predict peptide-HLA I binding for 8 out of 19 HLA I alleles expressed by the five test subjects detailed in **Table 2**. In any case, this rate for anticipating response is actually within the reported epitope discovery rate (10), and it is clear that most experimental responses can be predicted. Likewise, we found that positive responses represent a minority of predicted responses (**Figure 4**). In other words, we predict far more responses than we actually detected. The number of detected responses ranged from 11 out of 55 predicted for A<sup>∗</sup> 02:01 to none out of 22 for B<sup>∗</sup> 51:01. It is worth mention that the number of predicted responses is linked to both the number of peptides predicted to bind a given HLA I molecule and the number of subjects who expressed that HLA molecule. Thus, the number of predicted A<sup>∗</sup> 02:01 responses is larger than that for all the other HLA I molecules because all the test subjects were typed positive for A<sup>∗</sup> 02:01.

### DISCUSSION

Using T cell reactivity against human cytomegalovirus (HCMV) Epstein-Barr virus (EBV) and influenza as prototypical examples, we present data confirming that the epitope space of these viruses comprises numerous antigens. Moreover, these data highlight the variable hierarchy in T cell recognition of these

target antigens amongst the subjects we tested (**Table 1** and **Supplementary Table 2**). The notion that T cells target multiple viral antigens without a clear pattern of immunodominance in the human population for an individual antigen might be a generalizable finding, which is likely to apply to many other viruses as well. Therefore, immune monitoring efforts for HCMV, EBV (and possibly most other viruses), that narrows in on a single viral antigen has a high likelihood of misrepresenting the high- or low-responder status of an individual, and even of providing false negative results. Our data call attention to the need for antigen genome-wide immune monitoring efforts.

Assessment of T cell reactivity against a single, welldefined HCMV pp65 peptide (pp65495−503) vs. a pool of pp65 peptides clearly demonstrated that the single peptide-induced recall response frequently underestimated the magnitude of the expressed T cell repertoire targeting the entire pp65 antigen, and in some instances failed to detect it entirely (**Figures 1**, **2**). Likewise, assessment of T cell reactivity against additional single peptides from the pp65 antigen vs. the pp65 peptide pool further reiterated this finding (**Figure 3**). Similar observations were also made for EBV (**Supplementary Figure 2**), and for influenza (**Supplementary Figure 3**). Therefore, relying on single, previously-defined epitopes seems to be insufficient to reliably quantify, or even detect, T cell immunity against these three viruses, and likely against other viruses as well.

Focusing more closely on HCMV-seropositive, HLA-A<sup>∗</sup> 02:01 expressing test subjects which exhibited discordance between the number of T cells that responded to the "immunodominant" pp65495−<sup>503</sup> peptide and the pp65 peptide library, we tried to reconcile this discrepancy by testing whether such donors possessed CD8+ T cells that recognized alternative pp65 epitopes. We sought to test this hypothesis through "brute force" epitope scanning of the entire pp65 antigen, using individual 9-mer peptides that span the entire pp65 aa sequence. Indeed, each of these test subjects responded to additional pp65-derived 9-mer peptides and no single 9-mer peptide was universally recognized by each of these donors (**Table 2**). Taken together, these data further highlight the need to cover epitopes of an antigen comprehensively through usage of peptide libraries, rather than relying on a single or few previously defined "immune dominant" peptides.

One possible solution for performing comprehensive CD8+ T cell immune monitoring is to tailor peptide libraries to specific HLA class I alleles. Existing algorithms can be used to predict likely peptides that encompass the potential epitope space recognized by CD8+ T cells based on HLA binding criteria (28). However, considering each HLA class I allele expressed by individual test subjects, and a complex antigen such as pp65, this is a considerable scope. There will be a multitude of predicted peptides that satisfy the imposed HLA binding criteria, and these peptides will in large be different for each donor. As a much simpler alternative to such designed peptide libraries, one can take the "agnostic" brute force approach in which a peptide library is constructed that walks the entire amino acid sequence in steps of single amino acids. Using this latter method, one can conceivably cover the entire antigenic space of a protein, in any human, without the need for customization. This brute force approach can actually identify peptide responses where prediction fails, as only a minority of the responses can actually be predicted, and moreover, only a minority of predicted responses can actually be detected.

For a "brute force" approach, 9-11 aa peptides have been successfully used to detect epitope-specific CD8+ T cells (13, 14, 29). In these studies, similar to the results presented in this communication, frequently just one or two adjacent peptides will elicit a CD8+ T cell recall response (**Table 2** and **Supplementary Figure 5**) (13, 14, 29). This natural law confirmed here, that is dictated by the closed peptide-binding grove of MHC class I molecules, calls into question the use of peptide libraries consisting of peptides longer than 11 aa, and that walk the protein sequence in steps greater than a single aa. Thus, only comprehensive coverage of all potential epitopes can reveal the exact dimensions of the expressed CD8+ T cell repertoire.

Therefore, in theory, testing of peptide libraries consisting of 9–11 aa length with single amino acid overlaps would be the ideal strategy for "agnostic" CD8+ T cell immune monitoring. However, in praxi, until recently such an approach would have been considered impractical and prohibitive. This is due to (a) the number of peptides needed for such an approach, (b) the number of primary lymphoid cells (e.g., PBMC) required, (c) the labor involved in such testing, (d) the scope of data analysis and (e) interpretation of the flood of data obtained. As we illustrated in this work through systematic testing of five donors' PBMC responding to 553 single peptides, all tested in one experiment, these limitations no longer apply.

The primary notion set forth in this communication is that genome-wide comprehensive testing of 9–10 amino acid long peptides is both necessary and feasible for CD8+ T cell immune monitoring: restricting it to a few known "immune dominant" peptides is likely to underestimate or even entirely miss the antigen-specific CD8+ cell repertoire. Immune monitoring of CD8+ cells critically depends on the precise use of peptides, while immune monitoring of CD4+ cells can be done without involving peptides, using the entire protein. Our data also support the notion that large scale peptide scans have become feasible for systematic epitope discovery. Major steps have already been undertaken in this direction for CD4 + T cells (30, 31). Because HLA Class II molecules' peptide-binding grooves are open on both ends, they can both accommodate longer peptides, and are tolerant to frame shifts of the peptides' HLA anchor residues. For this reason, one can expect CD4+ cell determinant mapping to be largely comprehensive when the classic approach is taken using longer peptides (e.g., 15–25-mers) that cover the antigen sequence in larger steps (e.g., 5–15 amino acid increments). This classic approach for CD4+ cells would likely miss many or even most CD8+ cell epitopes, however (see **Table 1**, and **Supplementary Figure 5**). Systematically covering all possible non-amer peptides for mapping CD8+ T cell determinants multiplies the number of peptides to be tested, and thus the scope of testing. Our data suggest that, even for CD8+ T cells, comprehensive large-scale genome wide epitope discovery is approaching feasibility.

Recent advances in high-throughput peptide synthesis technologies have enabled manufacturers to offer extensive, even full proteome-spanning peptide libraries at sufficient quality and low cost, including the customization of such libraries. Unpurified peptides are suitable for screening purposes because the HLA alleles and TCR present in the PBMC cultures can be expected to select the "right" peptide synthesis variants that primed the T cell responses in vivo. Confirming this notion, we found (see **Table 2**) that in all five donors tested the purified pp65495−<sup>503</sup> peptide recalled essentially identical numbers of CD8+ T cells as the corresponding unpurified peptide from the 9-mer pp65 peptide series. However, as crude peptides are variable in both their purity level and yield, in general, it is advisable to validate individual epitopes identified in such screening experiments using more stringently purified peptides. Thus, the availability and manufacture of extensive peptide libraries no longer precludes brute force T cell epitope mapping.

The greatest obstacle, and rate-limiting factor, for brute force epitope mapping is the number of PBMC that can be obtained from a single test subject. For testing 553 peptides at 3 × 10<sup>5</sup> PBMC/well, plus 23 control wells, we required 1.73 × 10<sup>8</sup> PBMC from each test subject. Up to 5 × 10<sup>8</sup> PBMC can be obtained by classic venipuncture. As an alternative, which we relied upon, one can readily obtain 2 × 1010, or more, PBMC from a human subject in a single leukapheresis draw while depleting that individual of only 1% of his/her white blood cells. With 2 × 10<sup>10</sup> PBMC, using ELISPOT, one has sufficient cells to test up to 6 × 10<sup>5</sup> individual peptides!

When ELISPOT assays are performed in 96-well plates, between 0.1 and 1 × 10<sup>6</sup> PBMC are plated per well. In this cell density range, the SFU counts (e.g., the number of antigenspecific T cells detected) is strictly linear to the number of APC plated (32). In the 96-well format, 1 × 10 <sup>5</sup> PBMC per well is the lowest cell input that yields reliable data. Lower numbers of PBMC in the 96-well assay no longer form a confluent cell layer at the bottom of the well, and thus predictable T cell-APC interactions become disrupted. Recently, 384-well ELISPOT plates have become available. The membrane size on the bottom of a 384-well ELISPOT plate is one-third (not the expected one-fourth) that of the 96-well plate. The 384-well format permits miniaturization of ELISPOT assays to precisely one-third compared to a 96-well plate, whereby plating precisely one-third the numbers of PBMC yields precisely one-third of the SFU counts (32). Thus, epitope mapping experiments could theoretically be performed using 3 × 10<sup>4</sup> PBMC per well (per peptide). Based on these basic parameters, using 5 × 10<sup>7</sup> cells (corresponding to 50 mL of blood), one can test 500 individual peptides in a 96-well plate format. Using 2 × 10<sup>10</sup> PBMC, acquired through leukapheresis, 2 × 10<sup>5</sup> individual peptides can be tested in the 96-well format with 1 × 10<sup>5</sup> PBMC per well. The number of peptides that can be tested individually in a 384-well format follows by multiplying the above numbers by 3, that is, using the 384-well format, as many as 6 × 10<sup>5</sup> individual peptides can be tested against PBMC from a donor following a single leukapheresis. Thus, similar to the number of peptides, the number of PBMC required for large scale brute force epitope mapping is not an insurmountable limitation.

Testing of large numbers of peptides in a T cell assay requires well-developed logistics. In our case, the peptides were delivered by the manufacturer as powder in the desired well format spread across six individual 96-well plates. The peptides were then dissolved "en block" using a 96-well multichannel pipettor. In this way, master peptide plates were conveniently created from which the peptides were then transferred, again "en block," into the actual assay plates. Using such a strategy, both the dilution and plating of peptides was fast and fail-proof since the peptide layouts of the master plates were preserved. The PBMC were also added using the 96-well multichannel pipettor to conclude the most labor-intensive components of the assay: the cell culture setup. After a 24 h in vitro incubation, during which peptidespecific T cells became activated and secreted IFN-γ following interaction with APC, the 96-well plates were washed to remove cellular material, and detection reagents were added to begin development of the assay, all done by 96-well pipetting.

Using the approach outlined above, the setup of the experiment summarized in **Table 2**, in which 553 individual peptides plus controls were tested individually on five test subjects, took three investigators ∼3 h of work to accomplish. Using this "en block" pipetting approach, theoretically, even the extreme of testing 6 × 10<sup>5</sup> peptides in 384-well format, requiring 1,563 plates, could reasonably be achieved with the assistance of pipetting robots.

As illustrated by the data presented herein, epitope recognition by T cells typically involves considerable interindividual variability, even within HLA-allele matched donor cohorts. Therefore, systematic study of the recognized epitope space of a complex antigen system, such as a virus, will likely require testing of sizable cohorts. Because of the ease of peptide and PBMC plating when the above logistics are followed, PBMC of several donors can be tested in a single experiment. Therefore, performing the T cell ELISPOT assay itself is also not a rate limiting factor for high-throughput, brute force epitope mapping.

The logistics of reading and analyzing ELISPOT data generated in high-throughput epitope mapping experiments involves linking peptide and test subject identities with the experimental data itself. This can be accomplished using the SpotMapTM software, which permits assignment of individual peptides and PBMC donors to specific wells of 96- or 384- well plates, including usage of unique, barcode-based plate identifiers. Using this approach, the specific plate layout can be carried through from peptide synthesis, peptide transfer, plate reading and data analysis. If done in this manner, acquisition and analysis of the raw data, including assignment of positive wells and identification of the specific peptide responsible for cytokine production, is streamlined and in large fully automatic. With the assistance of a plate stacker for the thirty 96-well plates thirty 96-well plates required to generated the data presented in **Table 2**, the scanning, counting and analysis of the 14,400 wells (corresponding to 14,400 test conditions/data point), took 40 min of fully automatic reader time. With this analysis process progressing at a rate of 1 min per 96-well plate, even the "herculean" task of analyzing the results of testing 600,000 peptides in 1,563 384-well assay plates could be accomplished within 26 h of automated reader time.

#### CONCLUSION

The data presented in this report not only affirm the need for experimental epitope verification, but also serve to make the point that such experiments have become technically feasible. Testing of 553 individual peptides, using five PBMC donors in a single experiment, as reported herein, represents a milestone accomplishment in that direction. We would suggest that, using the technology presented here, it should be possible to epitope map entire genomes, e.g., of viruses. Thereby, for the first time, it will become feasible to study expressed T cell repertoires recognizing entire antigenic systems. In the coming years, we anticipate that such high-resolution studies of individuals', and cohorts', epitope space should become a reality, and will fertilize the field of "epitomics." Similar to other recently-developed

#### REFERENCES


"omics" platforms, the abundance and speed with which new information will be acquired, while overwhelming to the human mind, is destined to shed unprecedented insight into T cell epitope recognition in the context of both health and disease, and eventually will permit the precise assessment of the antigenspecific T cell repertoire as required for accurate immune monitoring of CD8+ T cell immunity.

#### ETHICS STATEMENT

The white blood cells used in this study were collected under an IRB of Hemacare, Van Nuys, California. They were sold anonymously to CTL, that is CTL, and the authors have no knowledge of the identity of these PBMC donors. Under such conditions, the third party (i.e., CTL and the authors) does not need an IRB to use such cells for research purposes.

#### AUTHOR CONTRIBUTIONS

Experiments were designed by PL, AK, and PR. Experimental data were generated by MS, TZ, DR, GK, and AL, and peptidebinding analysis was performed by PR. PL, GK, and PR prepared the manuscript.

### FUNDING

This study was funded by the R&D budget of Cellular Technology Limited.

#### ACKNOWLEDGMENTS

We thank Ruliang Li of Cellular Technology Limited for expert technical assistance.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2019.00655/full#supplementary-material


using profiles. Immunogenetics. (2004) 56:405–19. doi: 10.1007/s00251-004-0 709-7


**Conflict of Interest Statement:** PV, MS, TZ, DR, GK, AK, and AL are employees of Cellular Technology Limited (CTL), a company that specializes in ELISPOT testing, producing high-throughput-suitable readers, test kits, and GLP-compliant contract research.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Lehmann, Suwansaard, Zhang, Roen, Kirchenbaum, Karulin, Lehmann and Reche. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Application of Modeling Approaches to Explore Vaccine Adjuvant Mode-of-Action

Paul R. Buckley 1,2, Kieran Alden<sup>2</sup> , Margherita Coccia<sup>3</sup> , Aurélie Chalon<sup>3</sup> , Catherine Collignon<sup>3</sup> , Stéphane T. Temmerman<sup>3</sup> , Arnaud M. Didierlaurent <sup>3</sup> , Robbert van der Most <sup>3</sup> , Jon Timmis 2,4, Claus A. Andersen<sup>3</sup> \* and Mark C. Coles <sup>1</sup> \*

*<sup>1</sup> Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom, <sup>2</sup> Department of Electronic Engineering, University of York, York, United Kingdom, <sup>3</sup> GSK, Rixensart, Belgium, <sup>4</sup> Faculty of Technology, University of Sunderland, Sunderland, United Kingdom*

#### Edited by:

*Rino Rappuoli, GlaxoSmithKline, Italy*

#### Reviewed by:

*Thorsten Demberg, Marker Therapeutics, United States Michael Schotsaert, Icahn School of Medicine at Mount Sinai, United States*

#### \*Correspondence:

*Claus A. Andersen claus.a.andersen@gsk.com Mark C. Coles mark.coles@kennedy.ox.ac.uk*

#### Specialty section:

*This article was submitted to Vaccines and Molecular Therapeutics, a section of the journal Frontiers in Immunology*

> Received: *19 February 2019* Accepted: *27 August 2019* Published: *12 September 2019*

#### Citation:

*Buckley PR, Alden K, Coccia M, Chalon A, Collignon C, Temmerman ST, Didierlaurent AM, van der Most R, Timmis J, Andersen CA and Coles MC (2019) Application of Modeling Approaches to Explore Vaccine Adjuvant Mode-of-Action. Front. Immunol. 10:2150. doi: 10.3389/fimmu.2019.02150* Novel adjuvant technologies have a key role in the development of next-generation vaccines, due to their capacity to modulate the duration, strength and quality of the immune response. The AS01 adjuvant is used in the malaria vaccine RTS,S/AS01 and in the licensed herpes-zoster vaccine (Shingrix) where the vaccine has proven its ability to generate protective responses with both robust humoral and T-cell responses. For many years, animal models have provided insights into adjuvant mode-of-action (MoA), generally through investigating individual genes or proteins. Furthermore, modeling and simulation techniques can be utilized to integrate a variety of different data types; ranging from serum biomarkers to large scale "omics" datasets. In this perspective we present a framework to create a holistic integration of pre-clinical datasets and immunological literature in order to develop an evidence-based hypothesis of AS01 adjuvant MoA, creating a unified view of multiple experiments. Furthermore, we highlight how holistic systems-knowledge can serve as a basis for the construction of models and simulations supporting exploration of key questions surrounding adjuvant MoA. Using the Systems-Biology-Graphical-Notation, a tool for graphical representation of biological processes, we have captured high-level cellular behaviors and interactions, and cytokine dynamics during the early immune response, which are substantiated by a series of diagrams detailing cellular dynamics. Through explicitly describing AS01 MoA we have built a consensus of understanding across multiple experiments, and so we present a framework to integrate modeling approaches into exploring adjuvant MoA, in order to guide experimental design, interpret results and inform rational design of vaccines.

Keywords: vaccines, adjuvants, mathematical modeling, computational biology, systems biology, mechanistic modeling, AS01

### INTRODUCTION

Adjuvants are immunostimulants that shape and enhance the immune response to antigens through mimicking key aspects of innate pathogen recognition, leading to robust long-term memory recall responses (1, 2). Many modern vaccine adjuvants activate pattern-recognition-receptors (PRRs) expressed on innate immune cells, including toll-like-receptors (TLRs) and NOD-like-receptors (NLRs) (3–5) although there are a breadth of potential innate activation mechanisms (1, 3–6). This capacity to enhance responses not only increases efficacy but can reduce the required quantity of antigen in vaccine formulations, enhancing supply in the case of pandemic infections (3). Thus, understanding how adjuvants modulate the immune response is key to providing a mechanism-based approach to rationally tailor vaccines. While non-adjuvanted vaccines are usually capable of inducing sufficient antibody responses, it is widely characterized that adjuvants are capable of enhancing and altering the quality of humoral responses (7). Furthermore, for some diseases, including malaria, HIV, and TB, antibody responses alone are not considered to be sufficient to eliminate the pathogen (8). Thus, adjuvants which can generate both robust CD4+ T cell and strong neutralizing antibody responses are required (9).

AS01 is a liposome-based vaccine-adjuvant containing two immunostimulants: monophosphoryl lipid A (MPL) and QS-21 (Antigenics LLC, a wholly owned subsidiary of Agenus Inc., a Delaware, USA corporation). MPL is a TLR4 agonist, and QS-21 is a saponin, derived from the Quillaja saponaria soap bark tree. This formulation has been shown to enhance both antibody and T helper 1 (Th1) responses to antigens (8, 10). It is currently employed in two approved vaccines against malaria and herpes-zoster virus (9). AS01-adjuvanted vaccines have shown high efficacy results in herpes-zoster phase III trials, where two doses result in >90% efficacy against herpes-zoster, regardless of age, including in >70 year old patients (11, 12), showcasing an ability to overcome the age-related defect associated with vaccination. Furthermore, in a phase 2b clinical trial, AS01 adjuvanted vaccines provided 54% protection against active TB (13). TLR agonists like MPL are often utilized in modern adjuvants, as they modulate the type and duration of the immune response (14–17). However, adjuvants can be greatly improved by the inclusion of additional immunostimulants, as observed with AS01, where, QS-21 is found to synergise with MPL. While MPL's mode-of-action (MoA) is widely characterized, QS21's MoA is not well-understood, although it has been shown to colocalize with subcapsular-sinus macrophages (SCS-M) leading to inflammasome activation in a caspase-1 dependent manner (18–21). Caspase-1 activation can trigger pyroptosis, activation of damage-associated-molecular patterns (22) and cleave pro-IL-1β and pro-IL-18 into bioactive pro-inflammatory IL-1β and IL-18 (23, 24). IL-1β is pleiotropic in function, rather IL-18 has a specific role in innate IFNγ production (25). AS01 adjuvanted vaccines induce an inflammatory response associated with transient production of innate IFNγ in the draining lymph nodes of mice, peaking at approximately 6 h post-injection (PI), and subsiding to baseline levels by 48 h PI (8). It has been shown that this early IFNγ production is required to promote a functional CD4+ T-cell response in mouse models (10). Furthermore, in humans, serum IFNγ is associated with clinical protection (22). However, some key questions remain regarding the MoA of AS01, such as the role of key early events in the adaptive response.

Genetically modified animal models have provided vital insight into the mechanistic processes underpinning vaccine efficacy. Over the past two decades, these models have been utilized to determine AS01 MoA. These pre-clinical models have permitted investigation of individual genes and proteins, in a reductionist manner (26), however not all mechanistic questions can be addressed in this way. Systems biology methodologies including machine learning, statistical, mathematical, and agentbased models can provide a holistic perspective on MoAs through data and knowledge integration (27–30). This can permit exploration of the relationships between different components in the biological system through simulation, where systems are not viewed purely as a sum of parts, but where additional phenomena can emerge as a result of integration. These methods can capture the complexity of the biological system allowing exploration of individual or population dynamics, the role of localized microenvironments, vaccination dose and time (31), and can be used to guide and optimize experimental design (27, 30, 32–36). This permits exploring dose modulation, prioritizing research avenues and determining experimental endpoints that maximize the value of individual animal experiments. Different systems-based approaches are increasingly being applied to biomedical research problems; permitting development of novel mechanistic hypotheses, spatio-temporal analysis of function of cytokines, chemokines, growth-factors, and cell-cell interactions that currently cannot be achieved in vivo (34–38).

Yet if systems-based modeling approaches are to add value to our understanding of the biological system, it is critical that the relationship between the biological understanding and how this knowledge is captured in-silico is understood. In the realms of immunology, our previous work has shown the adoption of a principled approach to the development of such tools, focusing on developing confidence that the model is fit for its purpose as providing a platform for exploring and contributing to our understanding of real-world biological systems (30, 35, 39–43).

These concepts, however, have rarely been applied to exploring adjuvant MoAs, which are highly complex systems, spanning multiple organs, and levels of biological hierarchy. Thus, we present a framework in which we follow a principled modeling process (44) and collate knowledge surrounding AS01 MoA, which will then be used to construct simulations to explore key mechanisms of interest. We have captured our current consensus of AS01 MoA (see **Figure 1**) through interdisciplinary teamwork detailing the functionality of components underpinning how AS01 works, with focus on how the production of innate-IFNγ drives an adaptive response. This work, where a system of interest is identified, modeled, and scientific questions are elucidated, collectively comprises the "Discovery Phase" of the modeling process (44). The result is a "Domain Model" which is a model (i.e., an abstraction) of the key biological detail (44), which serves as a biological basis for simulation construction (**Figure 1E**). Domain models describe only the relevant biology, and do not describe concepts related to simulation construction, or how computer code is developed (44). Decisions on which modeling methods are utilized are taken subsequently during development of a separate model, named a "Platform Model," where mathematical and computational concepts are introduced, detailing how the Domain Model is to be implemented as a simulation (44). In this perspective, we focus on the presentation of a framework for creating a non-executable Domain Model that captures and

brings together understanding of AS01 biology, and present specific exemplars describing key component functions. To embrace simplicity, focus was placed on capturing essential components and entities where ample evidence of involvement in AS01 MoA is available. We believe that the application of this framework will complement work to explore AS01 MoA, and that these concepts should be utilized more generally to further understand adjuvant MoAs and thus enhance vaccine efficacy.

# CAPTURING THE HIGH-LEVEL CONSENSUS OF AS01 MOA IN A CELLDESIGNER MODEL

Based on murine experimental data and a consensus understanding of AS01 knowledge we have developed a hypothesis of AS01's MoA. This was initiated with a focus on key questions in AS01 biology: "In preclinical models, after intramuscular injection of a vaccine adjuvanted with AS01, how does AS01 initiate an immune response? What are the key interactions that give rise to the generation of an antigenspecific CD4+ T helper cell responses and subsequent antibody production? How does IFNγ regulate these different processes?" To address these questions, we identified and captured key cells and processes in the model. The highly visual characteristics of the Systems Biology Graphical Notation (SBGN) (46) applied in CellDesigner (47) (www.celldesigner.org) allows for improved communications which is crucial when working in cross-disciplinary teams; permitting transparency, knowledge retention, and future reusability through linking AS01 data and wider immunological literature via diagrams of biological processes. The development process, which is described below, evolved a joint understanding between research teams, resulting in one shared model (**Figure S1**). This process permitted the confirmation of knowledge gaps, and the resulting consensus of AS01 MoA will inform the remaining development of the Domain Model. The following four step process (**Figure 1A–C**) was used to capture AS01 MoA in a process diagram: **(1)** Development of a biological "cartoon" incorporating current knowledge of how the adjuvant generates an adaptive immune response in a specific host; **(2)** The "cartoon" was used to generate a formal CellDesigner model; **(3)** An iterative process of CellDesigner model development was followed, to incorporate key team ideas, capturing, refining, and extending key aspects of the biology; **(4)** The final model was scrutinized, based on team discussions, resulting in a collective understanding agreed by all parties, to generate a single combined model of the adjuvant's MoA.

## A DESCRIPTION OF THE AS01 CELLDESIGNER MODEL

While the CellDesigner model does describe some detail (these differences are depicted in yellow) of the response after a booster dose of an AS01-adjuvanted vaccine, it's main focus is the primary response. Thus, the following section describes the generation of a primary murine response to an antigen adjuvanted with AS01. As observed in vivo, in the CellDesigner model, after intramuscular injection of an AS01-adjuvantedvaccine (**Figure S1A**) the adjuvant components both activate local cells and drain into the dLN (**Figures S1C–E**) initiating the immune response (8). In the muscle (**Figure S1A**), the model captures the abstracted hypothesis that MPL and QS-21 activate a 'muscle-resident immune cell', which through chemokine secretion, recruits CCR2<sup>+</sup> LY6Chi Monocytes from the blood, into the injection site. These monocytes are capable of activation induced by MPL (48), antigen capture and migration into the dLN. This mechanism has been observed using lymphatic cannulation in an ovine (sheep) vaccination model with an AS01 adjuvanted vaccine (49). The potential for Monocytes to infiltrate via HEVs is not captured. Muscle-resident DCs (**Figure S1A**) are also capable of activation by MPL, capturing antigen, and undergoing a maturation process. To capture an abstraction of DC maturation, the key stages are distinguished in the model by an "immature DC", a "maturing DC," and a "mature DC." Immature DCs in the model express TLR4 and IFNγ receptor. Maturing DCs also express CCR7 (50) which mediates migration from tissue to the LN paracortex (51), co-stimulatory molecules (CD80/86), IL-12 receptor and vaccine peptide antigen-MHCII (pMHC) complexes. In addition to the migration of DCs and monocytes into the LN, adjuvant and antigen free-flow from the injection site through the afferent lymphatics (AL) (**Figure S1B**) into the dLN (8). After arrival in the dLN, QS-21 co-localizes with CD169<sup>+</sup> SCS-M, (**Figure S1D**) and can induce IL-18 secretion from these cells (18). In the paracortex (**Figure S1C**), secreted IL-18 delivers an activation signal to "innate IFNγ secretor cells" (ISC) (10, 25). This cell type is an abstraction to promote simplicity, encompassing Natural Killer cells, Natural Killer Tcells, innate-like CD8<sup>+</sup> T-cells, ILC1s, and gamma-delta Tcells, which have all been shown to contribute to the early, innate production of IFNγ after AS01 stimulation (10). IL-18 stimulation of ISCs is capable of promoting the production of IFNγ, augmented by IL-12 (10). During a secondary response, IFNγ levels are further augmented by IL-2 derived from antigenspecific CD4+ T cells, further promoting synergistic production of IFNγ. Furthermore, SCS-M (along with follicular dendritic cells) can capture free-flowing antigen, and transfer it to Bcells in the follicle (52), contributing to their priming. The capacity of activated monocytes to differentiate into DCs (8). IL-12 in the model is hypothesized to be secreted by a pool of DCs, [including monocyte-derived-dendritic-cells (MoDCs)] and activated monocytes. At early time points, consistent with literature, IFNγ and IL-12 production is thought to promote differentiation of naïve cognate CD4+ T-cells toward a T Helper 1 (Th1) polarization (53). T helper cells are captured in the model, although there is an abstraction at this level for diagrammatic simplicity—the cell is a single entity, where no distinction is made between Th subsets. In the lowerlevel models, the appropriate distinctions between phenotype of these cells are captured. Here, the Th cell provides expansion, immunoglobulin switching and survival signal to B-cells, or secretes Th1-associated cytokines (TNF-α, IL-2, IFNγ). The model also captures the key stages of antigen-specific B cell priming, activation, and differentiation into "antibody-secretingcells" or memory B cells, and the formation of germinal centers (**Figure S1E**). The blood compartment of the model (**Figure S1F**) captures the circulation of Th subsets after lymph node egress, and antibody circulation. The remit of the modeling exercise only requires the capture of the generation of T cell and antibody responses, and not the quality or functionality of the antibody response, nor the characteristics of the T-cell response in peripheral tissues, so these are not modeled (for inclusion and exclusion criteria, see **Datasheet 2**). Other inflammatory cytokines such as IL6, IL-1β etc. are produced during a response to AS01 however it is unclear how they contribute to early immune response to AS01.

## AN EXEMPLAR CAPTURING AND DESCRIBING KEY COMPONENTS OF AS01 MOA

To construct a simulation, lower-level behaviors, function, and interactions of components (cells and cytokines) must also be captured, thus substantiating the model. With respect to the research questions, we aim to capture an appropriately detailed description of the biology, building on, and informed by the scope of the CellDesigner model. An adapted version of the unified modeling language (UML) was employed to develop "state-machine" and "activity" diagrams (**Figures 2A,B**) (34, 35). Thus, the Domain Model comprises the CellDesigner visualization and state and activity diagrams. State-machine diagrams describe the different states a component (entity) can exist in, and requirements that govern transitions between

state. Arrows indicate transitions between the states, and information surrounded by square brackets describe a condition that must be satisfied for the transition to proceed. An expression preceded by a forward slash, indicates further information with respect to a state transition or a state. Finally, dashed lines represent concurrent states and a diamond indicates a decision is to be made, determined by a condition being satisfied. (B) A Dendritic Cell Activity Diagram: in this diagram, a black circle indicates the start of processes, and a double circle indicates the end of activities. The rectangles indicate activities and diamonds represent decisions. Horizontal black lines, are forks and joins, which indicate, respectively, the start and end of activities that occur concurrently.

these states. Activity diagrams in this context describe activities and events in the system that emerge from interactions between cellular components. Care must be taken to capture the relevant biology, and where abstractions are made, these should be appropriate, and decisions behind inclusion and exclusion of functionality and rationale for abstractions should be clearly documented to maintain a trail of the considerations involved, providing transparency of the decisions behind model development in order to permit scrutiny. Our laboratories have developed a tool for this purpose and the reader is directed to Alden et al. (42) for a detailed explanation.

**Figure 2A** is a state-machine diagram for a murine DC. This captures an abstraction of a cell type, encompassing the different DC subsets in the model (including MoDCs). This diagram captures the key functionality, at the required level of complexity of the DC for our research context. This begins with the cell entering the system, resident in either the injection site or in the dLN. Initially, the cell is immature, and whilst immature, it is phagocytic, and doesn't retain antigen (54). If it captures antigen and/or is stimulated by a TLR4 agonist, it enters the "maturing" state, where the DC has varying phagocytic capacity dependent on levels of antigen it has acquired, increases expression of pMHCII complexes, CCR7, CD80/86, and secretes IL-12. The increased expression of CCR7 allows the cell to chemotactically migrate toward CCL19/21 gradients in both draining lymphatics and in the dLN paracortex. After the time required to undergo full maturation, and if there is sufficient upregulation of CCR7 and CD80/86, the cell can become fully mature. When a DC is fully mature, it is not phagocytic, expresses high levels of pMHCII complexes, CCR7, CD80/86, and can secrete IL-12. During all life stages, this cell dynamically expresses CD11c, and TLR4. Mature DCs can also present antigen to T cells, as the diagram also captures the ability for a DC to be isolated or in a complex with another cell (i.e., when presenting antigen), and to be undergoing random or chemotactic migration. The cell exits the system if it undergoes apoptosis.

Activity diagrams are developed to describe actions and interactions of cellular components. In **Figure 2B** an activity diagram is shown for a murine DC. Initially, the cell is either muscle or dLN resident, surveying the environment, and either undergoing random or chemotactic migration. Following exposure to antigen, and/or stimulation by a TLR-4 agonist, the DC can begin maturing, as described in the state-diagrams, and can eventually become fully mature. If a cell is resident in the muscle site, due to the upregulation of CCR7, it can begin migration into the lymphatics, toward the dLN. If the cell is resident in the dLN, the upregulation of CCR7 would direct it toward the CCL19/21 gradients (which functionally, would direct it to the T cell area) (50). If the DC comes into contact with a T cell, and it undergoes a cognate interaction, the cell can deliver co-stimulatory molecules promoting activation, and either undergo apoptosis or return to migration. Not shown in this perspective, this same approach has been applied to all cell types and processes captured in the AS01 MoA CellDesigner model (see **Datasheet 1** for a list of diagrams).

# MODELING AND SIMULATION AS A BASIS FOR EXPLORING ADJUVANT MOA

Following the process outlined in this perspective, simulation can be utilized to integrate knowledge and explore hypotheses underpinning biological systems. The development of Domain Model diagrams can undergo iterative refinement driven by specific scientific questions, resulting in a domain model appropriate to address a specific question. Following this, a Platform Model is developed, and subsequently, a simulation can be constructed written in computer code (44). Simulations are then calibrated to real-world data, and usually undergo a process of validation. After construction, simulations can be inspected by a variety of analysis techniques, such as sensitivity and robustness analysis, permitting an exploration of the effects of stipulated immunological behaviors on the system (41, 45). This can elucidate important MoAs, which can be explored and validated in vivo (29, 35), thus guiding experimental design. Furthermore, systems-based techniques that explore optimization could be used to elucidate more efficient dosing schedules. For detailed reading on the entire modeling process described here, the reader is directed to Andrews et al. (44).

# CONCLUSION

We have presented a framework to capture and collate MoA knowledge and applied it to integrating and exploring AS01 MoA. This framework has informed the development of a Domain Model, capturing high-level AS01 MoA using CellDesigner, which was further substantiated through UML diagrams describing lower-level functionality. The CellDesigner visualization is an explicit description of the key, higher-level biology, resulting in a visualization with which researchers can illustrate and share their ideas and communicate knowledge and knowledge gaps. Building upon this, the UML-like diagrams captures detailed knowledge and hypotheses underpinning the system, bringing together AS01 understanding, immunological literature, and rational assumption to describe lower-level component behaviors. The resulting Domain Model not only brings together understanding about the biological system, but after appropriate refinement driven by a specific scientific question, can serve as a biological basis to construct simulations, permitting exploration of key research questions. We believe that these concepts will complement work on AS01 MoA and envision that these 3Rs-based approaches (https://www.nc3rs.org.uk/the-3rs), through viewing data holistically and complementing in vivo experimentation, can be applied more generally to improve the understanding of other adjuvant MoA, thus enhancing vaccine efficacy.

# AUTHOR CONTRIBUTIONS

PB, JT, KA, CC, AC, CA, MCC, AD, RM, and MC were involved in the conception and design of the study. CC, AC, CA, and MC acquired the data. CC, AC, CA, MCC, PB, JT, and KA analyzed the data. PB, AD, RM, ST, MC, JT, KA, CA, and MCC analyzed and interpreted the results. All authors were involved in drafting the manuscript or critically revising it for important intellectual content. All authors had full access to the data and approved the manuscript before it was submitted by the corresponding authors.

#### FUNDING

Research was funded through a PHD studentship award from BBSRC.

#### ACKNOWLEDGMENTS

The authors thank Isabelle Carletti for support regarding the data and analysis, Walthere Dewé, Tej Patel, Nabila Amanchar, Caroline Hervé (GSK), and Jason Cosgrove (Curie) for fruitful

#### REFERENCES


discussions. This work was partially funded by GlaxoSmithKline Biologicals SA. Shingrix is a trade marks of the GSK group of companies.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2019.02150/full#supplementary-material

Figure S1 | The CellDesigner Model captures the following key compartments: the muscle injection site (A), the lymphatic vessels (B), the draining lymph node (dLN) [including paracortex (C), B cell follicles (D), germinal center (E)], the blood (F), and the bone marrow (G). This model captures high-level mechanisms that are hypothesized to give rise to the phenomena observed by experimentation in mice. This begins with the intramuscular injection (34) of the vaccine at time zero (A) and captures the events leading to the secretion and circulation of antibodies (E,F), and lymph node egression of effector memory T cells (9).

Datasheet 1 | Full list of Domain Model Diagrams: the full list of diagrams that were created for the AS01 MoA Domain Model.

Datasheet 2 | A list of the entire Core AS01 MoA Domain Model Diagrams.


**Conflict of Interest Statement:** CC, AC, CA, RM, AD, ST, and MC are employees of the GSK group of companies. CA, RM, AD, MC, and ST report ownership of GSK shares and/or restricted GSK shares. MC was supported by a Marie Sklowdoska Curie Intra-European Fellowship (ref. "ADJSYN"). PB was holding a Ph.D. studentship and collaborated with GSK at the time of the study as part of his Ph.D. training. PB's work was partially funded by GlaxoSmithKline Biologicals SA through a post-graduate studentship.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Buckley, Alden, Coccia, Chalon, Collignon, Temmerman, Didierlaurent, van der Most, Timmis, Andersen and Coles. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.