The Capricious Nature of Bacterial Pathogens: Phasevarions and Vaccine Development

Infectious diseases are a leading cause of morbidity and mortality worldwide, and vaccines are one of the most successful and cost-effective tools for disease prevention. One of the key considerations for rational vaccine development is the selection of appropriate antigens. Antigens must induce a protective immune response, and this response should be directed to stably expressed antigens so the target microbe can always be recognized by the immune system. Antigens with variable expression, due to environmental signals or phase variation (i.e., high frequency, random switching of expression), are not ideal vaccine candidates because variable expression could lead to immune evasion. Phase variation is often mediated by the presence of highly mutagenic simple tandem DNA repeats, and genes containing such sequences can be easily identified, and their use as vaccine antigens reconsidered. Recent research has identified phase variably expressed DNA methyltransferases that act as global epigenetic regulators. These phase-variable regulons, known as phasevarions, are associated with altered virulence phenotypes and/or expression of vaccine candidates. As such, genes encoding candidate vaccine antigens that have no obvious mechanism of phase variation may be subject to indirect, epigenetic control as part of a phasevarion. Bioinformatic and experimental studies are required to elucidate the distribution and mechanism of action of these DNA methyltransferases, and most importantly, whether they mediate epigenetic regulation of potential and current vaccine candidates. This process is essential to define the stably expressed antigen target profile of bacterial pathogens and thereby facilitate efficient, rational selection of vaccine antigens.

iNTRODUCTiON Infectious diseases are a leading cause of morbidity and mortality worldwide. An estimated 23% of all deaths and 52% of deaths in children under the age of 5 years are caused by pathogenic microorganisms (1,2). Over the past two centuries, many vaccines have been developed that aim to prime the host immune system and protect against disease. Consequently, the morbidity and mortality of many diseases have been significantly reduced, such as polio (3), or even eradicated, such as small pox (4). Vaccination is often considered one of the greatest triumphs of medical science (5).
To date, vaccines are available against 26 pathogens; with at least a further 24 vaccines in the development pipeline (6). The manufacture and composition of these vaccines varies significantly (7): from killed-whole cell or virus vaccines [e.g., Salk's original polio vaccine (8)] and live attenuated vaccines [e.g., the measles, mumps, and rubella vaccine (9)], to "rationally designed" vaccines, which are subunit formulations specifically developed against selected cellular targets [e.g., the polysaccharide capsule-based pneumococcal conjugate vaccines (10) and the multivalent recombinant protein-based serogroup B meningococcal vaccine (11)]. The majority of available vaccines induce antibody-mediated protective immunity and target microorganisms and antigens that have little or no antigenic diversity or variability. Unfortunately, development of vaccines has been more difficult for pathogens that are antigenically diverse, as well as those that cannot be cultured in the laboratory, lack suitable animal models of infection, and/ or those that are controlled by mucosal or T cell-dependent immune responses. There is an increasing need for the development of rationally designed vaccines for these pathogens, which has been facilitated by improvements in molecular biology techniques (e.g., DNA sequencing and manipulation; protein and carbohydrate purification; and chemical conjugation methods for production of multivalent vaccines) and increased understanding of pathogen biology, host-pathogen interactions, and the requirements for immunogenicity (e.g., immune correlates of protection, and the adjuvants required to elicit this protection) (12)(13)(14)(15).
The era of "omics" and "big data" projects has unleashed a wealth of information for bacterial vaccine development, facilitating the ability to rapidly select potential vaccine antigens from genome and proteome analyses (14)(15)(16)(17). However, antigens with variable expression, due to environmental signals or phase variation (i.e., high frequency, random switching of expression), possess inbuilt immune evasion capacity and do not make ideal vaccine candidates. Phase variation is often mediated by the presence of highly mutagenic simple tandem DNA repeats [also known as simple sequence repeats (SSRs)], and genes with these sequence features need to be identified so that can be discounted as vaccine antigens. However, recent research has identified phase variably expressed DNA methyltransferases that act as epigenetic regulators in many bacterial pathogens (18). These global epigenetic regulators, called phasevarions, can switch expression of candidate vaccine antigens that heretofore have been assumed to be stably expressed.
In this review, we provide an overview of key aspects that are important during antigen selection for pathogenic bacteria and focus on the impact of phasevarions on vaccine development.

KeY CONSiDeRATiONS FOR vACCiNe ANTiGeN SeLeCTiON
For rationally designed, subunit vaccines to succeed, the selection of appropriate vaccine antigens is critical. Key features of vaccine antigens include (1) immunogenicity (i.e., the ability to elicit an immune response), (2) the ability to induce protection (i.e., the ability of the elicited immune response to prevent proliferation and/or the induction of pathology by the pathogen), and (3) conservation (i.e., the presence and sequence similarity between many/all strains of the pathogen). However, the stable expression of antigens during infection is also a critical factor in antigen selection that is often overlooked.
Several "omics" approaches are now routinely used to perform systems-based screening of potential antigens, such as genome-based reverse vaccinology, proteomics, transcriptomics, glycomics, and metabolomics (14)(15)(16)(19)(20)(21). These approaches allow high throughput identification of the potential antigens of a pathogen. The subsequent analysis of antigen conservation is a relatively straightforward process and has been assisted by the increasing availability of genomes, driven by decreases in sequencing costs (22,23). Sequence availability has also made it possible to assess antigenic drift (change by accumulation of mutations) and shift (complete replacement of antigens), both of which must be taken into account to select stable and effective vaccine antigens (24,25).
Investigation of whether the target antigen is actually expressed by the pathogen during infection in vivo is a more complex task, due to regulation by environmental signals and the potential for expression to be influenced by stochastic mechanisms. The transcription and translation of cellular factors are often contingent on environmental signals (e.g., tissue tropism, pH, and temperature) and cellular conditions (e.g., cell cycle) (26)(27)(28). For example, for pathogens such as Escherichia coli and other enteric pathogens, entry to the site of infection induces the expression of a different antigen repertoire (29, 30) that is triggered by diverse environmental or host signals such as pH (31) and temperature (32). While methods exist that allow the identification of expressed RNA (transcriptome) or protein (proteome) content under selected conditions, data collected often only represent a single physiological state that does not always reflect conditions found in the host. Accordingly, it is important to understand when and how cellular factors are expressed, to ensure that the target antigen is expressed during infection and in the same location (i.e., during mucosal or systemic infection) as the immune response elicited by the vaccine.

ANTiGeN eXPReSSiON AND THe COMPLiCATiON OF PHASe vARiATiON
Phase variation is defined as the high frequency, reversible ON/ OFF, or graded switching of gene expression, which is mediated through either genetic [e.g., due to variations in the number of simple tandem DNA repeats, or genome rearrangements (33,34)] or epigenetic [e.g., via deoxyadenosine methylase (Dam) (35)] mechanisms at individual promoters. Many antigens in bacterial pathogens are phase variably expressed. For most phase-variable genes, switching occurs randomly during genome replication, and thus antigen expression is impossible to predict. Consequently, phase-variable components are not ideal vaccine targets since cells that have low, or no, expression of the target antigen may be able to evade the immune system ( Figure 1A).
Many phase-variable genes can be identified bioinformatically, as the two main phase variation mechanisms, slipped strand mispairing and genome inversions, are well understood (36). Genes that are variable by slipped strand mispairing can be identified by the presence of multiple, tandem DNA repeats in the upstream or coding region of a gene. Slipped strand mispairing in DNA FiGURe 1 | Phase variation and immune evasion. (A) For a phasevariable outer-membrane protein, slipped strand mispairing and changes in DNA sequence repeats in the gene during genome replication lead to ON/ OFF expression of the encoded protein (blue). Antibodies to this antigen will not be effective if the protein has phased varied OFF. It is typically easy to predict phase-variable expression of these proteins due to the presence of DNA repeats (simple sequence repeat) in the coding region of the gene. (B) In phasevarions, phase-variable expression of a DNA methyltransferase causes genome-wide changes in DNA methylation, and expression differences in multiple genes due to epigenetic regulation. If these genes encode antigenic proteins/vaccine candidates, then methylation-dependent loss of expression (red protein) or reduced expression (purple protein) can lead to immune evasion as antibodies lose efficacy. However, due to the epigenetic nature of the phase-variable regulation, it is difficult to predict which proteins will have altered expression. repeats causes loss or gain of repeats units, leading to frameshift mutations (ON/OFF switching) if located in the coding region, or altered expression levels if located within a promoter or operator region. In the case of genome inversions and recombination mediated mechanisms, phase-variable genes can be identified by the presence of various genetic markers such as recombinases, inverted sequence repeats, cryptic domains, and/or via genome comparisons for local reorganization (36,37). Bioinformatic searches have been used successfully to identify numerous phasevariable genes in a variety of bacterial pathogens, such as Neisseria meningitidis (38)(39)(40)(41), Neisseria gonorrhoeae (42), Campylobacter jejuni (43), Helicobacter pylori (44), and Haemophilus influenzae (45); and these genes are typically excluded from further screening of vaccine candidates. It is interesting to note that NadA, present in the meningococcal serogroup B vaccine (4CMenB, Bexsero), is phase variable. However, the variable expression of NadA is complex and was not easily identifiable in silico; the tandem repeats are distally located upstream of the nadA promoter and regulation involves both stochastic and classical mechanisms of gene regulation (46)(47)(48). The DNA methyltransferase Dam is one of the best studied examples of epigenetic regulation in bacteria. While Dam itself is not phase variable or regulated, it is involved in phase variation of specific virulence genes in E. coli and Salmonella, such as pap (49,50) and agn43 (51,52). Dam is not believed to serve as a common transcriptional regulatory mechanism (35). Rather, competition between Dam and a particular DNA-binding regulatory protein provides opportunities for competitive stochastic switches that alter gene expression at specific target sites [reviewed in Ref. (35)].

ePiGeNeTiC ReGULATiON OF ANTiGeNS VIA PHASe-vARiABLe DNA MeTHYLTRANSFeRASeS
Phase-variable DNA methyltransferases, that act as global epigenetic regulators, have been identified in a number of pathogenic bacteria and add another layer of complexity to the process of antigen selection. Phase variation of these DNA methyltransferases results in coordinated, differential methylation of the entire genome in the DNA methyltransferases ON versus OFF variants. This leads to altered expression of a set of genes that is called a phasevarion, for phase-variable regulon (18, 53, 54) ( Figure 1B). Phasevarions exert a pleiotropic effect and are associated with variable expression of proteins from diverse functional categories, such as metabolic processes, nutrient acquisition, stress responses, and virulence, as well controlling the variable expression of vaccine candidates. Phasevarions have been characterized in numerous pathogenic bacterial species, including H. influenzae (54)(55)(56); the pathogenic Neisseria (57-59); H. pylori (60), C. jejuni (43,61), Moraxella catarrhalis (62,63), and Streptococcus pneumoniae (64) (see Tables 1 and 2).
Phasevarions present a critical challenge for vaccine development, in that the genes controlled by phase-variable DNA methyltransferases do not have easily identifiable markers to indicate their phase-variable expression -these markers are only associated with the DNA methyltransferase and not the genes it regulates. Consequently, these components may be considered as potential vaccine candidates because their expression is erroneously assumed to be stable. This could potentially result in less effective, or completely ineffective, vaccines ( Figure 1B).  In type II and III R-M systems, the DNA methyltransferases are independent proteins that dictate the specificity of the methylation site, and phase variation is typically mediated by slipped strand mispairing of SSRs in the coding sequence of the DNA methyltransferase (mod) gene ( Table 1). Changes in repeat number cause frameshift mutations and switching of Mod protein expression between "ON" (expressed) or "OFF" (not expressed) states ( Figure 2C). The type III Mod proteins are the most extensively studied ( Table 1), and multiple allelic variants exist for each system, as determined by sequence differences in the DNA recognition domain responsible for methyltransferase specificity (18,54,56,58,60,62,71). For example, 21 modA alleles (56, 58, 70), 6 modB alleles (18,58,74), and 7 modD alleles (57, 74) have been identified to date. Unlike the type I systems described above, switching between alleles by genome rearrangement within a strain has not been reported and only one allele is present in a given strain. However, horizontal transfer of allele DNA recognition domains occurs and is postulated to generate novel DNA methyltransferase alleles over time (70,75,76).
Consequently, when considering the impact of phasevarions on vaccine development, it is important to know which allele(s) are present in the bacterial species, as well as the distribution of these alleles -that is, whether certain alleles predominate among the pathogenic strains that require targeting by the vaccine. Previous studies have used PCR and Sanger sequencing methods to identify and determine alleles (55-58, 60, 62); however, the    increasing ease and lowered costs of full genome sequencing will enable the simple identification of phase-variable methyltransferases in broader, larger sample panels, as well as the identification of new or novel systems. For example, the availability of a large database of meningococcal genome sequences has recently been used to help survey the mod allele repertoire in over 1,600 isolates (74). A bigger challenge lies in defining the proteins regulated within each phasevarion, as this must be determined experimentally. This has previously been accomplished by custom transcriptomic microarray analysis (57,58,60), but is being supplanted by next generation sequencing techniques [namely RNAseq, as in Ref. (64)] and proteomic analyses [e.g., iTRAQ, as in Ref. (56,62)]. RNAseq allows the visualization of the full transcriptomic response to DNA methyltransferase phase variation, including differences in transcription of RNA genes (such as tRNAs) and non-coding RNAs (such as siRNAs and other regulatory RNAs). RNAseq will also provide valuable information about transcriptional start sites and upstream regulatory sequences for genes in the phasevarion, and possible transcription kinetics around methylation sites, enabling detailed mechanistic studies to be performed. In contrast, proteomic analyses will definitively identify the protein antigens differentially expressed by phasevarions under the conditions tested. This may differ from the transcriptomic data as RNA expression does not always correlate to protein translation, and so future studies should analyze expression data using multiple techniques in order to identify all members of each phasevarion. This will be invaluable for examining the actual changes in antigen levels and how this may affect vaccines.
The identification and analysis of genes controlled by phasevarions need to be carried out under conditions relevant to infection. This is because epigenetic regulation via DNA methylation is typically a multistep process, with DNA methylation affecting the action of regulatory proteins involved in transcription, rather than acting on transcriptional machinery itself [reviewed recently in Ref. (77)]. As such, conditions tested must be biologically relevant and allow these regulatory proteins to be active, in order to observe epigenetic regulation. This has been demonstrated by microarray analysis of the ModA11 phasevarion, where iron-limiting conditions were necessary to identify phasevarion members (mimicking iron limitation in the host, compared with standard laboratory culture conditions) (58). Unfortunately, the specific conditions that permit the full expression of the phasevarion can be difficult to determine, and bacteria should be grown under biologically relevant conditions, or if possible, collected directly from infection sites -such as from blood or mucosal surfaces. It is also critical that the whole, or representative, bacterial population is isolated and analyzed during phasevarion studies. This will allow the natural ON/OFF status and ratio of phase-variable DNA methyltransferases in the in vivo bacterial population to be understood.

CONCLUDiNG ReMARKS
The development of bacterial vaccines depends on the selection of appropriate antigens. Ideal vaccine antigens are conserved, immunogenic, and protective. They should also be consistently expressed at high enough levels during infection to be targeted by the immune system. Transient and arbitrary expression makes antigen targeting by the immune system difficult and could lead to immune evasion via escape of a subpopulation that do not express the antigen. For this reason, phase-variable antigens do not make ideal vaccine candidates.
Phase-variable regulators complicate the prediction of stably expressed antigens, as the regulated genes within a phasevarion lack overt markers that indicate potential random switching of expression. While phasevarions have been studied in a range of pathogenic bacteria, important questions remain regarding allele variability, distribution, and regulatory mechanisms. More detailed understanding of these factors will help to elucidate the full complement of phase-variable genes in human pathogens for which vaccine development has been problematic, and help facilitate robust antigen selection for rational vaccine design in the future.

AUTHOR CONTRiBUTiONS
All the authors contributed to drafting and revising the manuscript and approved the final manuscript.