Evidence for Contemporary Switching of the O-Antigen Gene Cluster between Shiga Toxin-Producing Escherichia coli Strains Colonizing Cattle

Shiga toxin-producing Escherichia coli (STEC) comprise a group of zoonotic enteric pathogens with ruminants, especially cattle, as the main reservoir. O-antigens are instrumental for host colonization and bacterial niche adaptation. They are highly immunogenic and, therefore, targeted by the adaptive immune system. The O-antigen is one of the most diverse bacterial cell constituents and variation not only exists between different bacterial species, but also between individual isolates/strains within a single species. We recently identified STEC persistently infecting cattle and belonging to the different serotypes O156:H25 (n = 21) and O182:H25 (n = 15) that were of the MLST sequence types ST300 or ST688. These STs differ by a single nucleotide in purA only. Fitness-, virulence-associated genome regions, and CRISPR/CAS (clustered regularly interspaced short palindromic repeats/CRISPR associated sequence) arrays of these STEC O156:H25 and O182:H25 isolates were highly similar, and identical genomic integration sites for the stx converting bacteriophages and the core LEE, identical Shiga toxin converting bacteriophage genes for stx1a, identical complete LEE loci, and identical sets of chemotaxis and flagellar genes were identified. In contrast to this genomic similarity, the nucleotide sequences of the O-antigen gene cluster (O-AGC) regions between galF and gnd and very few flanking genes differed fundamentally and were specific for the respective serotype. Sporadic aEPEC O156:H8 isolates (n = 5) were isolated in temporal and spatial proximity. While the O-AGC and the corresponding 5′ and 3′ flanking regions of these aEPEC isolates were identical to the respective region in the STEC O156:H25 isolates, the core genome, the virulence associated genome regions and the CRISPR/CAS elements differed profoundly. Our cumulative epidemiological and molecular data suggests a recent switch of the O-AGC between isolates with O156:H8 strains having served as DNA donors. Such O-antigen switches can affect the evaluation of a strain's pathogenic and virulence potential, suggesting that NGS methods might lead to a more reliable risk assessment.


INTRODUCTION
Shiga toxin-producing Escherichia coli (STEC) comprise a group of zoonotic enteric pathogens (Nataro and Kaper, 1998). The main reservoirs for STEC strains are ruminants, with cattle in particular. In humans, STEC infection may result in diarrhea, frequently complicated by the onset of hemorrhagic colitis (HC), or several renal and neurological sequelae, including the hemolytic uremic syndrome (HUS; Griffin and Tauxe, 1991;Su and Brandt, 1995;Paton and Paton, 1998;Remuzzi and Ruggenenti, 1998). Factors contributing to the virulence of STEC strains causing human disease, also referred to as enterohemorrhagic E. coli (EHEC), include two major phageencoded toxins, Shiga toxin 1 (Stx1) and 2 (Stx2), which can be produced and secreted by different strains individually or in combination. STEC may additionally possess virulence characteristics such as the ability to cause attaching-and-effacing (AE) lesions in the large intestine (McKee et al., 1995), and a large plasmid encoding for an enterohemolysin (hlyA/ehxA), a catalase-peroxidase (katP), and an extracellular serine protease (espP; Schmidt et al., 1995;Brunder et al., 1996Brunder et al., , 1997Brunder et al., , 1999. Lipopolysaccharide (LPS), a major component of the outer membrane, represents the principal virulence factor of gramnegative bacteria (reviewed in Lerouge and Vanderleyden, 2002). It consists of three distinct regions: lipid A, core oligosaccharide, and O-specific polysaccharides (O-antigens; Lerouge and Vanderleyden, 2002;Samuel and Reeves, 2003). While lipid A is the main driver of inflammatory responses, O-antigens are instrumental for host colonization, and bacterial niche adaptation (Reeves, 1995). O-antigens are highly immunogenic and, therefore, targeted by the adaptive immune system. Recognition by antibodies, e.g., initiates the classical complement pathway resulting in bacterial cell death or increased phagocytosis by cells of the host defense (Reeves, 1995). Due to this strong selective pressure, the O-antigen is one of the most variable bacterial cell constituents, with variation in the types of sugars present, their arrangement within the O-unit, and the linkages between O-units (Wang et al., 2001;Bazaka et al., 2011). Variation not only exists between different bacterial species, but also between individual clones within a single species (Penner and Aspinall, 1997;Stenutz et al., 2006;Lam et al., 2011;Liu et al., 2014). The existence of more than 180 O-antigens has been proposed so far for E. coli (Wang et al., 2001). This high variability is made use of in clinical and food microbiology and in epidemiology by applying serotyping for infection chain tracing and risk assessment of STEC strains isolated from patients or food for human consumption.
Three mechanisms for biosynthesis of the O-antigen seem to exist (Samuel and Reeves, 2003). The one most frequently employed by E. coli is the Wzy/Wzx-dependent pathway. The genes involved in O-antigen biosynthesis generally form an O-antigen gene cluster (O-AGC) in the chromosome. In E. coli, this cluster is flanked by the colanic acid biosynthesis gene cluster (wca genes) and the histidine biosynthesis (his) operon (Iguchi et al., 2015). The sequences of many O-AGC were used to determine the genetic basis of O-antigen evolution (Samuel and Reeves, 2003;Samuel et al., 2004). The results show that the O-AGC sections located between the gnd and galF genes have G+C contents lower (usually <40% in E. coli and Salmonella enterica) compared to the usual 51% genome average G+C content (Samuel et al., 2004). This atypical G+C content indicates that the O-AGC was acquired by interspecies horizontal gene transfer (HGT). Such an interspecies exchange was described for the O8 and O9 O-antigens of E. coli, as they are identical to O5 and O3 of Klebsiella pneumoniae, respectively (Sugiyama et al., 1997). An intraspecies switch in Vibrio cholerae from serogroup O1 to the novel serotype O139 was proposed to have been the initial event at the advent of a cholera epidemic in Asia (Mekalanos et al., 1997). This strain arose by HGT from a strain closely related to the pandemic V. cholerae O1 El Tor (Bik et al., 1995). Likewise, the acquisition in E. coli of the O157 O-AGC by an O55:H7 strain to generate the O157:H7 clone is considered critical for the evolution of this pandemic food-borne pathogen (Tarr et al., 2000;Wang et al., 2002). Besides having potential implications for the host range and virulence of clones of gram-negative bacteria, serotype switches interfere with serotype-based epidemiologic approaches to unveil infection chains and may even impact on the reliability of diagnostic workflows when these are shaped by serotype-based risk assessments. Here, we present evidence by integrating epidemiological and whole genome sequence data that an O-AGC cluster switch recently occurred between STEC and atypical enteropathogenic E. coli (aEPEC) clones of the serotypes O182:H25, O156:H25, and O156:H8 in a cattle herd (Geue et al., 2002;Barth et al., 2016).

Whole Genome Sequencing
Genomic DNA of the E. coli isolates was prepared using the ZR fungal/bacterial DNA kit (Zymo Research Europe GmbH, Freiburg, Germany) from overnight cultures in Luria Bertani broth following the instructions of the manufacturer. The DNA concentration was determined spectrophotometrically at 260 nm and analyzed for fragmentation by 1% TBE agarose gel electrophoresis. All isolates were whole genome sequenced using Illumina MiSeq 300 bp paired-end sequencing and a coverage >40× was obtained. The sequence read data was first subjected to quality control using the NGS toolkit (Patel and Jain, 2012). Reads with a minimum of 70% of bases having a phred score of >20 were defined as high quality reads. De novo assembly of resulting high quality filtered reads into contiguous sequences (contigs and scaffolds) was achieved using CLC Genomics Workbench 8.0 (CLC bio, Aarhus, Denmark). One strain per serotype (O182:H25 [13E0725], O156:H25 [13E0780], O156:H8 [13E0767]) was additionally whole genome sequenced on a PacBio RSII system (Pacific Biosciences, USA) by a commercial service provider (GATC Biotech, Konstanz, Germany) utilizing PacBio single-molecule real-time (SMRT) technology. Subsequent de novo assembly utilizing the HGAP3 protocol yielded a single polished contig with 200-fold average reference coverage. In order to ensure closed circle conformation of the bacterial chromosome, mapping, sequence analyses, and annotation were carried out using the commercial software package Geneious (version 9.1.6, Biomatters Ltd., Auckland, New Zealand). The whole genome alignments were performed by MAUVE analysis (version 2.3.1; Darling et al., 2004) as plugin in the Geneious software package.

Ethics Statement
An Ethics Statement is not necessary. The isolates were obtained by non-invasive rectal swabs during a longitudinal study already published (Geue et al., 2002). No animal experiments were carried out for this study.

RESULTS AND DISCUSSION
A previous study on STEC colonization in cattle herds identified specific STEC clones, which could be isolated from herds over extended periods of time and were therefore considered as persistently colonizing this animal reservoir (Geue et al., 2002;Barth et al., 2016). These strains expressed the flagellar serotype H25 but differed in O-antigen serotypes (O156, O165, O182). To assess the underlying genetic basis, we performed whole genome sequence analyses of these E. coli isolates. By MLST (Wirth et al., 2006), all O182:H25 and 15 of the O156:H25 isolates were assigned to ST300. The remaining 6 O156:H25 isolates were allocated to ST688. These STs differ from each other by a single nucleotide in purA only (Barth et al., 2016). By contrast, the O165:H25 isolates were classified as ST119 which is widely separated from ST300/ST688 (Barth et al., 2016).
To further analyze that the genomic similarity of O156:H25 and O182:H25 isolates, we assessed the presence and relatedness of selected fitness-and virulence-associated genes. First, we compared 53 previously described genes associated with chemotaxis and flagella of H25 (Sperandio, 2001;Niba et al., 2007; Table S1 in Supplemental Material). A 100 % identity was detected for the nucleotide and amino acid sequences. Of the 45,429 nucleotides studied, only seven nucleotides differed between the strains of the two serotypes. They were located in the flgD, flgI, flhA, fliG, fliZ, motA, and tar genes and resulted in one amino acid exchange each in FlhA, FliG, and MotA, respectively. In contrast, H25 chemotaxis and flagellar genes from isolates belonging to other O serogroups (O51:H25, O153:H25, O165:H25, O172:H25, O177:H25, ONT:H25) exhibited larger genetic distances (between 97.2 and 99% identity, Figure 1). Additionally, a region spanning ∼60 kb, which includes the complete core regions of the locus of enterocyte effacement (LEE) and the LEE insertions sites, was compared in all O156:H25 and O182:H25 isolates. The core regions were nearly identical (99.9-100%) in all isolates probed. Less than 40 nucleotides differed in the ∼33,000 nucleotides considered. The main difference detected was a nine nucleotide insertion in the espZ gene of 5 of the 15 O182:H25, but in none of the O156:H25 isolates. Identical ζ eae genes were found in all O156:H25 and O182:H25 isolates and the LEE was inserted at the same pheU/pheV tRNA site in all isolates. The sequences of the 5 ′ and 3 ′ regions flanking the integration sites were identical in all O156:H25 and O182:H25 isolates. The stxconverting bacteriophages found in all O156:H25 and O182:H25 isolates encode the Stx subtype 1a and possess identical gene sequences for the A and B subunits. Phage genomes were all integrated between the mlrA (yheY) and yheU genes. In comparison to E. coli MG1655 (accession no. U00096.2), IAI1 (accession no. NC_011741.1), and HUSEC2011 (accession no. HF572917.2), the first 160 nucleotides of the coding sequence of the mlrA gene, associated with the regulation of curli synthesis in E. coli and Salmonella enterica (Brown et al., 2001), were lacking in all O156:H25 and O182:H25 isolates. Regarding the clustered, regularly interspaced, short palindromic repeat (CRISPR) acquired immune system, which is being used for determining the evolutionary divergence of E. coli isolates, especially for closely related strains (Touchon et al., 2011;Yin et al., 2013), we identified a set of identical CRISPR associated sequence type E (CAS-E) genes adjacent to the CRISPR2.1 locus (nomenclature as described by Diez-Villasenor et al., 2010) between the cysH and iap genes in all 21 O156:H25 isolates. The STEC O182:H25 isolate 13E0725, which had been sequenced by PacBio RSII, also contained a 100% identical CAS-E region. These CAS genes were also found in the other 14 O182:H25 isolates, but the quality of the Illumina sequence data was not sufficient to allow an unambiguous sequence assignment for the entire region. The 10 repeats and the 9 spacers of the CRISPR2.1 loci were 100% identical in all O156:H25 and O182:H25 isolates. An additional CRIPR2.2-3 array, lacking CAS genes, was detected between queE (ygcF) and ygcE in all O156:H25 and O182:H25 isolates. Here, too, the identity of the 7 repeats and the 6 spacers was 100% in all isolates. Besides the CRISPR2 loci, we also detected a CRISPR4.1-2 array situated between the genes encoding clpA and infA. This array also lacked CAS genes, but the entire 684 bp sequence between the stop codons of infA and clpA contained two repeats flanking a single spacer element. Frontiers in Microbiology | www.frontiersin.org Its sequences were identical in the 21 O156:H25 and the 15 O182:H25 isolates and also in the E. coli reference strain MG1655 (accession no. U00096.2).
When analyzing the O-AGC and the corresponding 5 ′ and 3 ′ flanking regions of the O156:H25 and O182:H25 isolates, it became apparent that both O-AGC serotypes deploy the Wzy/Wzx-dependent pathway. The G+C content of the O-AGC part located between the galF and gnd genes was <40% (35.8% for O156:H25 and 34.0% for O182:H25). Similar low G+C-values have been described previously by Samuel and coworkers for other E. coli and Salmonella enterica strains (Samuel et al., 2004). However, the nucleotide sequences of the regions between the galF and gnd genes were very different between the two serotypes (45.7% identity). The size of the O-AGC of the O156:H25 isolates amounted to 13,260 bp. In contrast, the region between galF and gnd of the O182:H25 isolates was only 9,861 bp long. Both the number and the order of genes differed substantially between the O-AGCs (Figure 2). Furthermore, the galF genes and the next five genes upstream of galF varied noticeably in their nucleotide sequences (between 9 and 59 different nucleotides) although the differences at the amino acid sequence level were considerably lower (only 1-5 amino acid exchanges, Figure 2, Table 2). Further upstream, all corresponding genes of the O156:H25 and O182:H25 isolates were identical. Likewise, the gnd genes and the next five genes downstream of gnd differed significantly in their nucleotide and amino acid sequences (between 7 and 182 nucleotides, 2-51 amino acids). Further downstream, all corresponding genes were again identical in the O156:H25 and O182:H25 isolates.
Taken together, the results of both core and accessory genome analyses based on Illumina sequence data proved a great degree of similarity between the genomes of the O156:H25 and the O182:H25 isolates. Only the genomic regions encoding the O-AGC between the galF and gnd genes and very few genes flanking these regions, but still part of the O-AGC, varied substantially and were specific for the respective serotype. This could suggest a possible switch of the O-AGC between isolates, and the epidemiological data substantiated this hypothesis, despite the O-AGC not having been specifically selected for in the original strain isolation (Geue et al., 2002). Isolates of both serotypes were isolated on the same farms and in identical sampling periods. On farm B, O156:H25 and O182:H25 isolates were even detected on the same day in the same group (March of the 2nd study year) and, on one occasion, also on the same day in the same animal (August of the 2nd study year; Table 1).
To further substantiate these findings, one isolate per serotype was randomly picked for 3rd generation whole-genome sequencing on a PacBio RSII system. Closed circle conformations of the bacterial chromosomes and their annotation were performed. By MAUVE analysis the arrangement of homologous sequence blocks was found to be very similar in both isolates (Figure 3; Darling et al., 2004). A difference was observed in the size of the genomes. The genome of O156:H25 isolate 13E0780 had a size of 5,371,291 nucleotides, whereas the one of O182:H25 isolate 13E0725 comprised of only 5,112,484 nucleotides. The difference was mainly due to the presence of an additional pathogenic island with a type II secretion system and an additional phage-like sequence in the STEC isolate 13E0780.   Noteworthy, five O156:H8 isolates were isolated during the same investigation period and on the same farms B and D. In contrast to the O156:H25 isolates, these isolates were typed as ST327 by MLST analysis. This sequence type is very different from ST300/688 (six of seven alleles different). Also, the virulence associated genome regions were significantly different from the O156:H25/O182:H25 isolates. The O156:H8 isolates are atypical enteropathogenic E. coli (aEPEC; Hernandes et al., 2009), lacking both stx-bacteriophages as well as bfpA. Also the LEE locus differed from the one present in the O156:25/O182:H25 isolates. For example, eae genes for a ϑ intimin were detected and the LEE of all O156:H8 isolates was inserted in the ileX tRNA site. As in the case of STEC O156:H25 and O182:H25, one random aEPEC O156:H8 isolate (13E0767) was whole-genome sequenced using 3rd generation sequencing. The sequences were aligned in a MAUVE analysis together with both STEC isolates. The orientation and the sequence of genome blocks varied distinctly in comparison to O156:H25/O182:H25 (Figure 3). With respect to the CRISPR/CAS systems in the five aEPEC O156:H8 isolates, most of the CAS gene array is missing in the CRISPR2.1 locus. These isolates have a deletion spanning from the second nucleotide of the codon encoding for Gly683 in the cas3 gene to the last nucleotide (nt 29) of a CRISPR2 repeat element. The truncated CAS3 protein lacks 169 residues at its Cterminus and another C-terminal 48 residues are mutated due to the frameshift. The number of repeat and spacer elements also differed in contrast to the O156:H25/O182:H25 isolates. Only seven repeats and six spacers were found. Compared to the O156:H25/O182:H25 isolates, all O156:H8 isolates carried a deletion of ca. 12,000 nt ranging from queD right up to the CRISPR2.2-3 array. A smaller number of six repeats and five spacers was found in this locus. We also detected a CRISPR4.1-2 array in the O156:H8 isolates. Its two repeats are sequenceidentical with the O156:H25/O182:H25 repeats, but the spacer's sequence differs.
Analysis of the O-AGC and the corresponding 5 ′ and 3 ′ flanking regions of the O156:H8 isolates revealed that the region between galF and gnd of the O156:H8 isolates was identical to the respective region in the STEC O156:H25 isolates (Figure 4). Only 7 of the 14,154 nucleotides differed between the two serotypes in five different genes. Thereof, two nucleotide exchanges resulting in two amino acid exchanges were found in the wzx gene. One nucleotide exchange each was detected in the glycosyl transferase genes wfeX, wfeY, and in the manB gene. These nucleotide exchanges caused one amino acid exchange in WfeY. In contrast, the galF genes and all genes upstream of galF varied distinctly in their nucleotide and amino acid sequences (Figure 4, Table 3).
The gnd genes and the next four genes downstream of gnd were identical in the O156:H8 and O156:H25 isolates. Further downstream in the genome, all genes differed substantially.
The results presented herein imply that specific persistent STEC isolates can replace their O-AGC to change their phenotype. We postulate that STEC isolates that originally had the serotype O182:H25 changed their O-AGC to become STEC O156:H25 with sporadic O156:H8 isolates having served as potential DNA donors. An intraspecies switch in E. coli from O55:H7 to O157:H7 was previously described (Tarr et al., 2000;Wang et al., 2002). Another intraspecies gene exchange    was demonstrated in V. cholerae (Blokesch and Schoolnik, 2007). Natural transformation following addition of genomic DNA from an O139 donor strain to a competent O1 strain growing as a biofilm on a chitin surface was sufficient for exchange of the O1-AGC against the entire O139-AGC in a single transformation event. Such O-antigen switches can play important roles in at least two steps of the infection process (Lerouge and Vanderleyden, 2002). They can affect colonization via altered adherence and the recombinant strains also have different antigenic properties, which confers a selective advantage as it allows the strains to bypass or overcome host defense responses (Bik et al., 1995). Compared to O1 strains, V. cholerae O139 variants, for example, are resistant to an O1 lytic phage (Blokesch and Schoolnik, 2007), colonize a mouse model with 2-fold higher efficiency (Waldor et al., 1994), are more invasive and damage the mucosal and submucosal layers more aggressively (Amin et al., 2009), and cause disease in persons with preexisting immunity to V. cholerae O1 (reviewed in Ramamurthy et al., 2003). In light of the epidemiological data presented herein and previously (Geue et al., 2002;Barth et al., 2016), it is tempting to assume that altered properties have also helped the STEC isolates studied to realize a more persistent lifestyle in the ruminant host.
The question arises how this O-AGC exchange occurred mechanistically. In V. cholerae, chitin-induced natural transformation can mediate the switch during a short period of time and with a high frequency (Blokesch and Schoolnik, 2007). E. coli has not been shown to be naturally competent (see Sinha and Redfield, 2012 and references cited therein), although several reports mention uptake of plasmid DNA under specific conditions (Baur et al., 1996;Tsen et al., 2002;Etchuuya et al., 2011;Guo et al., 2015). E. coli has homologs to competence genes from Haemophilus influenzae and they are expressed sufficiently to allow growth on DNA as sole carbon and energy source (Finkel and Kolter, 2001). If the E. coli transcription factor Sxy, whose homolog is indispensable for competence development in H. influenzae, and the lambda Red recombinase system are artificially expressed, E. coli can take up and incorporate foreign DNA into its genome (Sinha and Redfield, 2012). Natural conditions that induce the expression of sxy have not been identified so far. However, extraintestinal pathogenic E. coli isolates display higher recombination rates than commensal strains (Rodríguez-Beltrán et al., 2015). Higher recombination frequencies are positively associated with the presence of virulence factors (Rodríguez-Beltrán et al., 2015), suggesting that other E. coli pathovars might also display increased recombination activity. It will be interesting to study if this is the case for STEC isolates and if conditions promoting host colonization (La Ragione et al., 2009;Barnett Foster, 2013;Pacheco and Sperandio, 2015) can contribute to natural transformability of E. coli.
Another possibility for introducing foreign genetic material is generalized transduction by bacteriophages. In the EHEC strain EDL933, the stx2AB genes are located on prophage 933W, which is capable of transducing genetic markers in unmodified EHEC and E. coli K-12 strains (Marinus and Poteete, 2013). With ∼27.4 kB, the length of the entire O-AGC encoding region from the O156:H25 isolate, which differs in sequence from the O182:H25 isolate, is well within the maximal amount of 61 kB that can be transferred by the phage. Despite all being negative for Stx phages (Barth et al., 2016), the O156:H8 isolates described in this study were originally isolated as stx1 or stx2 positive colonies (Geue et al., 2002). The loss of stx genes can already occur during the first subcultivation step and appears to be more frequent in non-O157 strains (Joris et al., 2011). It is therefore possible that either a lost stx-converting phage or other phages encoded in the O156:H8 genomes might have been involved in the generation of transducing phages containing the O-antigen region.
The O-antigen conversion proposed here for the two different serovars reinforces the importance of the O-antigen for host colonization and bacterial niche adaptation and adds another facet to the enormous genetic diversity and genomic plasticity of E. coli (Lukjancenko et al., 2010;Leimbach et al., 2013), again emphasizing the role of HGT in pathogen evolution. It also points to a probably overlooked aspect of E. coli/EHEC/STEC pathogenicity. Many different STEC serovars have been linked with human disease (Werber et al., 2008). The proposed HGT-mediated seroconversion suggests that additional genomic characterization of these isolates could reveal that they belong to only a limited number of sequence types each with its own set of specific virulence factors. Analysis of such "viro-STs" or "viro-clonal complexes" would aid the epidemiological analysis of disease outbreaks and pathogen evolution, help identify virulence factors either common to all or rather specific to only one or a few "viro-ST groups" aiding in their characterization and thereby improving our chances of finding and devising better strategies to combat STEC.

AUTHOR CONTRIBUTIONS
LG, CM, LW, and SB designed the research; LG, IE, DP, CB, and SB performed the research; LG, SB, CB, and TS analyzed data; LG, CB, and CM wrote the paper.

FUNDING
This work, including the efforts of SB, CM, and LG was funded by Deutsche Forschungsgemeinschaft (DFG) (GE2509/1-1).