RNA-Seq analysis of isolate- and growth phase-specific differences in the global transcriptomes of enteropathogenic Escherichia coli prototype isolates

Enteropathogenic Escherichia coli (EPEC) are a leading cause of diarrheal illness among infants in developing countries. E. coli isolates classified as typical EPEC are identified by the presence of the locus of enterocyte effacement (LEE) and the bundle-forming pilus (BFP), and absence of the Shiga-toxin genes, while the atypical EPEC also encode LEE but do not encode BFP or Shiga-toxin. Comparative genomic analyses have demonstrated that EPEC isolates belong to diverse evolutionary lineages and possess lineage- and isolate-specific genomic content. To investigate whether this genomic diversity results in significant differences in global gene expression, we used an RNA sequencing (RNA-Seq) approach to characterize the global transcriptomes of the prototype typical EPEC isolates E2348/69, B171, C581-05, and the prototype atypical EPEC isolate E110019. The global transcriptomes were characterized during laboratory growth in two different media and three different growth phases, as well as during adherence of the EPEC isolates to human cells using in vitro tissue culture assays. Comparison of the global transcriptomes during these conditions was used to identify isolate- and growth phase-specific differences in EPEC gene expression. These analyses resulted in the identification of genes that encode proteins involved in survival and metabolism that were coordinately expressed with virulence factors. These findings demonstrate there are isolate- and growth phase-specific differences in the global transcriptomes of EPEC prototype isolates, and highlight the utility of comparative transcriptomics for identifying additional factors that are directly or indirectly involved in EPEC pathogenesis.


Introduction
Enteropathogenic Escherichia coli (EPEC) have been associated with moderate to severe cases of diarrhea and are a leading cause of lethal diarrheal illness among young children in developing countries (Ochoa and Contreras, 2011;Kotloff et al., 2013). EPEC are identified by the presence of the locus of enterocyte effacement (LEE), which encodes a type III secretion system (T3SS) and the intimin adherence factor, which are involved in translocation of bacterial factors into host cells, and adherence to the surface of host cells. EPEC are also characterized by the absence of the Shiga-toxin genes, which is typically present in the enterohemorrhagic E. coli (EHEC) (Nataro and Kaper, 1998;Kaper et al., 2004). EPEC are further identified as typical EPEC by the presence of genes encoding the bundle-forming pilus (BFP), which is a type IV pilus typically carried by the EPEC adherence factor (EAF) plasmid. Meanwhile E. coli isolates that contain the LEE and do not carry the Shiga-toxin phage or the BFP genes are considered atypical EPEC (Nataro and Kaper, 1998). However, we have previously demonstrated that the atypical EPEC includes isolates with genomic similarity to typical EPEC and EHEC (Hazen et al., 2013).
The major components of EPEC pathogenesis that have been characterized to date are the T3SS encoded by the LEE region, additional T3SS effectors that are encoded by insertion elements or phages inserted within the genome, and the BFP (McDaniel et al., 1995;Nataro and Kaper, 1998;Kaper et al., 2004;Mellies et al., 2007;Nisa et al., 2013). Intimin, encoded by the eae gene, of the LEE region confers intimate attachment to host cells while the T3SS translocates effector proteins across the host cell membrane that result in the formation of the attaching and effacing lesions (Jerse et al., 1990;Donnenberg et al., 1993;McDaniel et al., 1995;McDaniel and Kaper, 1997;Garmendia et al., 2005). Once inside the host cell, the type III secreted effectors induce changes including rearrangement of the actin cytoskeleton to form pedestals that further facilitate adherence of EPEC to host cells Wong et al., 2011;Clements et al., 2012;Nisa et al., 2013). The transcriptional regulation of the LEE and BFP regions have been extensively studied (Gomez-Duarte and Puente et al., 1996;Mellies et al., 1999Mellies et al., , 2007Sperandio et al., 1999Sperandio et al., , 2000Elliott et al., 2000;Bustamante et al., 2001;Shin et al., 2001;Haack et al., 2003;Porter et al., 2004;Iyoda and Watanabe, 2005;Leverton and Kaper, 2005;Kendall et al., 2011). Regulation of EPEC virulence genes has been demonstrated to involve numerous transcriptional factors and to be influenced by environmental conditions and cell density Mellies et al., 1999;Sperandio et al., 1999;Elliott et al., 2000;Shin et al., 2001;Deng et al., 2004;Kaper et al., 2004;Garmendia et al., 2005;Bhatt et al., 2011;Wong et al., 2011).
Investigations of EPEC pathogenesis and virulence factor regulation have primarily used a select few prototype isolates (E2348/69, B171, E22, E110019) (Viljanen et al., 1990;Rasko et al., 2008;Iguchi et al., 2009). Each of the previously used prototype isolates was examined in detail for specific reasons; E2348/69 is an EPEC1 isolate as defined by multi-locus sequence typing (MLST) and has been used in human trials (Levine et al., , 1985Levine and Rennels, 1978), B171 is an EPEC2 isolate as defined by MLST and has been characterized for plasmid content and infection studies (Riley et al., 1987(Riley et al., , 1990Tobe et al., 1999), E22 is a rabbit adapted EPEC strain that has significant utility in the infection of animal models (Cheney et al., 1980) and E110019 is the prototype isolate of atypical EPEC that had a significantly different clinical presentation in a large Finnish outbreak that appeared to spread within households (Viljanen et al., 1990). The characterization of EPEC virulenceassociated genes has also typically involved investigating a limited number of genes at a time, which are usually within virulence gene regions, or are known global regulators in other bacteria Bustamante et al., 2001;Haack et al., 2003;Iyoda and Watanabe, 2005;Kendall et al., 2011). In contrast, global transcriptional analysis of bacterial pathogens has provided additional insight into genome-wide transcription during conditions that promote pathogenesis, as well as the identification of novel virulence associated factors (Bergholz et al., 2007;Camarena et al., 2010;Mandlik et al., 2011;Chugani et al., 2012;Sahl and Rasko, 2012). Microarrays have previously been used to identify global gene expression of EHEC in response to nutrient limitation, exposure to host cells, and during multiple growth phases (Bergholz et al., 2007;Jandu et al., 2009;Abu-Ali et al., 2010;Bingle et al., 2014). The regulons of the LEE-encoded regulator (Ler) in EHEC and EPEC prototype isolates was also recently described using microarray analyses (Bingle et al., 2014). Overall, these studies have not provided a global view of the transcription in EPEC.
A new method of investigating global transcriptomes is RNA sequencing (RNA-Seq), which is an unbiased high-throughput sequencing approach used to capture the global transcriptional response of an organism during particular conditions (Mortazavi et al., 2008;Wang et al., 2009;Martin and Wang, 2011;Ozsolak and Milos, 2011). This method allows the simultaneous analysis of all regions of the genome, unlike methods such as microarray or quantitative reverse-transcription PCR (qRT-PCR), which are limited to analyzing known genomic regions as targets. Also, RNA-Seq can be used to analyze isolates that have diverse or unknown genomic content, unlike microarray analysis, which requires that samples exhibit sequence similarity to known targets that were used to develop the probes of the microarray. To date, RNA-Seq has been used to characterize the global transcriptomes of multiple human disease-associated bacteria (Perkins et al., 2009;Camarena et al., 2010;Sharma et al., 2010;Mandlik et al., 2011;Chugani et al., 2012), including enterohemorrhagic E. coli (EHEC) (Landstorfer et al., 2014) and enterotoxigenic E. coli (ETEC) (Sahl and Rasko, 2012). These studies used RNA-Seq to demonstrate that distinct isolates of Pseudomonas aeruginosa have differences in their quorum sensing regulons (Chugani et al., 2012), or to identify genes that are likely acting as global regulators in ETEC (Sahl and Rasko, 2012). However, to our knowledge there have been no prior studies that have used RNA-Seq to describe the global transcriptional response of EPEC. Furthermore, it is not known how much variability exists among the global virulence regulons of EPEC that have emerged in divergent phylogenomic lineages of E. coli, each with unique genomic content.
To investigate whether the genomic diversity of four EPEC prototype isolates results in unique transcriptional profiles, an RNA-Seq approach was used to identify the global transcriptomes of these isolates during standard laboratory growth conditions that have been used to study EPEC pathogenesis (Kenny et al., 1997;Sperandio et al., 1999;Leverton and Kaper, 2005). Using a combination of RNA-Seq analysis, comparative transcriptomics and comparative genomics, the current studies demonstrate there are isolate-specific transcriptional responses for different growth conditions including three growth phases and two media types. Differential expression analysis RNA-Seq data from conditions that promote virulence factor expression compared to conditions during which virulence factors are not typically expressed was used to identify all genes that are simultaneously expressed in association with virulence factors. Many of these additional genes encode proteins involved in central metabolism or survival that may indirectly contribute to EPEC pathogenesis. Additional studies in this manuscript examine the bacterial interaction with host cells and identified a transcriptional pattern that is unique to each of the prototype isolates, as well as a core transcriptome under this condition. These studies highlight the diversity in the transcriptional program of these prototype isolates and provide a glimpse into the core transcriptome, which may provide information on genes and gene products that can be targeted as therapeutics and diagnostics for this group of pathogens. The findings of these studies demonstrate the utility of comparative transcriptomics for investigating EPEC. However, a similar approach could be undertaken for any genomically diverse pathogen or species and provide vital information into the transcriptional patterns of pathogens under any number of growth conditions.

Large Scale-BLAST Score Ratio (LS-BSR)
The four EPEC isolate (E2348/69, B171, C581-05, and E110019) genomes were compared with LS-BSR as previously described (Hazen et al., 2013;Sahl et al., 2014) using TBLASTN (Gertz et al., 2006). The predicted protein-encoding genes of each genome that had ≥90% nucleotide identity to each other were assigned to gene clusters using UCLUST (Edgar, 2010). Representative sequences of each gene cluster were then compared to each genome using TBLASTN (Gertz et al., 2006), and the TBLASTN scores were used to generate a BSR value indicating the detection of each gene cluster in each of the eight genomes. The BSR value was generated by dividing the score of a gene compared to a genome by the score of the gene compared to its own sequence. The representative nucleotide sequence of each gene cluster is provided in Supplemental Data Set 2.

RNA Isolation and Sequencing
The EPEC isolates were grown overnight in LB and were inoculated 1:100 into 50 ml of LB, or DMEM supplemented with 4.5 g/L glucose in a 250 ml flask. The flasks were grown at 37 • C with shaking (225 rpm) to a final optical density of approximately 0.2, 0.5, or 1.0. Two biological replicates were generated for each EPEC isolate and media type. The cells were concentrated by centrifugation at 3500 rpm for 5 min. Total RNA was isolated using the Ribopure bacteria kit (Ambion) and treated with the Ribopure DNase I to remove contaminating DNA. The samples were then treated with the Turbo DNA-free kit (Ambion) to ensure all contaminating DNA was removed, and the RNA was verified to be DNA free by quantitative PCR (qPCR) analysis using primers for rpoA, which encodes the alpha subunit of the RNA polymerase using primers listed in Supplemental  Table 1.
RNA-Seq of EPEC cells that have adhered to HeLa was performed as follows. HeLa cells were grown to a 90% confluence in a 6-well culture dish in DMEM supplemented with 4.5 g/L of glucose and 10% fetal bovine serum (FBS). The HeLa cells were washed with fresh DMEM supplemented with 4.5 g/L of glucose lacking FBS, and then 2 ml of the media was added before inoculating overnight cultures of each EPEC isolate 1:100 into the fresh media containing the HeLa cells. The EPEC isolates were allowed to adhere to the HeLa cells for 3 h at 37 • C with no shaking. The media containing non-adhered EPEC cells was removed and the HeLa and adhered EPEC cells were rinsed twice with fresh media, then all cells (HeLa and EPEC) were removed by scraping and were resuspended in 1 ml of fresh DMEM. The cells were concentrated via centrifugation at 3500 rpm for 1 min, and total RNA was isolated from both the HeLa and EPEC cells using the MasterPure kit (Epicenter). Contaminating DNA was removed using the Turbo DNA-free kit (Ambion). The polyadenylated HeLa mRNA was depleted from the samples using the GenElute mRNA Miniprep kit (Sigma), which binds poly(A) + labeled mRNA to a spin column. The poly(A) + depleted fraction as well as the eluted poly(A) + -enriched fraction were both saved at −80 • C for further processing. The poly(A) + -depleted fraction was treated again for contaminating DNA using the Turbo DNA-free kit, and was verified to be DNA-free using qPCR as described above. This DNA-free poly(A) + -depleted fraction was then submitted for library construction and 100 bp paired-end Illumina sequencing. All of the RNA samples were converted into cDNA libraries using the Ovation Prokaryotic RNA-Seq System (NuGen) that were sequenced on the Illumina HiSeq 2000 to generate 100 bp paired-end reads at the Institute for Genome Sciences Genome Resource Center. The Illumina sequencing reads generated for each RNA sample have been deposited in the short reads archive (SRA), and the accession numbers are listed in Supplemental Table 2.

RNA-Seq Analyses
The Illumina reads generated for each RNA sample were analyzed and compared using an Ergatis-based (Orvis et al., 2010) RNA-Seq analysis pipeline. The completed genome and annotation that is publicly available for EPEC isolate E2348/69 was used for the RNA-Seq analysis of this isolate. The contigs of the draft genome assemblies of B171, C581-05, and E110019 were ordered relative to the completed genome and plasmids of E2348/69 (Iguchi et al., 2009). The contigs of each genome were then combined into several large pseudomolecule sequences, and the protein-encoding regions and other features were predicted using Prodigal (Hyatt et al., 2010) using an in-house annotation pipeline (Galens et al., 2011). The reads for each isolate were aligned to their corresponding genome using Bowtie (Langmead et al., 2009) and the number of reads that aligned to the predicted coding regions were determined using HTSeq (Anders et al., 2014). The differential expression of each gene under two different conditions was determined using DESeq v. 1.5.24 (Anders and Huber, 2010). The reads aligned to each gene were normalized then averaged for each of the two biological replicates. The fold-change and the log 2 of the fold-change (LFC) were calculated for each of the comparisons. The expression data were then filtered for further analysis and genes that were determined to be transcriptionally altered met the following criteria: LFC ≥2, ≤−2, minimum normalized read count = 10, false discovery rate (FDR) ≤ 0.05 that was determined using DESeq v. 1.5.24 and R-2.15.2. The circular displays of the genes that exhibited significant differential expression were generated using Circos 0.65 (Krzywinski et al., 2009). Heatmaps of the significant LFC values for the LEE and BFP genes were constructed using MeV (Saeed et al., 2006).
The genes in these four isolates are represented by 7493 LS-BSR gene clusters, which were then utilized to examine the expression values for each of these LS-BSR clusters that were conserved among all of the isolates included in this analysis. The analysis was performed using in-house Perl scripts and heatmaps were generated using R statistical package v2.15.2 that in turn utilized the DESeq v1.10.1 library for normalization and the gplots v2.11.0 library for generating the heat maps. The expression values were normalized using the DESeq method (Anders and Huber, 2010). Only 3302 conserved clusters (i.e., represented in all strains) were used to compute the eigenvectors using principal component analysis methods. The first and second principal components were utilized in a scatter plot to visualize the clustering of the strains by gene content and gene expression. The normalized gene expression values were also used to compute the standard deviation for each LS-BSR cluster across all samples and 2000 out of 3302 LS-BSR clusters showing the greatest standard deviations of expression values were utilized to generate a heatmap of the samples using the LS-BSR cluster expressions.

Quantitative Reverse Transcriptase PCR (qRT-PCR)
The differences in expression during growth in DMEM compared to LB were determined for genes listed in Supplemental Table  1 using qRT-PCR. RNA that was used for RNA-Seq was reverse transcribed and primed with random hexamers to generate cDNA using the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen). The cDNA was diluted 1:20 into nuclease free water (Ambion) before analysis using qPCR. The qPCR on the reverse transcribed RNA samples was performed using SYBR Green master mix (Life Technologies) with 10 µl reactions comprised of the following: 5 µl of 2X SYBR master mix, 1 µl of each of the 5 µM forward and reverse primers (Supplemental Table 1), 1 µl of nuclease free water (Ambion), and 2 µl of cDNA diluted 1:20. Triplicate reactions were performed for each cDNA template and primer combination. The reactions were cycled in a 384-well plate on the 7900HT Fast Real-Time PCR System (Applied Biosystems) using a two-step reaction with an initial incubation of 50 • C for 2 min, 95 • C for 10 min, then 40 cycles of 95 • C for 15 s and 60 • C for 1 min, followed by a dissociation stage. The cycle threshold (Ct) values were calculated using the Applied Biosystems software. The Ct values of the biological replicates were averaged and the standard deviation was calculated. The Ct values of target genes of each sample were normalized by subtracting from it the Ct value of the constitutively-expressed RNA polymerase alpha subunit, rpoA, resulting in the Ct value of a particular gene for each sample. The difference in expression of a target gene ( Ct) in the DMEM compared to the LB samples was then calculated by subtracting the Ct of the LB sample from the Ct of the DMEM sample. The fold difference of the expression of a particular gene in DMEM compared to LB was determined by calculating 2 − Ct . The difference in expression is represented in the figures as the log 2 of the fold-difference (2 − Ct ) for each gene in DMEM compared to LB. The error bars indicate the standard deviation of the Ct values.

Results and Discussion
Comparative Genomics of the Four EPEC Prototype Isolates The EPEC prototype isolates selected for global transcriptional analysis belong to four different EPEC phylogenomic lineages as defined in our recent study as well as being from the E. coli phylogroups B1 and B2 (Hazen et al., 2013). Three of these isolates (E2348/69, B171, E110019) have been frequently studied as EPEC prototype isolates in research investigating the virulence mechanisms of typical and atypical EPEC (Viljanen et al., 1990;Rasko et al., 2008;Iguchi et al., 2009). The fourth isolate, C581-05, was selected as a representative of the EPEC4 phylogenomic lineage, which is also in phylogroup B2 (Hazen et al., 2013). Comparison of the genomic content of the four EPEC prototype isolates (E2348/69, B171, C581-05, E110019) using large-scale BLAST score ratio (LS-BSR) demonstrated there are 3836 genes that are present in all four isolates with significant similarity (LS-BSR ≥ 0.8) (Figure 1). There were a greater number of genes shared between EPEC prototype isolates of the same phylogroup (C581-05 and E2348/69 in phylogroup B2, vs. B171 and E110019 in phylogroup B1) than for isolates of different phylogroups (for example E2348/69 of phylogroup B2, and B171 of phylogroup B1) (Figure 1). In each isolate there was genome content that was not exclusive, but divergent in one or more of FIGURE 1 | Comparison of the genomic content of four EPEC prototype isolates. Venn diagram illustrating the number of shared and unique genes identified among the four EPEC prototype isolates using LS-BSR (Sahl et al., 2014). The E. coli phylogroup (Tenaillon et al., 2010;Hazen et al., 2013) of each of the prototype isolates is indicated in parentheses below the isolate name. The number of core genes is indicated, which are genes that are present with significant similarity (LS-BSR ≥ 0.8) in all four of the EPEC isolates. The number of shared genes is indicated in the overlapping regions, which are the genes that have significant similarity (LS-BSR ≥ 0.8) in two or three of the EPEC isolates, but are divergent (LS-BSR < 0.8) or absent (BSR < 0.4) in the other EPEC isolate or isolates. The number of unique genes identified in each EPEC isolate genome are the genes that have LS-BSR ≥ 0.8 in the one genome and <0.8 in the other three genomes. The number of genes in parentheses represents the number of unique genes of each EPEC isolate, which have LS-BSR ≥ 0.8 in the one genome and <0.4 in the other three genomes.
the other EPEC isolates. The number of genes that were identified in a single EPEC isolate (LS-BSR ≥ 0.8) that were divergent (LS-BSR < 0.8 but ≥0.4) or absent (LS-BSR < 0.4) in all the other isolates, ranged from 401 to 481 (Figure 1). Each genome also contained isolate-specific content with the number of genes that were unique to a particular isolate and absent from all the other isolates (LS-BSR ≥ 0.8, and <0.4 in all other isolates), ranging from 294 to 339 (Figure 1). These genomic comparisons allowed the determination of the phylogroup-and isolate-specific gene content of the EPEC prototype isolates. These findings were then used to compare the global transcriptomes of these isolates during growth in laboratory conditions that stimulate virulence factor expression.

RNA-Seq Analysis of the Global Transcriptomes of EPEC Prototype Isolates
To investigate whether the unique genomic content of each of the EPEC prototype isolates resulted in phylogroup-or isolate-specific differences in the global transcriptomes an RNA-Seq approach was used to analyze each of the prototype isolates grown during standard laboratory conditions. The global transcriptomes were determined for each of the EPEC isolates during growth in rich media (LB), and minimal media (DMEM), which are laboratory conditions that have been previously used to investigate EPEC virulence factor expression. Known virulence factors of EPEC are typically expressed during growth in DMEM but not in LB Rosenshine et al., 1996;Leverton and Kaper, 2005). RNA-Seq was also used to investigate the differences in the global transcriptomes during three growth phases (early exponential, late exponential, and stationary phase) corresponding to the optical densities (OD 600 ): 0.2, 0.5, and 1.0, respectively (Supplemental Table 2, Table 1, Figure 3).
The total number of Illumina HiSeq reads generated for the 56 RNA samples analyzed was approximately 5.25 billion (Supplemental Table 2). The total number of reads generated for each sample ranged from approximately 73.4 to 153 million, and the percentage of reads that mapped to each corresponding genome sequence ranging from 31 to 83% for the LB and DMEM samples (Supplemental Table 2). The percentage of reads that mapped to each EPEC isolate genome from the HeLa adherence assay samples was decreased, ranging from 28 to 35% (Supplemental Table 2), as would be expected for a sample containing both pathogen and host.
Principal Component Analysis Reveals Isolateand Media-Specific Patterns of Gene Expression

Isolate-Specific Changes in Gene Expression
Differential expression analysis of the RNA-Seq samples further demonstrated there were identifiable isolate-specific differences in gene expression during growth in DMEM compared to LB (Table 1, Figure 3 tracks 1-3, Supplemental Data Sets 3-6). The total number of genes that exhibited significant differential expression during growth in DMEM compared to LB ranged from 36 to 835 depending on the growth phase and the EPEC isolate (Table 1). Overall, the EPEC isolates had a greater number of genes that were differentially-expressed in DMEM compared to LB during stationary phase (OD 600 = 1) than there were during exponential (OD 600 = 0.5) or early exponential growth (OD 600 = 0.2) (Table 1, Figure 3 tracks 1-3). However, E2348/69 had the opposite trend with fewer genes that had significant differential expression in DMEM compared to LB during exponential (OD 600 = 0.5) compared to early exponential growth (OD 600 = 0.2) ( Table 1). These findings demonstrated that each of the four EPEC prototype isolates had isolate-specific responses to growth in the nutrient rich LB broth compared to the nutrient limiting conditions of DMEM.

Growth Phase-Specific Changes in Gene Expression
The number of differentially-expressed genes also was altered for each of the EPEC isolates when comparing the different growth phases during growth in a single media type (Table 1, Figure 3 tracks 4-9). During growth in DMEM the number of genes that had significant differential expression was greatest for the stationary phase (OD 600 = 1) samples compared to the early exponential samples (OD 600 = 0.2), or for the late exponential (OD 600 = 0.5) compared to the early exponential (OD 600 = 0.2) samples than observed for the stationary phase (OD 600 = 1) compared to the late exponential (OD 600 = 0.5) ( Table 1). Overall, E2348/69 had far fewer genes that exhibited significant differential expression when comparing the three different growth phases during growth in DMEM (7-52 genes) than was observed for the other three EPEC isolates during growth in DMEM (range from 102 to 1588 genes) (Table 1, Figure 3 tracks 4-6). The reduced number of genes that exhibited significant differential expression compared to E2348/69 DMEM grown to an OD 600 = 1 may in part be attributed to the observed decrease in growth of this isolate. The number of differentiallyexpressed genes when comparing these two growth phases was also reduced when compared to the three other EPEC isolates under the same conditions (Table 1). This indicates that along with a reduced ability to grow for extended periods of time in the DMEM, E2348/69 also had a different transcriptional response to growth in DMEM than was observed for the other EPEC isolates.

Media-Specific Changes in Gene Expression
This data set also allowed the examination of the differences in gene expression in different media. The majority of the genes that had significant differential expression when comparing growth phases were not identified in the comparison of growth in differing media types (Table 2). Overall, there were fewer genes (range 0-501) that exhibited significant altered expression in both LB and DMEM when comparing between the different growth phases ( Table 2). This finding highlights the considerable differences in the global transcriptomes of EPEC prototype isolates during growth in nutrient-rich compared to nutrient-limiting media. The greatest number of genes (501) that were differentially-expressed when comparing the different growth phases in both LB and DMEM was identified in the EPEC4 isolate C581-05 ( Table 2). In most cases the genes that were differentially-expressed for the different growth phase comparisons of both LB and DMEM had similar trends of increased or decreased expression, and there were fewer genes that were increased during growth in one media type and decreased in the other media type ( Table 2). For example, there were only 22 genes of E2348/69 that exhibited significant differential expression in both DMEM and LB when comparing the early exponential (OD 600 = 0.2) and exponential (OD 600 = 0.5) growth phases ( Table 2). This limited gene set can be partially attributed to the low number of E2348/69 genes (52) that had significant differential expression in DMEM when comparing growth phases (Table 1). Interestingly, nearly all (20 out of 22) of these genes exhibited a different expression trend in LB compared to DMEM, which is in contrast to the other EPEC isolates, which had a greater number of genes with the same trend in expression ( Table 2). For each of the E2348/69 genes that had a different trend in expression in the DMEM and LB samples, the expression was increased in the LB samples and decreased in the DMEM samples. For most of the growth phase comparisons, the majority of the differentially-expressed genes were altered only in LB or DMEM and did not exhibit differential expression in both media types ( Table 2). The number of genes that were altered only in LB ranged from 151 to 873, while the number of genes that were altered only in DMEM ranged from 7 to 1144 ( Table 2). Among the genes that were differentially-expressed when comparing the different growth phases during growth in LB only, were genes involved in metabolism (Supplemental Data Sets 3-6). Meanwhile, genes that were differentially-expressed during different growth phases in DMEM included virulenceassociated genes, plasmid-associated genes involved in conjugal transfer, putative phage genes, lipoproteins, and global regulators such as the histone-like protein H-NS, which regulates expression of the LEE (Bustamante et al., 2001;Mellies et al., 2007;Bhatt et al., 2011) (Supplemental Data Sets 3-6). Overall, these studies highlight the transcriptional differences among these four prototype isolates in multiple growth phases and media types.

Comparison of the Global Virulence Regulons of EPEC Prototype Isolates During Growth in Laboratory Conditions
To investigate isolate-specific differences in the expression of known virulence factors, and to identify additional genes that are coordinately-expressed with the known virulence factors, we identified the differences in expression of all protein-encoding genes of the four EPEC prototype isolates during exponential growth in DMEM compared to LB during exponential growth (OD 600 = 0.5) ( Table 1, Supplemental Data Sets 3-6). The total number of genes that exhibited significant differential expression for E2348/69, B171, C581-05, and E110019 were 475, 521, 437, and 540 respectively (Table 1, Figure 4A). These genes include known virulence factors such as those within the LEE and BFP regions (Table 3). However, there were also a number of conserved genes involved in metabolism and stress responses including transporters, and additional isolate-specific genes that represent a potentially unique contribution to the virulence regulon of a particular EPEC isolate ( Table 3). A previous study demonstrated there were clade-specific differences in the expression levels of LEE genes encoded by different O157:H7 EHEC isolates during growth in DMEM (Abu-Ali et al., 2010).
To our knowledge there has been no extensive comparison to date of phylogroup-or lineage-specific differences in virulence factor expression among EPEC isolates. Comparison of the differentially-expressed genes of the four prototype isolates that exhibited similarity based on the LS-BSR analysis demonstrated there were isolate-specific differences in the global transcriptomes during exponential growth (OD 600 = 0.5) in DMEM compared to LB (Figure 4A). This highlights the need FIGURE 4 | Comparison of the virulence regulons of the EPEC prototype isolates. (A) Circular plot comparing the log 2 fold-change (LFC) values for genes that exhibited significant differential expression (DE) for the EPEC prototype isolates grown to an OD 600 = 0.5 in DMEM compared to LB. Genes that exhibited significant DE in DMEM compared to LB met the following criteria: LFC ≥2, ≤-2, minimum read count = 10, false discovery rate (FDR) ≤ 0.05. The number of genes that had significant DE is indicated in parentheses below each isolate name. The outermost track of each EPEC isolate displays the LFC values of each of the significant DE genes for that isolate, and the genes are ordered clockwise in the order they appear in the completed or draft genome. Each of the inner tracks displays the LFC values of each gene with significant DE from another EPEC prototype isolate. These significant DE genes have ≥90% nucleotide identity to the corresponding gene in the outermost track. (B) Venn diagram showing the number of genes differentially-expressed for each of the four EPEC prototype isolates analyzed in this study grown to an OD 600 = 0.5 in DMEM compared to LB. The E. coli phylogroup (Tenaillon et al., 2010;Hazen et al., 2013) of each of the prototype isolates is indicated in parentheses below the isolate name. The number of core genes is indicated that are highly conserved (LS-BSR ≥ 0.8) in all four of the EPEC isolates that also exhibited significant DE in all four of the EPEC isolates. The number of genes that were identified as highly-conserved (LS-BSR ≥ 0.8) that also exhibited significant DE in two or three EPEC isolates and divergent or absent (LS-BSR < 0.8) is also designated. The number of isolate-specific genes indicates those genes that were exclusive to each EPEC isolate (LS-BSR ≥ 0.4 and <0.4 in all other EPEC isolates) and also exhibited significant DE during growth to an OD 600 = 0.5 in DMEM compared to LB. to examine multiple isolates and not rely on a single prototype strain for any pathovar to describe an entire group of isolates. Overall, the differentially-expressed genes that were identified in more than one of the EPEC isolates exhibited similar trends of increased or decreased expression ( Figure 4A); however, in some instances there were genes identified in multiple EPEC isolates that were increased in some of the isolates and decreased in other isolates (Figure 4A, Table 3). These included some genes identified by LS-BSR as highly-conserved in all four of the EPEC isolates (Supplemental Table 3). Interestingly, several genes (napB, napG, and napH) of the nap operon, which encodes a periplasmic nitrate reductase (Brondijk et al., 2002), exhibited increased expression in the EPEC isolates of phylogroup B1 (B171 and E110019), but decreased expression in the phylogroup B2 EPEC isolates (E2348/69 and C581-05) (Supplemental Table  3). There were a total of 73 genes that were present in the genomes of all four of the EPEC isolates that were also differentially-expressed in all four EPEC isolates ( Figure 4B). Of these 73 genes, there were four that had different trends of expression in these isolates, 18 exhibited consistently increased expression, and 51 exhibited consistently decreased expression in the four EPEC isolates (Supplemental Table 3).
Although there were a number of similarities in the response of the four EPEC prototype isolates to growth in DMEM compared to LB, there were many more genes that exhibited differential expression in one, two or three of the isolates (Figure 4A). The number of genes that were identified by LS-BSR as being unique (LS-BSR ≥ 0.8 and <0.4 in all other isolates) to an EPEC isolate that were differentially-expressed during exponential growth in DMEM compared to LB ranged from only 5-21 genes (Table 1, Figure 4B). The number of genes that were identified in two or three of the EPEC isolates and also had significant differential expression was similarly low, ranging from 0 to 16 genes ( Figure 4B). Thus, the majority of the genes that exhibited differential expression in each of the EPEC isolates were genes that were highly-conserved in more than one isolate (Table 1). However, these genes did not exhibit significant differential expression in all four of the isolates (Figure 4B). This may in part be attributed to the stringent cut-offs used in this study for identifying genes with significant differential expression, and with less stringent cut-offs there may be more of these highly-conserved genes with low-level differences identified with significant differential expression in all four isolates.

Differential-Expression of the LEE and BFP Regions
The differential expression of known virulence factors present in all four of the EPEC isolates such as genes encoded by the LEE demonstrated there are isolate-specific differences in the timing and conditions during which the LEE genes are expressed ( Figure 5A, Table 3). The majority of the LEE-encoded genes had significantly increased expression during growth in DMEM compared to LB for the exponential growth (OD 600 = 0.5) and early stationary phase (OD 600 = 1) samples ( Figure 5A). This finding was true for all four of the EPEC isolates except the stationary phase comparison of E2348/69, which was previously  noted to have a different gene expression pattern (Figure 2). The increased expression of the intimin gene (eae) observed by RNA-Seq analysis was verified using qRT-PCR for these samples (Supplemental Figure 2). Interestingly, the trend of increased LEE gene expression in DMEM compared to LB was not observed for the C581-05 late exponential growth samples, which only exhibited significant expression for one gene, espG, in DMEM compared to LB during late exponential growth (OD 600 = 0.5) ( Figure 5A). In contrast, the LEE genes of C581-05 exhibited significant differences in expression during late exponential (OD 600 = 0.5) compared to early exponential (OD 600 = 0.2) growth in LB ( Figure 5A). This finding suggests that in contrast to other EPEC isolates such as B171, which did not exhibit significant increases in LEE gene expression during growth in LB or DMEM over time, the LEE genes of EPEC isolate C581-05 had increased expression over time during growth in LB ( Figure 5A). This potentially suggests an additional role for quorum sensing in the regulation of LEE in C581-05 during growth in nutrient-rich media (LB). This type of variation in gene expression has been previously observed in E. coli as being quorum sensing regulated Sircili et al., 2004) and it is possible that this is the case in these studies; however more detailed analysis of the contribution quorum sensing will require studies that are beyond the scope of the current manuscript. Another notable difference in the expression of LEE genes was identified for both B171 and E110019 of phylogroup B1, which exhibited decreased expression of LEE1 genes during stationary phase growth compared to exponential growth in LB ( Figure 5A). This was in contrast to the LEE genes of both E2348/69 and C581-05 of phylogroup B2, which did not exhibit a significant difference in expression of the LEE genes during these conditions ( Figure 5A). This further indicates there are phylogroup-specific differences in the transcriptional regulation of EPEC virulence factors. The genes that were identified with significant genetic conservation and increased gene expression during the virulenceassociated conditions in all EPEC isolates, represents the core EPEC virulence regulon. The only LEE-encoded gene that exhibited significant differential expression during exponential growth in DMEM compared to LB in all four of the EPEC isolates was espG, which had increased expression in all of the EPEC isolates (Table 3). EspG is a type III secreted effector that alters the cytoskeleton of the host cell by interfering with the function of microtubules and golgi (Hardwidge et al., 2005;Shaw et al., 2005;Tomson et al., 2005;Wong et al., 2011). The increased expression of espG in all isolates was verified using qRT-PCR (Supplemental Figure 2). In contrast, few of the non-LEE-encoded T3SS effectors had increased expression during exponential growth in DMEM compared to LB (Table 3), suggesting a lack of global coordination of the regulation of these virulence factors in all isolates.
The differential expression of genes of the BFP region also differed for the three typical EPEC isolates (E2348/69, B171, and C581-05), which encoded the BFP operon ( Figure 5B). The majority of the BFP genes exhibited significantly increased expression during exponential growth (OD 600 = 0.5) in DMEM compared to LB (Figure 5B), which is consistent with the findings of previous studies that EPEC virulence factors exhibit increased expression in DMEM Rosenshine et al., 1996;Leverton and Kaper, 2005). Interestingly, EPEC isolate C581-05 did not have significantly increased expression during these conditions, and instead exhibited decreased expression of the transcriptional regulator, perC ( Figure 5B). PerC has previously been demonstrated to activate transcription of LEE and BFP genes (Gomez-Duarte and Tobe et al., 1996;Bustamante et al., 2011). This finding further suggests that virulence factors such as the BFP genes are constitutively expressed in C581-05 during exponential growth irrespective of media in some isolates. In contrast, the majority of the BFP genes of C581-05 exhibited increased expression during entry into stationary phase in DMEM compared to LB, indicating that BFP expression exhibits a greater increase during the nutrient-limiting conditions over time ( Figure 5B).
FIGURE 5 | Comparison of the isolate-and growth phase-specific differences in the expression of genes in the LEE and BFP regions. A diagram and heatmap of the isolate-and growth phase-specific differences in expression of (A) genes encoded within the LEE region of E2348/69, B171, E110019, and C581-05, and (B) genes of the BFP operon encoded by the EAF plasmids of E2348/69, B171, and C581-05 (note E110019 naturally lacks the BFP encoding plasmid). Red indicates increased differential expression, green indicates decreased differential expression, and white indicates the difference in expression was not significant or a gene was not present in the EPEC isolate. Each row represents the differential expression (log 2 fold-change; LFC) for the following different sample comparisons: LB OD 600 = 0. The decreased expression of the plasmid genes (bfpA and perA) of C581-05 during exponential growth in DMEM compared to LB, and increased expression of these genes during stationary phase was confirmed by qRT-PCR (Supplemental Figure 2). Again, these data suggest a role for quorum sensing in this isolate and highlight the strain specificity of virulence factor regulation in different isolates.

Differential-Expression of EPEC Genes Not Previously Identified as Virulence Factors
Additional genes that exhibited significant differential expression during the virulence-inducing conditions included biotin synthesis genes, dipeptide transporters, and a gene encoding a putative colicin receptor protein (Table 3, Supplemental Table 3). The genes that were determined to have decreased expression in all four EPEC isolates included genes encoding transporters and many genes involved in metabolism ( Table 3, Supplemental Table 3). The genes that were identified with significant differential expression in only one of the EPEC isolates included known virulence-associated genes, phage genes, a putative adhesin, lipoproteins, lipopolysaccharide biosynthesis genes, and numerous predicted transcriptional regulators ( Table 3, Supplemental Table 3). For example, among the genes that were unique to the E2348/69 transcriptome was espC, which encodes a serine protease autotransporter that causes cytotoxicity to host cells and is translocated into the host cell by the type V secretion mechanism ( Table 3) (Navarro-Garcia et al., 2004;Vidal and Navarro-Garcia, 2008). There were also several E2348/69 genes involved in O-antigen biosynthesis (wzy, wzx, wbiO) that exhibited the greatest increases in expression in DMEM compared to LB ( Table 3, Supplemental Table 3). The capsular polysaccharide and O-antigen modification was recently demonstrated to promote pathogenesis of the uropathogenic E. coli (UPEC) by increasing their ability to survive in the gut (Sarkar et al., 2014). However, to our knowledge the role of the O-antigen gene expression during EPEC virulence has not been established.
Many of the non-virulence factor genes identified that are coregulated with the virulence regulons of the EPEC isolates are involved in general metabolism or survival and may enhance the ability of the EPEC isolates to colonize and survive in the human gastrointestinal tract ( Table 3). Some of the genes of the nap operon (napB and napH), which encodes a periplasmic nitrate reductase (Brondijk et al., 2002), were previously determined for the EHEC O157:H7 Sakai strain, to exhibit increased expression during exponential growth in minimal media (Bergholz et al., 2007). Additionally, several metabolic transcriptional regulators have recently been linked to pathogenesis of EHEC and EPEC. For example, a transcriptional regulator, FusR, of O157:H7 EHEC was identified that regulates EHEC metabolism and the expression of LEE and promotes intestinal colonization in response to the presence of the metabolic by-product, fucose, which was identified as being abundant in the mucus layer of the intestine (Pacheco et al., 2012). Also, Cra and KdpE, two other global transcriptional regulators of sugar production and potassium transport, have been demonstrated to also either directly or indirectly regulate expression of LEE genes of EHEC (Njoroge et al., 2012(Njoroge et al., , 2013. Another regulator of glycogen biosynthesis metabolism (Romeo et al., 1993), CsrA, was also previously determined to regulate expression of EPEC virulence factors including the LEE (Bhatt et al., 2009(Bhatt et al., , 2011. Overall, the transcriptional studies demonstrate the interconnectedness of general metabolism and virulence and only through global studies can we begin to understand these relationships.

Differences in the Global Transcriptomes of the Typical and Atypical EPEC Prototype Isolates
In addition to the phylogroup-and isolate-specific differences in gene expression, there were also differences when comparing the three typical EPEC isolates (E2348/69, B171, and C581-05) to the atypical EPEC isolate (E110019) (Supplemental Data Sets 3-6). The LS-BSR analysis of the EPEC isolate genomic content identified 169 genes that were highly-conserved (LS-BSR ≥ 0.8) among the typical EPEC isolate genomes that were divergent (LS-BSR < 0.8, ≥0.4) or absent (LS-BSR < 0.4) from the atypical EPEC isolate E110019 (Figure 1). Of these 169 genes, there were 20 that were also identified with significant differential expression during growth in DMEM compared to LB in two of the typical EPEC isolates and only two genes that were differentially-expressed in all three of the typical EPEC isolates (Supplemental Data Sets 3-5). The two genes that exhibited significant differential expression in all three of the typical EPEC isolates were perC and the lactose permease lacY (Supplemental Data Sets 3-5). Meanwhile, genes that were present and exhibited significant differential expression in E110019 that were absent from the typical EPEC isolates included a putative adhesin, a colicin immunity protein, a plasmid stability protein, a phageassociated gene, and numerous conserved hypothetical proteins ( Table 3, Supplemental Table 3). These findings suggest that the plasmid maintained in E110019 is also regulated under these conditions, but is not similar to the EAF plasmid in the tEPEC isolates.

Global Transcriptomes of EPEC During Adherence to Host Cells In Vitro
The mono culture studies described above highlight the variability of the EPEC transcriptome of isolates grown independently of external stimuli. To extend the studies above, the global transcriptome of each EPEC isolate during adherence to HeLa cells compared to growth density were examined ( Figure 6A). The total number of genes that exhibited significant differential expression in one or more of these comparisons differed considerably for each EPEC isolate (ranged from 899 to 2280), as indicated by the size of the plot regions and numbers designated in parentheses for each EPEC isolate (Figure 6A, Supplemental Data Sets 3-6).
Comparison of the genes that had significant differential expression during adherence of the EPEC isolates to HeLa cells compared to exponential growth (OD 600 = 0.5) in DMEM broth demonstrated that most of the altered genes belonged to core gene clusters ( Figure 6B). However, there were only 20 genes that were present and also were differentially expressed in all four of the EPEC isolates ( Figure 6B, Supplemental Table 4).
FIGURE 6 | Comparison of the global transcriptional response of the EPEC prototype isolates during adherence to HeLa cells compared to planktonic growth in DMEM broth. (A) Circular plot of log 2 fold-change (LFC) values of genes that exhibited significant differential expression (DE) for each of the EPEC prototype isolates during growth in broth culture or during adherence to HeLa cells during in vitro tissue culture assays. The tracks contain LFC values of the following DE comparisons: DMEM OD 600 = 0.5 vs. LB OD 600 = 0.5 (track 1), HeLa vs. DMEM OD 600 = 1.0 (track 2), HeLa vs. DMEM OD 600 = 0.5 (track 3), and HeLa vs. DMEM OD 600 = 0.2 (track 4). Red indicates increased DE, green indicates decreased DE, and white indicates the difference in expression was not significant or a gene was not present in the EPEC isolate. (B) Venn diagram showing the number of genes differentially-expressed for each of the four EPEC prototype isolates analyzed in this study grown to an OD 600 = 0.5. The E. coli phylogroup (Tenaillon et al., 2010;Hazen et al., 2013) of each of the prototype isolates is indicated in parentheses below the isolate name. The number of core genes is indicated that were highly conserved (LS-BSR ≥ 0.8) in all four of the EPEC isolates and also exhibited significant DE in all four of the EPEC isolates during adherence to HeLa cells compared to growth in DMEM broth culture to an OD 600 = 0.5. The number of genes that were identified with significant similarity (LS-BSR ≥ 0.8) and also exhibited significant DE in the different combinations of two or three EPEC isolates is also designated. The number of isolate-specific genes indicates those genes that were exclusive to each EPEC isolate (LS-BSR ≥ 0.4 and <0.4 in all other EPEC isolates) and also exhibited significant DE during adherence to HeLa cells compared to growth in DMEM broth culture to an OD 600 = 0.5. Furthermore, all but two of these genes had similar trends in terms of increased or decreased expression in all four of the EPEC isolates (Table 4). Among the genes that exhibited increased expression in DMEM compared to LB for all four of the prototype isolates was hscB, which has been previously described as a cochaperone involved in the formation of Fe-S cluster proteins of E. coli (Vickery et al., 1997;Hoff et al., 2000). One of the genes that exhibited different trends of expression in the four EPEC isolates was nirC ( Table 4, Supplemental Table 4), which encodes a protein involved in nitrite uptake in E. coli (Clegg et al., 2002). The nirC gene was also identified as a virulenceassociated factor that increased the survival of Salmonella in the presence of macrophages (Das et al., 2009). This finding demonstrates that there are phylogroup-specific responses of EPEC in the presence of host cells, suggesting EPEC isolates of different E. coli phylogroups have unique mechanisms for survival during infection, in addition to their differences in virulence factor content that have been previously described (Hazen et al., 2013).
The number of genes that were exclusive to one of the prototype isolates that also exhibited significant differential expression during adherence to HeLa compared to growth in DMEM media ranged from 18 to 56 genes ( Figure 6B). Interestingly, of the 56 genes that were unique to E2348/69 that had significant differential expression, 41 of these genes were previously identified within integrative elements and phages of the E2348/69 chromosome (Iguchi et al., 2009) (Supplemental  Table 4). Also, both of the EPEC isolates from phylogroup B2 (E2348/69 and C581-05) were identified with a greater number of genes that were differentially-expressed in HeLa compared to exponential growth (OD 600 = 0.5) in DMEM broth, than were differentially-expressed in DMEM compared to LB during exponential growth (Table 1). This is in contrast to the EPEC isolates from phylogroup B1 (B171 and E110019), which had similar numbers of genes that were differentially-expressed in the HeLa compared to DMEM, as there were in DMEM compared to LB (Table 1). This finding further indicates that EPEC prototype isolates exhibit phylogroup-specific differences and diverse transcriptional responses to conditions that facilitate pathogenesis.

Conclusions
In this study, we have demonstrated that RNA-Seq and comparative genomics are powerful tools that can be used together to investigate the global transcriptome of an E. coli pathogen that has diverse genomic content. This approach can be used to provide insight into all genes that are simultaneously expressed with virulence factors and may directly or indirectly be involved in EPEC pathogenesis. By investigating differences in the global transcriptomes of EPEC isolates belonging to different evolutionary lineages of E. coli, we have demonstrated that genes comprising the unique genomic content of these isolates are in fact differentially-expressed during conditions that promote virulence factor expression. Thus, these studies highlight the need to investigate multiple genomically-diverse isolates rather than a single prototype isolate when investigating virulence mechanisms of a pathovar. Identifying a global transcriptional response that is conserved among divergent EPEC isolates provides insight into the emergence of EPEC isolates in numerous E. coli lineages. The conserved transcriptional response of EPEC could be used to develop diagnostic tools to determine whether EPEC are contributing to an infection. This would be particularly useful for patients that carry multiple enteric pathogens, such as was demonstrated in the recent GEMS study (Kotloff et al., 2013) that identified the causative agents of diarrheal illness among young children and infants in numerous study sites in Africa and Asia.