Evolution and Diversity of Listeria monocytogenes from Clinical and Food Samples in Shanghai, China

Listeria monocytogenes is a significant foodborne pathogen causing severe systemic infections in humans with high mortality rates. The objectives of this work were to establish a phylogenetic framework of L. monocytogenes from China and to investigate sequence diversity among different serotypes. We selected 17 L. monocytogenes strains recovered from patients and foods in China representing serotypes 1/2a, 1/2b, and 1/2c. Draft genome sequences were determined using Illumina MiSeq technique and associated protocols. Open reading frames were assigned using prokaryotic genome annotation pipeline by NCBI. Twenty-four published genomes were included for comparative genomic and phylogenetic analysis. More than 154,000 single nucleotide polymorphisms (SNPs) were identified from multiple genome alignment and used to reconstruct maximum likelihood phylogenetic tree. The 41 genomes were differentiated into lineages I and II, which consisted of 4 and 11 subgroups, respectively. A clinical strain from China (SHL009) contained significant SNP differences compared to the rest genomes, whereas clinical strain SHL001 shared most recent common ancestor with strain SHL017 from food. Moreover, clinical strains SHL004 and SHL015 clustered together with two strains (08-5578 and 08-5923) recovered from an outbreak in Canada. Partial sequences of a plasmid found in the Canadian strain were also present in SHL004. We investigated the presence of various genes and gene clusters associated with virulence and subgroup-specific genes, including internalins, L. monocytogenes pathogenicity islands (LIPIs), L. monocytogenes genomic islands (LGIs), stress survival islet 1 (SSI-1), and clustered regularly interspaced short palindromic repeats (CRISPR)/cas system. A novel genomic island, denoted as LGI-2 was identified. Comparative sequence analysis revealed differences among the L. monocytogenes strains related to virulence, survival abilities, and attributes against foreign genetic elements. L. monocytogenes from China were genetically diverse. Strains from clinical specimens and food related closely suggesting foodborne transmission of human listeriosis.


INTRODUCTION
Listeria monocytogenes causes severe systemic infections with high mortality rates (Toledo-Arana et al., 2009;Kuenne et al., 2013) and has been responsible for numerous outbreaks in North America and Europe during recent years (Gilmour et al., 2010;Jackson et al., 2011;Smith et al., 2011;Laksanalamai et al., 2012;Schoder et al., 2014). L. monocytogenes is able to survive and grow under a wide range of temperature and pH conditions with a significant tolerance to salt (Gilmour et al., 2010). This ubiquitous pathogen imposes a risk with significant economic burden to public health (Toledo-Arana et al., 2009).
Clustered regularly interspaced short palindromic repeats (CRISPR)/cas system is considered a bacterial immune system against invading genetic fragments by targeting specific sequence including phages and plasmids (Touchon and Rocha, 2010). The presence of functional CRISPR/cas system has a negative correlation with resistance to antibiotics in Enterococci (Palmer and Gilmore, 2010). In addition, non-coding RNAs regulate gene expression through hybridizing with mRNA or binding to proteins to modulate their activities under different conditions (Kuenne et al., 2013). Both of these systems contribute to the pathogenicity, antimicrobial resistance, and metabolism of L. monocytogenes.
The objectives of the current study were to provide a phylogenetic framework of human and foodborne L. monocytogenes from China, and to reveal genomic sequence diversities of L. monocytogenes. The data should assist in a better understanding of the evolution and genetic diversity of L. monocytogenes.

Bacterial Strains
To determine the genetic diversity and to identify the genetic characteristics of L. monocytogenes, 12 clinical strains belonging to different PFGE profiles were selected among 50 strains from Shanghai, China (2004 (unpublished data) ( Table 1). Moreover, five strains of L. monocytogenes from food were included to investigate a possible connection of food to the clinical cases. The strains represented serotypes 1/2a (n = 8), 1/2b (n = 5), and 1/2c (n = 4).

Genomic Analysis
In addition to the 17 genomes, 24 publicly available L. monocytogenes genomes were selected to determine the evolutionary relationship among L. monocytogenes ( Table 2). Single nucleotide polymorphisms (SNPs) were identified based on core genome alignments using progressive Mauve (Darling et al., 2010) and customized in-house script. Clonal complexes were determined by multilocus sequence typing (MLST) analysis using seven housekeeping genes, including abcZ, bglA, cat, dapE, dat, Idh, and IhkA (Cantinelli et al., 2013). To reconstruct evolutionary relatedness among the genomes, we used Genetic Algorithm for Rapid Likelihood Inference (GARLI 2.0) to perform a maximum likelihood analysis (Zwickl, 2006) with 1000 bootstrap replicates and GTR+I+G nucleotide substitution model based on the SNPs we identified. Pairwise distance matrix with the number of nucleotide differences was calculated using MEGA 5.10 (Tamura et al., 2011) with 10,000 bootstrap replications. CRISPR arrays and cas gene clusters were identified using CRISPRFinder (Grissa et al., 2007). Stand-alone blast (blast 2.27+) (Altschul et al., 1990) was used to determine the presence/absence of 117 virulence related genes, 150 non-coding RNAs (Izar et al., 2011), and other gene clusters. Strain specific elements within subgroups were identified using cluster_smallmem command in the USEARCH package (Edgar, 2010) with 90% identities as threshold. The genome organization comparison for subgroup IIk was displayed using BRIG 0.95 with 90 and 70% as upper and lower identity threshold, respectively (Alikhan et al., 2011).

RESULTS
Our findings revealed the phylogeny of 41 L. monocytogenes strains from diverse sources and geographic locations, and provided insights into their sequence diversities. The size of draft genomes of the 17 strains from Shanghai, China (Table 1) ranged from 2.86 Mb (SHL013) to 3.12 Mb (SHL002) ( Table 2). SHL013 contained the lowest number of genes (n = 2821) whereas SHL002 had the highest number of genes (n = 3113). The average genome size in lineages I and II was 2.93 Mb and 2.98 Mb, respectively.
Lineage II strains displayed a divergent structure between subgroups (Table S1). Eight 1/2a strains from China were scattered in different subgroups with large SNPs differences. SHL013 recovered from an infant who died of listeriosis had more than 24,000 SNPs differences compared to other genomes except FSL J2-003. Another lethal strain, SHL009, also showed more than 20,000 SNPs differences compared to the rest genomes. Several foodborne and clinical strains were genetically closed. However, there was no epidemiological data available to make any foodborne illness connection. For example, the difference was 60 SNPs (±4 SNPs) between SHL001 (human) and SHL017 (bean), and only 11 SNPs (±2 SNPs) between SHL004 (human), and SHL015 (beef) ( Table S2). Serotype 1/2c strains showed a clonal structure and the differences between the 1/2c genomes were no more than 300 SNPs (Table S2).
Based on SNPs difference, several strains from China appeared to have a close evolutionary relationship to those from North America. SHL007 to SHL008 had no more than 120 SNPs difference compared to FSLJ1-175 and R2-502 from the United States (Table S2). Clinical strain SHL004 displayed 94 SNPs (±7 SNPs) and 93 SNPs (±7 SNPs) differences compared to strains 08-5578 and 08-5923, respectively, which were recovered from a large foodborne outbreak in Canada in 2008 (Table S1).

Phylogenetic and Comparative Genomic
Analyses between Strains from China (SHL004 and SHL015) and the Outbreak Strains (08-5578 and 08-5923) of Canada SHL004 and SHL015 were closely clustered with 08-5578 and 08-5923 in subgroup IIk (Figure 1). They belonged to CC8 that was identified as epidemic clone V (ECV). The SNPs differences between these genomes ranged from 11 to 97 SNPs (Table S2).
FIGURE 1 | Phylogenetic tree of 41 L. monocytogenes strains using whole genome sequencing data and distribution of important genetic elements.
More than 154,000 SNPs were used to construct the maximum likelihood phylogenetic tree for all compared genomes. The bootstrap value for each node was listed. The total number of present genetic elements examined was listed in the first column for each genome.

Distribution of Important Genetic Elements
Of 117 genes related to virulence, metabolism, and regulations determined (Table S3), 101 genes were present in all 41 genomes. We also found genes were present only in certain subgroups. For example, lmo0150 (hypothetical protein) was present in subgroups IIa, IIi, IIj, and IIk; lmo0471 (hypothetical protein) was found in subgroups IIi and IIj. Additionally, we identified strain specific genes within subgroups via comparing the genomes in the same subgroup ( Table S4). The distribution of other genetic elements follows.

Stress Survival Islet (SSI-1)
SSI-1 was present in both lineages (Ic, Id, IIe, IIg, IIi, IIj, and IIk; Figure 1). The common ancestor of subgroups Ic and Id may have acquired SSI-1. The same happened to subgroups IIi, IIj, and IIk. In contrast, it seems that the ancestors of subgroups IIe and IIg obtained SSI-1 independently. A gene encoding transcriptional regulator was identified upstream of SSI-1. There was no other inserted sequence found in those without SSI-1 (Hein et al., 2011).

Internalins
The number of internalins in the genomes examined ranged from 5 (F6854 and LO28) to 10 (SHL013, SHL015, and subgroups IId, IIe, IIg, IIh; Figure 1). A gene cluster encoded four internalins including inlG, inlH (inlC2), inlD, and inlE (5 ′ to 3 ′ ). InlH and inlE were present in all genomes whereas inlG and inlD were found in different subgroups (Figure 1). InlG was present in the lineage II subgroups except IIb and IIc. However, lineage I strains did not contain inlG. Gene inlD was present in both lineages but subgroups IIi and IIj. SHL009 and SHL013 carried all four genes. Additionally, the presence of inlC, inlF, inlI, and inlJ were inconsistently scattered between different subgroups (Figure 1). SHL015 possessed all four internalins; FIGURE 2 | Evolutionary model of subgroup IIk divided into Canadian and Chinese lineages. The last common ancestor of subgroup IIk carrying plasmid pLM5578 was divided into two lineages. Both horizontal gene transfer and mutation play important roles for the divergence, such as the loss of plasmid and phage B025, and the acquisition of LGI-1.
however, SHL004 contained only inlC, which appeared to be lost independently four times on the phylogeny (Figure 1).

Secretion System Proteins
Secretory pathway components of six identified secretion systems were present in all genomes except that certain genes were absent in some genomes ( Table S6). The tatA gene was found exclusively in lineage II whereas tatC was present in lineage II and subgroup Id. EsaC was found in subgroups Ia, Ib, Id, IIi, and IIj. SecE was identified in all strains but F6854. LspB was only identified in EGDe.
LGI-1 exclusively existed in 08-5578 and 08-5923 in subgroup IIk. An insertion encoding 50 genes was identified at the same locus of LGI-1 in subgroup Ic except R2-502, termed as LGI-2 (Table S8). LIG-2 originated from phage and did not contain virulence genes based on the current annotation.

ComK Prophage Junction Fragment
The comK phage insertions were present in 12 strains from China (Figure 1). These insertions carried divergent components with both indels and substitutions. Some strains in the same subgroup contained identical comK phage insertions, such as SHL001 and SHL017, SHL004, and SHL015. In contrast, subgroup Ib strains SHL007 and SHL012 carried different components in the comK junction fragment.

CRISPR/cas System
Some L. monocytogenes strains from the same subgroup contained identical CRISPR spacer arrays, such as subgroup Id (Table S9). EGDe (1/2a, subgroup IIi) contained the same spacers as 1/2c strains (subgroup IIj). Several strains from China contained identical spacers that were different from those of the rest strains in the same subgroup, such as subgroup Ic strains SHL002 and SHL008 ( Table S9). The number of spacers in different strains varied. There were 68 spacers in SHL001 and SHL017, but no spacer present in SHL009.

DISCUSSION
In the present study, we sequenced 17 L. monocytogenes strains recovered from humans and food in China and built a phylogenetic framework for L. monocytogenes using these genomes against publicly available data from diverse sources and locations. L. monocytogenes serotypes 1/2a and 1/2b displayed a divergent structure. In contrast, serotype 1/2c showed a clonal structure with small SNPs differences (Figure 1, Table S2). The strains from China contained extensive diversification, suggesting the evolutionary diversity and complex of L. monocytogenes in China.
Our findings provided detailed and comprehensive information on L. monocytogenes evolution and diversity. Four genomes belonging to CC8 were grouped together, SHL004 (human), SHL015 (beef), and two outbreak isolates 08-5578 and 08-5923 in Canada (Gilmour et al., 2010). Intriguingly, the isolation times for these two pair strains were close, September and August 2008, respectively. The proposed evolutionary model (Figure 2) suggests this L. monocytogenes clone complex may have circulated globally and the strains from different geographic locations have split into two subgroups.
Clinical strains of serotype 1/2b recovered from different years in Shanghai, China, (SHL002 in 2007 andSHL008 in 2012;SHL007 in 2011 andSHL012 in 2010) showed close genetic relatedness, suggesting these clones remain and circulate locally. SHL002 and SHL008 shared most recent common ancestor. SHL007 and SHL012 also originated from the same ancestor. Additionally, clinical strains of serotype 1/2a were grouped together with those from foods (SHL015 from beef and SHL017 from bean). Such an association indicated a possible foodborne transmission of listeriosis to humans.
The distribution of virulence factors indicated variations in virulence potentials and evolutionary histories in different subgroups. The mean numbers of virulence factors examined in subgroups Ic and IIg (Figure 1) were higher than other subgroups suggesting a link between these genetic factors and observed differences in pathogenicity, survival, and risk in causing diseases.
Internalins play essential roles in host-cell interactions of L. monocytogenes (Bierne et al., 2007) and the presence and distribution of internalins relate to differences in virulence potential (Rychli et al., 2014). InlC is essential in liver infection, cell-to-cell spread, and interactions with host cells (Leung et al., 2013). Since inlD is related to invasion activity (Seveau et al., 2007), serotype 1/2a and 1/2b strains are likely possess greater invasion ability than 1/2c strains. The distribution of internalins such as inlD and inlG suggests that formation of internalin clusters may be a multiple-step event and their acquisition could be related to divergence of these subgroups.
Genomic islands also play a vital role in survival and virulence of L. monocytogenes and their distribution appears to underpin different phenotypes observed in virulence among various subgroups. SSI-1, which was present in several subgroups (Ic, Id, IIe, IIg, IIi, IIj, and IIk), contributed to survival under low pH and high salt concentration (Ryan et al., 2010) and help L. monocytogenes to pass through the stomach and gut (Rychli et al., 2014). Moreover, LIPI-3 is likely to be acquired independently in subgroups Ia and Ic, known to encode hemolytic and cytotoxic factors as well as contributing to virulence of L. monocytogenes (Cotter et al., 2008). A total of 29 genomes including 12 from China contained comK phage junction fragment. It is noteworthy that the sequence variation in comK junction fragment may account for rapid adaptation and persistence in food processing plants and the contamination of RTE meats (Verghese et al., 2011).
The ncRNAs distribution also highlighted potential differences in L. monocytogenes pathogenesis . rli29, rli85, and rliC were present in EGDe (IIi) and 1/2c strains (IIj), all of which related to intracellular up-regulation in macrophage, although rliC was reported down-regulated in lineage III strains (Deng et al., 2010). These three ncRNAs are likely to be acquired by the common ancestor of subgroups IIi and IIj. Certain subgroups (IIc, IId, IIg, IIi, IIj, and IIk) carried rli50, documented previously as being involved in virulence in mice cells (Toledo-Arana et al., 2009;Mraheil et al., 2011).
In addition, strains in the same subgroups with small core SNP differences had different phenotypes in virulence due to possibly some strain specific genes (Table S4). SHL007 contained genes encoding hemolysins (locus tags: I615_15111 and I615_15116) compared to SHL012; SHL013 carried multidrug efflux protein (I622_01390), ABC transporter proteins (I622_03330 and I622_03805) and flagella proteins (I622_05059 and I622_05069) compared to FSL J2-003.
The absence of cas gene clusters was not uncommon in L. monocytogenes (Hain et al., 2012;Kuenne et al., 2013). This lack of enzyme encoding DNA in these strains would likely render the entire CRISPR system unfunctional, which could facilitate the strains in these subgroups in acquiring foreign genetic elements including antibiotic resistance genes (Palmer and Gilmore, 2010). As the difficulty of assembling cas genes region in our draft genome, the findings here may be inconsistent.
However, as we got draft genomes for all strains, the limitation for the current study includes sequence gap, sequence error, and could achieve better genome quality with alternative assembly methods.
In summary, L. monocytogenes strains from China displayed a divergent population structure. Links between clinical and food strains, and a possible connection of strains from China and those associated with outbreak in another country indicate that whole genome sequencing data provide valuable information in public health and basic research. The differences in the distribution of virulence factors among L. monocytogenes from various sources highlighted variations in pathogenicity and the importance of horizontal gene transfer in the evolution and divergence of L. monocytogenes. The great resolution and power of whole genome sequencing advances a better and deeper understanding of the origins, emergence, and relationship of the dangerous foodborne pathogens.

DATA DEPOSITION
This project has been deposited at the Nucleotide database at NCBI under the GenBank accession numbers listed in Table 2.
Table S1 | Pairwise distance matrix of SNP number differences with stand deviation for 11 subgroups of compared genomes.