Whole-genome Sequencing for Surveillance of Invasive Pneumococcal Diseases in Ontario, Canada: Rapid Prediction of Genotype, Antibiotic Resistance and Characterization of Emerging Serotype 22F

Background: Molecular typing is essential for inferring genetic relatedness between bacterial pathogens. In this study, we applied whole genome sequencing (WGS) for rapid prediction of sequence type and antibiotic resistance for invasive pneumococcal isolates. Methods: 240 isolates from adults (≥50 years old) in Ontario, Canada during 2009 to 2013 were subjected to WGS. Sequence type, antibiotic susceptibility and resistance were predicted directly from short reads. Emerging non-vaccine serotype 22F was further characterized by WGS. Results: Sequence type was successfully determined for 98.3% of isolates. The overall sensitivity and specificity for antibiotic resistance prediction were 95 and 100% respectively, compared to standard susceptibility testing methods. WGS-based phylogeny divided emerging 22F (ST433) strains into two distinct clades: clade A harboring a 23 kb-prophage and anti-phage PhD/Doc system and clade B with virulence-related proteases. Five isolates in clade A developed macrolide resistance via 5.1 kb mega element recombination (encoding mefE and msrD), while one isolate in clade B displayed quinolone resistance via a gyrA mutation. Conclusions: WGS is valuable for routine surveillance of pneumococcal clinical isolates and facilitates prediction of genotype and antibiotic resistance. The emergence of 22F in Ontario in the post-vaccine era and evidence of evolution and divergence of the 22F population warrants heightened pneumococcal molecular surveillance.


INTRODUCTION
Invasive pneumococcal diseases (IPD) such as septicaemia and meningitis caused by Streptococcus pneumoniae result in a heavy burden of disease among children and adults worldwide (Ausina et al., 1988;O'Brien et al., 2007). After widespread implementation in 2002 of the 7-valent pneumococcal conjugate vaccine (PCV7, covering serotypes 4, 6B, 9V, 14, 18C, 19F, and 23F) in Canadian children, the incidence of IPD due to PCV7 serotypes decreased significantly in all age groups within the Canadian population (Kellner et al., 2009). In 2010 the new PCV13 vaccine (PCV7 serotypes plus 1, 3, 5, 6A, 7F, and 19A) replaced the PCV7 vaccine for routine immunization of children (Desai et al., 2010). Shortly after implementation of PCV13 in Canada, the number of IPD cases due to vaccineassociated serotypes such as 19A and 7F declined (Deng et al., 2015). However, the concurrent increase of IPD due to nonvaccine serotypes 22F, 15A, 33F, and 35B post-introduction of PCV13 has become a concern to clinicians (Deng et al., 2015;Golden et al., 2015;Duvvuri et al., 2016). In addition to use in children, the PCV13 vaccine was also approved by Canadian health authorities in 2012 for use among adults 50 years and older for the prevention of IPD (An Advisory Committee Statement (ACS), 2013). Currently it is also recommended for immunocompromised adults aged ≥ 18 years with high risk of IPD (An Advisory Committee Statement (ACS), 2013; Tomczyk et al., 2014).
Streptococcus pneumoniae is naturally competent, so it can uptake DNA from environment and undergo horizontal gene transfer between strains. Some recombination events such as deletions, insertions and inversions are frequently observed in the pneumococcal genome. In nature, horizontal gene transfer facilitates bacterial evolution, providing S. pneumoniae with easy access to crucial resistance genes and/or virulence factors. A recent analysis of the whole genome sequences of 240 Pneumococcal Molecular Epidemiological Network (PMEN1) pneumococcal strains revealed an exceptionally high recombination/mutation ratio of 7.2, and an average of 72 single nucleotide polymorphisms (SNPs) per recombination site (Croucher et al., 2011).
Molecular typing of Streptococcus pneumoniae is important for delineating genetic structure of bacterial populations and useful for inferring evolutionary relationships between isolates. For years, pulse-field gel electrophoresis (PFGE) that is based on motility variations of restriction fragments was considered the "gold standard" for subtyping of bacterial strains (Tenover et al., 1995). However, this method is technically demanding and time-consuming, and is thus not widely used for epidemiological surveillance. Multi-locus sequence typing (MLST) that measures DNA sequence variations for multiple housekeeping genes gradually replaced PFGE as a popular typing tool in the 21st century (Maiden et al., 1998). However, MLST has its shortcomings, as it is time-consuming, labor intensive and has higher cost than PFGE (Sabat et al., 2013). Recently, next-generation whole genome sequencing (WGS) has become a powerful and attractive tool for rapid typing of bacterial isolates, such as for E. coli (Joensen et al., 2014) and group A streptococcus (Athey et al., 2014). In addition, WGS has been evaluated for rapid prediction of drug susceptibility and resistance in Mycobacterium tuberculosis (Walker et al., 2015) and Staphylococcus aureus isolates (Gordon et al., 2014). With the cost of WGS continuing to decline (to a price close to or lower than MLST), it has the potential of replacing currently used typing methods for clinical isolates in the near future.
To date, MLST remains the primary method for largescale typing of S. pneumoniae in the clinical laboratory. In this surveillance study, we employed high-throughput wholegenome sequencing for rapid prediction of MLST and antibiotic resistance in silico and further characterization of emerging nonvaccine serotypes of S. pneumoniae following the introduction of PCV13 in Ontario, Canada.

Clinical Pneumococcal Isolates and Antimicrobial Susceptibility Testing
Invasive S. pneumoniae isolates (1956 in total, approximately 400 isolates per year) were collected from older adults (≥50 years of age) in Ontario, Canada between 2009 and 2013. Submission of IPD isolates to Public Health Ontario Laboratories (PHOL) for further evaluation is encouraged but not mandatory. Geographic mapping of pneumococcal disease cases in Ontario older adults during 2009-2013 was done with geographic information system software QGIS (Version 2.8, http://www.qgis.org/). First 3 digits of postcode of each patient's residential address were used for the geographic mapping. Clinical isolates were recovered from sterile sites (including blood, cerebrospinal fluid and pleural fluid) and sent to the National Microbiology Laboratory (NML) in Winnipeg, Canada for serotyping using the Quellung reaction. Broth microdilution panels (Thermo Scientific, Waltham, MA, USA) were used for antimicrobial susceptibility testing for penicillin, erythromycin, clindamycin, fluoroquinolones, tetracycline, and cephalosporins. Minimum inhibitory concentrations (MICs) for each antibiotic were determined according to 2015 Clinical and Laboratory Standards Institute (CLSI) guidelines.

Strain Selection for Whole Genome Sequencing
We randomly selected 48 strains per year from 2009 to 2013 (240 isolates in total) from all available isolates in our S. pneumoniae database. The number of isolates selected per serotype was proportional to all isolates in that year, with the exclusion of rare serotypes that had a frequency <1.5%. Illumina multiplex sequencing strategy was used for sequencing the 240 isolates using high-throughput processing. Genomic DNA of clinical isolates was extracted using QIAamp DNA mini kits (Qiagen, Germany). A library of multiplexed samples (96 indexes for Hiseq and 12-24 indexes for Miseq) was prepared according to standard protocols using the Nextera XT kit. Genomic libraries of 192 samples were sequenced using Illumina Hiseq 2500 (100 bp paired end, Illumina, San Diego, USA), 44 samples were sequenced using Illumina Miseq (150 bp paired end), and 4 samples were not sequenced due to poor quality of library.

Prediction of MLST Type and Antibiotic Resistance from Short Reads
The MLST sequence type was determined from short reads for each isolate using SRST2 software (Inouye et al., 2012). Genetic determinants for resistance to antibiotics including chloramphenicol, macrolides, clindamycin and tetracycline were deduced from short reads using SRST2 software by comparison with a database of resistance genes (https://github.com/katholt/srst2). The predicted susceptibilities by whole-genome sequencing were compared with phenotypic drug susceptibility test results. When there was a mismatch between phenotype and genomic prediction, phenotypic drug susceptibility tests were repeated for those isolates. Two types of errors were defined for the susceptibility portion of this study: a very major error (VME) referred to a susceptible genotype with a resistant phenotype, and a major error (ME) was defined as a resistant genotype with a susceptible phenotype.

Genomic Characterization of Emerging Non-Vaccine Serotype 22F
Genomes for 22F isolates were assembled de novo using Velvet 1.2.07 (Zerbino and Birney, 2008) and VelvetOptimiser (mean coverage 208x). The draft genomes had an average of 333 contigs and mean N50 of 61885 bp. The draft genomes were annotated automatically with the RAST pipeline (rast.nmpdr.org). The draft genomes of 22F have been deposited at DDBJ/EMBL/GenBank under accession MLG(H-Z)00000000 and MLH(A-F)00000000. Raw Illumina sequences of 236 isolates have been deposited at SRA/NCBI under accession SRR3211687-729, SRR5011653-846. Core genomes and accessory genomes of 22F were predicted using Pan-Genomes Analysis Pipeline (PGAP) (Zhao et al., 2012) that utilizes BLAST command line tools (NCBI), InParanoid (O'Brien et al., 2005) and MultiParanoid (Alexeyenko et al., 2006) software. The core genes shared by all 22F isolates were retrieved and concatenated to make a "core" genome. MEGA5 (Tamura et al., 2011) was used to align these core genome sequences and to visualize single nucleotide polymorphism (SNP) variants. The genome sequence alignments were imported into BratNextGen (Marttinen et al., 2012) to infer regions of recombination and for clustering analysis (grouping strains into subclusters). Gaps and SNPs due to recombination were removed. The core genome alignment file of 22F isolates was uploaded to CIPRES server (www.phylo.org) for phylogenetic analysis. A phylogenetic tree was constructed with RAxML (Stamatakis, 2014) using a GTR model with a gamma correction for site rate variation. The online tool, Interactive Tree of Life (Letunic and Bork, 2011) was used for manipulation of phylogenetic trees. For comparative genomic analysis, the accessory genomes of different groups (e.g., drug-resistant and susceptible isolates) of 22F were compared in order to identify genetic features unique to any identified groups. Z-test was used to predict significance of difference between the proportions of strains in the two groups possessing a gene of interest. P < 0.01 was considered statistically significant.

Statistical Tests
Chi-square test was used to determine the statistical difference between the proportions of emerging serotypes (i.e., 22F) relative to total isolates in each year between 2009 and 2013 (5 proportions compared using 2 × 5 contingency table). P < 0.05 was considered significant and a simple Z-test was used to compare a pair of proportions (P < 0.05 is considered significant).

Emergence of Non-PCV13 Serotypes Post-Vaccination
Although, IPD isolates were voluntarily submitted to PHOL from regional health units, distribution map of clinical samples indicated a good representation of geographic regions in the province of Ontario ( Figure S1). Overall, during 2009-2013 IPD cases in older adults (≥50 years of age) due to PCV13 serotypes showed the trend of decline (chi-square test p < 0.01 between five ratios), while IPDs due to non-PCV13 serotypes had the trend of increase (chi-square test p < 0.01) (Figure 1). Non-PCV13 serotype 22F increased from 8.5% in 2009 to 14.5% in 2013 (p < 0.01 for chi-square test, and p = 0.0061 bewteen 2009 and 2013) (Figure 2). In 2013, serotype 22F surpassed 19A and 7F, and became the most common S. pneumoniae serotype identified among IPD isolates from Ontario adults.

Sequence Type Prediction In silico and Clonal Lineage Analysis
WGS was performed on a total of 236 IPD isolates (48 isolates per year, with four strains excluded due to extremely low concentrations of DNA following library preparation). MLST sequence type was successfully predicted from Illumina short reads directly for 232 strains, giving a success rate of 98.3% (232/236). The four isolates that failed MLST prediction had low sequence coverage (less than 10, ranging from 2 to 8). A  total of 90 different sequence types (ST) were identified and these STs could be further divided into 17 clonal groups and 39 singletons (individual STs) through eBURST analysis. The emerging post-PCV13 serotype 22F was exclusively associated with clonal complex CC433 that include ST433 (25/26, 96%) and ST4533 (1/26, 4%) a single allele variant of ST433. CC63, CC66 and CC100 were the major (>80%) CCs identified among 15A, 9N, and 33F, respectively ( Table 1).

In silico Antibiotic Resistance Prediction and Comparison with Phenotypic Resistance Testing
The presence of acquired genes (Table S1 in supplementary material) including cat, mefA, msrD, ermB, and tetM that are associated with phenotypic resistance was predicted directly from short reads for the IPD isolates sequenced. 226 (95.8%) isolates had complete concordance between genotype and phenotype for the antimicrobial agents tested. The remaining 10 isolates had a total of 11 discrepancies between genotype and phenotype (Table S2 in supplementary material). Repeated susceptibility testing for the isolates with genotype/phenotype mismatches reduced the number of discrepancies from 11 to 7, giving a revised concordance rate of 97%. There were five very major errors (0.55% VME rate) and two major errors (0.22% ME rate) for antibiotic resistance prediction of S. pneumoniae isolates. The overall sensitivity and specificity for predicting phenotypic resistance to the antimicrobials tested using genomic data were 95% (95% CI, 90 to 97%), and 100% (95% CI, 99 to 100%), respectively ( Table 2).

Phylogenetic Analysis of 22F Reveals Two Divergent Subpopulations
Of 26 serotype 22F isolates, 25 (96%) belonged to the same sequence type, ST433. Core genome alignment (1581 genes) of 22F-ST433 clinical isolates identified a total of 6308 SNPs. After removal of gaps and putative recombination regions predicted by BratNextgen, 3886 high-quality SNPs (2212 parsimony informative sites) were detected from 789,333 sites. A maximum likelihood phylogeny (un-rooted tree) was constructed. All 22F-ST433 isolates can be divided into two sub-clusters: clade A and clade B (Figure 3). In clade A, five isolates that confer resistance to macrolides (both erythromycin and azithromycin) are closely related to each other, forming a sub-cluster clade A1. All isolates in clade B are susceptible to macrolides and other antibiotics, except for isolate SP194, possessing resistance to levofloxacin (MIC 4 µg/mL) and moxifloxacin (MIC 2 µg/mL). Further comparison of quinolone resistance determinant regions (QRDR) between quinolone-resistant and sensitive strains found a non-synonymous mutation on gyrA gene (S81F) only.  An un-rooted maximum-likelihood tree based on the core genome with putative recombination sites excluded. Two clades are revealed by the tree: clade A and clade B. Clade A include two sub-clades: clade A1 (five isolates) and clade A2 (four isolates). All five isolates in clade A1 were resistant to macrolides (shown with red circle at the tip), and one isolate from clade B was resistant to fluoroquinolones (pink pentagon at the tip). * indicates100% bootstrap support. Each isolate at the tip was labeled as: Serotype-isolate ID-year of isolation.

Accessory Genome Comparison Discloses Clade-Specific Genes in 22F
Accessory genomes of clade A and clade B isolates of 22F were compared, and presence or absence of accessory genes in each group was determined. A total of 39 genes were unique to clade A (100% presence in this group, but absent in the other), while clade B had 8 clade-specific genes ( Table 3). Thirteen genes (combined length of 22,515 bp) unique to clade A were contiguous to each other (loci SP18.166056.587-610 and SP18.166056.1991-98) in the genome and encoded for a variety of bacteriophage proteins. In addition, all clade A strains possessed Type I restriction system and PhD/Doc toxin-antitoxin (TA) system in their genomes (PhD-Doc locus is surrounded by Type I restriction system and integrase). Five (55.6%) clade A isolates that form clade A1 subclade contained a gene cassette of 5163 bp (loci SP29.152236.85-92) that encode for macrolide resistance determinants mefE and msrD. The 5.1 kb gene cassette shared 96% similarity with mega (macrolide resistance genetic assembly)

DISCUSSION
In a previous study we used the Sanger-sequencing based MLST method for the molecular typing of S. pneumoniae isolates from Ontario children (Deng et al., 2015). However, MLST typing is time consuming and very labor intensive. For example, to type 100 pneumococcal isolates using seven gene loci, 700 PCR amplifications plus 1400 Sanger sequencing reactions (forward and reverse direction) are required. Additional repeated sequencing is often needed for ambiguous bases in a sequence due to a variety of reasons such as poor DNA quality and contaminations. Compared to MLST and PFGE, WGS has the highest discriminatory power and has great potential for largescale typing of bacterial isolates (Joensen et al., 2014). In this surveillance study, we used WGS for rapid determination of sequence type and antibiotic resistance for IPD isolates submitted to a public health laboratory. Illumina high-throughput genome sequencing combined with multiplexing allows rapid sequencing of hundreds of samples in a short period of time (1-5 days in rapid run mode). Our results suggest that MLST type could be accurately predicted from Illumina short reads directly for S. pneumoniae, provided genome sequences have fairly good sequencing depth (at least 10 or better). The overall sensitivity and specificity of genomic prediction of antibiotic resistance for S. pneumoniae were 95% (95% CI, 90 to 97%), and 100% (95% CI, 99 to 100%) respectively, compared to standard susceptibility testing methods. Ninety-seven percent of samples had complete concordance between genotype and phenotype for the antimicrobial agents tested, with low rates of VME (0.55%) and ME (0.22%). These results suggest that WGS prediction of antibiotic resistance mechanisms acquired through horizontal gene transfer events was as sensitive and specific as routine antimicrobial susceptibility assays for pneumococci. However, it is worth noting that only four antibiotics were tested in this study, and our observations in pneumococci may not apply to other bacterial pathogens. In addition, this WGS method is currently unable to reliably predict some resistance mechanisms due to point mutations on gene loci. For example, penicillin resistance among pneumococcal isolates resulting from mutations in penicillin-binding proteins (PBPs) or non-PBP genes cannot be predicted. Although, current phenotypic susceptibility assays are likely to be more reliable for routine testing, WGS methods may be useful for slow growing organisms, or organisms that cannot be cultured.
We have detected some VME and ME errors when comparing WGS methods with standard phenotypic susceptibility tests. Possible explanations for VMEs (genotype-susceptible but phenotype-resistant) may be sequencing error, alternative resistance mechanisms that are not reflected in the resistance gene database and novel resistance mechanisms not previously described. MEs (phenotypically susceptible isolates that contain resistance genes) may be the result of suppressed or delayed gene expression of resistance genes, since expression of a gene is a complex process that involves the interactions between repressors, promoters and other regulatory networks.
In our previous study we reported the increase of 22F prevalence among children (≤ 5 years of age) in Ontario in the post-PCV13 era (Deng et al., 2015). In the present study a significant increase in 22F-associated IPD cases was observed in older adults in Ontario (≥ 50 years old) after the introduction of PCV13. In 2013, 22F was the most common serotype detected in IPD among all ages in Ontario, Canada. The emergence of 22F in the post-vaccine era was also reported in other regions of Canada (Demczuk et al., 2013), and in other countries such as the USA (Kendall et al., 2016), UK (Pichon et al., 2013), Germany (van der et al., 2015), and Norway (Steens et al., 2015). An extensive review of serotype 22F in a MLST database found that it was broadly distributed in over 25 countries and the earliest report of IPD caused by 22F was in 1993 (MLST Databases, 2016). According to the public MLST database, 22F capsular type is mainly associated with the MLST-defined clonal complex CC433 (over 70%), ST1294 (3.3%) and CC698 (3.1%) (MLST Databases, 2016). Although, 24/25 (96%) Ontario 22F isolates mostly shared identical genotype (ST433), WGSbased phylogenetic analysis divided the 22F population into two distinct clusters (clade A and B in Figure 2) that were diverse in their genomic contents. The discovery of clade-specific gene sets by comparative genomics suggests the diversification of 22F-ST433 clone. Interestingly, all strains in clade A shared a 23kb-prophage-associated genomic island that clade B lacked. This chromosomal island comprises 30 genes including lysozyme genes downstream from the phage minor tail protein gene. To combat the bacteriophage invasion, clade A strains possessed anti-phage weaponry of Type-I restriction system that degrades phage DNA (Loenen et al., 2014) and PhD/Doc TA system which promotes cell death and limits phage replication within bacterial populations (Fineran et al., 2009). In addition, clade A1 (5 isolates) acquired macrolide resistance through incorporation of a 5.1 kb mega gene cassette that encodes for mefE and msrD resistance genes possibly via horizontal gene transfer. Antibiotic resistance obtained by 22F might provide selective advantage for them over other drug-susceptible competitors. In the current study 19.2% of 22F isolates carried this macrolide-resistant gene cassette; further dissemination of drug resistance among 22F should be closely monitored. 22F isolates in clade B possessed CAAX proteases that may be virulence-related (Firon et al., 2013) and their exact functions are unknown. One strain in clade B, isolated in 2013, demonstrated fluoroquinolone resistance through a de novo gyrA gene mutation (S81F). These gene differences between distinct clades suggest ongoing evolution and divergence of 22F clone through genetic recombination possibly due to rapid adaptation to environmental stress and selection pressure.
In conclusion, our study suggests that WGS is a valuable tool for routine molecular typing of pneumococcal isolates in the clinical laboratory, including antibiotic resistance prediction.
The emergence of 22F in Ontario, Canada and its population divergence warrants heightened surveillance. Before routine use of WGS within clinical laboratories, certain barriers need to be addressed such as sequencing quality assurance, the need for automated WGS data analysis tools, and a shortage of qualified personnel with the necessary skills for data interpretation.

AUTHOR CONTRIBUTIONS
NM prepared DNA library for genome sequencing. XD performed data analysis and prepared the manuscript. ST and TA performed bioinformatic analysis. MI performed geographic mapping of isolates. TM helped data analysis and manuscript writing. NF and JG designed and supervised this project.

FUNDING
This study was supported by Pfizer, Canada and Public Health Ontario.

ACKNOWLEDGMENTS
We thank Dax Toti at the Genome Sequencing Center at University of Toronto and Aimin Li at the Genome Core of Public Health Ontario (PHO) for genome sequencing. We also thank the staff at National Microbiology Laboratory of Canada for serotyping Streptococcus pneumoniae isolates.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.02099/full#supplementary-material Figure S1 | Geographic distribution of invasive pneumonia disease cases among Ontario older adults (≥50 years of age) between 2009 and 2013.
Table S1 | List of acquired genes predicted in silico from genome sequences that correlates with phenotype resistance.