Strain Level Streptococcus Colonization Patterns during the First Year of Life

Pneumococcal pneumonia has decreased significantly since the implementation of the pneumococcal conjugate vaccine (PCV), nevertheless, in many developing countries pneumonia mortality in infants remains high. We have undertaken a study of the nasopharyngeal (NP) microbiome during the first year of life in infants from The Philippines and South Africa. The study entailed the determination of the Streptococcus sp. carriage using a lytA qPCR assay, whole metagenomic sequencing, and in silico serotyping of Streptococcus pneumoniae, as well as 16S rRNA amplicon based community profiling. The lytA carriage in both populations increased with infant age and lytA+ samples ranged from 24 to 85% of the samples at each sampling time point. We next developed informatic tools for determining Streptococcus community composition and pneumococcal serotype from metagenomic sequences derived from a subset of longitudinal lytA-positive Streptococcus enrichment cultures from The Philippines (n = 26 infants, 50% vaccinated) and South African (n = 7 infants, 100% vaccinated). NP samples from infants were passaged in enrichment media, and metagenomic DNA was purified and sequenced. In silico capsular serotyping of these 51 metagenomic assemblies assigned known serotypes in 28 samples, and the co-occurrence of serotypes in 5 samples. Eighteen samples were not typeable using known serotypes but did encode for capsule biosynthetic cluster genes similar to non-encapsulated reference sequences. In addition, we performed metagenomic assembly and 16S rRNA amplicon profiling to understand co-colonization dynamics of Streptococcus sp. and other NP genera, revealing the presence of multiple Streptococcus species as well as potential respiratory pathogens in healthy infants. A range of virulence and drug resistant elements were identified as circulating in the NP microbiomes of these infants. This study revealed the frequent co-occurrence of multiple S. pneumoniae strains along with Streptococcus sp. and other potential pathogens such as S. aureus in the NP microbiome of these infants. In addition, the in silico serotype analysis proved powerful in determining the serotypes in S. pneumoniae carriage, and may lead to developing better targeted vaccines to prevent invasive pneumococcal disease (IPD) in these countries. These findings suggest that NP colonization by S. pneumoniae during the first years of life is a dynamic process involving multiple serotypes and species.

Pneumococcal pneumonia has decreased significantly since the implementation of the pneumococcal conjugate vaccine (PCV), nevertheless, in many developing countries pneumonia mortality in infants remains high. We have undertaken a study of the nasopharyngeal (NP) microbiome during the first year of life in infants from The Philippines and South Africa. The study entailed the determination of the Streptococcus sp. carriage using a lytA qPCR assay, whole metagenomic sequencing, and in silico serotyping of Streptococcus pneumoniae, as well as 16S rRNA amplicon based community profiling. The lytA carriage in both populations increased with infant age and lytA+ samples ranged from 24 to 85% of the samples at each sampling time point. We next developed informatic tools for determining Streptococcus community composition and pneumococcal serotype from metagenomic sequences derived from a subset of longitudinal lytA-positive Streptococcus enrichment cultures from The Philippines (n = 26 infants, 50% vaccinated) and South African (n = 7 infants, 100% vaccinated). NP samples from infants were passaged in enrichment media, and metagenomic DNA was purified and sequenced. In silico capsular serotyping of these 51 metagenomic assemblies assigned known serotypes in 28 samples, and the co-occurrence of serotypes in 5 samples. Eighteen samples were not typeable using known serotypes but did encode for capsule biosynthetic cluster genes similar to non-encapsulated reference sequences. In addition, we performed metagenomic assembly and 16S rRNA amplicon profiling to understand co-colonization dynamics of Streptococcus sp. and other NP genera, revealing the presence of multiple Streptococcus species as well as potential respiratory pathogens in healthy infants. A range of virulence and drug resistant elements were identified as circulating in the NP microbiomes of these infants. This study revealed the frequent co-occurrence of multiple S. pneumoniae strains along with Streptococcus sp. and other potential pathogens such as S. aureus in the NP microbiome of these infants. In addition, the in silico serotype analysis proved powerful in determining

INTRODUCTION
Invasive pneumococcal disease (IPD) caused by Streptococcus pneumoniae has decreased significantly after implementation of the pneumococcal conjugate vaccine (PCV) (Pilishvili et al., 2010;Tocheva et al., 2011). However, nasopharyngeal carriage of the pneumococcus in children <5 years old appears to continue at roughly 20-30% of the population in the US or Europe (Weatherholtz et al., 2010;Sharma et al., 2013;Fleming-Dutra et al., 2014;Lee et al., 2014). Carriage in low and middle income countries is higher with a pooled average of ∼65% (Adegbola et al., 2014) and up to 75% in South Africa (Nzenze et al., 2014). Results from epidemiologic surveys show that the incidence of capsular serotypes targeted by the vaccine (VT) has decreased, while non-VT serotypes have increased (Huang et al., 2005;Pelton et al., 2007;Sharma et al., 2013). In particular, evidence is emerging that the serotypes targeted in the current vaccines include a lower fraction of the serotypes causing IPD in young children particularly in Asia and Africa compared to the protection afforded young children by the vaccines in developed countries (Hausdorff et al., 2000).
Detection of S. pneumoniae in clinical samples has traditionally been performed using microbiological cultures (Reller et al., 2008) or more recently, by quantitative PCR targeting the autolysin (lytA) gene (Messmer et al., 2004;WHO and CDC, 2011). In addition to detection of the organism from clinical samples, it is important to characterize the capsular serotype, since it has been shown that VT isolates are more likely to cause invasive disease than non-VT isolates (Weatherholtz et al., 2010;Fleming-Dutra et al., 2014). Capsule type is determined by serology using standardized antisera (Reller et al., 2008) or by multiplex PCR approaches that are able to discriminate between 20 and 37 of the more than 90 known capsule types (Satzke et al., 2013). However, these methods are laborious and expensive, and they have the inherent shortcoming that they cannot easily detect several capsular types in a single sample (Satzke et al., 2013). Methods that use high-throughput DNA sequencing have been presented as alternatives for capsular typing (Leung et al., 2012;Ip et al., 2014). These methods have relied on using a PCR enrichment step where the capsule loci are preferentially amplified directly from clinical samples, and thus suffer from similar limitations as multiplex PCR strategies. A more recent typing scheme using reads from whole genome sequence (WGS) data was developed to assign an in silico serotype (Kapatai et al., 2016). Here, we expand on the WGS approach using whole-metagenome sequencing of Streptococcus-enriched cultures and simultaneous development of bioinformatics approaches that clearly identify the capsular type. Our study demonstrates that metagenomics methods for serotyping S. pneumoniae directly from infant samples provide the potential for determining capsule information, the presence of other NP colonizers, and for providing data relating to virulence and drug resistance carriage.

Study Design and Subjects
This study was performed in healthy infants whose mothers delivered at the Research Institute of Tropical medicine associated clinic in Muntinlupa City, Philippines or Chris Hani Baragwanath Hospital in Johannesburg, South Africa between June 2012 and January 2013. All mothers attending the clinics during the recruitment periods at each location were invited to participate in the study and written consent was obtained from all who agreed to participate. The study was approved by the Ethics Committees at both clinical sites and at the J. Craig Venter Institute (JCVI). Children were recruited to participate for 12 months. All of the children in South Africa were vaccinated against pneumococcus using PCV-7 according to the national vaccination schedule (Madhi et al., 2012). The Philippines had not implemented a national vaccination program against pneumococcus so half the children were randomly assigned to receive the PCV-10 vaccine (Rodenburg et al., 2010) vaccine.

Nasopharyngeal Sample Collection and Enrichment Protocol
Sampling was performed according to each infant's scheduled visits: at birth (within 6 h), at the time of their first PCV vaccination (usually 6 weeks old), at the time of their second dose (usually at 14 weeks old), at the time of the last dose (40 weeks old), and at 12 months. Maternal samples were obtained at birth (only South Africa) and at 12 months (both sites). NP samples from infants and mothers were collected by pediatricians in the clinics using Copan Eswabs following manufacturer's instructions. After collection, samples were placed in 1 ml liquid Aimes buffer and stored on ice until delivery to the clinical laboratory. A 200 µl aliquot of NP sample was transferred to 6 ml Supplemented Todd-Hewitt Broth (THB) containing 0.5% yeast extract and 17% rabbit (Philippines) or fetal bovine (South Africa) serum and 10 mg/ml colistin and incubated at 37 • C at 5% CO 2 without shaking for 6 h. Cells were then centrifuged at 9,000 rpm for 10 min and frozen at −20 C. Metagenomic DNA was extracted from this pellet using Qiagen DNeasy Blood and Tissue kit (Qiagen) following manufacturer's instructions. Purified DNA was transferred to QIAsafe DNA tubes (Qiagen), allowed to dry uncovered for 10-12 h in a laminar flow hood, and shipped to JCVI at ambient temperature.

Definition of Carriage by lytA Pcr
The presence of S. pneumoniae was assessed using a lytA qPCR as described (WHO and CDC, 2011) using primers F373: 5 ′ -ACGCAATCTAGCAGATGAAGCA-3 ′ and R424: 5 ′ TCGTGCGTTTTAATTCCAGCT-3 ′ . DNA was amplified using the following program: 95 • C for 10 min, followed by 95 • C for 15 s, 60 • C for 1 min using TaqMan Universal Master Mix on a Biorad CFX96 Real-Rime PCR machine (RITM) or Applied Biosystems 7500 Real-Time PCR system(RMPRU). Samples were considered lytA-positive if the C t value was below 35 (WHO and CDC, 2011).

Metagenomic DNA Sequencing
Only a subset of lytA-positive samples was selected for metagenomic sequencing, where infants were sampled at random with the goals to obtain lytA-positive samples for each representative age and following the pneumococcal population in a subset of infants for the duration of the study. Genomic DNA sequencing libraries were generated using standard library construction (Illumina), adding sample specific barcodes. Sequencing was performed by pooling 8-22 samples in a single 2 × 250 or 2 × 300 MiSeq run to obtain ∼35 million reads per run.

Metagenomic Assembly Pipeline
A pipeline to assemble reads and evaluate assembly content was developed as follows: (1) reads were adaptor and quality trimmed using trimmomatic (Bolger et al., 2014); (2) reads that mapped to the human reference genome GRCh38 (GCA_000001405.15) using bowtie2 version 2.2.7 (Langmead and Salzberg, 2012) with "sensitive" settings were removed; (3) filtered reads were then assembled with metaSPAdes version 3.7.1 (arXiv:1604.03071); and (4) BLAST-based evaluation of taxonomic and serotype content (details below) was conducted across metaSPAdes assembled contigs.

Assembly-Based and Read-Based Taxonomic Analysis
In order of execution, contigs larger than 200 bp from each metagenomic assembly were aligned against (1) a database of common Streptococcus genomes to identify intended host targets (alignments greater than 95% identity); and (2) the human reference GRCh38 to remove ancillary human contigs (alignments greater than 90% identity). Finally, the remaining set of contigs were aligned to the NCBI NT Bacterial Database (ref, link) BLASTN matches with >97% identity over 5% of the contig length were considered a match. The filtered BLASTN output from each sample were combined and then queried to identify the predominant taxa present in the enrichments by compiling all of the occurrences of a given reference genome across the samples. This genome list was then used to build a reference nucleotide database for read-mapping to more quantitatively assess the relative abundance of each taxa in the enrichment samples ( Table S1). The database also included all finished S. pneumoniae genomes. Metagenomic reads were mapped using bowtie2 with very-sensitive settings such that reads could only map once to the reference taxonomic database. Counts of mapped reads to each genome were quantified and were used to assess the relative abundance in different samples.

16S rRNA Community Analysis of the Non-enriched NP Microbiome
To determine the pre-enrichment NP bacterial community composition, 16S rRNA amplicon profiling was performed on the initial sample before the enrichment step. Operational taxonomic units (OTUs) were generated de novo from raw Illumina sequence reads using an in-house analyses pipleline relying on the UPARSE (Edgar, 2013) and mothur (Schloss et al., 2009) open-source bioinformatics tools. Briefly, pairedend reads were trimmed of adapter sequences, barcodes, and primers prior to assembly, followed by discarding low quality reads and singletons. After a de-replication step and abundance determination, sequences were filtered for chimeras and clustered into OTUs. To assign taxonomy, we used the Wang classifier, and bootstrapped using 100 iterations. We set mothur to report full taxonomies only for sequences where 80 or more of the 100 iterations were the identical (cutoff = 80). Taxonomies were then assigned to the OTUs with mothur using version SSU Ref NR 99 version of the SILVA 16S ribosomal RNA database (Quast et al., 2013) as the reference. Tables with OTUs and the corresponding taxonomy assignments were generated and used in subsequent analyses. The resulting matrices were summarized by frequency across species-level resolution.

Assembly-Based in Silico Capsular and Multi-Locus Sequence Typing
The first step for establishing in silico method for serotyping was to create a nucleotide database of serotype sequences. Serotypes were assumed to be predominantly driven by the capsule polysaccharide (cps) locus of the Streptococcus strains. Capsule sequence exemplars were retrieved for all known serotypes from Bentley et al. (2006) andSkov Sorensen et al. (2016). Assemblies were aligned to this reference serotype nucleotide database for in silico serotyping using BLASTn. Sequence alignments greater than 98% identity over 2,000 bp were kept, and top matches of the cumulative alignment length for each serotype were identified via manual curation because in some cases multiple top matches were identified when more than one serotype was present. This was evident by cases in which different contigs had top matches to different serotypes. If no match was identified, metagenomic assemblies were then queried with aliA (NP_357921.1) and dexB (NP_357904.1), the two conserved genes upstream and downstream of cps cluster. The sequence region between these two flanking genes was then extracted from each metagenome assembly and evaluated by BLAST against the nucleotide non-redundant nt/nr database at NCBI to identify the match with the top total score. Multi-locus sequence typing (MLST) was performed on each metagenomic assembly in silico using LOCUST (Brinkac et al., 2017) using the S. pneumoniae MLST scheme at https://pubmlst.org/spneumoniae (Jolley and Maiden, 2010).

Virulence and Antibiotic Resistance Gene Analysis
Contigs from metagenomic enrichment analysis were compared using BLAST alignments against a reference databases containing known antibiotic resistance determinants or virulence factors including S. pneumoniae-specific virulence genes (Zhou et al., 2007;Kadioglu et al., 2008;Liu and Pop, 2009;Mitchell and Mitchell, 2010;Chen et al., 2012;Blumental et al., 2015). BLAST results were filtered for hits that were greater than 90% identical over 80% of the reference length.

lytA-Positive Burden in South Africa and Philippine Infants
A total of 393 nasopharyngeal (NP) samples from 203 infants enrolled in our pediatric microbiome study were analyzed for lytA carriage as a proxy for S. pneumoniae colonization ( Table 1). Most samples represented the first sample immediately after birth, the 6-, and 14-, 40-week, and 12 months since these corresponded to the pediatric visits when the PCV vaccine was administered or were the end-point of the microbiome project. After culture enrichment, the proportion of lytA-positive samples (C T < 35) increased consistently with infant age, ranging from as low as 23.7% at birth to consistently above 85% after 7 months, with very little difference in the lytA-positive rates between the Philippines and South Africa, irrespective of vaccination status. Mother carriage of lytA-positive samples in South Africa was ∼45% while lytA carriage from mothers in the Philippines was nearly 100%.
We obtained longitudinal time points for 93 subjects ranging from 2 to 7 samples per infant (average 3 samples). Thirtythree (35%) of those infants had lytA-positive samples every time they were sampled, including their earliest visit ( Table 1). Of the remaining infants with longitudinal samples, 54 had negative lytA samples in their early visits and became lytA-positive over time, following the overall trend described above. The remaining six infants had negative lytA samples each time they were tested, though all but 2 of these samples corresponded to less than 2 months of age, again suggesting that the carriage and abundance of lytA-positive organisms is low at a very young age.

Metagenomic Sequencing and Analysis of Streptococcal Carriage
A total of 51 samples were selected for further characterization through metagenomic sequencing in order to identify the various strains colonizing the NP of infants in each country. Samples were selected to represent primarily infants who had the maximum number of longitudinal lytA-positive samples in order to determine the effect of vaccination on pneumococcal population dynamics. Twelve samples were obtained from seven South African infants and 39 samples from 25 Philippine infants. Roughly one-half of the samples belonged to longitudinal samplings ( Table 2). The majority of samples encoded multiple lytA genes in the metagenomic assembly of at least 80% nucleotide identity to the S. pneumoniae reference lytA sequence (NP_359346.1) (range: 1-4 copies, Table 2).

Population Structure of Nasopharyngeal Streptococcus Community
Our metagenomic approach to studying Streptococcus spp. colonizing the nasopharynx allowed a very detailed view of the various organisms that reside in that space. The taxonomic composition of the enriched NP microbiome based on percentage of mapped reads to various reference genomes indicated the predominance of S. pneumoniae in most samples (Figure 1,      RITM, the Philippines; RMPRU, South Africa. S. pneumoniae contigs and assembled length determined by BLAST match to S. pneumoniae reference genomes for each contig (see text for details). Multilocus sequence typing (MLST) sequence types (ST) were assigned using the PubMLST database. The number of lytA copies in each assembly is based on % identity thresholds relative to the S. pneumoniae R6 copy (NP_359346.1). Serotype assignment was generated from in silico typing of WGS metagenomic assemblies (see text for details). nt, non-typeable.
Streptococcus sp. 16S rRNA amplicon sequences comprised between <1 and 33% of reads (Figure 2, Table S3) from the initial NP microbiome sample. The community composition varied greatly in relative abundance of different taxa, but the primary taxa was largely consistent with Dolosigranulum, Haemophilus, Prevotella, and Moraxella being the most prevalent. Other taxa prevalent in a fewer number of samples include Porphyromonas, Finegoldia, and Johnsonella.

Capsular Type Detection and Serotype Prediction
We applied in silico BLAST-based methods to ascertain the capsular type(s) present in these metagenomic samples. Using a criteria of >98% nucleotide identity, 33 samples were assigned a serotype. The most common serotype was 16f (four samples) while the following were encountered three times: 6a, 6b, 16f, 19c, 19f, and 23a (Figure 1, Table 2). The presence of more than one serotype was detected in five infants. Ten samples from The Philippines cohort had capsule types belonging to PCV10 vaccine types, and five of those samples originate from vaccinated infants (i.e., RITM009, RITM059, and RITM071), most of which occurred in infants >10 months of age. However, one VT-serotype sample originated from a 6-week-old infant (RITM043I2). One vaccinated South African infant carried a vaccine type serotype (23f) at 6 weeks, the timepoint for the first PCV7 administration. For longitudinal samples originating from the same infant, only three infants had the same capsule type at more than one visit (RITM052:23A, RITM059:6b, and RMPRU011:16f).
The samples without in silico serotype matches were further interrogated to determine whether nontypeable Streptococcus capsule biosynthetic genes were present by examining sequence content between aliA and dexB, the conserved genes flanking the capsule biosynthetic cluster. The majority of extracted capsule sequences matched at 94-96% identity to several variants of the capsule locus detailed in Park et al. (2012) including the complete S. pneumoniae NT-110-58 genome (CP007593) (Hilty et al., 2014) (Table 2), as well as the complete genomes of S. mitis B6 (FN568063.1) and S. pseudopneumoniae IS7493 (CP002925.1) (Shahinas et al., 2011). Similar sequences (>94% FIGURE 1 | Relative abundance of NP microbiome taxa from metagenomic analysis of enrichment cultures. Abundance is based on normalized read counts mapped to a reference database of Streptoccocus species and other taxa detected in the NP enrichment assemblies ( Table S1). The sample names that indicate infant and sampling time point are provided under the x-axis. Blue lines connecting sample names highlight longitudinal samples originating from the same infant. In silico serotype classification was assigned using a BLAST-based strategy by aligning metagenomic assemblies against a reference database of capsule biosynthetic loci (see Methods for details). Red-colored serotype text indicates a vaccine-type serotype while a red circle depicts which vaccine-type serotype samples came from vaccinated infants. A count of the number of contigs aligning to the nonecapsulated NT_110_58-like capsule locus is given (see text for details). similarity) were also present in the serotypeable samples as well (Figure 1) indicating that they are prevalent and co-exist with S. pneumoniae serotypes.

In Silico MLST Analysis
MLST types were definitively assigned from the metagenomes of twelve samples ( Table 2, Table S4), two of which were from the same infant (RITM059) with the same MLST type (ST473). One additional samples (RITM060I10) represented a novel sequence type comprised of previously classified alleles. South African sequence types matched other ST from South African in the PubMLST isolate database, while Philippine samples were comprised of sequence types from more diverse locations.

Virulence Factors Genes
We predicted metagenomes from S. pneumoniae samples to encode core virulence factors lytA, ply (pneumolysin), nanA (neuraminidase A), hyl (hyaluronidase), pspC (pneumococcal surface protein C), and pavA (pneumococcal adhesion and virulence A) (Hiller et al., 2007). All samples encoded at least one S. pneumoniae virulence factor when compared to reference databases (Table S5; Zhou et al., 2007;Chen et al., 2012). The five representative virulence factors examined here were present in >50% of the metagenomes, where ply was present in almost all samples (98%). Several samples encoded more than one sequence distinguishable copy of hyl, ply, and pavA. One sample that contained both S. aureus, and S. pyogenes encoded a total of 62 virulence factors, including both staphylococcal and streptococcal toxins and complement evasion factors (Tables S5,  S6). The majority of Staphylococcus-containing samples had more than 10 virulence factors including haemolysins and toxins, indicating the presence of fully virulent S. aureus (Powers and Wardenburg, 2014).

Antibiotic Resistance Markers
Twelve samples contained antibiotic resistance genetic determinants (Table S7): nine samples from the Philippines and three from South Africa. Seven samples encoded only one antibiotic resistance marker and two samples encoded 6 or more. Metagenome assemblies from two samples encoded the bla(TEM-1) gene, which is the most common β-lactamase in Gram-negative bacteria (Muhammad et al., 2014). The gene was encoded in contigs with relatively low read coverage was highly similar to Neisseria plasmids (Muhammad et al., 2014). Both TEM-1 samples were obtained from Philippine infants, one from a 6-week visit (RITM077), and the other from the 12-month visit (RITM022). One sample from a 6-week-old infant (RMPRU023I2) encoded the methicillin-resistance gene, mecA. The mecA gene was surrounded by sequences homologous to the transponson involved in mecA mobilization (Katayama et al., 2001), suggesting it was encoded by a mobile element.

DISCUSSION
In this study, we report the use of targeted culture enrichment and metagenomic sequencing to study the dynamics of Streptococcus carriage in the infant nasopharynx in the Philippines and South Africa. A total of 393 samples from 203 infants were analyzed, where the majority of early samples were lytA-negative which is consistent with other studies and with colonization occurring later in life (>4 months) (Coles et al., 2001;Ercibengoa et al., 2012;Turner et al., 2012).
Broth enrichment culture has been demonstrated to be a powerful approach to increasing the sensitivity for detecting the carriage of S. pneumoniae in the upper respiratory tract. When methods are compared on the same samples, the carrier fraction of the samples and the serotype diversity are maximal for the broth enrichment culture (da Gloria Carvalho et al., 2010). Metagenomic sequencing of the entire enrichment culture allowed us to see the range of bacteria that were selected by the enrichment culture protocol. The assembly data suggested that streptococcal enrichment was successful, with Streptococcus sp. reads accounting for an average of 2% of the 16S rRNA reads from the pre-enriched NP community, to an average of 93% of post-enrichment mapped reads. All samples had more than one Streptococcus sp. present including S. pseudopneumoniae and S. mitis. The detection of multiple lytA sequences of varying nucleotide similarity supports the idea that the NP community is colonized by a complex assemblage of Streptococcus organisms. This observation highlights the potential for genetic exchange among closely related Streptococcus sp. as recombination is a well-characterized mechanism for generating genetic diversity within the species (Hanage et al., 2009;Chaguza et al., 2015Chaguza et al., , 2016. Among the other taxa identified genera by 16S rRNA gene analysis in the non-enriched primary sample, were common NP microbiome taxa including Dolosigranulum, Haemophilus, Moraxella, and Prevotella sequences (Bogaert et al., 2011;Perez-Losada et al., 2017). Some studies have suggested that Corynebacterium and Dolosigranulum presence are protective from S. pneumoniae colonization (de Steenhuijsen Piters and Bogaert, 2016), but the limited sample size and general low prevalence of S. pneumoniae in this 16S rRNA data precludes much inference about the relationship. Other taxa enriched in the metagenomic analysis include Staphylococcus, Gemella, and Neisseria indicating that the enrichment protocol shifted the community composition substantially.
The use of lytA for detecting pneumococcus in community acquired pneumonia cases has been documented and is frequently employed as a rapid assay (Abdeldaim et al., 2010). In this study where the subjects were largely free of respiratory infections, the lytA assay detected the presence of S. pneumoniae as a member of the commensal microbiome but also detected other lytA containing streptococcal species in the commensal NP microbiome. Recent screening assays have in fact documented that lytA is not a specific diagnostic gene for S. pneumoniae (Simoes et al., 2016). Undoubtedly the use of a second pneumococcus selective gene would greatly improve the specificity of the assay for use as a rapid pneumococcus diagnostic tool for respiratory infections.
Although the presence of Streptococcus spp. in the nasopharynx of these infant subjects was both common and frequent, it was relatively uncommon for a child to have consistent colonization by the same S. pneumoniae strain. There were only three instances of the same capsular type in samples obtained over 3 months apart. Studies of serotype switching have been focused on such switching events in the context of PCV vaccination (for example see Hanage et al., 2011) but not in such young children. Serotypes related to the vaccine (PCV10 in the Philippines and PCV7 in South Africa) were observed in 11 samples, seven of which came from vaccinated infants. However, two of these samples originated from infants on their first scheduled vaccine administration, while the other five samples came from infants >10 months of age. This highlights the need for further examination of vaccine success in these populations. Multiple samples also had more than one serotype present concurrently, and many encoded both typeable and non-typeable capsule loci. This is consistent with previous studies using different methods, and again highlights the potential for genetic exchange between Streptococcus strains (Kamng'ona et al., 2015). In silico MLST typing indicates that many samples were not typeable, but for those that were, only one infant had the same sequence type more than once ( Table 2). The remaining samples could not be specifically assigned to a single MLST type either because the assembly did not resolve all the loci necessary for typing especially in those cases with co-occurring S. pneumoniae, or because loci had no matches compared to known MLST types.
The 16S rRNA NP longitudinal sampling demonstrated consequential variation between successive samples for the NP community composition in our infants during their first year of life. It is likely that the serotype variation we are observing is a consequence of the inherent instability of the NP microbiome during this early stage of life (Jebaraj et al., 1999;Hohwy et al., 2001;Turner et al., 2011;Ercibengoa et al., 2012). Another striking observation on the NP microbiomes in these infants is the prevalence of potentially pathogenic species acting as commensal members of the young infant NP microbiome. We have noted the presence of pathogenic bacteria in the respiratory tract microbiome of lung transplant patients in the absence of an infection, and often when these patients did present with a pneumonia, the pathogen was earlier detectable as a prior member of the commensal population before the onset of disease (Shankar et al., 2015). In this context it is not surprising that we detected the presence of at least one S. pneumoniae virulence factor in all of the metagenomic enrichment culture samples, with the majority of Staphylococcus-containing samples exhibiting more than 10 virulence factors. Furthermore, our detection of antibiotic resistance genes and mobile elements that can be easily transferred between strains suggests that the infant NP serves as a reservoir for antibiotic resistant potential. These observations are consistent with a hypothesis that in these young infants, potentially pathogenic bacteria are common members of the commensal microbiome and that bacterial respiratory disease does not simply result from the presence of a bacterial respiratory pathogen but is the result of a more complex interaction between the host immune system status and the respiratory tract microbiome. However, the mechanisms behind the activation and phenotypic manifestation of virulence in the early NP microbiome remain unclear.

CONCLUSIONS
The in silico serotype approach here may contribute to serotype analysis of strains isolated from infants that could lead to better data on residual serotypes that constitute the reservoir for future pneumococcal infections post-targeted vaccines to prevent IPD in infants in these countries. In addition, the study revealed the frequent presence of bacterial pathogens in the NP microbiome of these infants with genomes encoding an abundance of virulence and antibiotic resistance elements. Evidence is emerging that the serotypes targeted in the current vaccines are not as protective for young children in developing countries. The serotype tool reported here may contribute to serotype analysis of strains isolated for infants with IPD that could lead to developing better targeted vaccines to prevent IPD in infants in these countries.

ETHICS STATEMENT
The study was approved by the Ethics Committees at both clinical sites and at the J. Craig Venter Institute (JCVI). For the South African cohort, approval was issued by the University of Witwatersrand, Johannesburg Human Research Ethics committee on 2/24/12 and reviewed with approval on 8/6/2013. The J. Craig Venter Institute Institutional Review Board approval was issued on 2/4/2012. For the Philippine cohort, approval was issued on 2/28/2012 by the Research Institute for Tropical Medicine Institutional Review board, assigned number 2012-002. The J. Craig Venter Institute Institutional Review Board approval was issued on 4/4/2012.

AVAILABILITY OF DATA
The WGS data supporting the conclusions of this article are available in GenBank under accession number PRJNA31170 http://www.ncbi.nlm.nih.gov/bioproject/PRJNA311705/. Other concluding datasets can be found within article and its additional files.

AUTHOR CONTRIBUTIONS
LL and MW were the major contributors to study design, performed the analysis, and crafted the manuscript. JM, AG, EB, DH, and JS participated in software tool design and data analysis and performed the statistical analysis and interpretation of the data. StM managed the materials and data exchanges and interactions among the clinical site and JCVI and organized the metadata and participated in editing of the manuscript. ES, AM, BB, SN, SK, ML, and ShM Contributed to study design, and sample and data collection. SK and PA participated in laboratory testing. GS participated in software tool design, and data analysis as well as critically reading the manuscript. KK was instrumental in developing the collaborative interactions with the project's South Africa clinical site and contributed to the coordination of the project with the Philippine clinical site. He provided guidance to the serotyping study design and performed a critical review of the manuscript prior to submission. KN and WN participated in the study design, coordinated the project across the three collaborating sites, and participated in editing the manuscript.

FUNDING
This work was supported by grant OPP1017579 from the Bill and Melinda Gates Foundation.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01661/full#supplementary-material Table S1 | Accession information for reference Streptococcus species and other abundant taxa used to construct nucleotide database for metagenomic read mapping. Table S2 | Taxonomic composition of post-enrichment NP microbiome based on read mapping counts to reference database comprised of most abundant taxa in BLAST-based analysis of metagenomic assemblies.