The sil Locus in Streptococcus Anginosus Group: Interspecies Competition and a Hotspot of Genetic Diversity

The Streptococcus Invasion Locus (Sil) was first described in Streptococcus pyogenes and Streptococcus pneumoniae, where it has been implicated in virulence. The two-component peptide signaling system consists of the SilA response regulator and SilB histidine kinase along with the SilCR signaling peptide and SilD/E export/processing proteins. The presence of an associated bacteriocin region suggests this system may play a role in competitive interactions with other microbes. Comparative analysis of 42 Streptococcus Anginosus/Milleri Group (SAG) genomes reveals this to be a hot spot for genomic variability. A cluster of bacteriocin/immunity genes is found adjacent to the sil system in most SAG isolates (typically 6–10 per strain). In addition, there were two distinct SilCR peptides identified in this group, denoted here as SilCRSAG-A and SilCRSAG-B, with corresponding alleles in silB. Our analysis of the 42 sil loci showed that SilCRSAG-A is only found in Streptococcus intermedius while all three species can carry SilCRSAG-B. In S. intermedius B196, a putative SilA operator is located upstream of bacteriocin gene clusters, implicating the sil system in regulation of microbe–microbe interactions at mucosal surfaces where the group resides. We demonstrate that S. intermedius B196 responds to its cognate SilCRSAG-A, and, less effectively, to SilCRSAG-B released by other Anginosus group members, to produce putative bacteriocins and inhibit the growth of a sensitive strain of S. constellatus.

Host colonization involves competition with resident microorganisms. One mechanism by which streptococci inhibit closely related bacteria is through short peptides called bacteriocins (Dawid et al., 2007). Variable bacteriocin and putative bacteriocin genes are found adjacent to the sil locus in GAS, and these are predicted to be under the regulatory control of SilA (Belotserkovsky et al., 2009). The Streptococcus Invasion locus (sil) system was first identified in Group A Streptococci (GAS), where a transposon insertion in the sil system attenuated virulence in a murine model (Hidalgo-Grass et al., 2002). The core signaling system has been characterized and contains the cell-cell signaling peptide SilCR, a peptide processing/export system (SilD/E), and two component sensing system (SilA/B). Upon sensing pheromone peptide SilCR, the histidine kinase SilB phosphorylates the response regulator SilA, upregulating genes involved in SilCR production. These include the SilCR peptide itself and ABC transporters, SilD and SilE. SilC is encoded on the antisense strand of silCR and its expression represses SilA-inducible genes in GAS (Hidalgo-Grass et al., 2002;Eran et al., 2007). SilA/B transcriptional regulation is uncharacterized, but induction of the sil system is dependent on their presence and expression (Eran et al., 2007). The system also includes a putative CAAX protease, thought to be involved in immunity against or maturation of bacteriocins.
The sil system has been implicated in virulence in GAS (Hidalgo-Grass et al., 2002;Salim et al., 2008). Disruption of silC leads to attenuation in a murine model (Hidalgo-Grass et al., 2002). As SilC represses SilCR induction, the lack of the pheromone peptide appears to favor pathogenesis. However, interspecies communication occurring between GAS and Group G streptococci (GGS) via SilCR peptide positively regulates SilCR and bacteriocins (Belotserkovsky et al., 2009). Thus, regulation of this system in streptococci is complex and can impact pathogenesis in multiple ways.
The sil system has not been characterized in SAG. Here, we describe the sil system in 42 SAG genomes, outlining species-specific variation and identification of novel putative bacteriocins. We note this locus as a hotspot for genetic variability with most strains featuring a large cluster of putative bacteriocin/immunity genes. We demonstrate that the inhibitory activity of SilCR-dependent putative bacteriocins can give S. intermedius a competitive advantage.

Assembly and Strain Cluster Assignment
The in-house draft genomes were all sequenced using NexteraXT libraries and Illumina MiSeq using either 150 or 250 bp paired-end reads. The sequences were assembled using our inhouse genome assembly pipeline [Surette Lab Assembly Pipeline (SLAP)]. SLAP preprocesses the reads by trimming adapters with Cutadapt (Martin, 2011) and performing quality trimming and error correction with sga preprocess and sga correct (Simpson and Durbin, 2012). It then estimates an optimal kmer size using an in-house adaptation of the program Kmergenie (Chikhi and Medvedev, 2014), and calls four separate assemblers [MaSuRCA (Zimin et al., 2013), IDBA (Peng et al., 2011), SPAdes (Bankevich et al., 2012), and Velvet (Zerbino and Birney, 2008)] to produce four candidate assemblies. The assemblies are then scaffolded using SSPACE (Boetzer et al., 2011) and quality metrics are provided to the user via FastQC (Andrews, 2010) for read quality and QUAST (Gurevich et al., 2013) for assembly quality. For our purposes, the best assembly was chosen for each strain based on N50 value and total assembly length. All in-house code is available from the authors upon request.
We assigned our samples to strain clusters following Jensen et al. (2013). We found each of the seven housekeeping genes used in the paper in all of our strains. The SAG accession numbers and tuf (KY275134 -KY275175) are as indicated. All genes were present in all samples with the exception of sodA, which was absent from C1051. These genes were aligned with the trimmed sequences used by Jensen et al. (2013), trimmed to the same length, and a phylogenetic tree was constructed using MEGA 7.0.20 using default parameters with the following exceptions: we used a minimum-evolution model with 500 bootstrap replicates and pairwise deletion for missing data.
Identifying the Putative sil Region in SAG The putative sil system was identified in the published genome of S. intermedius B196 (Accession number: NC_022246.1) based on its structural similarity to the GAS and GGS sil locus. We used megablast (Altschul et al., 1997) to find each gene from this putative sil locus in the 44 additional SAG genomes. We found the genes together on a single contig in 42 genomes (aside from S. intermedius B196) and split across multiple contigs in two genomes (M1 and C424, which we excluded from our analysis). A list of the 42 strains used for our analysis is provided in Supplementary Table S1. The 42 putative sil regions were annotated from the BLAST hits using in-house Python and Biopython scripts (van Rossum and Drake, 2001;Cock et al., 2009) with manual adjustments, and the annotations were visualized and compared using Geneious (Kearse et al., 2012).

Phylogenetic Trees
A phylogenetic tree of the sil locus was generated using MrBayes (Huelsenbeck and Ronquist, 2001;Altekar et al., 2004). Each gene (silA through silD) was aligned individually using MUSCLE (Edgar, 2004) and the alignments were concatenated (with gaps in place of the nucleotides for missing genes), providing a single alignment of all genes. The genes were each assigned their own partition in the MrBayes model, with a 4-by-4 General Time Reversible evolutionary model (Tavare, 1986) and invariant/gamma rates distribution (Yang, 1994) for each partition. Since many sequences were missing individual genes, the presence or absence of each gene was encoded as binary "standard" data in a separate partition (Lewis, 2001). The trees were run until the average standard deviation of the split was less than 0.01 and the consensus tree was visualized using the Interactive Tree of Life (iTOL) (Letunic and Bork, 2016).

Annotating the Bacteriocin Region
The putative bacteriocin region adjacent to the sil region was annotated using the online version of antiSMASH (Blin et al., 2013(Blin et al., , 2014 accessed November of 2013. Only the results of the antiSMASH-internal GLIMMER (Delcher et al., 1999) annotation were used. In order to determine how variable the accessory regions are, the sequences from the GLIMMER hits were translated and preliminary clustering was undertaken using OrthoMCL (Fischer et al., 2011). This produced 16 putative orthologous groups, here referred to as ORF1 through ORF16. To identify any pseudogenes or related ORFs not discovered by GLIMMER, we manually divided each cluster into smaller groups based on the MUSCLE alignment of the cluster (if multiple groups were apparent) and generated a consensus sequence for each group. We searched naively for each of these consensus sequences in all of the bacteriocin regions (tolerant to 20% mismatch at the amino acid level).
We generated a tree for each of the 16 ORF clusters produced by OrthoMCL using FastTree (Price et al., 2009). The location of each putative accessory gene in each strain was marked and patterns of synteny were manually examined.
We searched for each ORF in the non-redundant protein sequences (nr) database with Blastx (Altschul et al., 1997) to identify putative functions. In addition, each ORF was searched against the BACTIBASE database (Hammami et al., 2007(Hammami et al., , 2010 of bacteriocins to identify hits with existing bacteriocins.

Bacterial Culturing Conditions
We cultured S. intermedius B196 and S. constellatus M505 on Todd Hewitt agar with 0.5% yeast extract (THY) and incubated at 37 • C in a 5% CO 2 incubator for 3 days. We inoculated THY broth with colonies and incubated either at 5% CO 2 or anaerobically (5% CO 2 , 5% H 2 , 90% N 2 ) at 37 • C overnight. We used broth cultures to conduct further experiments. THY supplemented with 75 µg/ml spectinomycin (THY-spec) was used to grow our knockout strain (see below). We used Escherichia coli Top10 chemically competent cells (Life Technologies) during cloning. E. coli carrying desired knockout constructs grew on Luria-Bertani agar with 100 µg/mL spectinomycin (LB-spec).

Identification of SilA Binding Sites in S. intermedius B196
Binding of response regulator SilA to direct repeats in GAS and GGS affects expression of sil locus components as well as putative bacteriocins (Hidalgo-Grass et al., 2002). The SilA binding site in GAS (ACCATTCATG-11bp-ACCTTTTAAG) (Belotserkovsky et al., 2009) was used as a query to identify putative sites in S. intermedius B196 using the motif search in Geneious (Kearse et al., 2012). The ClustalW 2.1 alignment of these sites (Goujon et al., 2010) was used to generate an S. intermedius B196 consensus sequence, visualized using WebLogo (Crooks et al., 2004).

SilCR Knockout Construction
We constructed a deletion mutant of silCR in S. intermedius B196. First, we cloned the spectinomycin resistance marker from pDL278 (Dunny et al., 1991) along with its promoter into pUC19 with primers specF and specR as shown in Table 1. We then cloned the upstream and downstream regions of silCR in S. intermedius on either side of a spectinomycin resistance marker in pUC19 using primers SilCRupF, SilCRupR, SilCRdownF, and SilCRdownR (Table 1). We amplified the cassette (silCR upstream: specR: silCR downstream) using PCR with primers SilCRupF and SilCRdownR and purified it. Our laboratory has found that S. intermedius B196 is naturally competent and can be transformed using the competence stimulating peptide, ComC (DSRIRMGFDFSKLFGK) (data not shown). An overnight THY broth culture was diluted 1000 fold in 500 µL THY and incubated for 2 h at 37 • C in 5% CO 2 before adding competence peptide (10 ng) and purified cassette DNA (500 ng). The reaction was incubated in normal growth conditions for an hour before plating the transformation reaction on THY-spec. We incubated the plates for 2 days anaerobically at 37 • C and screened colonies using PCR and sequencing to verify the deletion.

Bacteriocin Activity Assays
To assess bacteriocin activity, we adapted top-agar overlay experiments (Kormin et al., 2001;Maricic and Dawid, 2014). S. intermedius B196 (the bacteriocin producer) and mutant strain were grown in an overnight anaerobic broth culture as previously described. A volume of 4 µL of the overnight culture was spotted on THY agar and incubated in the anaerobe chamber at 37 • C for 2 days. In some cases, synthetic SilCR peptide was spotted along with culture. SilCR peptide was synthesized by RS Synthesis, LLC (Louisville). The amount of peptide added was dependent on the experiment. S. constellatus M505, which lacks the bacteriocin cluster and caax gene, was used as a bacteriocin-sensitive strain. This was grown anaerobically in THY for 24 h. Top agar (THY) was prepared using 1.5% agar as this higher percentage improved the visualization of the zone of clearing. A 100 µL of overnight broth culture of S. constellatus M505 was added to 5 mL molten Top agar and inverted three times before pouring onto the agar plates with S. intermedius spots. Plates were incubated for 1-2 days at 5% CO 2 at 37 • C.

Relative Real Time PCR
To assess SilCR dependent expression of putative bacteriocins, we designed primers for specific genes in the sil locus and reference genes (molecular chaperone dnaK and recombinase recA) ( Table 1). Biological replicates of S. intermedius B196 and mutant were grown and spotted on THY agar as described, with and without synthetic SilCR peptide. After 16 h of growth in the anaerobic chamber at 37 • C, bacterial spots were resuspended in RNAlater Solution (Ambion). Bacterial RNA was purified using enzymatic lysis, TRIzol (Invitrogen) and the RNeasy mini kit (Qiagen) as described in Fei et al., 2016. To eliminate DNA contamination, samples were treated with the RNase free DNase set (Qiagen) and subsequently purified using the RNeasy mini columns (Qiagen). RNA was normalized using readings from the Nanodrop prior to cDNA preparation using the SuperScript R III first-strand cDNA synthesis system for RT-PCR (Invitrogen) as per manufacturer's instructions. SsoFast TM Evagreen R Supermix (Bio-Rad) was used for real time PCR as per manufacturer's instructions. Real-time PCR was conducted on the CFX96 Touch TM Real-time PCR detection system (Bio-Rad). Thermal cycling conditions included an initial 95 • C for 30 s and 39 cycles of 95 • C for 5 s and 55 • C for 5 s. The gene expression level of sample groups was normalized to wild type S. intermedius B196 without SilCR peptide and calculated using the CT method in CFX manager (Bio-Rad).

Identification of the sil System in S. intermedius B196
We have previously sequenced and annotated the genome for the clinically isolated strain S. intermedius B196 (Olson et al., 2013). We identified a region with structural similarity to the sil region previously identified in GAS and GGS. This region contains five of the six genes included in the GAS/GGS sil locus (silA, silB, silCR, silD, and silE), with the same relative positioning and orientation ( Figure 1A). The locus is bounded by a putative D-Ala-D-Ala carboxypeptidase (SIR_RS15545) at one end and sodA at the other. Adjacent to the carboxipeptidase is a conserved hypothetical protein (SIR_RS15550) that has a thioredoxinlike domain. This is followed by 13 small ORFs containing putative bacteriocins, followed by a gene with homology to CAAX proteases. Downstream of the CAAX protease are the sil genes including silA, silB, silCR, silD, and silE. We did not find evidence for a counterpart to silC in S. intermedius B196 based on nucleotide similarity to GAS silC. While there is an open reading frame (ORF) on the antisense strand of SilCR, there is no evidence that it is transcribed or plays any role in sil regulation. More research is required to conclude if this is silC.
In other streptococci, the sil system and putative bacteriocins are transcriptionally regulated by response regulator SilA and the SilCR pheromone peptide (Hidalgo-Grass et al., 2002). The SilA binding site in GAS (ACCATTCATG-11bp-ACCTTTTAAG) was used to find putative SilA binding sites in the S. intermedius B196 sil locus, as highlighted in Figure 1A. Direct repeats were found upstream of silCR, silE, caax, SIR_RS15550 and in the accessory region upstream of ORF1, ORF15, and ORF16. The predicted operator was conserved in all of these genes except CAAX protease, which had inconsistencies in the second repeat ( Figure 1B).
A schematic for the hypothetical regulation of the sil system in SAG is shown in Figure 1C. The predicted regulatory network functions as follows. SilCR is produced by SAG species and exported. Sensing of extracellular SilCR peptide induces SilB to autophosphorylate and phosphorylate response regulator SilA. SilA, in turn, induces expression of the seven genes indicated in Figure 1B. Induction of these genes is predicted to amplify the response via SilCR dependent autoregulation and induce production of bacteriocins which can inhibit closely related bacteria.

Bacteriocin Activity in S. intermedius B196 is Regulated by SilCR
We investigated the role of the sil system in inter-species competition using S. intermedius B196 as a model. A putative SilA binding site was found upstream of putative bacteriocins ORF1, ORF15, and ORF16 (Figure 1). To investigate whether SilCR regulates bacteriocin expression in S. intermedius B196, we constructed a silCR deletion mutant. We assayed bacteriocin activity by spotting S. intermedius B196 and its mutant on THY agar and after growth, applying a top agar overlay of a sensitive strain, S. constellatus M505. Strain M505 was chosen because it lacks the sil accessory region (see below) and we therefore predicted it would lack immunity genes and be sensitive to SilCR-dependent bacteriocins produced by B196. When M505 was overlaid in top agar over B196, a clear zone of growth inhibition was observed (Figure 2A). This activity is lost in the B196 silCR mutant (S. intermedius B196 silCR; Figure 2A). Exogenous addition of the synthetic SilCR peptide from B196 (SilCR SAG-A ) restored the wild type phenotype, demonstrating that SilCR induces production of an inhibitor of strain M505.
In order to identify whether deletion of silCR affected gene expression of the putative bacteriocins ORF1 and ORF16, we conducted relative real-time PCR on the wild type and mutant strains (Figure 2B). We used spotted B196 and B196 FIGURE 2 | Bacteriocin-mediated competition in S. intermedius is controlled by the SilCR peptide. The gene for the pheromone peptide SilCR was deleted in S. intermedius B196 (S. intermedius B196 silCR). (A) Competition in the wild type and mutant was ascertained using an overlay of sensitive strain S. constellatus M505. The two identified SAG SilCR peptides were exogenously added to wild type and mutant and the diameter of the inhibition zone measured as shown. (B) Relative reverse transcription PCR on B196 and B196 silCR for experiment shown in (A) without the M505 overlay ( * p < 0.005). (C) A concentration dependent induction of bacteriocin activity by SilCR peptides A and B in S. intermedius B196. Note the B196 colony diameter is ∼6.5 mm. SilCR SAG-A induced zones are significantly larger than those with SilCR SAG-B ( * p < 0.005). silCR with and without exogenously added SilCR SAG-A for our analysis. Deletion of silCR downregulated expression of gene silE and putative bacteriocins ORF1 and ORF16 in the samples tested. Addition of SilCR SAG-A to the knockout induced expression of ORF1, ORF16 and silE by 61-, 43-, and 138-fold respectively; demonstrating that SilCR regulates expression of putative bacteriocins ORF1 and ORF16 and these are putatively involved in inhibition of S. constellatus M505.

Comparative Genomic Analysis of the sil Locus Separates SAG Strains into Two silCR Peptide Groups
We investigated the distribution of the sil system in SAG. Forty four SAG strains were used in our analysis: a combination of in-house sequenced genomes and those available online. Only genomes containing the sil genes on a single contig were analyzed (42 of the 44 genomes; as listed in Supplementary Table S1). We used the multilocus sequence analysis (MLSA) described by Jensen et al. (2013) to assign our strains into seven designated clusters for SAG. Our MLSA tree reproduced the clusters found by Jensen et al. (2013) with our new strains neatly nested within six of the seven clusters. Strain cluster assignments are included in Figure 3 and the MLSA phylogeny is shown in Supplementary  Figure S3.
Initial investigation of the sil sequences divided SAG strains into two groups. In 24 of the 42 genomes, both silB and silCR appeared truncated; silB at the 5 end and silCR at the 3 end. Further examination, however, found that this 5 end of silB was well-conserved in 22 of the 24 strains. In addition, results from a BLAST search of this region showed high identity with histidine kinases, which is consistent with the putative function of SilB. Indeed, the gene had been annotated as such in at least one of the published genomes in our data set (protein accession YP 008494978 from genome C232), implying that we had found a variation of silB. These 22 strains also had a well-conserved region following their truncated silCR. A stop codon located 150 bp from the start of silCR was used to define the variation of putative silCR. Thus, we identified two distinct SilCR peptides in SAG, SilCR SAG-A and SilCR SAG-B (previously truncated version), with predicted mature amino acid sequences GWLEDLLKHFSGYNSLTKGDSNNTLG and GWLE DLFSPYLKKYKLGKLGQPDLG, respectively. Notably, each SilCR peptide allele associated exclusively with a corresponding variant of histidine kinase SilB. This suggests that the response by a strain could be dependent on the kinase variant. The conserved FIGURE 3 | sil locus heterogeneity in the Streptococcus Anginosus Group (SAG). A tree was constructed based on sil genes in 42 strains. Six locus arrangements were found (shown in locus arrangement legend with the locus annotation). Hashes (//) in the annotation indicate a putative bacteriocin accessory region. Two SilCR peptides were identified, shown in the peptide legend. SAG species text is color coded (S. intermedius in orange, S. anginosus in gray, and S. constellatus in green). The accessory region ORFs are shown with the tree. Some ORFs have several versions, indicated by the letter inside the ORF. Genes that have acquired mutations leading to pseudogenes in some strains and are indicated in the figure.
5 end of SilCR propeptide is hypothetically cleaved at the double glycine residues to produce the mature SilCR peptide.
To investigate whether S. intermedius B196 could respond to the alternative peptide, SilCR SAG-B was added to the knockout (Figure 2A). Despite high peptide concentrations, SilCR SAG-B induced a smaller inhibitory zone in M505 than did SilCR SAG-A (8 mm vs. 11 mm). To further test the peptide dependent bacteriocin induction of strain B196, several concentrations of each peptide were used. We found that higher concentrations of SilCR SAG-B were required to produce inhibitory zones comparable to those produced by SilCR SAG-A (Figure 2C). Thus, while strain B196 can sense and respond to both pheromone SilCR peptides produced by SAG strains, its inhibitor production is peptide-and concentration-dependent requiring high concentration of the non-cognate signaling peptide to elicit a response.
To investigate the phylogeny of the sil locus in the 42 SAG genomes, the gene presence, sequence, and orientation were used to construct a tree (Figure 3). The tree of sil genes corresponded well with SAG species. Six different organizations of the sil locus were noted in SAG, as depicted in the locus arrangement legend in Figure 3. The type of SilCR peptide the strain produced (SilCR SAG-A vs. SilCR SAG-B ) is also indicated in the figure. Strains encoding SilCR SAG-A have low diversity overall and are monophyletically nested within S. intermedius, while all three species can encode SilCR SAG-B . In 32 of the 42 sil loci, we identified the canonical arrangement of the sil locus (locus arrangement A in Figure 3). This arrangement is found in strains carrying SilCR SAG-A and SilCR SAG-B and is the arrangement seen in S. intermedius B196 (Figure 1A). The additional sil locus arrangement groups are characterized by the loss of genes and when SilCR is present, only included SilCR SAG-B . Locus group B is defined by an intact sil region but is altered next to silE with sodA absent and includes a single S. anginosus strain. The other arrangements of the sil region (C-D) are represented by deletions leading to loss of core sil genes. Group C consists of a single S. constellatus strain and is defined by missing silE and sodA. Group D is represented by one S. constellatus strain and one S. intermedius strain and is characterized by missing a number of sil locus components including the putative bacteriocin region, CAAX protease and silA response regulator. Group E is composed of only S. anginosus strains, and is characterized by the lack of all sil genes and accessory proteins except silD, silE, and sodA, with an inversion in silD and silE relative to sodA. Group F includes a single S. anginosus strain and is identical to Group E but missing silE. Overall, the S. anginosus strains had the most variation in composition of the sil locus with all locus groups represented except Groups C and D.
We identified a few strains where one or more sil genes had undergone inactivating mutations. A number of these pseudogenes appear to be the result of a few single mutation events and are consistent with the sil tree (Figure 3). In strains SK54, ATCC27335, and JTH08 (sister taxa in our sil gene phylogeny), silCR has a single-nucleotide deletion at base 12. In strains C1366 and C1384 (also sister taxa), silCR has a different single-nucleotide deletion at base 14. The sil system is otherwise intact in these strains. In strains C188, C232, and C818 (three of six sisters in an unresolved node in our phylogeny), the secretion protein silD has a single-nucleotide deletion at base 1314. These strains also have otherwise-intact sil regions. In addition to these, a small number of individual strains do not share sil gene inactivation events with other strains in our data set. Strain M410 has a partial sil region (containing silE, silD, and sodA only) and a single nucleotide deletion at base 972 in silE. Strain CCUG39159 has an intact sil region and an SNP that introduces a stop codon at base 1342. Strain M47 appears to have undergone a larger genome rearrangement event, with a 70 Kb region intervening between partial silD and silE genes. Strains M423, 1_2_62CV, and C270 have different frameshift mutation events in non-sil genes (SIR_RS15550 for the first, CAAX for the latter two). Finally, strain SK1060 appears to be degenerate in several places; the SIR_RS15550 homolog contains a single nucleotide insertion at base 95, while CAAX, silA, silB, and silD contain deletions at bases 558, 18, 136, and 1107 respectively.

Heterogeneity in the sil Accessory Bacteriocin Region Implies Strain Specific Competition
The Streptococcus sil system controls expression of competitive bacteriocins in GAS (Eran et al., 2007;Belotserkovsky et al., 2009;Armstrong et al., 2016). In S. intermedius B196, we observe that bacteriocin activity is dependent on this signaling system. We discovered a highly variable accessory region in all strains carrying sil locus groups A-C (34 of the 42 genomes). We investigated the hypothetical ORFs in this region for putative bacteriocin and immunity genes. Sixteen distinct orthologous groups were identified using OrthoMCL. A detailed similarity comparison within each orthologous group is included in the Supplementary Similarity Files. The amino acid identity heatmap for each ORF is shown in the Supplementary Figures. In general, while some ORFs are highly conserved within SAG, others are not. For example, ORF13 is conserved with 97% or higher amino acid identity across 28 strains while ORF2 is clearly divided into two groups with one group having 44% amino acid identity to the other. Using this classification scheme, it was apparent that some strains had two copies of predicted ORFs (e.g., ORF4 in S. anginosus 1_2_62CV; Figure 3). It is important to note that the two copies are not identical and, given their patterns of synteny, it is not likely that they are functionally equivalent.
We assigned putative functions to the 16 ORFs identified based on similarity to hits in the non-redundant protein sequence database using BlastX (NCBI). These are summarized in Table 2. In total, six putative bacteriocins were identified (ORF1, ORF2, ORF6, ORF8, ORF15, and ORF16). Of the six bacteriocins, two were highly similar to existing bacteriocins (ORF2 is similar to bovicin 255 and ORF15 is similar to ThmA bacteriocin). The non-bacteriocin ORFs may have chaperone or immunity functions. ORFs 4 and 7 had high similarity with a bacteriocin secretion protein and a bacteriocin immunity gene, respectively. ORF3 has a putative role in replication. ORF14 shares some nucleotide identity with a tRNA (cytidine/uridine-2 -O-)-methyltransferase. It is unknown whether ORF14 serves a similar function in SAG. Six of the putative ORFs did not have predicted functions (ORFs 5,9,10,11,12,and 13). Characterization of these genes is required to gain insight into their role. However, some speculation based on the patterns of co-occurrence with other genes is possible. Several of the strains have acquired mutations in some of these accessory genes that has led to pseudogenes. These include two predicted bacteriocin genes (ORF1 and ORF2), several genes with no assigned function (ORF3, ORF9, ORF10, and ORF12) and one gene assigned to bacteriocin secretion function (ORF4). No pseudogenes were detected in any of the predicted immunity genes. There were some clear patterns of synteny in the accessory regions of the sil loci (Figure 3). These became more apparent after taking the subclusters of each ORF into consideration. We describe the syntenic groupings of ORFs as "ORF sets" (OS) because the operon structures have not been established for these strains. In combination with our putative functional assignments, we attempted to characterize these ORF sets (Supplementary Figure S1) in order to group bacteriocins with co-occurring genes that may be functionally associated with them. All ORF sets except OS IX contain at least one putative bacteriocin (Supplementary Figure S1). Three ORFs occurred only once each across all strains and are labeled "other" (Figure 3). Putative bacteriocin ORF1A is exclusively found adjacent to ORF5A, while ORF1B is separated from ORF5B by bacteriocin ORF2 and its associated genes. The two versions of putative bacteriocin ORF2 were found in two distinct clusters, OS III and OS V. ORF2 is always associated with a version of the putative secretion protein, ORF4. It is also always adjacent to either the putative immunity gene ORF7 or the unknown ORF10, suggesting an immunity function for the latter. Putative bacteriocin ORF6 is always found adjacent to ORF3, implying ORF3 may confer immunity against ORF6. A version of ORF3 is also found associated with ORF2 in OS V. It may be that ORF3, rather than ORF7, confers immunity to ORF2A. Putative bacteriocin ORF8 was found adjacent to ORF12 in most strains (although ORF12 has been mutated to a pseudogene in some of these strains). ORF12 was not present in strains lacking ORF8 and may therefore be only required in the presence of ORF8. Uncharacterized ORF9 is associated with two putative bacteriocins (ORF15 and 16) in two separate ORF sets (OS I and OS II). In each case, the genes are also associated with a hypothetical protein of unknown function (ORFs 11 and 13, respectively). None of the strains carried all six putative bacteriocins, implying that each strain may be susceptible to at least one bacteriocin. A number of S. intermedius strains had five predicted bacteriocins (Figure 3) implying that this species has high competitive potential. S. constellatus C1367 and S. anginosus C423 also had five predicted bacteriocins.
The arrangements of ORF sets within the accessory regions correspond imperfectly with the sil tree, indicating a large amount of recombination or horizontal gene transfer is occurring within this region. OS I is found in a majority of strains. The six strains lacking OS I form a monophyletic group with low genetic distance in the sil region, indicating a recent loss event. Areas of apparent recombination are highlighted with pink boxes in Figure 3. OS I occurs at one end of the bacteriocin accessory region in the strains where it is present. OS II is found on five genomes, in two positions relative to other ORF sets. OS IV typically appears at the downstream end of accessory regions where it is found (except in strains C1369 and B196), while OS III may be at the downstream end or may be upstream of OS VI. OS V occurs either upstream of OS II or OS III. In either case it is directly downstream of OS I.

DISCUSSION
In GAS and GGS, the Streptococcus sil system senses and responds to pheromone peptide SilCR with induction of endogenous SilCR production and expression of bacteriocins. In some streptococci, expression of silCR is inhibited by expression of silC, which is encoded on the antisense strand at the 3 end of silCR. In those organisms, it has been shown that silC expression can repress SilCR-activated promoters (Hidalgo-Grass et al., 2002;Eran et al., 2007). No conserved silC ORF across all silCR+ strains of SAG was found and more research is required to identify if there is a silC variant in SAG.
The prevalence of the sil locus within GAS is low, with only 4 out of the 19 fully sequenced genomes of GAS carrying it. Remnants of the locus have been left behind in strains that do not (Kizy and Neely, 2009;Michael-Gayego et al., 2013;Jimenez and Federle, 2014). The prevalence is higher in GGS, with all sequenced strains carrying a functional SilCR gene (Belotserkovsky et al., 2009;Michael-Gayego et al., 2013). Our analysis of SAG showed that the majority of the strains sequenced carry the locus. Locus arrangement Groups A and B in Figure 3 shows that all components of the locus are present in most SAG strains, although individual genes do appear to be degenerate in some strains. The other groups included few strains and, like GAS, showed loss of some sil genes. Further study is needed to determine whether the lack of a fully functional sil system affects a strain's competitive fitness or ability to colonize or invade.
In SAG, considerably higher genetic heterogeneity was seen in the bacteriocin accessory region than in the sil gene region. S. anginosus displayed the most variation in the presence of sil genes, implying that the locus may not be under stabilizing selection and may not be required for competition in this species. Conversely, most S. intermedius and S. constellatus strains were included in locus group A, suggesting that the sil locus may be more competitively necessary in these species, at least in the clinical context from which these strains were isolated.
The bacteriocins identified in the SAG sil locus have not been characterized or directly associated with intra-or interspecies competition; however, the identification of six putative bacteriocins within the sil locus in SAG is a new finding. Bacteriocins can mediate competition with closely related bacteria and can provide a competitive advantage during in vivo colonization experiments (Dawid et al., 2007). ORF2 is a homolog of bovicin 255, which has been shown to inhibit growth of select streptococci (Whitford et al., 2001). ORF15 is homologous to thermophilin 13 bacteriocin (ThmA), which is produced by S. thermophilus and can inhibit a broad range of Gram-positive bacteria including spore formers (Marciset et al., 1997). In addition to these genes, four more putative bacteriocins were identified and remain to be characterized. The species selectivity of these putative bacteriocins in SAG is currently being investigated.
Interspecies SilCR induction of SilA-responsive genes has been described previously between GAS and GGS (Belotserkovsky et al., 2009). Inter-and intra-species competition in streptococci can occur in a polymicrobial environment such as the oral cavity. The hypothetical mature SilCR SAG-A and SilCR SAG-B peptides are 26 and 25 amino acids respectively, while the mature GAS peptide is 17. Both SAG immature peptides have the same six amino acids at their 5 end but vary at their 3 end. The SAG SilCR peptides are not very similar to the GAS and GGS SilCR peptides in either sequence or length and it remains to be determined if these species' sil systems are cross-reactive.
Our data demonstrates that S. intermedius B196 can sense and respond to SilCR SAG-B , which can be carried by all three SAG species; however, a much higher concentration of the foreign peptide SilCR SAG-B was required to rescue the phenotype in B196 silCR than the B196-native version (SilCR SAG-A ). This implies a divergent co-evolution of the silCR and silB genes in the SAG-A clade, with selection for self-detection.
SilCR expression has been shown to attenuate virulence in a mouse model of necrotizing fasciitis (Hidalgo-Grass et al., 2002, 2004. However, SilA has also been shown to promote expression of virulence-associated genes including streptolysin S, iron transporter SiaA and serine protease ScpC (Salim et al., 2008). It is unknown whether the sil system in SAG affects its pathogenicity. This system is unusual in that it can contribute to both infection in the host and bacterial competition depending on the environment. Given the host-specificity of SAG, and the commensal and pathogenic roles it can play, further analysis of this system could deepen our understanding of SAG competition and its associated clinical diseases.

AUTHOR CONTRIBUTIONS
All authors listed have made substantial, direct and intellectual contribution to the work and approved it for publication.

ACKNOWLEDGMENT
We would like to thank members of the Bowdish and Surette labs for helpful discussion.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.02156/full#supplementary-material