Streptococcus pyogenes Causing Skin and Soft Tissue Infections Are Enriched in the Recently Emerged emm89 Clade 3 and Are Not Associated With Abrogation of CovRS

Although skin and soft tissue infections (SSTI) are the most common focal infections associated with invasive disease caused by Streptococcus pyogenes (Lancefield Group A streptococci - GAS), there is scarce information on the characteristics of isolates recovered from SSTI in temperate-climate regions. In this study, 320 GAS isolated from SSTI in Portugal were characterized by multiple typing methods and tested for antimicrobial susceptibility and SpeB activity. The covRS and ropB genes of isolates with no detectable SpeB activity were sequenced. The antimicrobial susceptibility profile was similar to that of previously characterized isolates from invasive infections (iGAS), presenting a decreasing trend in macrolide resistance. However, the clonal composition of SSTI between 2005 and 2009 was significantly different from that of contemporary iGAS. Overall, iGAS were associated with emm1 and emm3, while SSTI were associated with emm89, the dominant emm type among SSTI (19%). Within emm89, SSTI were only significantly associated with isolates lacking the hasABC locus, suggesting that the recently emerged emm89 clade 3 may have an increased potential to cause SSTI. Reflecting these associations between emm type and disease presentation, there were also differences in the distribution of emm clusters, sequence types, and superantigen gene profiles between SSTI and iGAS. According to the predicted ability of each emm cluster to interact with host proteins, iGAS were associated with the ability to bind fibrinogen and albumin, whereas SSTI isolates were associated with the ability to bind C4BP, IgA, and IgG. SpeB activity was absent in 79 isolates (25%), in line with the proportion previously observed among iGAS. Null covS and ropB alleles (predicted to eliminate protein function) were detected in 10 (3%) and 12 (4%) isolates, corresponding to an underrepresentation of mutations impairing CovRS function in SSTI relative to iGAS. Overall, these results indicate that the isolates responsible for SSTI are genetically distinct from those recovered from normally sterile sites, supporting a role for mutations impairing CovRS activity specifically in invasive infection and suggesting that this role relies on a differential regulation of other virulence factors besides SpeB.


INTRODUCTION
Streptococcus pyogenes (group A streptococcus, GAS) is responsible for a variety of human infections ranging from mild and frequent diseases, such as pharyngitis and cutaneous infections, to more severe and rare invasive infections including sepsis, necrotizing fasciitis and streptococcal toxic shock syndrome (Walker et al., 2014). In 2005, the estimated global incidence of invasive GAS infections (iGAS) was 663,000 new cases, resulting in 163,000 deaths, while at least 111 million children under 15 years suffered from pyoderma, mostly in developing countries, and pharyngitis incidence was estimated at over 616 million cases (Carapetis et al., 2005). Even though mild infections are usually self-limited, they may play a crucial role in transmission. Furthermore, the nasopharyngeal mucosa and the skin can also be asymptomatically colonized representing primary reservoirs of GAS (Cunningham, 2000).
The gold-standard typing methodology of GAS is emm typing, which relies on the variability of the amino acid sequence of the N-terminal portion of S. pyogenes major virulence factor: the M protein (McMillan et al., 2013). The sequence of the 5 ′ variable region of the emm gene encoding the M protein determines the emm type, of which there are more than 250 distinct variants (https://www.cdc.gov/streplab/m-proteingenetyping.html). However, emm typing is based on the sequence of only approximately 10-15% of the complete emm gene. Recently a new classification was proposed based on emm clusters established by phylogenetic analysis of the entire sequence of the emm gene of 175 different emm types (Sanderson-Smith et al., 2014). Each cluster contains isolates with closely related M proteins that share binding motifs to host proteins and other structural properties. Since isolates with the same emm type encode nearly identical M proteins, the emm cluster can be inferred from the emm type.
Associations between certain emm types and specific disease presentations have been established. Of particular importance is the association of iGAS with a contemporary emm1 clone, frequently designated as M1T1, which has persisted for decades as the major cause of invasive disease in most developed countries (O'Loughlin et al., 2007;Aziz and Kotb, 2008;Luca-Harari et al., 2009;Friães et al., 2012). Recently, the emergence of a specific emm89 clade (clade 3) that rapidly outcompeted the previously circulating emm89 strains was reported in multiple countries and associated with an increase in the prevalence of emm89 among GAS infections (Friães et al., 2015a;Turner et al., 2015;Zhu et al., 2015). Isolates from clade 3 are characterized by the absence of the hasABC locus encoding the hyaluronic acid capsule of GAS, and by a variant nga-ifs-slo locus, similar to the one present in M1T1 strains, which is associated with increased expression of NAD-glycohydrolase (NADase) and streptolysin O (SLO) (Turner et al., 2015;Zhu et al., 2015).
Despite the success of emm typing, some studies have suggested it is not enough to identify GAS clones and that it must be complemented with other typing methods such as multilocus sequence typing (MLST), superantigen (SAg) gene profiling (Carriço et al., 2006;Friães et al., 2013a), and, more recently, whole genome sequencing (Carriço et al., 2013). However, variability in key virulence factors and regulators within clones defined by these typing methods may have important consequences for the virulence of a particular isolate.
A mouse model of skin and soft tissue infection (SSTI) showed that mutations in the covRS two component system were a key step for the transition from a localized to a systemic infection (Sumby et al., 2006). Consistently, several studies reported covRS mutations in isolates recovered from human infections (Engleberg et al., 2001;Hasegawa et al., 2010;Ikebe et al., 2010;Lin et al., 2014;Friães et al., 2015b). In these isolates the downregulation of SpeB expression, a potent extracellular cysteine protease, is thought to be fundamental toward the switch to a hipervirulent phenotype (Aziz et al., 2004;Kansal et al., 2010). Transcription of speB is also under direct control of RopB (Carroll and Musser, 2011), and naturally occurring mutations in ropB were also shown to impair SpeB production (Hollands et al., 2008;. We reported previously that mutations resulting in the truncation of CovS, which presumably impaired its function, were significantly overrepresented among iGAS compared to pharyngitis in Portugal. However these were only present in 10% of invasive isolates, not explaining why most isolates caused invasive infections. Additionally, among all studied isolates, which included invasive and pharyngeal isolates, 20% had no detectable SpeB activity but no significant association was detected between the presence or absence of SpeB activity and the type of infection (Friães et al., 2015b).
In order to cause a wide spectrum of disease, GAS has to be able to adapt to different environments in the host and despite decades of research there is still no consensus regarding which molecular or phenotypic properties are responsible for an enhanced invasive potential of certain lineages. Although the nasopharyngeal mucosa is usually considered as the main source of isolates causing iGAS in developed countries (Fiorentino et al., 1997), SSTI are commonly reported as the predominant foci associated with invasive disease (Lamagni et al., 2008). This raises the possibility that strains adapted to infect the skin have an increased ability to invade and survive in deeper tissues. However, there is scarce data about the characteristics of GAS isolates responsible for SSTI, especially in developed, temperate climate regions. Most of the studies from these regions report the characteristics of SSTI isolates together with GAS from other non-invasive sites (mostly pharyngeal swabs) when comparing invasive and non-invasive disease (Descheemaeker et al., 2000;Ekelund et al., 2005;Rivera et al., 2006). A few others specify the molecular characteristics of the SSTI isolates subset, but are limited in the number of isolates or are restricted to short time periods (Kittang et al., 2008;Mijač et al., 2010;Vähäkuopus et al., 2012;Tamayo et al., 2014). In this study we characterized 320 isolates from SSTI recovered in Portugal during [2003][2004][2005][2006][2007][2008][2009] for their susceptibility to a panel of antimicrobials, emm type, SAg profile, and MLST. The genes conferring resistance to selected antimicrobials were also investigated. In addition, all 320 isolates were tested for SpeB activity and in those without detectable activity we sequenced the covRS and ropB genes to document any mutations. The SSTI isolates from 2005 to 2009 presented substantial differences relative to a collection of previously partially characterized iGAS isolates recovered in the same period in Portugal (Friães et al., 2007(Friães et al., , 2013b. The prevalence of mutations impairing CovRS function was found to be lower when compared with that previously reported among iGAS isolates in Portugal (Friães et al., 2015b).

Bacterial Isolates
For this study, 24 hospital laboratories distributed throughout Portugal were asked to send us, on a voluntary basis, all GAS isolated from SSTI between January 2003 and December 2009. The study was approved by the Institutional Review Board of the Centro Académico de Medicina de Lisboa. These were considered surveillance activities and were exempt from informed consent. All methods were performed in accordance with the relevant guidelines and regulations. The data and isolates were de-identified so that these were irretrievably unlinked to an identifiable person. Participation was low in the first 2 years (n = 17 in 2003, n = 8 in 2004), but subsequently increased (average n = 59/year, range 48-77) (Dataset 1 available at http://dx.doi.org/10.6084/m9.figshare. 6736313). Overall, 320 non-duplicate GAS isolates from SSTI were recovered and included in the study: 306 from skin and soft tissue exudates (pus) and 14 from skin and soft tissue biopsies. Identification of isolates was performed by colony morphology, β-hemolysis on blood agar, and the presence of the characteristic Lancefield group A antigen (OXOID, Basingstoke, UK). In addition, 247 non-duplicate iGAS isolates recovered in the same hospitals during 2005-2009, which had been partially characterized previously (Friães et al., 2007(Friães et al., , 2013b, were also included (Dataset 2 available at http://dx.doi.org/10.6084/m9. figshare.7016663). Strain SF370 was obtained from Colección Española de Cultivos Tipo (CECT5109). Strains were grown at 37 • C in Todd Hewitt broth (THB) (BD, Sparks, MD, USA) or in Tryptone Soy Agar (Oxoid, Basingstoke, UK) supplemented with 5% defibrinated sheep blood.

Antimicrobial Susceptibility Testing and Genetic Determinants
Susceptibility tests were performed by disk diffusion according to the guidelines of the Clinical and Laboratory Standards Institute (CLSI) (Clinical Laboratory Standards Institute, 2017) for penicillin, vancomycin, erythromycin, levofloxacin, tetracycline, chloramphenicol, clindamycin and linezolid (Oxoid, Basingstoke, UK). E-test strips (BioMérieux, Marcy l'Etoile, France) were used for MIC determination in cases of intermediate susceptibility and to confirm resistance when ≤5 isolates were resistant to a particular antimicrobial. Determination of macrolide resistant phenotype was performed as previously described (Melo-Cristino and Fernandes, 1999). A multiplex PCR reaction for erm(B), erm(A), and mef genes was used on macrolide resistant isolates to identify the resistance conferring genes (Figueira-Coelho et al., 2004). The mef positive isolates were further analyzed in order to distinguish between mef (A) and mef (E) (Silva-Costa et al., 2008). The tetracycline resistance genotype was determined for resistant isolates by a multiplex PCR for the genes tet(K), tet(L), tet(M) and tet(O) (Trzcinski et al., 2000).

Molecular Typing
The emm typing was performed according to the protocols and recommendations of the Center for Disease Control and Prevention (CDC) (https://www.cdc.gov/streplab/protocolemm-type.html).
The sequence of the covRS and ropB loci of all SSTI isolates that presented no proteolytic activity (see below) was determined as previously described (Friães et al., 2015b). For each isolate, the sequences were assembled and compared with the corresponding regions of the genome of strain SF370 (GenBank AE004092), considered as the reference wild-type alleles. Isolates were considered to carry null alleles if the changes found were predicted to result in absence of a functional protein due to nonsense mutations or frameshifts. All new covRS and ropB sequences identified in this study were deposited in GenBank (accession numbers MH537795-MH537849).

Determination of Proteolytic Activity and SpeB Expression
All isolates were screened for detectable SpeB activity using a plate assay, as previously described (Friães et al., 2015b). Briefly, single GAS colonies were stab-inoculated into fresh plates of medium containing 0.5-strength Columbia broth, 3% w/v skim milk (BD, Sparks, MD, USA), and 1% w/v agar (Oxoid, Basingstoke, UK). A strain was considered to show proteolytic activity when it presented a translucent zone of size similar to the one of strain SF370 after 24h incubation at 37 • C, in three independent assays. On the contrary, a strain was considered not to show proteolytic activity if it did not produce a translucent halo in any of the three assays. For strains in which the results of the three assays were inconclusive or not consistent, detection of SpeB by Western blot was performed (Friães et al., 2015b) and this was considered the final result.

Statistical Analysis
Simpson's index of Diversity (SID) with respective 95% confidence intervals (CI 95% ) was used for the analysis of the typing methodologies and to evaluate the allelic diversity of the covR, covS, and ropB genes found in this study (Carriço et al., 2006). Two-tailed Fisher's exact test and odds ratios were used to identify significant pairwise associations. Overall differences in the distribution of typing characteristics between infection types was evaluated by the χ 2 -test. The Cochran-Armitage test was used to evaluate trends. Only characteristics grouping ≥10 isolates were considered in statistical tests, with all other isolates grouped together in a single group. The p-values for multiple tests were corrected using the false-discovery rate (FDR) linear procedure (Benjamini and Hochberg, 1995). A p-value < 0.05 was considered significant for all tests.
Macrolide resistance was detected in 33 SSTI isolates (10%) of 12 different emm types (Dataset 1), with a significant decreasing trend during the years of the study (p < 0.001) (Supplementary Figure 1). The majority of these isolates presented the cMLS B phenotype (constitutive resistance to both erythromycin and clindamycin) and carried the erm(B) gene (n = 22). The remaining macrolide resistant isolates (n = 11) exhibited the M phenotype (resistance to erythromycin and susceptibility to clindamycin) and carried the mef (A) gene, except for one isolate that was positive for the mef (E) gene.
Tetracycline resistance was detected in 47 SSTI isolates (15%) comprising 25 emm types (Dataset 1) with no significant temporal trend (Supplementary Figure 1). The majority of these isolates (n = 42) presented only the tet(M) gene, while three isolates carried both tet(M) and tet(L) and two isolates presented solely the tet(O) gene. Resistance to tetracycline and macrolides was simultaneously detected in 12 isolates, of which 9 presented the cMLS B phenotype.
The emm types were distributed into 15 emm clusters (Table 1 and Figure 1). The majority of SSTI isolates belong to emm clusters from clade X (n = 189, 59%).
The characteristics of SSTI isolates were compared with those from contemporary iGAS isolates. Given the reduced number of SSTI isolates in the first 2 years of the study, this comparison was performed only for the SSTI and iGAS isolates recovered from 2005 to 2009 (Dataset 1 and Dataset 2). Significant differences were detected in the overall distribution of emm types (p < 0.001), as well as in the prevalence of specific emm types (Supplementary Figure 2) with the SSTI subset presenting a higher SID than the iGAS subset ( Table 1, p = 0.014). In agreement, 24 emm types accounting for 36 isolates were identified exclusively among SSTI, while seven emm types accounting for seven isolates were found only in iGAS. While emm1 and emm3 were significantly overrepresented among iGAS (p < 0.001 and p = 0.002, respectively), emm89 was significantly overrepresented among SSTI (p = 0.006). However, when stratifying the emm89 isolates as to the presence or absence of the hasABC locus (Friães et al., 2015a) (Dataset 1), only the isolates lacking the locus were associated with SSTI (p = 0.028). All these differences were still significant after FDR correction. When exploring possible changes in time from 2005 to 2009, there was a significant increase of emm89 among SSTI (p = 0.012), while among iGAS the increase of emm89 was not statistically supported after FDR (Dataset 1 and Dataset 2). In both infection types, this was underpinned by an increase in the proportion of the acapsular emm89 isolates (p < 0.001), while there was no significant change in time of emm89 isolates with the hasABC locus in either infection type.
The differences of emm types were reflected in the distribution of emm clusters among SSTI and iGAS isolates (Figure 1), although the SIDs were not significantly different ( Table 1, p = 0.910) While iGAS isolates were significantly associated with emm clusters A-C3 (comprising almost exclusively emm1 isolates) and A-C5 (comprising exclusively emm3 isolates) (p < 0.001 and p = 0.002, respectively), SSTI isolates were significantly associated with emm cluster E4 (dominated by emm89) (p = 0.006).
The emm cluster typing system can be used for predicting the ability of the strains to bind different host proteins based on the binding properties of the respective M proteins (Supplementary Table 1) (Sanderson-Smith et al., 2014). When comparing SSTI and iGAS isolates, and excluding the clusters with uncertain binding for each host protein, the ability to bind C4BP was associated with SSTI isolates (p < 0.001). This association reflects the high prevalence of clusters E3 and E6 among SSTI, even though the dominant cluster among SSTI (cluster E4) was classified as uncertain in the ability to bind C4BP. SSTI isolates were also associated with the ability to bind IgA and IgG (p = 0.002 and p = 0.033, respectively), whereas the invasive isolates were associated with the ability to bind fibrinogen and albumin (p < 0.001 and p = 0.004, respectively), all significant after FDR.
In the period of 2005-2009, the overall distribution of STs differed significantly between SSTI and iGAS (p < 0.001), with a higher diversity among SSTI (Table 1, p = 0.002). ST28 and ST15 were associated with iGAS (p < 0.001 and p = 0.007, respectively) (Supplementary Figure 3), in agreement with the association of the corresponding dominant emm types with iGAS (emm1 for ST28 and emm3 for ST15). These differences were supported after FDR correction.
No significant differences were detected when comparing the ST diversity of isolates with the same emm type recovered from each type of infection ( Table 2).

SAg Profiling
Overall, chromosomally encoded smeZ and speG genes were the most frequently detected among SSTI isolates, being present in 307 (96%) and 301 (94%) isolates, respectively (Table 3). PCRamplification of speB and speF was not possible for one isolate, and the absence of both genes was confirmed by Southern blot (data not shown). A total of 50 distinct SAg profiles were identified among the SSTI isolates (Tables 1, 3).

SpeB Protease Activity and Sequence of covRS and ropB
We previously showed that the total proteolytic activity could be used as a proxy for SpeB activity and that all isolates with null alleles, either in covS or in ropB, lacked SpeB activity (Friães et al., 2015b). Among the 320 SSTI isolates, 79 (25%) had no detectable SpeB activity (Dataset 1). These isolates represented 22 different Among the isolates with no SpeB activity, 15 distinct covR alleles were identified, resulting in only two different amino acid sequences, while 34 and 47 different alleles were detected for covS and ropB, corresponding to 22 and 37 distinct amino acid sequences, respectively (Figure 2). In two isolates no PCR product was amplified using the ropB specific primers, indicating a possible deletion involving the ropB gene. In one of these isolates, the absence of PCR amplification of the speB and speF genes, both located in the same region as ropB, supports the occurrence of a large deletion encompassing the entire ropB gene, as well as speB and speF. Deletions spanning different lengths in this locus have been previously identified in GAS isolates from human infections (Friães et al., 2013a).
Nucleotide changes predicted to prevent the expression of a functional protein due to nonsense mutations or frameshifts were considered to result in null alleles. No null alleles were detected in covR, while null covS alleles were detected in 10 isolates. Since none of the SpeB-positive isolates is expected to have null covS alleles, these would correspond to 3% of all isolates recovered from SSTI. These null covS alleles were present in isolates of six different emm types (emm1, emm11, emm89, and emm118, each n = 2; emm9 and emm223, each n = 1) and seven STs (ST28, ST403, and ST565, each n = 2; ST75, ST101, ST408, and ST828, each n = 1), and there were no associations with specific emm types or STs. Two isolates presented in-frame indels (45 bp deletion and 3 bp insertion), whose effect on protein function is not clear. The consequences of these in-frame indels in CovS protein function could not be inferred based on the absence of SpeB, since both isolates also carry null ropB alleles which could explain the downregulation of SpeB. Therefore these were not considered null alleles.
Frontiers in Microbiology | www.frontiersin.org Changes predicted to result in null alleles, including nonsense mutations and indels that generate frameshifts are represented in red. One exception is represented by "*" corresponding to an isolate with an early deletion of 4 bp in ropB which is more probable to result in the alteration of the start codon rather than a premature stop codon and therefore was not considered as a null mutation.

DISCUSSION
Antimicrobial resistance among SSTI isolates was not significantly different from that found among contemporary iGAS and pharyngitis isolates, with the decreasing resistance to macrolides in SSTI mirroring declines in resistance previously described among isolates causing these other infections (Friães et al., 2013b;Silva-Costa et al., 2015). The studied SSTI isolates presented a very high genetic diversity, which was higher than the diversity of iGAS isolates in the period of 2005-2009. Still, two lineages accounted for more than one third of all SSTI isolates, namely emm89 and emm1. The molecular epidemiology of SSTI GAS isolates varies considerably among the different published studies, but these often refer to distinct time periods than the one studied here and include different infection types, with some including only isolates from impetigo, while others, like the present study, include all SSTI isolates. In Finland, a clear dominance of emm77 (40%) was found among pus isolates from 2008, followed by emm types 1, 28, and 89 (13%) (Vähäkuopus et al., 2012). In Northern Spain, emm77 was the second more prevalent type (17%) among dermal infections during 2005-2011, but emm89 largely dominated (31%) (Tamayo et al., 2014). In Norway, skin isolates recovered between 2005 and 2006 were dominated by emm types 28, 12, and 87 (Kittang et al., 2008), and in Serbia emm58 was significantly associated with SSTI, comprising 12 of the 52 SSTI isolates recovered in 2001(Mijač et al., 2010. In Beijing, among 52 impetigo GAS isolated between 2005 and 2008, emm12 and emm1 were highly prevalent (54 and 37%, respectively) (Liang et al., 2005). In Japan, emm28 was the most prevalent (23%) among 53 abscess isolates during 2003(Wajima et al., 2008, while in Taiwan a high prevalence of emm11 and emm106 was reported among isolates from different types of SSTI (Chiang-Ni et al., 2011;Lin et al., 2011). A very high diversity was observed among isolates recovered from pyoderma in India and in the Aboriginal Australian communities, presenting emm types infrequently isolated in developed, temperate climate regions (McDonald et al., 2007;Kumar et al., 2012).
In our study, although most of the emm types were identified among both iGAS and SSTI isolates, there were clear differences in the overall emm and clonal distribution between the two infections. The emm type 1 associated to iGAS was also previously found to be overrepresented among iGAS relative to pharyngitis (Friães et al., 2012), confirming the enhanced capacity of this lineage to cause invasive disease. The emm1 isolates have remained a major cause of iGAS in Portugal from 2000 to 2009 (Friães et al., 2007(Friães et al., , 2013b, in line with the worldwide dissemination of the M1T1 clone (O'Loughlin et al., 2007;Aziz and Kotb, 2008;Luca-Harari et al., 2009). The association between emm3 and iGAS is not unexpected, since emm3 has been reported as a major cause of invasive disease in Portugal and other countries, although the comparison with pharyngitis isolates did not identify this emm type (Beres et al., 2004;Luca-Harari et al., 2009;Friães et al., 2013b).
The association between emm89 and SSTI has been previously reported in Northern Spain, but with no information regarding the specific clades involved (Tamayo et al., 2014). In Finland, emm89 was not particularly associated with SSTI, although it was among the most frequent emm types in isolates recovered from pus (Vähäkuopus et al., 2012). However, the Finish study refers to 2008, prior to the emergence of emm89 clade 3 in Finland, at least among GAS isolated from blood (Latronico et al., 2016). In Portugal, only emm89 isolates lacking the hasABC locus, presumed to belong to the recently emerged clade 3 (Friães et al., 2015a;Zhu et al., 2015), were significantly associated with SSTI. This observation is in line with the previously reported increase in the prevalence of emm89 among SSTI in Portugal, but not in iGAS or pharyngitis, associated with the emergence of this acapsular clade (Friães et al., 2015a). In agreement, between 2005 and 2009 the increase of emm89 isolates lacking the capsule locus translated into a significant increase of emm89 isolates only among SSTI, but not among iGAS. Our data thus raises the possibility that the genome remodeling underlying the emergence of emm89 clade 3 may have led to a particular propensity to cause SSTI, although this clade also quickly replaced the previously circulating emm89 clades in all infection types (Friães et al., 2015a). This association with SSTI could be due to an increased capacity to colonize the skin, as well as to an improved ability to overcome the major host defense mechanisms present in skin and soft tissue or to produce cytotoxic effects at these sites. Increased transmissibility and persistence has been suggested for clade 3-associated strains based on enhanced capacity to adhere to uncoated plastic (Turner et al., 2015), while the increased expression of NADase and SLO by clade 3 has been associated with virulence in a mouse model of necrotizing fasciitis (Zhu et al., 2015). However, further investigation is needed to clarify the role of these phenotypes in the specific association of emm89 clade 3 with SSTI.
These associations between emm type and disease presentation were reflected on the different prevalence of the respective STs and SAg profiles between iGAS and SSTI, in agreement with the high congruence previously observed between these three typing methods (Carriço et al., 2006;Friães et al., 2013a). The observed association of emm clusters A-C3 and A-C5 with iGAS, as well as that of cluster E4 with SSTI also reflects the association of the respective emm types with the types of infection.
According to the predicted ability of the different M proteins to interact with host factors (Sanderson-Smith et al., 2014), invasive isolates would be associated with the ability to bind fibrinogen and albumin. The ability to bind fibrinogen was proposed as a mechanism to decrease complement deposition resulting from the activation of the classical pathway (Carlsson et al., 2005). A similar function is believed to be performed by C4BP (Carlsson et al., 2003), whose binding was associated with SSTI isolates due to the high prevalence of emm clusters E3 and E6 in this type of infection. The reasons why complement inhibition could be achieved through fibrinogen in iGAS and C4BP recruitment in SSTI remain elusive. Most emm clusters include proteins able to bind albumin, with the exception of cluster E4 (Sanderson-Smith et al., 2014). The association of cluster E4 with SSTI resulted in the overrepresentation of albumin binding among iGAS. Although albumin binding was shown to mask epitopes in the C-repeated region of the M protein (Sandin et al., 2006), deletion of this region did not impair virulence in a mouse intraperitoneal infection model (Waldemarsson et al., 2009), not providing clues as to why this could be important in the context of iGAS. The ability to bind to human IgA and IgG was associated with SSTI isolates. Immunoglobulin binding was shown to hinder opsonophagocytosis, even in the absence of a specific immune response (Carlsson et al., 2003). One can imagine such defense mechanism could be useful to bacteria in the context of both SSTI and iGAS and the reasons for the observed differences deserve further scrutiny. Only 5% of the SSTI isolates were predicted to bind plasminogen, although this was recently shown to facilitate keratinocyte invasion (Siemens et al., 2011) and could thus be beneficial in the context of SSTI.
The acquisition of mutations in the two-component regulatory system CovRS is regarded as an important mechanism promoting the transition to an invasive phenotype among GAS isolates. Current data suggests that upon infection with a wild type strain, covRS mutants arise at the focal infection site, from which mixed populations are recovered, and are subsequently selected for during transition to deeper, normally sterile sites (Sumby et al., 2006;Mayfield et al., 2014). In our previous work, we observed that null covS alleles were significantly overrepresented among iGAS (10%) relative to pharyngeal isolates (2%) (Friães et al., 2015b). Despite the low number of isolates carrying covS null alleles in all types of infection, the data reported here reveals an underrepresentation of covS null alleles in SSTI (3%) relative to iGAS isolates (p = 0.009), but no significant difference relative to pharyngeal isolates. Since our isolates were obtained from a single colony from each patient, there was a sampling of possibly mixed populations, which could introduce a bias or at least underestimate the ongoing selection of covRS impaired variants. Still we do not believe this compromises our conclusions since the colonies were randomly picked without particular care to select any of the variants. Our results therefore indicate that mutations impairing CovRS are a hallmark of iGAS isolates, although they still represent a minority of iGAS (Friães et al., 2015b). The lower prevalence of covS null alleles among pharyngeal and SSTI isolates supports the importance of a functional CovRS in the initial stages of non-invasive infection, both in skin and in the upper respiratory tract (Hollands et al., 2010;Alam et al., 2013). As reported for pharyngeal and invasive isolates (Friães et al., 2015b), covRS and ropB mutations occurred in SSTI isolates of diverse lineages and were not a particular characteristic of any specific clone. In contrast to covS, the proportion of ropB null alleles among SSTI isolates (4%) was similar to that found among both pharyngeal and invasive isolates (3%), indicating that mutations abrogating RopB activity occur in a low fraction of GAS isolates, regardless of the type of infection they cause.
The approach of considering as null alleles only those with premature stop codons can potentially underestimate the number of isolates with non-functional regulators, since some amino acid substitutions may also inactivate protein function. However, this conservative approach was based on the difficulty of confidently predicting the functional effect of missense mutations in covRS and ropB. Several studies have established an association between specific amino acid substitutions and an altered phenotype, such as reduced/absent SpeB or enhanced expression of hyaluronic acid capsule (mucoid colonies), SLO or NADase (Sumby et al., 2006;Hasegawa et al., 2010;Ikebe et al., 2010;Olsen et al., 2012;Tatsuno et al., 2013). However, without functional studies comparing the phenotypes of the isolates with missense mutations and those of isogenic strains carrying the wildtype regulator genes, it is not possible to establish causality between the missense mutations and the complete loss of CovRS or RopB function. For a few mutations this correlation was further supported by the reversion of the phenotype after complementation with wild-type covRS genes (Engleberg et al., 2001;Masuno et al., 2014), but those mutations were not identified in our study. On the other hand, it has been shown that amino acid substitutions in CovS may result in a partial function loss (Tatsuno et al., 2013) and that the reduction in the kinase or phosphatase activity of CovS has different effects in the transcriptomes of strains from distinct lineages (Horstmann et al., 2015). Likewise, some ropB missense mutations that were identified in multiple strains were associated with different levels of SpeB expression and activity (Olsen et al., 2012).
The abrogation of SpeB activity is usually regarded as one of the most important features of CovRS mutants contributing to the invasive phenotype, since this abrogation spares multiple virulence factors involved in invasive disease pathogenesis and evasion of host immunity (Carroll and Musser, 2011). The proportion of isolates lacking SpeB among our SSTI collection (25%) was similar to that found previously among iGAS (24%) and higher, but not significantly, than that found among pharyngitis isolates in Portugal (16%, p = 0.081) (Friães et al., 2015b). Only one ST was associated with a lack of SpeB expression (ST382) and both this ST and the emm type it expresses (emm6) were evenly distributed in both SSTI and iGAS. These results suggest that, in addition to SpeB downregulation, the differential regulation of other virulence genes induced by impairment of CovRS is also critical for an enhanced invasive capacity of the isolates. In agreement with this suggestion, a mouse model of SSTI showed that covRS mutants produced larger necrotic regions than speB mutants (Engleberg et al., 2004).
A limitation of this study is the lack of information regarding the specific infection caused by the isolates, as well as the clinical evolution and outcome of each infection. SSTI include a wide range of superficial and deep infections which vary greatly in severity. It was therefore not possible to evaluate possible correlations between the clones identified in SSTI and the severity of the respective infections. This could bias the comparison between SSTI and iGAS, leading to a possible underestimation of the differences between the two types of infection. However, even in these conditions we still identified significant differences between the two subsets of isolates regarding both clonal composition and presence of null covS alleles, which are in agreement with our previous observations from the comparison between pharyngeal and iGAS isolates (Friães et al., 2012(Friães et al., , 2015b. GAS isolates causing SSTI in Portugal are a genetically diverse population. Still, the 30 valent M protein-based vaccine, which was shown to evoke cross-opsonic antibodies against nonvaccine serotypes (Dale et al., 2011), could potentially cover up to 95% of SSTI in Portugal. Although SSTI are the main primary foci associated with invasive disease (Lamagni et al., 2008), SSTI isolates, inasmuch as pharyngeal isolates (Friães et al., 2012), present a clearly different clonal composition from contemporary iGAS isolates. It is less clear how the differences in the presumed interaction with host proteins and exotoxin profiles brought about by those clonal differences could explain the distinct disease presentations. Despite the different prevalence of multiple emm types between SSTI and iGAS, within each emm type the same MLST defined lineages and the same SAg profiles could be found in both infection types. This indicates that any intra-emm type genetic differences between the two populations must be explored at a more detailed level, such as by whole-genome sequencing. Our current data therefore indicates that among the GAS clones causing infection in the Portuguese population, some have an increased capacity to invade deeper tissues and cause severe infections, while others seem to be particularly successful at establishing SSTI or pharyngeal infections. The proportion of SSTI isolates with no detectable SpeB activity was modest and similar to that found previously in Portugal among iGAS and pharyngitis isolates, indicating that the selective pressure to eliminate SpeB is also not a primary factor in SSTI. However, we confirmed an association between mutations abrogating CovRS function and iGAS, suggesting that increased expression of other virulence factors, such as the hyaluronic acid capsule, streptolysins or NADase, rather than SpeB downregulation, may be under selection in the context of iGAS.

DATA AVAILABILITY STATEMENT
The dataset generated and analyzed in this study can be found in FigShare

AUTHOR CONTRIBUTIONS
CP performed the experiments. PGSSI collected data. AF, CP, and MR analyzed and interpreted the data. AF, CP, JM-C, and MR were involved in the conception and design of the study, as well as in drafting the manuscript. AF, CP, JM-C, MR, and PGSSI were involved in revising the paper critically for important intellectual content.