Genomic Epidemiology and Global Population Structure of Exfoliative Toxin A-Producing Staphylococcus aureus Strains Associated With Staphylococcal Scalded Skin Syndrome

Staphylococci producing exfoliative toxins are the causative agents of staphylococcal scalded skin syndrome (SSSS). Exfoliative toxin A (ETA) is encoded by eta, which is harbored on a temperate bacteriophage ΦETA. A recent increase in the incidence of SSSS in North America has been observed; yet it is largely unknown whether this is the result of host range expansion of ΦETA or migration and emergence of established lineages. Here, we detail an outbreak investigation of SSSS in a neonatal intensive care unit, for which we applied whole-genome sequencing (WGS) and phylogenetic analysis of Staphylococcus aureus isolates collected from cases and screening of healthcare workers. We identified the causative strain as a methicillin-susceptible S. aureus (MSSA) sequence type 582 (ST582) possessing ΦETA. To then elucidate the global distribution of ΦETA among staphylococci, we used a recently developed tool to query extant bacterial WGS data for biosamples containing eta, which yielded 436 genomes collected between 1994 and 2019 from 32 countries. Applying population genomic analysis, we resolved the global distribution of S. aureus with lysogenized ΦETA and assessed antibiotic resistance determinants as well as the diversity of ΦETA. The population is highly structured with eight dominant sequence clusters (SCs) that generally aligned with S. aureus ST clonal complexes. The most prevalent STs included ST109 (24.3%), ST15 (13.1%), ST121 (10.1%), and ST582 (7.1%). Among strains with available data, there was an even distribution of isolates from carriage and disease. Only the SC containing ST121 had significantly more isolates collected from disease (69%, n = 46) than carriage (31%, n = 21). Further, we identified 10.6% (46/436) of strains as methicillin-resistant S. aureus (MRSA) based on the presence of mecA and the SCCmec element. Assessment of ΦETA diversity based on nucleotide identity revealed 27 phylogroups, and prophage gene content further resolved 62 clusters. ΦETA was relatively stable within lineages, yet prophage variation is geographically structured. This suggests that the reported increase in incidence is associated with migration and expansion of existing lineages, not the movement of ΦETA to new genomic backgrounds. This revised global view reveals that ΦETA is diverse and is widely distributed on multiple genomic backgrounds whose distribution varies geographically.

Staphylococci producing exfoliative toxins are the causative agents of staphylococcal scalded skin syndrome (SSSS). Exfoliative toxin A (ETA) is encoded by eta, which is harbored on a temperate bacteriophage ETA. A recent increase in the incidence of SSSS in North America has been observed; yet it is largely unknown whether this is the result of host range expansion of ETA or migration and emergence of established lineages. Here, we detail an outbreak investigation of SSSS in a neonatal intensive care unit, for which we applied whole-genome sequencing (WGS) and phylogenetic analysis of Staphylococcus aureus isolates collected from cases and screening of healthcare workers. We identified the causative strain as a methicillin-susceptible S. aureus (MSSA) sequence type 582 (ST582) possessing ETA. To then elucidate the global distribution of ETA among staphylococci, we used a recently developed tool to query extant bacterial WGS data for biosamples containing eta, which yielded 436 genomes collected between 1994 and 2019 from 32 countries. Applying population genomic analysis, we resolved the global distribution of S. aureus with lysogenized ETA and assessed antibiotic resistance determinants as well as the diversity of ETA. The population is highly structured with eight dominant sequence clusters (SCs) that generally aligned with S. aureus ST clonal complexes. The most prevalent STs included ST109 (24.3%), ST15 (13.1%), ST121 (10.1%), and ST582 (7.1%). Among strains with available data, there was an even distribution of isolates from carriage and disease. Only the SC containing ST121 had significantly more isolates collected from disease (69%, n = 46) than carriage (31%, n = 21). Further, we identified 10.6% (46/436) of strains as methicillin-resistant S. aureus (MRSA) based on the presence of mecA and the SCCmec element. Assessment of ETA diversity based on nucleotide identity revealed

INTRODUCTION
Outbreak investigations for bacterial pathogens in the healthcare setting have been transformed in recent years with the advent of pathogen whole-genome sequencing (WGS). Several investigations of Staphylococcus aureus outbreaks in neonatal intensive care units (NICUs) have been reported (Köser et al., 2012;Nübel et al., 2013;Azarian et al., 2015Azarian et al., , 2016. By interrogating the genomes of emerging pathogens, researchers are able to assess their clonal distribution (i.e., population structure) and migration as well as the genomic determinants of virulence and antimicrobial resistance. Since the first genomes of S. aureus were published in 2001 (Kuroda et al., 2001), studies have described the evolution and dissemination of epidemiologically important strains (Kuroda et al., 2001;Gonzalez et al., 2006). The application of genomics was subsequently expanded to track significant changes in the epidemiology of virulent lineages in hospital outbreaks and on larger geographic and temporal scales (Harris et al., 2010;Alam et al., 2015).
Genomic studies of S. aureus have historically focused on methicillin-resistant S. aureus (MRSA) due to challenges in treatment and poorer clinical outcomes (Dasenbrook et al., 2010;Harkins et al., 2018;Kanjilal et al., 2018). Yet methicillinsusceptible S. aureus (MSSA) strains continue to cause significant morbidity and mortality (Jackson et al., 2019). In particular, some S. aureus strains produce epidermolytic staphylococcal exfoliative toxins (ETs), extracellular proteins that cause separation of the epidermal layer of the skin. These strains cause a serious but rare condition known as staphylococcal scalded skin syndrome (SSSS), predominantly in children (Ladhani et al., 1999). Two ETs have been identified among staphylococci: ETA, encoded by the chromosomally located gene eta, and ETB, encoded by gene etb that is found on a large plasmid (Yamaguchi et al., 2000). ETA is harbored by a temperate bacteriophage, ETA, that is able to lysogenize susceptible strains (Yamaguchi et al., 2000).
The first clinical descriptions of SSSS were in the mid-19th century (Ladhani et al., 1999), and due to its distinctive presentation, pediatric outbreaks are frequently reported (Anthony et al., 1972;Florman and Holzman, 1980;El Helali et al., 2005). Molecular data, on the other hand, are limited (Murono et al., 1988;Yamaguchi et al., 2002;Doudoulakakis et al., 2017). As the genes coding for ETA and ETB are found on mobile genetic elements (MGEs), their distribution in the S. aureus population should seemingly be diverse. Yet ETproducing strains are often restricted to a few clonal complexes (CCs) and are principally oxacillin susceptible (Braunstein et al., 2014). Recently, an increased incidence of SSSS in the United States has been reported (Staiman et al., 2018;Hultén et al., 2019). A study by Hultén et al. (2019) reported an increase in cases of SSSS at a children's hospital from 2.3/10,000 admissions in 2008 to 52.6/10,000 admissions in 2017. The majority of causative isolates identified were MSSA strains belonging to CC 121, a lineage previously associated with SSSS (Botka et al., 2017;Doudoulakakis et al., 2017). The genomic epidemiology of ET-positive strains, particularly regarding the distribution of ET among staphylococci, remains unexplored.
Here, we describe an outbreak investigation of MSSAassociated SSSS in a NICU. We identified the epidemic strain as sequence type (ST) 528 possessing eta. Using Illumina shortread and nanopore long-read sequencing data, we generated and published a high-quality reference genome for the outbreak clone (Cella et al., 2021). Using published WGS data in the European Nucleotide Archive (ENA), we identified all deposited S. aureus genomes possessing eta, which we then analyzed to determine the distribution of ETA as well as the diversity of ETA phages.

Outbreak Investigation and Healthcare Worker Screening
Ongoing surveillance for healthcare-associated infections identified a putative cluster of temporally related SSSS infections among neonates hospitalized in the NICU of a tertiary care medical center. Microbiological analysis identified the causative agent as MSSA susceptible to all tested antimicrobials. Due to the comparatively low incidence of SSSS, an epidemiological association was suspected, and an investigation was initiated. In addition to the implementation of environmental control measures, we carried out MSSA carriage screening of healthcare workers who had contact with the incident cases. Clinical isolates from case patients (MSSA_SSSS_01-MSSA_SSSS_04) and carriage isolates from healthcare workers underwent WGS and phylogenetic analysis.

Whole-Genome Sequencing and Phylogenetic Analysis
gDNA extraction was carried out using Qiagen DNeasy blood and tissue kit (Qiagen, Germantown, MD, United States) according to the manufacturer's instruction; and gDNA quantification was carried out using a Qubit. WGS short-read libraries were constructed using the Illumina Nextera Flex library prep kit, which were subsequently sequenced on an Illumina HiSeq using 250 cycle V3 chemistry flow cell to produce 2 × 250 paired-end reads with a target idealized coverage of 50×. After sequencing, per sample idealized coverage was calculated, and quality was assessed using FastQC. Raw reads were quality filtered with Trimmomatic v.0.39 using the following settings: SLIDINGWINDOW:10:20 MINLEN:31 TRAILING:20 (Chen et al., 2014). Multilocus sequence typing (MLST) was inferred using srst2 using Illumina data based on the S. aureus MLST database 1 (Inouye et al., 2014). De novo genome assembly of WGS data was performed using Unicycler v0.4.8 (Wick et al., 2017) with default options and annotated with Prokka v1.14.6 using the MSSA_SSSS_01 (NZ_CP061349.1) reference genome annotation to preferentially annotate coding sequences (Seemann, 2014). For a single isolate collected from the index case (MSSA_SSSS_01), we generated a high-quality complete genome using longread data from the Oxford Nanopore MinION. The methods for assembly, polishing, and annotation of MSSA_SSSS_01 are described in detail elsewhere (Cella et al., 2021). Pangenome analysis was performed using Roary v.3.12 (Page et al., 2015), and single-nucleotide polymorphism (SNP) alignment was extracted from the core genome (i.e., loci present in >99% of the sample) using snp-sites v2.4.0. 2 A maximum likelihood (ML) phylogeny was inferred with IQTREE v1.6.8 using the ASC_GTRGAMMA substitution model with 100 bootstrap replicates (Nguyen et al., 2015). For isolates phylogenetically clustering with the SSSS cases, WGS data were mapped to MSSA_SSSS_01 using Snippy v4.6.0, 3 and a SNP alignment was used to infer an ML phylogeny as described above. To infer transmission, the pairwise distances among highly related strains were assessed, and the ML phylogeny was interrogated.

Identification of ETA + Global Staphylococcus aureus Strains and Investigation of Population Structure
To investigate the global distribution of eta, we used Bitsliced Genomic Signature Index (BIGSI) to query the entire global ENA repository of bacterial WGS datasets for biosamples containing eta (NCBI Reference Sequence: NP_510960.1) (Yamaguchi et al., 2000;Bradley et al., 2019). Biosamples with BLAST identity over 90% were downloaded, assembled, and annotated as described above. All associated metadata were downloaded from public WGS data repository. To determine the year and country of isolation and source (carriage or disease), we conducted an extensive review of the literature using all available accession numbers, sample names, and aliases. Metadata tables were obtained from Supplementary Material of published studies. If published metadata were not available, study authors were contacted. Samples were deduplicated based on metadata (e.g., only one isolate was selected if multiple isolates were collected from a single study participant). In addition, samples were excluded if we could not verify the presence of eta in the de novo assembly and if the assembly failed quality check (contig >331, assembly length >3,100,000 or <2,650,000). For all remaining isolates, we performed pangenome and ML phylogenetic analysis and assigned MLST as described above. SNPs in the core genome were then used to assess population structure with fastBAPS (Tonkin-Hill et al., 2019). Based on the population structure and MLST analyses, dominant lineages were identified. For these lineages, we repeated the pangenome analysis and inferred a core genome phylogeny as described above. Abricate with default settings was used to detect the genotypic antibiotic resistance determinants. 4 For MRSA isolates, staphylococcal cassette chromosome mec (SCCmec) type was determined by assessing ccr and mec gene complexes from assemblies using SCCmec finder tool 5 (International Working Group on the Classification of Staphylococcal Cassette Chromosome Elements (IWG-SCC), 2009). MLST, SCCmec, and antibiotic resistance results were mapped on the core genome phylogeny using ggTree implemented in Rstudio running R v 3.6.0.

Dating of the Outbreak Lineage
Global strains belonging to the same lineage as the outbreak strain (represented by MSSA_SSSS_01, n = 30) were dated using Bayesian coalescent analysis. First, WGS data were mapped to MSSA_SSSS_01, using Snippy v4.6.0 as described above, to produce a reference-based alignment. Providing the alignment to Gubbins v2.4.1, we identified SNPs introduced through recombination, which were then censored as they interfere with phylogenetic inference. An ML phylogeny was subsequently inferred from the recombination-censored alignment (Croucher et al., 2014). The ML phylogeny and collection year of each sample were used as input for BactDating v1.0, a tool used to perform Bayesian dating of a bacterial phylogenetic tree (Didelot et al., 2018). We assessed temporal signal (i.e., evidence of clocklike evolution), using the roottotip function, which regresses the year of collection to the tree distance. BactDating was then run with a Markov chain Monte Carlo (MCMC) chain length of 2,000,000 to estimate the molecular clock and rate of coalescence. The resulting evolutionary rate was assessed, and the time-scaled tree was visualized.

ETA Diversity and Population Structure
We then sought to investigate the population structure of the eta-containing phages found in our dataset. Phage-containing contigs were searched using eta and int genes from the eta phage in MSSA_SSSS_01 as targets (NZ_CP061349.1, locus tags IC588_RS04295 and IC588_RS03960), aligned based on the target sequences, annotated using contig-puller v2.0, 6 and trimmed in Geneious Prime R 2020.1.2 (Biomatters Ltd., Auckland, New Zealand) at the attachment site sequences, reported in Yamaguchi et al. (2000). Genome assemblies in which both attachment sites were identified and recovered on a single contig were designated complete, and these sequences were analyzed using Phaster to determine the closest published phage genome (Arndt et al., 2016). Synteny of the phage integration sites was visually inspected, and int gene homology was assessed using fastablasta v0.5 7 with a 95% nucleotide identity threshold to the MSSA_SSSS_01 int gene as a reference (NZ_CP061349.1, locustag IC588_RS03960). To recover putative phage sequences for the remaining genome assemblies with fragmented assemblies, we performed ortholog clustering of phage genes. The gene presence/absence output from Roary pangenome analysis (described above, but re-run including all genome assemblies and the complete ETA) was filtered with the goal of identifying gene clusters that were (i) unique to ETA, (ii) present as a single copy, and (iii) full length/complete (i.e., not pseudo-genes). This was accomplished by excluding any clusters that were not represented in the complete phage sequences, clusters with multi-copy or truncated genes, and clusters inconsistently identified in the whole genome vs. phage sequence (i.e., the cluster was only detected in the whole genome assembly but absent in the complete ETA sequence for an isolate; the gene cluster being phage associated, but not unique to ETA). The dataset was then filtered so that each isolate was only represented once. A presence/absence matrix was constructed with the 177 gene clusters remaining post-filtering, and a midpoint rooted UPGMA phylogenetic tree was inferred using SplitsTree4 v4.14.6.
To define groups and further resolve the diversity and population structure of ETA, complete phage genomes (n = 139) were aligned using Clustal Omega v1.2.4 (Sievers and Higgins, 2018) to estimate pairwise nucleotide distances. Phages were clustered based on a 95% nucleotide identity threshold as suggested in Adriaenssens and Brister (2017) and assigned "G00" numbers. Pairwise nucleotide distances were visualized using a heatmap. This clustering approach was used to determine a cutoff for grouping in the phage gene presence/absence matrix, to define phage population structure for all genomes. Phages were clustered if they shared the same presence/absence profile ± 2 gene clusters, and then assigned phylogroup "P00" numbers.

Outbreak Investigation
Analysis of MSSA isolates from four patients hospitalized with SSSS identified that all belonged to ST582 (CC15) and possessed the ETA prophage that harbors eta. Index strain MSSA_SSSS_01 was fully resolved using hybrid genome assembly approach, and expression of eta was confirmed using qRT-PCR as described elsewhere (Cella et al., 2021). Assessment of pairwise genetic distances suggested that all were related by recent transmission events (Figure 1C), and epidemiological analysis of length of stay in the unit showed that two of the four patients had overlapping lengths of stay ( Supplementary  Figure 1). Nares screening of 89 healthcare workers who had contact with the case patients revealed 41 asymptomatic carriers of MSSA. Phylogenetic analysis showed a diverse population of carried MSSA strains dominated by ST30 and 7 https://github.com/kwongj/fastablasta ST5 (Figure 1). Most significantly, isolates from SSSS cases clustered with an isolate obtained from a single healthcare worker who had contact with the SSSS case patients during their hospitalization. All five isolates, when considered together, were highly genetically related, suggesting that the healthcare worker is a part of the transmission chain. However, the directionality of transmission was difficult to resolve. As a result of this finding, the healthcare worker was decolonized using the following protocol: oral doxycycline (100 mg) and rifampin (600 mg) twice daily for 7 days, intranasal mupirocin (2% ointment) and chlorhexidine gluconate 0.12% oral rinse twice daily for 5 days, and daily chlorhexidine baths. Screening of multiple body sites performed 7 and 28 days after initiating the regimen confirmed decolonization. Enhanced environmental cleaning was also implemented in the unit. Subsequently, no additional SSSS cases were identified.

Global Distribution of Strains Possessing ETA
Querying extant WGS data identified 434 genomes whose de novo assembly contained eta and passed quality parameters; two additional ST582 samples identified in the outbreak investigation detailed above were included. After review and abstraction of all associated metadata, 290 genomes were complete, having information on year and country of collection as well as source (Supplementary File 1). The sample spanned collection between 1994 and 2019 from 32 countries, classified into seven regions. The greatest proportion of the sample was collected from the United Kingdom (n = 126), the United States (n = 79), and Belgium (n = 55) (Figure 2). Analysis of population structure identified eight dominant lineages [sequence clusters (SCs)] accounting for 91.7% of the overall population. The most prevalent MLST types included ST109 (24.3%), ST15 (13.1%), ST121 (10.1%), and ST582 (7.1%). ST15 and ST582 belong to the same CC (CC15); together, they accounted for 20.2% of the total population (Figure 2). Among 297 strains with available data, there was almost an equal distribution of isolates from asymptomatic carriage (47%) and disease (53%). Only SC2, which contains ST121, appeared to have more isolates collected from disease (69%, n = 46) than carriage (31%, n = 21). Phylogenetic analysis of each SC showed clear spatial structure even within STs (Supplementary Figures 2, 3)  436 S. aureus isolates, the macrolide resistance determinant erm was the most common; 163 isolates possessed ermA (n = 120), ermB (n = 1), or ermC (n = 42) followed by tetracycline efflux pump tetK (n = 31) or tetM (n = 3).

Dating of ST582
Because the isolates identified during our outbreak investigation belonged to ST582, a relatively infrequent ST among clinical S. aureus isolates, we sought to further elucidate the emergence of the lineage, which is most closely related to ST15-CC15, designated SC8 in our analysis of population structure (Figure 2). Analysis of recombination identified that the average number of polymorphisms introduced by recombination compared with mutation (r/m) was 6.7. Root-to-tip analysis showed significant temporal signal, allowing for coalescent analysis (Supplementary Figure 4). Dating of this lineage identified an evolution rate of 4.39 [95% highest posterior density (HPD) 1.28-9.46] SNPs/genome/year and a date of the most recent common ancestor of 1931.8 (95% HPD 1785.1-1990.8), which places the split of ST15 and ST582 prior to that date.

ETA Diversity and Population Structure
We were able to extract 139 complete ETA sequences from 436 genome assemblies. Visual inspection of the integration sites identified high conservation of int gene (all sharing >98% nucleotide homology to MSSA_SSSS_01 int gene), the attL and attR sequences, and integration location. This suggests a conserved integrase type, and although variable in gene content, it is a conserved integration location. Assessment of ETA diversity based on nucleotide identity of complete phage sequences identified 27 groups designated as G01-G27, and gene content from the presence/absence matrix further resolved the structure into 62 phylogenetically congruent clusters, designated phylogroups as P01-P62 (Figure 3). While we were able to resolve phages into well-defined clusters, appreciable diversity was observed in both gene content ( Figure 3B) and nucleotide identity ( Figure 3C).
In comparison with the population structure of the strains harboring them, ETA variants were relatively stable within lineages (Supplementary Figure 5). However, variation in phage phylogroup and cluster within lineages suggests repeated infection of susceptible strains by diverse ETA phages as well as the impact of recombination to phage gene content (Supplementary Figures 2, 3). Characterization of lineages, phage population structure, and published ETA phage genomes elucidated varied evolutionary history in phage acquisition even among closely related clades (Supplementary Table 1). Notably, ST121 strains belonging to SC2 all possessed a highly related (same phage cluster) ETA matching phiETA3 (NC_008799), while the closely related ST123 showed greater diversity in phage variation. SC4, largely composed of ST109 and closely related MLSTs, largely possessed a B166-like ETA. SC8, which contained our outbreak isolate, was the most segregated with ST15 isolates possessing ETA phages matching phiETA3 and ST582 strains B166. Lastly, MRSA strains belonging to ST913 (SC7), which was limited to Western Europe, possessed a novel ETA phage related to SA97 (NC_029010) that was not found in any other lineage (Figure 2A).

DISCUSSION
Staphylococcal scalded skin syndrome is one of the most welldefined S. aureus clinical syndromes in that it is clearly linked to a known genotype-the presence of genes encoding ETs. A recent increase in SSSS incidence has been reported from North . The proportion of isolates belonging to major SC is indicated by the pie chart. The United Kingdom is presented separately from the rest of Europe due to the number of sampled strains and the difference in population structure. The size of the pie chart is scaled by the proportion of the total number of strains sampled from each region. Mapping was performed using R package maps v3.3.0.
America and abroad, suggesting either the recent emergence of lineages carrying ETs or an increase in horizontal transmission of the MGE-harboring genes encoding ETs. Applying genomic epidemiology to an outbreak investigation in a NICU, we identified the etiologic strain as an ETA + MSSA belonging to ST582, which is less commonly reported in the literature than ST121 as a cause of SSSS (Conceição et al., 2011;Rao et al., 2015;Hultén et al., 2019). We investigated the global distribution of ETA + S. aureus strains, finding both the diversity of lineages carrying ETA and the diversity of ETA temperate bacteriophages (referred to here as ETA) themselves to be significantly greater than previously appreciated. Furthermore, we dispel the notion that ETA is limited to MSSA by identifying at least three lineages that are composed largely of SCCmec + and mecA + strains with lysogenized ETA. These findings revise our understanding of the population structure of ETA + S. aureus and present a novel method of tracking genotypes of interest. Genomic epidemiology is increasingly being applied to outbreak investigation in healthcare settings, and the recent increase in accessibility of sequencing technology and bioinformatics expertise has improved the utility of these data for real-time investigation. Here, temporally related SSSS cases in the NICU initiated an investigation, which involved screening healthcare workers caring for hospitalized infants. MSSA carriage among healthcare workers was in line with estimates from the general population. Most significantly, screening identified a colonized worker with a highly genetically related strain, indicative of recent transmission. While we cannot confirm the directionality of transmission (i.e., whether the healthcare worker was the source or acquired it from the incident outbreak case), the colonized healthcare worker explains the acquisition of patient 4 who had a non-overlapping stay with the other cases. This is further supported by the close genetic distance (1 SNP) separating the healthcare worker isolate and patient 4's isolate. Decolonization of the healthcare worker as well as implementation of enhanced environmental cleaning resolved the outbreak, and no further cases to date have been identified.
A traditional approach in microbial population genomics is to identify a lineage of interest and track its emergence and spread. Often, these lineages possess an epidemiologically important phenotype such as increased virulence or antibiotic resistance conferred by genomic determinants carried on MGE such as ETA (Malachowa and Deleo, 2010). Here, instead, we reverse this approach by identifying the genetic element of interest, eta, and querying extant WGS data using a newly developed tool to elucidate the global distribution of S. aureus strains possessing ETA (Bradley et al., 2019). The majority of studies from which we obtained data were not focused on SSSS or related clinical syndromes (Nübel et al., 2013;Roisin et al., 2016;Moradigaravand et al., 2017;Mason et al., 2018); therefore, we believe this approach optimizes the use of published WGS data.
The temperate bacteriophages ETA, categorized as Sa1int phages, were previously thought to have low diversity and a relatively narrow host range, localized to few lineages, and limited to strains lacking SCCmec (Yamaguchi et al., 2000;Yoshizawa et al., 2000;Goerke et al., 2009). Further, while previous studies have shown that ETA is inducible and phage particles can convert susceptible strains (Yamaguchi et al., 2000;Yoshizawa et al., 2000), host specificity and the factors governing susceptibility remained obscure. Our revised global view reveals that ETA is diverse in terms of nucleotide identity and gene content and is widely distributed in multiple genomic backgrounds whose distribution varies geographically. The population is highly structured with eight dominant SCs that generally aligned with S. aureus CCs. ETA was observed to be relatively stable within lineages; however, it is apparent from the phage clustering analysis that recombination has generated considerable diversity within the prophage region and that these variations may be geographically specific. Interestingly, we observed relatively few singleton taxa divergent from main phylogenetic clades, suggesting that novel acquisitions of ETA are infrequent. Yet whether this results from the underlying variation in S. aureus population structure across geographies, low induction of ETA harboring strains, or limitations in host range requires further investigation.
Three SCs-SC4, SC8, and SC2-comprise greater than 74% of the total population of eta + strains. The most prevalent lineage, CC9 (SC4 in Figure 2), which includes ST9 and ST109 and has been previously associated with skin disorders and livestock-associated MRSA strains, represents nearly half of eta + isolates from North America and Europe (Růžičková et al., 2012;Botka et al., 2015). Following SC4, CC15 (SC8 in Figure 2), which includes ST582 and ST15, is the second most prevalent. This SC comprised two clades that possess considerable difference in their geographic distribution and phage content, with ST582 largely limited to Europe and North America. SC8 is globally distributed. ST15 and ST582 have been previously recognized to share a recent common ancestor, with ST582 possessing a 310-kb chromosomal replacement not found in ST15 (Didelot and Wilson, 2015). Furthermore, ST15 and ST582 each has a distinct ETA variant, suggesting separate acquisitions following their divergence, most likely at least 90 years ago based on dating of ST582. Finally, SC2, which comprised ST121 and ST123, is the third most prevalent lineage globally. While this lineage has previously had limited distribution in North America, recent reports suggest that it is now emerging there (Conceição et al., 2011;Hultén et al., 2019). SC2 is the only eta + lineage in our study for which our data suggest that it may be more commonly found in disease than carriage. This is supported by previous reports of ST121 as a hypervirulent lineage (Rao et al., 2015); however, an expanded sample of ETA-ST121 strains and additional analysis would be required to confirm these limited observations.
One promising finding is that antibiotic resistance is relatively limited among eta + S. aureus strains. The exception was the identification of sporadic clusters of MRSA, most often belonging to ST88, ST8, and ST913. MRSA strains possessing ETA were thought to be rare, as it was suggested that crossimmunity may inhibit co-carriage of SCCmec and ETA. Here, we show that it is more common than previously appreciated. The MRSA strains analyzed here were obtained from multiple studies, which is likely why their prominence was not appreciated until these data were aggregated. One lingering question is whether these strains produce ETA. From available metadata, we were able to determine that almost half of MRSA isolates were collected from cases of disease and were phenotypically resistant to oxacillin and β-lactam antibiotics. However, we are unable to confirm production of ETA nor infer production from clinical syndrome due to limited published information. Overall, the relationship between SCCmec and ETA requires further investigations. Further, relatively unknown lineages like ST913, which is found predominantly in the United Kingdom and Germany and possesses a novel ETA phage, will require further investigation and monitoring to determine the extent of its distribution and its epidemiological trajectories.
We have presented a comprehensive analysis of the diversity and distribution of ETA among S. aureus, an endeavor motivated by an outbreak investigation of SSSS and recent reports of increased incidence in North America. We illustrate the coevolution of ETA with its host S. aureus, which has resulted in considerable prophage variation that is geographically structured. Our finding that ETA is relatively stable within lineages suggests that the recent increase in incidence is associated with migration and clonal expansion of existing lineages, not the movement of ETA to new genomic backgrounds. Our study is not without limitations. Our sample and understanding of the distribution of ETA are reliant on the sampling of the included studies, which is biased by accessibility to WGS platforms and a disproportionate representation of antibiotic-resistant lineages of S. aureus in sequenced datasets. Further, as we are reliant on genomic data, we are unable to confirm phenotypic antibiotic susceptibility as well as the production of ETA, as it is known that some strains harboring ETA do not produce the toxin. Also, while we found that ETA is more widely distributed among S. aureus strains than previously understood, it is relatively limited in comparison with the extant population structure of S. aureus. Future studies should focus on understanding susceptibility of S. aureus to ETA lysogenization by comparing ETA-strains from the same STs. A full accounting of other MGE, including prophages and genomic elements, present in eta + strains would also help toward that goal.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the University of Pennsylvania Institutional Review Board. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
TA, MD, and DP contributed to conception and design of the study. MD, DP, MS, and CS performed data collection. MJ and EC contributed to whole genome sequencing data generation and analysis. TA wrote the first draft of the manuscript. SB and MD wrote sections of the manuscript. All authors contributed to the analysis and manuscript revision, read, and approved the submitted version.

FUNDING
TA, EC, and MJ are funded in part by NIH NIAID award K22 AI141582 to TA. MD and MS are funded in part by NIH NIAID award R01AI139188.