Identification of Three Clf-Sdr Subfamily Proteins in Staphylococcus warneri, and Comparative Genomics Analysis of a Locus Encoding CWA Proteins in Staphylococcus Species

Coagulase-negative Staphylococcus warneri is an opportunistic pathogen that is capable of causing several infections, especially in patients with indwelling medical devices. Here, we determined the complete genome sequence of a clinical S. warneri strain isolated from the blood culture of a 1-year-old nursling patient with acute upper respiratory infection. Genome-wide phylogenetic analysis confirmed the phylogenetic relationships between S. warneri and other Staphylococcus species. Using comparative genomics, we identified three cell wall-anchored (CWA) proteins at the same locus (sdr), named SdrJ, SdrK, and SdrL, on the chromosome sequences of different S. warneri strains. Structural predictions showed that SdrJ/K/L have structural features characteristic of Sdr proteins but exceptionally contained an unusual N-terminal repeat region. However, the C-terminal repetitive (R) region of SdrJ contains a significantly larger proportion of alanine (142/338, 42.01%) than the previously reported SdrI (37.00%). Investigation of the genetic organization revealed that the sdrJ/K/L genes were always followed by one or two glycosyltransferase genes, gtfA and gtfB and were present in an ∼56 kb region bordered by a pair of 8 bp identical direct repeats, named Sw-Sdr. This region was further found to be located on a 160-kb region subtended by a pair of 160-bp direct repeats along with other virulence genes and resistance genes. Sw-Sdr contained a putative integrase that was probably a remnant of a functional integrase. Evidence suggests that Sw-Sdr is improbably an efficient pathogenicity island. A large-scale investigation of Staphylococcus genomes showed that sdr loci were a potential hotspot of insertion sequences (ISs), which could lead to intraspecific diversity at these loci. Our work expanded the repository of Staphylococcus Sdr proteins, and for the first time, we established the connection between sdr loci and phylogenetic relationships and compared the sdr loci in different Staphylococcus species, which provided large insights into the genetic environment of CWA genes in Staphylococcus.

Coagulase-negative Staphylococcus warneri is an opportunistic pathogen that is capable of causing several infections, especially in patients with indwelling medical devices. Here, we determined the complete genome sequence of a clinical S. warneri strain isolated from the blood culture of a 1-year-old nursling patient with acute upper respiratory infection. Genome-wide phylogenetic analysis confirmed the phylogenetic relationships between S. warneri and other Staphylococcus species. Using comparative genomics, we identified three cell wall-anchored (CWA) proteins at the same locus (sdr), named SdrJ, SdrK, and SdrL, on the chromosome sequences of different S. warneri strains. Structural predictions showed that SdrJ/K/L have structural features characteristic of Sdr proteins but exceptionally contained an unusual N-terminal repeat region. However, the C-terminal repetitive (R) region of SdrJ contains a significantly larger proportion of alanine (142/338, 42.01%) than the previously reported SdrI (37.00%). Investigation of the genetic organization revealed that the sdrJ/K/L genes were always followed by one or two glycosyltransferase genes, gtfA and gtfB and were present in an ∼56 kb region bordered by a pair of 8 bp identical direct repeats, named Sw-Sdr. This region was further found to be located on a 160-kb region subtended by a pair of 160-bp direct repeats along with other virulence genes and resistance genes. Sw-Sdr contained a putative integrase that was probably a remnant of a functional integrase.

INTRODUCTION
Staphylococci continue to be the leading cause of many human infections. Along with Staphylococcus aureus and other coagulase-positive staphylococcal (CoPS) strains, whose pathogenicity and mechanisms of virulence are well elucidated, the coagulase negative species has recently garnered more attention. Albeit less virulent than coagulase-positive isolates (they express a restricted number of virulence factors), nowadays they are the most prevalent microorganisms within the genus, particularly in hospital settings, where iatrogenic procedures have improved patient care, but have also promoted body invasion by such pathogens.
Staphylococcus warneri, along with Staphylococcus epidermidis, is one of the predominant coagulase-negative staphylococcal (CoNS) bacteria inhabiting the skin in healthy humans (Nagase et al., 2002). In comparison with S. aureus and other CoPS strains that can cause a range of illnesses, from minor skin infections to life-threatening diseases, S. warneri prefers to cause infections if the patients have underlying conditions such as indwelling foreign bodies and/or immunosuppression. As a common saprophyte of human epithelia, S. warneri is also frequently isolated from saliva, dental plaque, and nasal swabs, where it represents the third most prevalent CoNS species after S. epidermidis and S. hominis (Ohara-Nemoto et al., 2008). Over the past three decades, with the progressive refinement of identification techniques, S. warneri has been reported as a new pathogenic species frequently isolated from bacteremia, sepsis with multiple abscesses, orthopedic infections, vertebral osteomyelitis, and ventricular shunt infections of immunocompromised patients (Bryan et al., 1987;Kamath et al., 1992;Campoccia et al., 2010;Martínez-Lage et al., 2010). In addition, the overuse of antibiotics has led to a rapid increase in the occurrence of multiresistant strains of many bacteria, including S. warneri. Currently, the importance of S. warneri as a modern-day pathogen is growing, as it has established itself as a successful nosocomial pathogen.
Pathogenic gram-positive cocci express a plethora of virulence factors, including cell wall-anchored (CWA) proteins involved in microbial attachment to host tissues, which are considered the first critical step in the establishment of most bacterial infections (Foster and Höök, 1998). S. aureus can express up to 24 different CWA proteins, whereas CoNS species such as S. epidermidis and S. lugdunensis express a smaller number of CWA proteins (Bowden et al., 2005;Heilbronner et al., 2011). Timothy et al., proposed to classify Staphylococcus CWA proteins into four groups, namely, microbial surface component recognizing adhesive matrix molecule (MSCRAMM) family, near-iron transporter (NEAT) motif family, three-helical bundle family and G5-E repeat family, on the basis of the presence of motifs that have been defined by structure-function analysis . However, to the best of our knowledge, there are still numerous CWA proteins in Staphylococcus that cannot be assigned to any groups. The most prevalent group in Staphylococcus is the MSCRAMM family, which is defined by tandemly linked IgG-like folded domains (N2N3) that can engage in ligand binding by the dock, lock and latch (DLL) mechanism (Ponnuraj et al., 2003;Foster et al., 2014;Foster, 2019). The Clf-Sdr protein, with a repeat (R) region composed of serineaspartate dipeptide repeats at the C-terminus, defines a subfamily of MSCRAMMs. The Clf-Sdr subfamily in S. aureus includes clumping factor A/B (ClfA/B) and SdrC/D/E , while in S. epidermidis it includes SdrG/F (Bowden et al., 2005). The N2N3 domains of these Sdr proteins share low sequence identities, which allow them to have different functions and target different ligands or the same ligands at different sites (Foster and Höök, 1998;Wang et al., 2013). For example, ClfA and SdrG have been shown to only bind to fibrinogen molecules at different sites, while ClfB is reported to adhere to multiple peptides carrying the glycine-serine-rich (GSR) motif, such as fibrinogen α (Fgα), cytokeratin 10 (CK10) peptides and dermokines (Ganesh et al., 2011;Xiang et al., 2012). Unfortunately, the MSCRAMMs in CoNS, except for S. epidermidis, draw less attention to researchers; for example, MSCRAMMs from S. warneri have not been characterized yet.
Cell wall-anchored proteins can be encoded on mobile genetic elements (MGEs), such as biofilm-associated protein (Bap) (Ubeda et al., 2003), S. epidermidis surface protein J (SesJ) (Arora et al., 2020) and S. saprophyticus surface protein F (SssF) (King et al., 2012). The presence of virulence factors in MGEs can influence the virulence potential of a specific strain. Therefore, it is critical to study the genetic organization of virulence factors in emerging pathogenic bacteria to understand the mechanisms employed by these bacteria to cause infections. Sdr proteins, as an interesting subfamily of MSCRAMMs, have attracted much attention from researchers, but most studies have focused on their interaction with host proteins. Based on previous studies, we found that some of these Sdr proteins were obviously a part of accessory genes, such as SdrD/E from S. aureus (Liu et al., 2015). However, what leads to such a situation remains elusive.
Here, we preliminarily characterized three different Sdr proteins at the same locus in S. warneri strains and performed a comprehensive comparative genomics analysis of the locus in different Staphylococcus species.

MATERIALS AND METHODS
Bacterial Strains, Genome Sequencing, and Genome Assembly Staphylococcus warneri WS479 was isolated from the blood culture of a 1-year-old nursling patient with acute upper respiratory infection in an affiliated hospital of Wenzhou Medical University, Zhejiang, China. The strain was identified by a Vitek-60 microorganism autoanalysis system (BioMerieux Corporate, Craponne, France).
The AxyPrep Bacterial Genomic DNA Miniprep Kit (Axygen Scientific, Union City, CA, United States) was used to extract the genomic DNA of S. warneri WS479. Library preparation, MinION sequencing and Illumina sequencing were performed at Shanghai Sunny Biotechnology Co., Ltd. The MinION long reads were initially assembled by Canu v1.8 (Koren et al., 2017), and then two FASTQ sequence files generated using the Illumina HiSeq 2500 platform were mapped onto the primary assembly to control assembly quality and to correct possible misidentified bases by using Bwa (Li and Durbin, 2009).
The genomes used to determine phylogenetic relationships and perform comparative genomics analysis were downloaded from the NCBI genome database. For those genomes used to investigate the sdr locus, the complete genome sequences were included preferentially, and the draft genome sequences that incompletely harbored the sdr locus were excluded. All the genomes used in this study were confirmed by calculating average nucleotide identity (ANI) (Supplementary Figure 5) using FastANI (Jain et al., 2018).

Antimicrobial Susceptibility Testing and Cloning Experiments
The minimum inhibitory concentrations (MICs) were determined using the agar dilution method following the guidelines of the Clinical and Laboratory Standards Institute (CLSI, 2019). Susceptibility patterns were interpreted according to the CLSI breakpoint criteria for Staphylococcus. MIC was defined as the lowest concentration producing no visible bacterial growth. Enterococcus faecalis ATCC 29212 and Escherichia coli ATCC 25922 were used as reference strains for quality control. The resistance gene sequences (aadD2 and blaZ) along with their promoter regions were PCR-amplified using the primers 5 -GCTCTAGAGCTTTCTATTATTGCAATGTGGAATTG-3 and 5 -CGGGATCCCGTCAAAATGGTATGCGTTTTGACACA-3 for aadD2 and 5 -CGGGATCCCGATTTAGCCATTTTGACA CCTTCTTT-3 and 5 -CCAAGCTTGGTTAAAATTCCTTCAT TACACTCTTGGCG-3 for blaZ, with each having a pair of flanking restriction endonuclease adapters (XbaI and BamHI for aadD2 and BamHI and HindIII for blaZ). The PCR products of aadD2 and blaZ were then eluted from agarose gel, digested with the corresponding restriction endonucleases, and ligated into the pAM401 and pUCP24 vectors, respectively. The recombinant plasmid (pAM401-aadD2) was transformed into E. faecalis JH2-2 by the electroporation method, and the transformants were grown on brain heart infusion agar plates supplemented with chloramphenicol (16 µg/mL). The recombinant plasmid (pUCP24-blaZ) was transformed into E. coli DH5α cells via the calcium chloride method, and the transformants were grown on Luria-Bertani agar plates supplemented with gentamicin (40 µg/mL). The recombinant plasmids were verified by both restriction endonuclease digestion and Sanger sequencing (Shanghai Sunny Biotechnology Co., Ltd., Shanghai, China).

Construction of a Phylogenetic Tree
To determine the phylogenetic relationship between S. warneri and other Staphylococcus species based on genomic data, we extracted and concatenated whole genome-wide singlecopy orthologous protein sequences that were detected by OrthoFinder version 2.3.8 (Emms and Kelly, 2019) from 74 completely sequenced Staphylococcus genomes and a Bacillus subtilis genome (Bacillus subtilis 168, served as an outgroup). Multi-FASTA alignment was performed using MAFFT v7.407 (Katoh and Standley, 2013). Then, Gblocks (Castresana, 2000) was used to select conserved blocks of the resulting alignment. RAxML version 8.2.12 (Stamatakis, 2014) was used to infer the phylogeny by the maximum likelihood algorithm (ML) under the substitution matrix WAG which was chosen by ProtTest version 3.4 (Darriba et al., 2011).

Sequence Analysis and Molecular Modeling
Identification of domains and other sequence features that occurred within proteins was carried out using InterProScan (Jones et al., 2014), SignalP 1 and TMHMM 2 . The repeat domains were identified using T-REKS 3 . Furthermore, several positively charged residues and LPXTG motifs at the C-terminus of CWA proteins were checked and confirmed manually. Comparison of the identities among proteins was performed using BLASTN and BLASTP (Altschul et al., 1990). Multiple sequence alignment was carried out using ClustalW 4 . The homology models of the N2N3 domains of SdrJ and SdrL were generated using the SWISS-MODEL repository using the crystal structure of the N2N3 domains of ClfB and SdrC (PDB IDs: 3AU0 and 6LXH) as the template, respectively. The homology model of the N2N3 domains of SdrK was ultimately established using MODELER (Webb and Sali, 2014) based on the crystal structures of SdrE and SdrG. Subsequently, the quality of the final models was assessed by a Ramachandran plot using the RAMPAGE server and quality model assessment with the protein structure analysis (ProSA) server (Wiederstein and Sippl, 2007;Kaufmann et al., 2011).

Protein Purification and Fluorescence Experiments
The DNA sequences of rSdrL178-748 (S. warneri WB224) and rSdrD54-680 (S. aureus TCH60) were synthesized with a pair of flanking restriction endonuclease adapters (NdeI for the forward primer and XhoI for the reverse primer) at Shanghai Tsingke Biotechnology Co., Ltd. Using pCold II as a cloning vector and E. coli BL21 as the recipient, transformants (pCold II-rSdr/BL21) were selected on LB agar plates containing 100 µg/mL ampicillin. The overnight culture of the recombinant strain (pCold II-rSdr/BL21) was diluted 100-fold in 100 ml of LB medium, incubated for 2-3 h at 37 • C with orbital shaking at 250 rpm until the OD 600 reached 0.6, and incubated with 1 mM isopropyl-β-d-thiogalactopyranoside (IPTG) (Sigma Chemicals Co., St. Louis, MO, United States) for 20 h at 16 • C (Choi and Geletu, 2018). The recombinant protein was purified by affinity chromatography using BeyoGold His-tag Purification Resin (Beyotime, Shanghai, China) according to the manufacturer's instructions. The histidine tag was removed by Enterokinase (GenScript, Nanjing, China) for 16 h at 16 • C.

RESULTS AND DISCUSSION
General Features of the S. warneri WS479 Genome The whole genome of S. warneri WS479 contained one single circular chromosome and two plasmids designated pWS-31 and pWS-25 (Supplementary Table 1  Each B-repeat contains three conserved stretches, namely, C1, C2, and C3, which are framed in green boxes. The consensus sequence for a canonical EF-hand is taken from that of Kretsinger (1987). "*" means ≥80% conserved and "!" means 100% conserved.
The in vitro susceptibility testing of S. warneri WS479 exhibited resistance to a number of antibiotics (Supplementary  Table 2), including erythromycin and clarithromycin (macrolides) and amikacin and azithromycin (aminoglycosides), according to CLSI breakpoint criteria for Staphylococcus (Supplementary Table 2). Moreover, the minimum inhibitory concentration (MIC) of roxithromycin against S. warneri WS479 was 256 µg/mL, which was significantly higher than the resistance breakpoint for erythromycin (>8 µg/mL), though no interpretation criteria for the antimicrobial was available. Moreover, three antibiotic resistance genes (blaZ, aadD2 and msrA) involved in drug resistance against three antibiotics (β-lactams, aminoglycosides and macrolides, respectively) were predicated on the plasmid pWS-31, and two of them (blaZ Phylogenetic Relationship Between S. warneri and Other Staphylococcus Species Lamers et al. (2012) established a new classification using 16S rRNA, tuf, rpoB and dnaJ genes to classify the Staphylococcus species into 15 cluster groups. S. warneri together with S. pasteuri were defined as the "warneri cluster group." To confirm the phylogenetic relationship between S. warneri and other Staphylococcus species, we constructed a maximum likelihood phylogenetic tree based on concatenating deduced amino acid sequences of 622 single-copy ortholog sequences from 74 completely sequenced Staphylococcus genomes (including S. warneri WS479) and a Bacillus subtilis genome (Bacillus subtilis 168, served as an outgroup) available in NCBI. The results confirmed that S. warneri together with S. pasteuri fell into a single clade with perfect supporting bootstrap values (100%) (Figure 1). Furthermore, the "warneri cluster group" was clustered closest to the "epidermidis cluster group, " which was composed of S. epidermidis, S. capitis, and S. caprae, indicating similar biological traits among these species.

SdrJ/K/L Are Novel Clf-Sdr Subfamily Proteins
When searching for the putative CWA proteins in S. warneri genomes, we discovered three genes encoding proteins sharing similar structures with the Clf-Sdr family, which we have called SdrJ/K/L, according to the established nomenclature. The deduced full-length SdrJ, SdrK and SdrL proteins in strains NCTC11044, FDAARGOS 754 and 16A are 1,035, 1,000, and 1,323 amino acids (aa) (Figure 2A) with predicted mature molecular masses of 100.56, 97.93, and 133.32 kDa, respectively. SdrJ/K/L are multidomain proteins that contain, starting from the N-terminus, a 46-amino acid long signal sequence, N-terminal repeats (NTRs), an A-region, 0-4 B-repeats, an SD (AD) repetitive (R) region and a typical cell wall anchoring sequence, such as an LPXTG motif, a hydrophobic membrane spanning region and a short cytoplasmic positively charged tail (Figure 2A). The A-region in these three proteins shares 35.90 to 45.36% sequence identity (Supplementary Table 3), suggesting that they have different functions. Surface proteins contain an N-terminal signal peptide that promotes their translocation across the bacterial membrane through the Sec pathway (Dramsi et al., 2008). The signal peptides of Sdr proteins are comprised a string of 17-18 hydrophobic aa (H-region), a variable-length N-region and an 8-10 aa C-region ( Figure 2B). Multiple sequence alignments revealed that SdrJ/K/L like other reported Sdr proteins had a YSIRKXXXGXXS-like motif ( Figure 2B). Deletion of this motif decreased the efficiency of signal peptidase, resulting in accumulation of precursors in the membrane and thus impacting on the secretion of CWA proteins (Dramsi et al., 2008).
The most prevalent CWA proteins are MSCRAMMs, which are defined by tandemly linked IgG-like folded domains that bind to their ligands through the "dock, lock, and latch" (DLL) mechanism. This binding mechanism involves characteristic structural features in the A-region, including two adjacent IgGlike folded domains with a conserved TYTFTDYVD-like motif presenting at the "back" of the latching trench in the first Frontiers in Microbiology | www.frontiersin.org domain in the tandem and a latch sequence at the C-terminal extension of the second domain (Ponnuraj et al., 2003). A latch sequence is not a conserved sequence of amino acids but is formed by alternating small residues (Ponnuraj et al., 2003). Domain and tertiary structure prediction revealed that residues 324-649 of SdrJ, 347-665 of SdrK and 339-653 of SdrL adopt two IgG-like folds (N2N3) (Figures 2A, 3). They all furthermore contained a TYTFTDYVD-like motif at the expected positions for a latching trench in the first domain of the predicted IgGlike tandem and a putative latch sequence in the extension of the second IgG-like folded domain (Figure 3). The presence of a latch sequence and a conserved TYTFTDYVD-like motif in the N2N3 subdomain indicated that they could bind a ligand peptide by the DLL mechanism. Overall, the A-regions of these three novel Sdr proteins contain the characteristic IgG-like folded tandem of a MSCRAMM.
The A-region of SdrJ shared the highest sequence identity (48.04%) with that of ClfB. In addition, the B-repeat, a Ca 2+binding domain, was absent in both ClfB and SdrJ, which have very similar structures (Figure 2A). Significantly, however, the R region of SdrJ contained a larger proportion of alanine (42.01%), even than the previously reported SdrI (37.00%) (Sakinc et al., 2006). The R regions of Clf-Sdr proteins are hypothesized to extend the A-region away from the cell surface to prevent obstruction of ligand binding by the cell wall (Hartford et al., 1997;Lizcano et al., 2012); however, whether a rich of alanine acids in the R region of SdrJ and SdrI would influence ligand binding is still unclear.
The A-region of SdrK was similar to that of SdrG at the amino acid level (53.77%), suggesting that it may also have fibrinogenbinding activity (Davis et al., 2001;Sellman et al., 2008). B-repeats not only help project the A-region on the cell surface but also engage in ligand binding. For example, SdrF binds type I collagen via its B-repeats in a temperature-dependent manner (Di Poto et al., 2015). Thus, the absence of B-repeats in SdrJ and SdrK might weaken their ligand binding activity.
SdrL proteins are more prevalent than SdrJ and SdrK proteins in S. warneri. The A-region shared a significant sequence identity (73.50%) with that of SdrC, which was reported to involve β-neurexin binding as well as bacterial biofilm formation (Barbu et al., 2010(Barbu et al., , 2014. The B-repeats of S. warneri SdrL can vary from 1 to 4 copies. Sequence alignments from multiple B-repeats revealed that they also contained one conserved EFhand motif (Figure 2C; Roman et al., 2016). To confirm the calcium binding ability of SdrL and investigate its change of hydrophobic surface in the presence of Ca 2+ , we purified rSdrL 178−748 (A-region and one B-repeat) protein of S. warneri WB224 and rSdrD 54−680 of S. aureus TCH60 (positive control). bis-ANS showed that compare with rSdrD 54−680 , rSdrL 178−748 had less decrease in fluorescence (14.23 a.u. versus 4.88 a.u.) when in the presence of Ca 2+ at 500 nm (Figure 4), which suggested that the B repeat of SdrL was functional. In summary, SdrL may be related to biofilm formation, and its predominance indicates its importance to the pathogenicity of S. warneri.
Staphylococcal MSCRAMMs do not usually contain an NTR domain, which is found in these three proteins between the signal sequence and the A-region (Arora et al., 2016). Unlike the NTR domains in SesJ and SdrI, the NTR domains found in the three S. warneri Sdr proteins were relatively short, and the tandem repeats were not identical ( Table 1). As a result of the unusually short N1 region that connects the N2N3 tandem to the preceding NTR region, the A-region of these three proteins is smaller than that of their closest homologs. MSCRAMM family proteins bind to host proteins and can mediate bacterial adhesion, evade the immune response and participate in biofilm formation. Sequence analysis and modeling studies reveal that SdrJ/K/L harboring an unusual NTR domain are novel members of the Staphylococcus Clf-Sdr subfamily, which provides a point of penetration for further research on the pathogenicity of S. warneri. It is noteworthy that, S. warneri is the first CoNS possessing structural homologs (SdrJ/K) of Clf proteins that are free of B repeats. The interactions of SdrJ/K/L and their possible roles in S. warneri pathogenesis are currently being investigated.

Genetic Organization of the Region Encoding SdrJ/K/L in S. warneri
Comparative genomic analysis revealed that SdrJ/K/L were encoded on the same locus (hereafter named the sdr locus), which was located between azo1 (encoding FMN-dependent NADPHazoreductase) and folE2 (encoding GTP cyclohydrolase) on the chromosome of S. warneri. Bioinformatic prediction  Frontiers in Microbiology | www.frontiersin.org revealed that there were predicted genomic islands surrounding the locus encoding SdrJ/K/L. A search of direct repeat sequences resulted in finding an ∼160 kb segment of the S. warneri chromosome, including the sdrJ/K/L genes flanked by 160 bp direct repeats (DRL1 and DRR1) (Figure 5 and Supplementary Figure 2). Approximately 40 kb upstream of the segment was a cluster of tRNA genes. Moreover, within the 160 kb segment, an ∼56 kb subregion (hereafter named Sw-Sdr) was found to be bordered by 8 bp perfect direct repeats (Figure 5).
In total, we identified 9 or 10 putative virulence genes (depending on the strains) and two resistance genes in the 160 kb region (Figure 5). Notably, the sdrJ/K/L genes and their immediate surroundings within Sw-Sdr seemed to be poorly conserved. These sdrJ/K/L genes were always followed by one or two glycosyltransferases, gtfA and gtfB (Figure 5). Similar organization was also observed in other Staphylococcus species, such as downstream of the sdrG gene and sdrC/D/E genes in S. epidermidis and S. aureus, respectively (Hazenbos et al., 2013). These gtf genes were proposed to glycosylate the R regions of Sdr proteins, which can protect Sdr proteins against host proteolytic activity and yet generate major epitopes for the human anti-staphylococcal antibody response (Hazenbos et al., 2013). Thus, SdrJ/K/L might also subject to similar posttranslational modifications.
On the right side of Sw-Sdr was a putative tyrosinetype integrase that was 185 aa in length. A domain search revealed that the integrase only possessed a catalytic domain and had several key residues substituted at the RHRY tetrad (Supplementary Figure 3; Kwon et al., 1997); thus, it was an atypical integrase or might be a remnant of a functional integrase (Jo et al., 2016). Further BLAST analysis revealed that such atypical integrase seems to be frequently present in Staphylococcus genomes, but its detailed function remains unknown. Additionally, phylogenetic analysis revealed that these shortened integrases were phylogenetically distinct from known integrases of bacteriophages, integrons, integrative conjugative elements and other GIs (Supplementary Figure 4).
Overall, the genetic environment of sdr loci in S. warneri also harbored other virulence and resistance determinants; however, whether the region is a pathogenicity island remains unknown due to a lack of convincing evidence (this region has a similar GC content than the overall S. warneri genome ( Supplementary  Figure 1), is free of efficient mobility genes, and is free of genes from phages) (Hacker et al., 1997).

Not Only MSCRAMM Genes but Also
Other CWA Genes Can Be Found at the sdr Locus Different S. warneri strains possessed different sdr genes at the sdr locus (between azo1 and folE2), suggesting that these sdr genes were a part of accessory genes or might be present in an MGE. After a large-scale investigation of Staphylococcus genomes, we found that species harbored CWA genes and (or) gtf genes at the sdr locus fell into a single clade (Figure 1), namely, group SD, while the remaining species contained transcriptional regulator or metabolism -related genes instead of CWA genes and (or) gtf genes (data not shown). To better characterize Sw-Sdr, we compared Sw-Sdr (S. warneri NCTC11044) with other relative regions from different Staphylococcus species. The results showed that except for the region containing CWA genes and their immediate surroundings, the other regions were relatively conserved among the investigated Staphylococcus species. In S. aureus and S. epidermidis, the sdr locus harbored the well-known Sdr family proteins SdrC/D/E and SdrG, respectively (Figure 6). In some Staphylococcus species, such as S. capitis and S. caprae, the sdr locus was free of CWA genes but still possessed the gtf gene. In S. haemolyticus, CWA genes at the sdr locus could encode not only MSCRAMM proteins but also G5-E repeat family proteins and Rib domain-containing proteins (Figure 6 and Table 2). Rib domain-containing proteins could also be found at the sdr loci in S. hominis and S. lugdunensis. Such proteins feature a C-terminal of tandem repeated Rib domains that are thought to define protective epitopes and may play a role in generating phenotypic and genotypic variation in group B Streptococcus (Michel et al., 1992;Wästfelt et al., 1996). These facts suggest that the downstream integrase was independent of the diversity of CWA genes at the sdr locus because in S. epidermidis, the downstream integrase was absent, while in S. lugdunensis, the sdr locus and the integrase-containing region were not continuous. The integrase might be associated with the initial integration of this region in Staphylococcus but lost its function during evolution, thereby leading to the settling of this region.

Insertion Sequences at sdr Locus
We noticed that within a Staphylococcus species, although the CWA genes at this locus varied from strain to strain, the repository of these CWA genes was settled. To prove this, we investigated the genome sequences of the 11 Staphylococcus species belonging to group SD (Figure 1). All the investigated species showed a settled repository of CWA genes at this locus, such as sdrC/D/E in S. aureus, sdrG in S. epidermidis, and sdrJ/K/L in S. warneri and a gene encoding the Rib domain-containing protein in S. lugdunensis (Table 2). Moreover, species belonging FIGURE 6 | Comparison of Sw-Sdr (S. warneri NCTC11044) with related regions in other Staphylococcus species. Domains of the Sdr protein in S. haemolyticus, the G5-E family protein in S. hominis and the Rib domain containing protein in S. lugdunensis are shown. WCG, species belonging to the "warneri cluster group"; ECG, species belonging to the "epidermidis cluster group"; HCG, species belonging to the "haemolyticus cluster group"; ACG, species belonging to the "aureus cluster group".  to the same cluster group tended to possess similar CWA genes at the sdr locus; for example, the sdrC/D/E derivatives in S. argenteus and S. schweitzeri share high sequence identities (>80%) with sdrC/D/E genes in S. aureus. We serendipitously observed that some Staphylococcus genomes occasionally integrated an insertion sequence (IS) at the sdr locus. More specifically, we noticed that 10.00% (15/150) of S. aureus strains possessed IS at this locus, probably due to a lack of complete genome sequences; only 2.38% (2/84) of S. epidermidis strains and 4.35% (1/23) of S. hominis strains were found to carry IS ( Table 2). Complete genome sequence data are essential for the analysis of genes containing repeat regions, such as the sdr locus. Unfortunately, due to lack of genome resources, the IS element was not observed at the sdr locus in some Staphylococcus species. S. haemolyticus was a special case with more than half (58.54%; 24/41) of the investigated genomes possessing the IS element at the sdr locus; of note, 78.57% (11/14) of the completely sequenced strains possessed the IS element at the sdr locus, implying that this number would be higher if more complete genome sequences were included. IS element was reported to involve in the modulation of expression of virulence factors in many bacterial species including Staphylococcus (Ziebuhr et al., 1999;Rajeshwari and Sonti, 2000;Conlon et al., 2004;Simser et al., 2005;Perez et al., 2015). In our dataset, S. lugdunensis was the only species with a relatively large number of complete genomes that no IS element was identified at sdr loci. Further investigation revealed that the Rib domain-containing protein in S. lugdunensis was not similar with other Sdr or G5-E proteins that possessed R region. Thus, the repeat region of Sdr or G5-E proteins at sdr loci could act as a recombination hotspot for the integration of such mobile genetic elements. However, further experiments will be needed to confirm this hypothesis. Therefore, we thought that the intraspecific differences in CWA genes or truncation of CWA genes (such as sdrL in S. warneri NCTC7291 and S. warneri WS479) at this locus were caused by these mobile genetic elements.
However, it is still unclear whether the putative site-specific integrase downstream of the sdr locus is involved in the formation of different categories of CWA proteins in various Staphylococcus species. Since sdr loci variably contained CWA genes within a Staphylococcus species, it can be a potential locus used for typing strains with different pathogenicity.

CONCLUSION
In this study, we determined the complete whole-genome sequence of S. warneri WS479, and confirmed that blaZ and aadD2 on pWS-31 were functional. The phylogenetic relationship between S. warneri and other Staphylococcus species was confirmed by a genome-wide phylogenetic tree, and S. warneri together with S. pasteuri were in a single clade. Combining sequence analysis and modeling studies, we identified three CWA proteins, namely, SdrJ/K/L, on the chromosomes of S. warneri. These proteins have structural features characteristic of Sdr proteins but also contain an unusual N-terminal repeat region. Further investigation of the genetic organization of sdrJ/K/L revealed that they were actually located at the same locus (sdr locus) on the chromosome sequence. Although a putative integrase and direct repeats were observed, our analysis still suggested that the region containing Sdr proteins (Sw-Sdr) was unlikely to be an efficient pathogenicity island. Comparative genomic analysis showed that Sw-Sdr was a conserved region found in different Staphylococcus species, and the MSCRAMM gene was not the only CWA gene at sdr loci. A large-scale investigation of Staphylococcus genomes showed that sdr loci were a potential hotspot of insertion sequences (ISs), leading to intraspecies diversity at these loci.

ETHICS STATEMENT
This study was approved by The Ethics Committee of the Affiliated Hospital of Wenzhou Medical University (China) and informed consent was obtained from the patient.

ACKNOWLEDGMENTS
The authors thank all the colleagues and reviewers who helped this work.