Determining the Genetic Characteristics of Resistance and Virulence of the “Epidermidis Cluster Group” Through Pan-Genome Analysis

Staphylococcus caprae, Staphylococcus capitis, and Staphylococcus epidermidis belong to the “Epidermidis Cluster Group” (ECG) and are generally opportunistic pathogens. In this work, whole genome sequencing, molecular cloning and pan-genome analysis were performed to investigate the genetic characteristics of the resistance, virulence and genome structures of 69 ECG strains, including a clinical isolate (S. caprae SY333) obtained in this work. Two resistance genes (blaZ and aadD2) encoded on the plasmids pSY333-41 and pSY333-45 of S. caprae SY333 were confirmed to be functional. The bla region in ECG exhibited three distinct structures, and these chromosome- and plasmid-encoded bla operons seemed to follow two different evolutionary paths. Pan-genome analysis revealed their pan-genomes tend to be “open.” For the virulence-related factors, the genes involved in primary attachment were observed almost exclusively in S. epidermidis, while the genes associated with intercellular aggregation were observed more frequently in S. caprae and S. capitis. The type VII secretion system was present in all strains of S. caprae and some of S. epidermidis but not in S. capitis. Moreover, the isd locus (iron regulated surface determinant) was first found to be encoded on the genomes of S. caprae and S. capitis. These findings suggested that the plasmid and chromosome encoded bla operons of ECG species underwent different evolution paths, as well as they differed in the abundance of virulence genes associated with adherence, invasion, secretion system and immune evasion. Identification of isd loci in S. caprae and S. capitis indicated their ability to acquire heme as nutrient iron during infection.

Staphylococcus caprae, Staphylococcus capitis, and Staphylococcus epidermidis belong to the "Epidermidis Cluster Group" (ECG) and are generally opportunistic pathogens. In this work, whole genome sequencing, molecular cloning and pan-genome analysis were performed to investigate the genetic characteristics of the resistance, virulence and genome structures of 69 ECG strains, including a clinical isolate (S. caprae SY333) obtained in this work. Two resistance genes (blaZ and aadD2) encoded on the plasmids pSY333-41 and pSY333-45 of S. caprae SY333 were confirmed to be functional. The bla region in ECG exhibited three distinct structures, and these chromosome-and plasmid-encoded bla operons seemed to follow two different evolutionary paths. Pan-genome analysis revealed their pan-genomes tend to be "open." For the virulence-related factors, the genes involved in primary attachment were observed almost exclusively in S. epidermidis, while the genes associated with intercellular aggregation were observed more frequently in S. caprae and S. capitis. The type VII secretion system was present in all strains of S. caprae and some of S. epidermidis but not in S. capitis. Moreover, the isd locus (iron regulated surface determinant) was first found to be encoded on the genomes of S. caprae and S. capitis. These findings suggested that the plasmid and chromosome encoded bla operons of ECG species underwent different evolution paths, as well as they differed in the abundance of virulence genes associated with adherence, invasion, secretion system and immune evasion. Identification of isd loci in S. caprae and S. capitis indicated their ability to acquire heme as nutrient iron during infection.
Keywords: Epidermidis Cluster Group, bla operon, resistance, isd locus, virulence, pan-genome INTRODUCTION Coagulase-negative staphylococci (CoNS) commonly live on the human skin (Piette and Verschraegen, 2009;Becker et al., 2014). They often caused infectious diseases in specific groups of patients, such as those with neonates, neutropenia, and so on (Ma et al., 2011;Zong et al., 2011), and infections at metastatic sites, such as joints, heart and bones. The infections in these populations are often not easy to treat (Casey et al., 2007). The virulence properties of CoNS species are mainly related to their ability to form biofilms and produce colonizing biomaterials (Becker et al., 2014). At present, these species are considered as important bloodstream pathogens usually with multidrug resistance (May et al., 2014). As a member of CoNS, Staphylococcus caprae commonly colonizes the milk gland and skin of goats and occasionally causes goat mastitis (Watanabe et al., 2018). In addition, S. caprae causes human infections, such as acute otitis externa (Shuttleworth et al., 1997), peritonitis (Shin et al., 2011), urinary tract diseases (Kanda et al., 1991), endocarditis (Vandenesch et al., 1995), meningitis (Benedetti et al., 2008), and many cases of bacteremia. However, the reasons why S. caprae can cause hospital-acquired infections haven't been fully elucidated.
Based on the 16S rRNA, tuf (elongation factor Tu), rpoB (β-subunit of RNA polymerase) and dnaJ (heat shock protein 40) genes (Lamers et al., 2012), Lamers et al. proposed a new classification and classified the Staphylococcus species into 15 cluster groups. This finding revealed that S. caprae was a member of "Epidermidis Cluster Group" (ECG) with S. epidrmidis as the leading causative organism which included S. caprae, S. saccharolyticus, S. epidermidis, S. capitis subsp. urealyticus and S. capitis subsp. capitis. The ECG species is composed of the "medium"-pathogenic staphylococci which means that when they are isolated from clinical specimens, it's uncertain that whether the infections are really caused by them (Becker et al., 2014). As S. caprae, S. capitis and S. epidermidis all belong to ECG, they are thought to share the basic mechanisms causing various hospital-acquired infections (Watanabe et al., 2018). Therefore, evaluating genomic structure divergences in virulence factors and metabolism is required to elucidate the mechanism of infections caused by ECG.
The complete set of conserved genes in all studied strains were defined as the core genome, while the accessory genome represents the genes existing in part of the strains, and the pan-genome comprises all genes in the core genome and accessory genome (Nourdin-Galindo et al., 2017;Wu et al., 2018). The pan-genome reflects the diversity among the species, host/environment adaptations as well as the variety of pathogenic mechanisms (Tettelin et al., 2008). In pan-genome analysis, unlike classical comparative genome analysis, all the genes were encompassed in a studied phylogenetic clade or a certain species (Chen et al., 2018) and will provide a better understanding of ECG genome diversity and virulence capabilities.
In this work, the whole genome sequence of a clinical S. caprae strain isolated from a puncture fluid specimen was determined. Furthermore, a genomic comparison among recently available ECGs was performed. Specifically, we analyzed the pan-genomes of each ECG species, as well as the virulence factors, and for the first time, we identified novel isd loci (iron uptake system allowing bacteria to steal iron from host heme) present in S. caprae and S. capitis. These genomic analyses will clarify the genomic differences, evolutionary relationships and pathogenic potentials of the strains.

MATERIALS AND METHODS
Bacterial Strains, Genome Sequencing, Gene Predictions, and Functional Annotations S. caprae SY333 was isolated from a puncture fluid specimen of a 29-year-old woman with fever in Lishui Hospital, Zhejiang, China. The strain was identified by Vitek-60 microorganism autoanalysis system (BioMerieux Corporate, Craponne, France), and then verified by homologous comparisons of 16S rRNA gene of S. caprae SY333 with those of the bacteria available in the nucleotide database of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov). Finally, the average nucleotide identity (ANI) was calculated to confirm the result.
The AxyPrep Bacterial Genomic DNA Miniprep kit (Axygen Scientific, Union City, CA, United States) was used to extract the genomic DNA of S. caprae SY333. Sequencing of the S. caprae SY333 genome was performed by a combination of technologies producing short (HiSeq 2500; Illumina) and long reads (MinION and PacBio RS II; Oxford Nanopore Technologies and Pacific Biosciences). Canu v1.8 (Koren et al., 2017) was used for initial assembly, and a hybrid assembly was subsequently performed using the Unicycler pipeline (Wick et al., 2017), with the contigs generated by Canu and all the sequenced reads (short and long reads) served as an input. The cyclization of final contigs was confirmed through the builtin tools of the unicycler. Other publicly available ECG genome sequences including 24 full genomes, and 44 draft genomes were downloaded from the NCBI public database (Table 1). Genes were predicted and annotated by using Prokka v1.14.0 (Seemann, 2014); furthermore, the predicted proteins were searched against the NCBI non-redundant (NCBI Resource Coordinators, 2016) and Swiss-Prot (UniProt Consortium, 2015) databases (Tatusov et al., 2003) using DIAMOND (Buchfink et al., 2015) with an evalue threshold of 1e-5. Annotation of the resistance genes was performed using ResFinder (Zankari et al., 2012) and Resistance Gene Identifier (RGI) software of Comprehensive Antibiotic Resistance Database version 4.0.3 (https://card.mcmaster.ca/) (McArthur et al., 2013) with an e-value threshold of 1e-10. CrisprCasFinder was employed to identify CRISPR/Cas system (Couvin et al., 2018).

Antimicrobial Susceptibility Testing and Cloning Experiments
The minimum inhibitory concentration (MIC) was detected by agar dilution method recommended by Clinical and Laboratory Standards Institute (CLSI). The result was interpreted following the CLSI breakpoint criteria for Staphylococcus (CLSI, 2019).  Enterococcus faecalis ATCC 29212 and Escherichia coli ATCC 25922 were used as reference strains for quality control. The resistance gene sequences (aadD2 and blaZ) along with their promoter regions were PCR-amplified using the primers 5'-GCTCTAGAGCTTTCTATTATTGCAATGTGGAATTG-3' and 5'-CGGGATCCCGTCAAAATGGTATGCGTTTTGAC ACA-3' for aadD2, and 5'-CGGGATCCCGATTTAGCCATTT TGACACCTTCTTT-3' and 5'-CCAAGCTTGGTTAAAATT CCTTCATTACACTCTTGGCG-3' for blaZ, with each having a pair of flanking restriction endonuclease adapters (Xball and BamHI for aadD2, and BamHI and HindIII for blaZ). The PCR products were then eluted from agarose gel, digested with the corresponding restriction endonucleases, and ligated into the pAM401 and pUCP24 vectors, respectively. The recombinant plasmid (pAM401-aadD2) was transformed into E. faecalis JH2-2 via the calcium chloride method, and the transformants were cultured on brain heart infusion agar plates with chloramphenicol (16 µg/mL). The recombinant plasmid (pUCP24-blaZ) was extracted and further transformed into E. coli DH5α. The cloned resistance gene was verified by restriction endonuclease digestion and sequencing (Shanghai Sunny Biotechnology Co., Ltd., Shanghai, China). All plasmids and strains of this work were shown in Table 2.

Phylogenetic and Phylogenomic Analyses
The ubiquitous, conserved single-copy genes, including atpD, recA, gyrA, gyrB, and ftsZ (Nourdin-Galindo et al., 2017;Chen et al., 2019), from each strain were used for phylogenetic analysis. Bacillus subtilis 168 was used as the outgroup. Initially, the nucleotide sequences of each gene was translated and then concatenated by a custom-derived shell script; Multi-FASTA alignment was performed using MAFFT v7.407 (Katoh and  Standley, 2013), and the resulting alignment was used to infer the phylogeny by the maximum likelihood algorithm (ML) using RAxML version 8.2.12 (Stamatakis, 2014) under the substitution matrix LG, which was selected by ProtTest version 3.4 (Darriba et al., 2011). For further verification, phylogenomic analysis (Comas et al., 2007) was conducted using 764 single-copy orthologous genes existing in all 69 ECG strains as well as Bacillus subtilis 168. These single-copy orthologous genes were identified using Orthofinder version 2.3.8 (Emms and Kelly, 2019). The methods of Multi-FASTA alignment and phylogeny inference were the same as those mentioned above.

Pan-Genome Inference and COG Functional Characterization
The pan-genomes of each ECG species were inferred with Roary version 3.12.0 (Page et al., 2015). The annotation files generated by Prokka were provided to Roary as an input. The gene presence/absence matrix produced by Roary was listed in Supplementary Table 1, and further analysis was based on this file. The gene accumulation curve was produced via ggplot2 (Wickham, 2009) using the results of Roary. COG categorization of each pan-genome was carried out using DIAMOND, and reference gene sequences provided by Roary were searched against the COG database. Only those hits with an e-value < 1e-10, an identity higher than 40% and a coverage higher than 70% were considered significant (Nourdin-Galindo et al., 2017).

Identification of the Virulence Genes in ECG Genomes
BLASTX program was used to search all coding sequences of ECG strains against Virulence Factor Database (VFDB) (http:// www.mgc.ac.cn/VFs/) . Compared with the virulent genes in the database at an e-value < 1e-10, only those query genes with an identity higher than 40% and a coverage higher than 70% were considered as the potential virulence genes (Nourdin-Galindo et al., 2017). Functional annotations were performed based on the categories and subcategories presented in VFDB.

Sequence Analysis of bla Region and isd Locus
Only the complete ECG genome sequences were selected to perform comparative genomic analyses of the bla region and isd locus. Nucleotide sequence alignment and construction of the neighbor-joining phylogenetic tree of the bla operon were performed using the MAFFT program and MEGAX (Kumar et al., 2018) with a bootstrap of 1,000 replicates. Since the bla regions belonging to the same structure are similar to each other, comparative genomics analysis was performed using several representative sequences of these three structures (Figure 2), and the results were visualized via the gggenes package in Rstudio. The near-iron transporter (NEAT) domains and secretion signal of isd proteins were predicted online using InterProScan software by searching against the InterPro database (Hunter et al., 2009). Typical features including a signal peptide (predicted using the SignalP server at www.cbs. dtu.dk/services/SignalP/) at the N terminus, the LPXTG-motif (identified using a custom-derived script written in Python) close to the C terminus followed by a hydrophobic transmembrane segment (predicted using the TMHMM server at www.cbs. dtu.dk/services/TMHMM-2.0/) and several positively charged residues at the C terminus (manually checked) of putative cell-wall-anchored proteins (four Isd proteins and OrfA) in S. caprae and S. capitits (Bowden et al., 2005) were predicted. Phylogenetic analysis of isd NEAT domains was carried out using the method described above. Comparisons of the isd loci were carried out using BLASTN and BLASTP. Other bioinformatics tools were written using Python and Biopython (Cock et al., 2009).
Up to date, there are only 10 genome sequences of S. caprae present in NCBI genome database, of which 6 are incomplete genome sequences. Among the 4 complete genome sequences, 3 (S. caprae JMUB145, JMUB590 and JMUB898) were isolated from human skin without any plasmid, and the other one (S. caprae 26D) was isolated from buffalo milk with a plasmid (carrying blaZ). The genome of S. caprae SY333 is the first one of a clinical S. caprae isolate carrying multiple plasmids, of which 2 are resistance plasmids.

Resistance Genes and Their Functions in S. caprae SY333
A total of 5 drug resistance genes associated with 3 antibiotics classes (β-lactams, aminoglycosides, and macrolides) were identified in the S. caprae SY333 genome, with 2 on the chromosome (norA and mgrA), 2 on pSY333-45 (aadD2 and msrA) and 1 on pSY333-41 (blaZ). Two resistance genes (aadD2 and blaZ) were cloned for functional evaluation. Compared with the control strain (JH2-2/pAM401), the cloned aadD2 gene increased 8-and >16-fold of the MIC levels to tobramycin and kanamycin, respectively. Meanwhile, the blaZ gene exhibited >512-, 128-, >256-, and >1024-fold increase in MIC levels in response to treatment with ampicillin, cephazolin, ceftazidime, and meropenem, respectively, compared with those of the control (DH5α/pUCP24) ( Table 3). The in vitro susceptibility testing of S. caprae SY333 exhibited  resistance to a number of antibiotics, including erythromycin and clarithromycin (macrolides), amikacin and azithromycin (aminoglycosides) and penicillin (β-lactam), according to CLSI breakpoint criteria for Staphylococcus. Moreover, the MIC level of roxithromycin against S. caprae SY333 was 128 µg/mL. Although there was no interpretation criteria of resistance breakpoint for roxithromycin, it was significantly higher than that for erythromycin (>8 µg/mL). Two mechanisms have been reported to confer penicillin resistance in staphylococci (Olsen et al., 2006). The primary mechanism is the production of β-lactamase encoded by blaZ. The second is the expression of PBP2a, a penicillin-binding protein encoded by mecA. In this work, only one mechanism (blaZ) conferring penicillin resistance in the S. caprae SY333 genome was confirmed.

Comparison of bla Region in ECG Species
The bla operon (blaI-blaR1-blaZ) carried by a Tn552-like element in S. caprae SY333 is encoded on the plasmid pSY333-41. According to serotyping and different hydrolyzing substrate profiles, the β-lactamases expressed by blaZ could be divided into four types A, B, C, and D (Pereira et al., 2014). Among them, type B is usually encoded in the chromosome, while the other three types (A, C, and D) are generally encoded on the plasmids (Bagcigil et al., 2012). Phylogenetic analysis of plasmidborne and chromosomally located bla operons (blaI-blaR1-blaZ) in ECG species identified two distinct clades (clade A and clade B, Figure 1). Clade A included a mixture of strains with bla operons encoded on either chromosomes or plasmids, while Clade B only included strains in which the bla operon was encoded on the chromosome (Figure 1).
Comparison of the bla region (the bla operon and its immediate surroundings) showed that there were three distinct structures: (1) the bla operon was associated with a Tn552like element that is commonly located on the plasmid, for example, S. caprae SY333 (plasmid-op) (except for S. epidermidis FDARGOS_529); (2) two tyrosine recombinase genes xerC were located downstream of the blaZ gene, which was only encoded on the chromosome, for example, S. epidermidis ATCC 12228 (chrop); (3) the bla operon had no significantly featured surroundings and was only located on the chromosome, for example, S. caprae JMUB898 (chr-op1) (Figure 2). The β-lactamase generelated transposon Tn552 and its derivatives belong to a group of transposons targeting resolution sites (resL) and are almost  The above results indicated that the bla operon located on the chromosome and the plasmid had followed two distinct evolutionary paths, which has been confirmed by the hypothesis raised by Olsen et al. (2006). The appearance of the Tn552-like element in the chromosomes of S. epidermidis FDARGOS_529 also indicated that the bla operon-related region could be translocated by Tn552 between chromosomes and plasmids. The fact that those chromosome-encoded bla regions clustered together with those plasmid-encoded regions indicated that they may originate from the same ancestors (Figure 1).

Phylogenetic Relationship Among ECG Species
To infer the phylogenetic relationship of the ECG strains, a phylogenetic tree using the five ubiquitously conserved core genes of the 70 strains (including Bacillus subtilis) was constructed. The resulting phylogenetic tree clearly grouped the 69 ECG strains into three distinct clades as expected. S. caprae was phylogenetically closer to S. capitis (Figure 3A). Through phylogenomics analyses of 764 concatenated genes, a tree similar to that reconstructed from five ubiquitously conserved core genes was obtained ( Figure 3B). This inferred phylogenetic relationship among ECG species was highly consistent with a former report in which the phylogenetic relationship was reconstructed using single-nucleotide polymorphisms in 82 Staphylococcus genome sequences, including ECG species (Watanabe et al., 2018).

Pan-Genome Inference and COG Functional Characterization
In order to compare the general genetic similarities and differences within the three species, the core and pan-genome of each species was determined. The result revealed that there were a total of 1,568 core genes, 2,788 accessory genes (genes of accessory genome present in at least two strains) and 2,263 unique genes (genes of accessory genome present in only one strain) among all S. epidermidis strains ( Figure 4A). Similar to S. epidermidis, 1,789 core genes, 1,441 accessory genes, and 946 unique genes, and 2,065 core genes, 780 accessory genes and 1,008 unique genes were predicted among S. capitis strains and S. caprae strains, respectively.
The rare faction curve ( Figure 4B) showed that as genomes were sampled, the genes never observed before are continuously added at a fairly steady rate, causing the pan-genome size to increase, with no sign of getting stable soon. This tended to indicate that the pan-genomes of the three species are "open" (Medini et al., 2005;Diene et al., 2013). In this work, the number and diversity of the S. epidermidis strains were greater than those of both S. capitis and S. caprae. This is the main reason why the pan-genome size of S. epidermidis is larger than the other two species (Figure 4A). To some extent, the pan-genome state ("open" or "close") for an organism partially depended on its capacity of acquiring exogenous DNA (Diene et al., 2013), especially for the species living in bacterial communities, such as those skin inhabitants [coagulase-negative staphylococci (CoNS)]. These species had a high horizontal gene transfer range FIGURE 3 | Phylogenetic and phylogenomics relationships of the housekeeping genes and single-copy orthologous genes of the ECG strains. (A) Phylogenetic relationship of ECG strains, according to the housekeeping genes shared by all complete genomes (i.e., atpD, recA, gyrA, gyrB and ftsZ). (B) Phylogenomics tree generated based on a total of 764 single-copy orthologous genes. Color shading indicates the species to which each stain belongs (blue: S. epidermids; green: S. capitis; orange: S. caprae). More information on strain characteristics is shown in Table 1. and were most likely to have an open pan-genome (Georgiades and Raoult, 2010). Moreover, several systems which prevent horizontal gene transfer (HGT) could also influence the pangenome state, for example, the concomitant identification of CRISPR/Cas, RM and T/AT loci that constitute specialized systems preventing HGT in S. lugdunensis result in a closed pangenome which was in contrast to all other staphylococci studied to date (Argemi et al., 2018). So far, however, these systems have rarely been identified in other CoNS (Argemi et al., 2018). In this study, only 2 (S. capitis CR01 and S. capitis CR03) and 3 strains (S. epidermidis FDAARGOS_153, S. epidermidis M0881 and S. epidermidis RP62A) of S. capitis and S. epidermidis were confirmed to harbor complete CRISPR/Cas system (Type III), while such system was more common in S. lugdunensis. To some extent, these factors led to open pan-genomes of ECG species analyzed in this work.
Functional classification according to COG category showed that, the core genes of the three species were successfully assigned to 23 subcategories which were more than those of the accessory genes and unique genes, suggesting that the core genes of the three species have been intensively studied. The core gene category repartition was highly similar among the 3 species, and Accumulation plots for ECG species pan-(blue) and core-genomes (red). The curves show the relationship among the pan-genomes, the core-genomes, and the number of genomes. As the number of genomes sequenced increased, pan-genomes also increased. (C) COG functional categories from the pan-genomes of ECG strains. COG categories are as follows: For cellular processes and signaling: D, cell cycle control, cell division and chromosome partitioning; M, cell wall/membrane/envelope biogenesis; N, cell motility; O, post-translational modification, protein turnover and chaperones; T, signal transduction mechanisms; U, intracellular trafficking, secretion and vesicular transport; V, defense mechanisms and Z, cytoskeleton. For information storage and processing: B, chromatin structure and dynamics; J, translation, ribosomal structure and biogenesis; K, transcription and L, replication, recombination and repair. For metabolism: C, energy production and conversion; E, amino acid transport and metabolism; F, nucleotide transport and metabolism; G, carbohydrate transport and metabolism; H, coenzyme transport and metabolism; I, lipid transport and metabolism; P, inorganic ion transport and metabolism and Q, secondary metabolite biosynthesis, transport and catabolism. X for mobilome or prophages. R for general function prediction only, and S for unknown function.
was found to be abundant in "Translation, ribosomal structure and biogenesis, " "Amino acid transport and metabolism, " and "Energy production and conversion" (Figure 4C). However, the functions of accessory and unique genes of the three species showed significant diversity which could result in the diverse biological characteristics of ECG species. Compared to the core genomes, the accessory and unique gene pools of the ECG species contained significant abundance of the genes belonging to the class "Mobilome or prophages." These genes often appear to transfer laterally between strains and result in the transmission of virulence and resistance genes between strains, thus influencing bacterial pathogenicity (Jackson et al., 2011).

Virulence Genes in ECG Genomes
To analyze the different pathogenic potential within ECG, a total of 164 virulence genes from 69 ECG genomes were identified, and 57 (57/164, 34.8%) virulence genes coexisted in all ECG genomes. Interestingly, S. caprae carried the highest average number of virulence genes (128), and S. caprae SY333 carried the most virulence genes (134) among all 69 ECG genomes.
The virulence genes could be grouped into 8 categories according to the Virulence Factor Database, including "adherence and invasion factor, " "immune evasion, " "enzyme, " "toxin, " "stress protein, " "metabolic adaptation, " "metal uptake, " and "regulation and other surface component." Remarkably, each species of ECG has its own specific virulence genes (Figure 5, Supplementary Figure 1). For S. caprae, the species-specific virulence genes were involved in "immune evasion" and "metal uptake"; for S. capitis, "metal uptake"; and for S. epidermidis, all categories except for "stress protein." Most species-specific virulence genes in S. caprae belonged to "immune evasion, " especially those biosynthesis genes of capsular polysaccharide (cap locus), while in S. epidermidis, they mostly belonged to "secretion system." In ECG genomes, most of the virulence genes were associated with adherence and immune evasion, indicating their strong ability to colonize their hosts and evade the innate immune system.
The genes involved in "enzyme, " "toxin, " "stress protein, " "metabolic adaptation, " and "regulation and other surface component" were relatively conserved in ECG genomes. Except for two toxin genes (sell and sec), which were only present in two strains of S. epidermidis (S. epidermidis SE90 and S. epidermidis SE95), the other toxin-related virulence factors were present in almost all strains of ECG. For secretion systems, the T6SS gene clpB and the T3SS gene mlr6326 were commonly found in the ECG, while EsaA, EssA, EssB, and EssC, the proteins of T7SS machinery (Jäger et al., 2018), were only present in all strains of S. caprae and some of S. epidermidis. Notably, a secreted serine protease SspA (V8 protease) that degrades fibronectin-binding microbial surface components and recognizes adhesive matrix molecules (MSCRAMMs) to promote invasion (McGavin et al., 1997) was present in all strains of S. caprae and S. epidermidis but not S. capitis. In terms of intercellular aggregation, the ica (intercellular adhesion) locus involved in PNAG/PIA biosynthesis (Otto, 2009), seemed to be present more frequently in S. capitis and S. caprae, while those genes (sdrFGH, ebp, ebh, and atlE) associated with primary attachment appeared to be present almost exclusively in S. epidermidis. Previous studies have reported that most of the S. epidermidis strains carried SdrF, SdrG, and SdrH genes (Bateman et al., 2005). It has been reported that SdrG could necessarily and sufficiently promote in vitro adhesion to fibrinogen of S. epidermidis Conrady et al., 2008) and in vivo central venous catheter (CVC) associated infectious disease (Guo et al., 2007). The enrichment of MSCRAMMs in S. epidermidis indicated that S. epidermidis achieved better in vivo colonization than the other two species. Although the ica locus is commonly present in S. caprae and S. capitis, the lack of MSCRAMMrelated proteins may limit their ability to form biofilms (Otto, 2009). It is worth mentioning that ica and cap loci which protect the bacteria from the important innate host defense mechanisms are commonly present in S. caprae and might lead to better immune evasion (Otto, 2009;Fournier et al., 2013). T7SS, a secretion pathway for the virulence proteins, protects the bacteria from the host defense system and makes them able to survive in abscesses for a long time (Warne et al., 2016). Identification of T7SS in all S. caprae and some S. epidermidis strains indicated that these strains might possess higher capacity for secretion of virulence factors, as well as higher potential for bacterial pathogenesis. A previous study had shown that T7SS played an essential role in keeping integrity and homeostasis of the staphylococcus aureus membrane. This is crucial when the bacterium faces antimicrobial fatty acids (Lopez et al., 2017). Since T7SS is commonly present in S. caprae, it may suggest that T7SS targeted therapeutics decreases the virulence of S. caprae and makes it more susceptible to fatty acids. Similarly, the distribution of virulence factors involved in "enzyme, " "toxin, " "stress protein, " "metabolic adaptation, " and "regulation and other surface component" in ECG genomes indicates that ECG also has many similarities in terms of pathogenicity. Interestingly, in regard to "metabolic adaptation"-related genes, the isd genes have been only reported to be present in the species S. lugdunensis of CoNS (Heilbronner et al., 2011), while strains of S. caprae and S. capitis analyzed in this research also possess a gene cluster similar to those of S. aureus and S. lugdunensis.

Comparative Genomics Analysis of isd Locus
In order to get iron from the host, pathogens have evolved several mechanisms. S. aureus uses the isd system as a fundamental heme-iron uptake pathway. The S. aureus genome encodes an isd system (isdABCDEFGHI) conferring heme uptake and sortase B (SrtB), which is responsible for anchoring its specific substrates (IsdC) to the bacterial cell wall (Grigg et al., 2010; Liang et al., 2016). The cell walls of S. aureus anchored four Isd proteins (IsdA, IsdB, IsdC, and IsdH). These proteins contain 1 to 3 conserved NEAT domains. Each Isd surface gene at least encodes a secretion signal, a cell wall-anchoring motif and a NEAT domain (Grigg et al., 2010;Heilbronner et al., 2016). IsdB (NEAT1) and IsdH (NEAT1 and NEAT2) can bind to hemoglobin and the haptoglobin-hemoglobin complex but not heme via N-terminal NEAT domains, while IsdH NEAT3 and IsdB NEAT2 contain heme-binding NEAT domains that transfer heme to IsdA, IsdC, and then to IsdEF, which is a membranelocating transporter (Liu et al., 2008;Muryoi et al., 2008). IsdG and IsdI, the heme oxygenases in the cytoplasm, are responsible to destroy the porphyrin ring, releasing the free iron (Reniere et al., 2007). To better illustrate the similarities and differences of the isd locus among S. capitis, S. caprae, S. aureus, and S. lugdunensis, we performed a comprehensive comparison of the isd genes within these four species. There are four NEAT proteins in S. caprae (isdA_2, isdC, isdA, and isdL) and S. capitis (isdA_2, isdC, isdA, and isdM), respectively, which contain NEAT domains similar to those of S. aureus and S. lugdunensis (Table 4). Except for isdA_2, the other three NEAT-containing proteins in S. capitis and S. caprae possessed a secretion signal and a putative cell wall-anchoring motif (LPXTG) (Figure 6), which indicated the capacity of cell wall attachment (Cabanes et al., 2002). A comparative genome analysis demonstrated that the isdGEFI and srtB genes were conserved across S. capitis, S. caprae, and S. aureus, while other NEAT domain-containing genes appeared to be less conserved (Figure 6). IsdD, a membrane protein in S. aureus, is absent in S. capitis and S. caprae (Grigg et al., 2010). Phylogenetic analysis of NEAT domains showed that IsdL NEAT2 and NEAT3 and IsdM NEAT were phylogenetically close to IsdB NEAT2, IsdH NEAT3 of S. aureus and IsdB NEAT2 of S. lugdunensis, which were responsible for heme binding. IsdL NEAT1 was clustered together with IsdH NEAT1 and NEAT2, IsdB NEAT1 of S. aureus and IsdB NEAT1 of S. lugdunensis which were responsible for binding to the haptoglobin-hemoglobin complex and to hemoglobin (Figure 7). Multiple sequence alignments of NEAT domains revealed that each putative hemebinding NEAT domain in S. capitis and S. caprae contained an essential YXXXY heme-binding motif ( Figure 8A). The first tyrosine of the heme-binding motif (e.g., Y166 of IsdA) noncovalently binds the central iron atom of heme, and the second tyrosine residue (e.g., Y170 of IsdA) provides a stabilizing hydrogen bond to the first tyrosine (Sheldon and Heinrichs, 2015). Except for IsdL NEAT2 of S. capitis, these putative hemebinding NEAT domains also contained a conserved SXXXXY sequence which forms a 3 10 -helix (α-helix) referred to as the "lip" region ( Figure 8A). Hemoglobin binding by these NEAT domains in S. aureus is mediated by a conserved five amino FIGURE 6 | Comparison of the isd loci from completely sequenced genomes of S. caprae and S. capitis and the complete genomes of S. aureus Newman and S. lugdunensis NCTC12217. Blue links connect orthologous genes, and the color depth indicates the degree of identity. Heme-binding NEAT domains are indicated by yellow boxes, while Hb-binding NEAT domains are indicated by purple boxes. Genes are denoted by large arrows, and heme-or Hb-binding genes are colored in green. Dashed lines indicate genes that are not adjacent. The putative cell wall-anchoring motifs, single sequences and the function of NEAT domains in S. aureus Newman are indicated with arrows. SS, secretion signal; orfA, an unknown function putative cell wall-anchoring protein; ABC, ABC transporter.
acid aromatic motif (YYHFF in IsdH-N1 and FYHYA in IsdH-N2 and IsdB-N1 at positions 164 to 168). Mutation of any one of these key residues severely hinders hemoglobin and/or haptoglobin binding (Sheldon and Heinrichs, 2015). Alignment of the hemoglobin-binding NEAT domains revealed amino acid substitution at position 166 (H166I) and 167 (Y167E) in IsdL NEAT1 which might confer reduction or loss of the function ( Figure 8B). These results indicate that these putative hemebinding NEAT domains in S. capitis and S. caprae have the ability to bind to heme, while whether the putative hemoglobinbinding domain (IsdL NEAT1) in S. capitis is functional needs further validation.
In this study, we haven't found any NEAT domain that might bind to hemoglobin in S. caprae, and the function of the putative hemoglobin-binding domain (IsdL NEAT1) is still questionable. Indeed, not all heme or hemoglobin-binding Isd proteins contain the LPXTG motif, for example, isdX1 (containing one NEAT domain) and isdX2 (containing five NEAT domains) which are located between isdC and isdE in the isd loci of Bacillus cereus group were secreted without a LPXTG motif (Sheldon and Heinrichs, 2015). Unlike S. aureus, whose NEAT proteins acquire heme from hemoglobin directly at the bacterial surface, Bacillus cereus group secretes IsdX1 to capture heme in the extracellular milieu and relies on NEAT-NEAT interactions to deliver the bound heme to the envelope via IsdC (Maresso et al., 2008). Of note, isdA_2 in S. caprae and S. capitis which is located between isdC and isdE also lacks LPXTG motif and that is similar to isdX1 and isdX2 although no obvious sequence identities were observed between isdA_2 and isdX (isdX1 and isdX2). Based on the characteristics of isdA_2, we hypothesized that IsdA_2 could be secreted out of bacteria cell rather than attached to the cell wall and interact with heme in the extracellular milieu. In addition, a putative cell wall-anchored protein OrfA was found in the isd loci of S. capitis and S. caprae, the function of OrfA remains unknown. In summary, the operation mechanism of Isd systems in S. caprae and S. capitis might be slightly different from that of S. aureus and S. lugdunensis.
S. lugdunensis was once considered as a unique species that harbors an iron-regulated surface determinant locus (isd) among coagulase-negative staphylococci. Discovery of the isd locus in S. capitis and S. caprae would provide evidence for their ability to use heme as an iron source during infection.

CONCLUSION
In this work, the complete genome sequence of a clinical S. caprae isolate with two resistance plasmids was reported for the first time. S. caprae SY333 showed resistance to several antibiotics, such as erythromycin, clarithromycin, amikacin, azithromycin and penicillin. Two plasmid-encoded resistance genes (blaZ and aadD2) were confirmed to be functional. The pan-genome analysis of the three ECG species showed that their pan-genomes tend to be "open" and functional annotation revealed that core gene category repartition was highly similar across the 3 species. Analysis of the bla region in ECG revealed that the chromosomeencoded and plasmid-encoded bla operons had two distinct evolutionary paths. Virulence factors in ECG differed mostly in adherence, invasion, immune evasion and secretion system. T7SS may play an important role in pathogenesis of S. caprae and S. epidermidis. Genes related to primary attachment are almost exclusively present in S. epidermidis, while intercellular adhesion-related genes are more frequently present in S. caprae and S. capitis. Identification of the isd locus in S. caprae and S. capitis discouraged previous claims that S. lugdunensis was the only coagulase-negative Staphylococcus species with a locus encoding iron-regulated surface determinant (Isd) proteins, as well as indicated that this two species may have the ability to use heme as the nutrient iron source during infection, which could enhance their pathogenic potential.

ETHICS STATEMENT
This study was approved by the Ethics Committee of the Central Hospital of Lishui City (China) and informed consent was obtained from the patient.

AUTHOR CONTRIBUTIONS
ZS, DZ, XZ, QL, HLin, WL, HLiu, and JL collected the strains and performed the experiments. ZS, XL, TX, and HZ analyzed the experimental results and performed the bioinformatics analysis. ZS, TX, QB, and HZ wrote the manuscript. KL, TX, QB, and HZ designed the experiments. All authors read and approved the final manuscript.