Comparative Genomic Analysis of Streptococcus dysgalactiae subspecies dysgalactiae Isolated From Bovine Mastitis in China

Streptococcus dysgalactiae subsp. dysgalactiae (SDSD) is one of the most prevalent pathogens causing bovine mastitis worldwide. However, there is a lack of comprehensive information regarding genetic diversity, complete profiles of virulence factors (VFs), and antimicrobial resistance (AMR) genes for SDSD associated with bovine mastitis in China. In this study, a total of 674 milk samples, including samples from 509 clinical and 165 subclinical mastitis cases, were collected from 17 herds in 7 provinces in China from November 2016 to June 2019. All SDSD isolates were included in phylogenetic analysis based on 16S rRNA and multi-locus sequence typing (MLST). In addition, whole genome sequencing was performed on 12 representative SDSD isolates to screen for VFs and AMR genes and to define pan-, core and accessory genomes. The prevalence of SDSD from mastitis milk samples was 7.57% (51/674). According to phylogenetic analysis based on 16S rRNA, 51 SDSD isolates were divided into 4 clusters, whereas based on MLST, 51 SDSD isolates were identified as 11 sequence types, including 6 registered STs and 5 novel STs (ST521, ST523, ST526, ST527, ST529) that belonged to 2 distinct clonal complexes (CCs) and 4 singletons. Based on WGS information, 108 VFs genes in 12 isolates were determined in 11 categories. In addition, 23 AMR genes were identified in 11 categories. Pan-, core and accessory genomes were composed of 2,663, 1,633 and 699 genes, respectively. These results provided a comprehensive profiles of SDSD virulence and resistance genes as well as phylogenetic relationships among mastitis associated SDSD in North China.


INTRODUCTION
Streptococcus dysgalactiae is an important bovine mastitis causing pathogen worldwide (Cameron et al., 2016;Gao et al., 2017;de Campos et al., 2021), which could result in economic loss and deteriorated animal welfare (Heikkilä et al., 2018). Streptococcus dysgalactiae consists of 2 subspecies: Streptococcus dysgalactiae subsp. equisimilis (SDSE) and Streptococcus dysgalactiae subsp. dysgalactiae (SDSD). They can be distinguished based on their hemolytic properties [SDSD (a-hemolytic) and SDSE (bhemolytic)] on blood agar (Alves- Barroco et al., 2021). SDSD is classically described as an animal pathogen, causing animal diseases, such as bovine mastitis (Zhang et al., 2018), and mostly results in persistent (sub)clinical mastitis (Botrel et al., 2010). Meanwhile, studies indicate that SDSD isolates can also be isolated from human with breast cancer and infect human primary keratinocyte cells in vitro (Alves- Barroco et al., 2019b;Koh et al., 2020), indicating the potential to infect human. Therefore, SDSD in cow milk should be considered a threat to public health.
The pathogenicity of Streptococcus dysgalactiae attributes to various virulence factors (VFs), which involved in the following categories: adherence, enzyme (including hyaluronidase, mitogenic factor, streptococcal enolase, and streptodornase et al.), immune evasion, immunoreactive antigen, iron uptake, manganese uptake, protease, superantigen, and toxin formation process, based on Streptococcus virulence factors database (VFDB). These virulence genes facilitate the process of adherence to epithelial cells and internalization as well as the subsequent dissemination into host cells (O'Halloran et al., 2016;Alves-Barroco et al., 2019a). Abuse of antimicrobials in mastitis treatment contributes the most to the spread of antimicrobial resistance (AMR) in mastitis associated SDSD isolates (McDougall et al., 2014;Cameron et al., 2016;Zhang et al., 2018). Specifically, the emergence of multiple-drug resistant strains can be a major challenge in the treatment of bovine mastitis and is a growing concern for public health (McDougall et al., 2014;El Garch et al., 2020).
Population structure of SDSD remains unknown, although various genotypes of S. dysgalactiae have been described (Rato et al., 2013;Santos-Sanches et al., 2015). Genetic characteristics of SDSD associated with bovine mastitis, including virulence factors and AMR genes, have been profiled using PCR-based methods (Tian et al., 2019). However, these methods are insufficient to capture the subtle genomic differences among strains. Whole genome sequencing (WGS) with high resolution could be the preferred method to determine strain diversity and to infer phylogenetic relationships as well as to identify virulence and resistance genes in SDSD isolates with subtle differences. Therefore, the objectives of this study were to (1) estimate the apparent prevalence of SDSD in samples from (sub)clinical mastitis; (2) determine genetic diversity and evolution of SDSD isolates by 16s RNA sequencing and whole genome sequencing analysis; (3) identify VFs and AMR genes in those SDSD isolates.

Statement of Ethics
This study was conducted in accordance with ethical guidelines and standard biosecurity and institutional safety procedures of China Agricultural University (CAU; Beijing, China). Prior to the start of the study, ethical approval was granted by the Departmental Committee of the College of Veterinary Medicine, CAU.

Milk Sample Collection and Streptococcus dysgalactiae subsp. dysgalactiae Identification
A total of 674 milk samples, including 509 clinical mastitis (CM) and 165 subclinical mastitis (SCM) milk samples, were obtained from 17 large dairy farms (each had > 1,000 lactating cows) in 7 provinces (Tianjin = 19 samples; Shanxi = 15 samples; Hebei = 189 samples; Heilongjiang = 87 samples; Inner Mongolia = 95 samples; Shandong = 259 samples; and Shaanxi = 10 samples) in China from November 2016 to June 2019. The CM samples, including changes in the milk (e.g., clots, discoloration, flakes, and wateriness) and either with or without visible abnormalities of the udder and/or systemic symptoms (e.g., red, swollen, firm or painful udder, or fever), were collected from a single quarter with abnormalities. The SCM samples were defined as intramammary infection without clinical symptoms and could be detected by California mastitis test and somatic cell counts (Godden et al., 2017). All milk samples were collected aseptically, stored in an ice box and subsequently transported to Mastitis Reference Laboratory at the College of Veterinary Medicine, CAU, Beijing, China.
Pathogens were identified according to bacteriological culture, colony morphology, gram staining, biochemical tests and 16S rRNA sequencing (NMC, 2017). In short, an aliquot (50 µl) of milk sample was spread on a tryptone soy agar (TSA; Oxoid, Basingstoke, United Kingdom) with 5% defibrinated sheep blood (Land Bridge Technology, Beijing, China) and incubated aerobically at 37 • C in a humidified condition for 24 h. Bacterial colony morphology was recorded and samples that yielded ≥ 3 morphologically distinct colonies were considered as contaminated and were excluded from the subsequent analysis. Gram staining and catalase testing were conducted to discriminate Staphylococci (gram positive, catalase-positive) and Streptococci-enterococci group (gram-positive, catalase-negative). Esculin test was used to differentiate esculin-positive cocci and Streptococci (esculin-negative). Streptococci were identified at subspecies level using 16S rRNA gene sequencing (Beijing Sunbiotech Inc., Beijing, China) and BLAST with sequences deposited in National Center for Biotechnology Information (NCBI). A total of 51 isolates were identified as SDSD and stored in Brain Heart infusion broth (BHI; Aobox, China) containing 25% glycerol and stored at −80 • C for subsequent analysis.

DNA Extraction
Genomic DNA was extracted with a bacterial genomic DNA extraction kit (CoWin Biosciences, Beijing, China) for each bacterial isolate according to manufacturer's instructions for Gram-positive bacteria. Additional RNase A (4 µl, 100 mg/ml) was used to exclude RNA. The extracted DNA was quantified with a NanoDrop One spectrophotometer (Thermo Fisher Scientific, Waltham, MA).

16S rRNA Phylogenetic Analysis
The 16S rRNA gene sequences of a total of 51 SDSD isolates, along with sequence from the selected reference strain ATCC 43078 (Accession no. NR_115275 REGION: 11.767) from GenBank, were edited and aligned using ClustalW multiple sequence alignment algorithm and then phylogenetic tree was constructed via Maximum Likelihood (Tamura-Nei model) with MEGA-X v10.1.8 (Tamura and Nei, 1993). Confidence values for each branch of the phylogenetic tree was estimated using bootstrapping with 1,000 resamplings. Loci with < 95% site coverage, including fewer than 5% alignment gaps, missing data and ambiguous bases at any positions (partial deletion option) (Kumar et al., 2018), were eliminated. The phylogenetic tree was visualized by the iTOL web server 1 (Letunic and Bork, 2019).

Multi-Locus Sequence Typing
MLST was applied to determine sequence types (STs) of the SDSD isolates based on 7 housekeeping genes, namely gki, gt, murI, mutS, recP, xpt, and atoB. Sequence at each locus was assigned with an allele number, and the corresponding combination of the 7 allele numbers for each isolate was submitted to the PubMLST database 2 to obtain the ST of the isolate. In this study, a clonal complex (CC) was defined as a group of STs in which every ST shared at least 5 of 7 identical allele profiles with at least 1 other ST in the group. The minimum spanning tree (MST) was constructed by the geoBURST algorithm and visualized by the PhyloViz web server 3 to infer phylogenetic relationships among STs of original isolates.

Genome Assembly and Annotation
A total of 10 SDSD isolates (of 3 dominant STs) from clinical mastitis cases and 2 from subclinical mastitis cases were randomly selected for whole genome sequencing. The DNA samples with concentration ≥ 40 µg/mL and purity indices A260/280 ≥ 1.8, A260/230 ≥ 2.0 were submitted for whole genome sequencing with an Illumina HiSeq 4000 system (Illumina, San Diego, CA, United States) at the Beijing Genomics Institute (Shenzhen, China). Genomic DNA was sheared randomly to construct 3 read libraries with lengths of 350 bp by a Bioruptor ultrasonicator (Diagenode, Denville, NJ, United States) and physicochemical methods. The paired-end fragment libraries were sequenced according to the Illumina HiSeq 4000 protocol. Raw reads of low quality from pairedend sequencing (those with consecutive bases covered by fewer than 5 reads) were discarded. Subsequently, the reads were assembled using SOAPdenovo Version 1.05. 4 The total number of reads, sequences, and contigs, genome size, N50 and total length were obtained. Gene prediction was performed on the SDSD genomes assembly by glimmer3 5 with Hidden Markov models. tRNA, rRNA and sRNAs were identified by tRNAscan-SE, RNAmmer, and the Rfam database, respectively. The tandem repeats annotation was obtained using the Tandem Repeat Finder. 6 Prophage regions were predicted using the PHAge Search Tool Enhanced Release (PHASTER) web server 7 (Arndt et al., 2016).

Identification of Virulence-Associated Genes and Antimicrobial Resistance Genes
Virulence-associated genes were identified based on the core dataset in VFDB Version 2019-07 (Liu et al., 2019). Antimicrobial resistance genes were identified by comparing the SDSD genomes Comprehensive Antibiotic Research Database (CARD) Version V6. Amino acid sequences of predicted genes were aligned against the proteins in these databases using blastp. A gene was assigned to a virulence or antimicrobial resistance protein by the highest score hit containing a minimum identity of 40%. The E-value 5.3E-08 was the highest among those E-values. Pseudo genes were excluded from the virulence and AMR genes. Location of the AMR genes were retrieved from MobileElementFinder 8 and ISFinder 9 and PHASTER (see text footnote 7).

Pan-Genome Analysis
The pan-genome of 12 SDSD isolates was computed with Bacterial Pan Genome Analysis tool (BPGA) Version 1.3, using a USEARCH algorithm to cluster orthologous gene families (Chaudhari et al., 2016). Core genes were defined  as the genes that exist in all the genomes, accessory genes were defined as genes present in ≥ 1 genomes but not all the genomes, while unique genes were defined as the genes only found in a single genome, according to BPGA analysis. Functional annotations of core, accessory, and unique genes were obtained after comparing sequences to those present in the clusters of orthologous groups (COGs) of proteins and Kyoto encyclopedia of genes and genomes (KEGG) databases. The phylogenetic tree was constructed based on core genes of 12 SDSD genome sequences, combined with 12 Streptococci genome sequences obtained from the National Center for Biotechnology Information (NCBI) (details in Supplementary

Statistical Analyses
Test for proportions was applied to compare the proportion of the genes fall into the functional categories among core, accessory and unique genome using SPSS 23.0 software (SPSS Inc., Chicago, IL, United States) and significance was considered at p < 0.05 in a two-tailed test.

Isolates
Detailed information of SDSD isolate is shown in Tables 1, 2.
Frontiers in Microbiology | www.frontiersin.org Inner Mongolia and Hebei). SDSD 62 was the isolate that most closely related to the reference genome ATCC 43078.

Multi-Locus Sequence Typing and Minimum-Spanning Tree
A total of 11 distinct STs were identified in the 51 SDSD isolates from 7 provinces in China ( Table 2 and Figure 1), 5 of which were novel, namely ST521, ST523, ST526, ST527, and ST529. Of these STs, ST521 (15 isolates from 7 herds in 2 provinces) was the most predominant one, followed by ST453 (11 isolates from 6 herds in 5 provinces) and ST523 (5 isolates from 3 herds in 3 provinces). The phylogenetic analysis and MST (Figure 2) involving 445 STs were performed. The 11 STs were grouped into 2 CCs and 4 singletons. In CC1, ST453 was the main evolutionary starter and 4 STs were involved, whereas ST461 was the main evolutionary starter in CC2.
Regarding enzyme-related genes, dltA, eno, glnA1, hylB, luxS, lplA1, lgt, mf, relA, sak, stp, sodB, and sda were present in all FIGURE 2 | Minimum spanning tree (MST) of MLST for Streptococcus dysgalactiae (including Streptococcus dysgalactiae subsp. equisimilis and subsp. dysgalactiae) involving 445 STs in MLST database performed by geoBURST algorithm and visualized by PhyloViz online. The STs identified in this study were in various colors, whereas remaining STs in the MLST database were gray. Cloning complexes (CCs) were defined as groups of similar isolates, each isolate having at least 5 of 7 identical alleles with at least 1 other ST within the group. ST453, ST305, ST523, ST454 and ST529 were grouped into CC1, and ST453 was the main evolutionary starter. ST461, ST521 and ST527 were grouped into CC2, whereas ST460, ST298, and ST526 were considered 3 singletons. SDSD isolates. There were 27 immune evasion-related genes, including 5 capsular genes (rgpA to G, except rgpC) and another 21 immune evasion-related genes were detected in all isolates. Among the 7 protease-related genes present in all STs, clpE had uniform distribution: isolates of ST453 had 3 clpE copies, whereas isolates of other STs only had 2. Notably, all isolates carried 4 cytolysin regulator gene cylR2 copies, which belong to the toxin-related genes category.

Identification of Antimicrobial Resistance Genes
A total of 23 AMR genes in the genomes of 12 SDSD isolates were identified, the occurrence and distribution of AMR genes are presented in Figures 3B, 5, respectively. Based on gene number, 23 AMR genes varied from 0 to 3 copies in the 12 SDSD isolates. Two isolates of ST453, namely SDSD20 and SDSD37, both had the most abundant 21 AMR genes, belonging to 9 distinct classes of antimicrobials (aminoglycoside, tetracycline, vancomycin, bacitracin, fluoroquinolone, lincomycin, nucleoside, peptide, and β-lactamase). Moreover, all 12 isolates harbored 1 lmrp gene which encodes a multidrug resistance efflux pump. We found a total of 4 AMR genes (from 4 isolates in ST453 and ST521) were located at mobile genetic elements, most of which were located in insertion sequences [ANT6-IA, SAT-4 and APH(3 )-IIIA] and one in transposon [APH(3 )-Ia]. Meanwhile, two integrative conjugative elements (ICEs) ICETn6009 were identified in SDSD20 and SDSD37 isolates. However, there was no AMR gene carried by ICETn6009 and prophage regions.

Pan-Genome Analysis
The pan-genome of the 12 SDSD contained 2,663 genes. The core genome (shared by all S. dysgalactiae isolates) consisted of 1,633 genes. The accessory genome comprised 699 genes, and the unique genome included 293 genes. The number of core genes was fairly constant at ∼1,600 genes, whereas the size of genes in the pan-genome continued to increase as the number of strains increased (Figure 6). Functional annotation of genes in the pan-genome revealed the distribution of functional categories among 3 pan-genome sets (Figure 7). Among which, metabolism was identified as the most abundant functional category in the core genes. The overall percentage of metabolic functions in the core genes was 38.6%, whereas that in the accessory and unique genes were 26.5 and 9.7%, respectively. Metabolism was almost 1.5 and 4 times more enhanced in the core genes compared to accessory and unique genes ( Figure 7A). The functional category of information storage and processing had a higher proportion in unique genes than in core and accessory genes. The functions of transcription and replication, recombination and repair were enhanced in unique genes, whereas the functions of translation, ribosomal structure and biogenesis were enhanced in core genes ( Figure 7B). According to blasting with KEGG databases, this distinct distribution was observed ( Figure 7C). The functional category of metabolism was the most abundant in three pan-genome sets components, with the overall proportion of 65.7, 60.0 and 53.1% in core, accessory and unique genes, respectively. Notably, the function of carbohydrate metabolism was enhanced in accessory genes rather than in core or unique genes. Meanwhile, the functional category of genetic information processing was significantly enhanced in unique genes than in core or accessory genes. Interestingly, the functions of replication and repair were enhanced in unique genes, whereas, the functions of folding, sorting and degradation, transcription, and translation were enhanced in core genes ( Figure 7D).
FIGURE 4 | Heatmap of virulence factor gene number distribution in 12 Streptococcus dysgalactiae subsp. dysgalactiae (SDSD) isolates. A total of 108 virulence-associated genes were determined by Virulence Factors of Database (VFDB) in all 12 SDSD strains, and the copy number, ranging from 0 to 4 were indicated by yellow to red. Eleven main virulence categories were marked with colored bars along the Y-axis. The phylogenetic tree based on the presence of virulence genes of the isolates was presented as a cladogram in the panel along the X-axis.
A phylogenetic tree was constructed based on the core genes of 24 Streptococci genomes, including 12 SDSD genomes obtained from the Illumina sequencing and 12 obtained from NCBI (Figure 8). The phylogenetic tree revealed that the 24 Streptococci were divided into 3 phylogenetic groups. Group A only contained S. pneumoniae CGSP14, and group C included S. dysgalactiae strains isolated from human, fish and swine. All SDSD isolates from cattle belonged to group B. Isolates of ST521 were more closely related to SDSD NCTC4670 collected from humans, whereas isolates with ST453, ST523 and ST526 were more closely related to SDSD ATCC27957 and NCTC13731 (both were isolated from dairy cows).

FIGURE 5 | Heatmap of antimicrobial resistance (AMR) gene distribution in 12
Streptococcus dysgalactiae subsp. dysgalactiae (SDSD) isolates. A total of 23 AMR genes were identified by CARD (Comprehensive Antibiotic Research Database) in all 12 SDSD strains. The copy number, ranging from 0 to 3, was indicated by yellow to red. Ten main AMR categories were marked with colored bars along the Y-axis. The phylogenetic tree based on the presence of AMR genes in the isolates was presented as a cladogram in the panel along the X-axis.

DISCUSSION
Several studies have described the epidemiology of Streptococcus dysgalactiae in dairy herds (Cameron et al., 2016;Horpiencharoen et al., 2019). In addition, studies using PCRbased techniques to identify virulence and AMR genes in bovine S. dysgalactiae isolates have been published (Fernandez et al., 2013;Horpiencharoen et al., 2019). However, no in-depth studies have investigated the population structure, profiles of virulence and AMR genes as well as the phylogenetic relationships of bovine mastitis derived SDSD isolates. Therefore, we conducted phylogenetic analysis based on 16S rRNA and MLST of 51 SDSD isolates obtained from 17 dairy herds in 7 provinces of China. In addition, we also conducted WGS of 12 representative SDSD isolates to determine the distribution of virulence and antimicrobial resistance genes. The core genomes were more related to metabolism, and the unique genomes were more relevant to genetic replication and repair according to functional pan-genomes analysis. Consequently, our study suggests that core genes of metabolism can be used as a tool to identify Streptococcus species in the microbiome. The phylogenetic tree based on the core genome of 24 Streptococcus revealed that human S. dysgalactiae isolates have closer genetic relationship with fish and swine S. dysgalactiae isolates than bovine SDSD isolates, which is consistent with Alves- Barroco et al. (2021).
Understanding the phylogenetic relationships among strains is important to determine the transmission of pathogens. The phylogenetic tree based on 16S rRNA sequences revealed 4 phylogenetic clusters. Cluster A-D contained 15, 15, 10, and 11 isolates derived from 3, 5, 5, and 3 provinces, respectively. Some identical STs (ST521, ST453, ST523, ST454, and ST527) were identified in multiple herds in this study, whereas the other STs were only detected in single herds. That identified STs were in different herds, which is consistent with a previous study (Lundberg et al., 2014). Furthermore, movement of infected animals likely promotes spreading of pathogens between herds (Widgren and Frössling, 2010).
Using the whole genome sequences of 12 representative SDSD genomes, we observed differences in the number of predicted prophage regions in 12 SDSD genomes among different strains and STs. A total of 35 prophage regions were detected in 12 SDSD genomes. The average number of prophages per genome was 2.9-the highest number of prophages was presented in ST453 (5.3) and the lowest was 1.3 in ST521. These values suggest that the number of prophage regions in SDSD could possibly be associated with ST/CC, which is in line with Lichvariková et al. (2020).
Adherence-related genes could facilitate adhesion and biofilm formation, which are important factors in Streptoccoci pathogenesis (Widgren and Frössling, 2010). Adhesion is the first step in biofilm formation or invasion into host cells, promoting survival of microorganisms in infected tissues and facilitating development of mastitis (Dahesh et al., 2012;Alves-Barroco et al., 2019a). Eleven adhesion-related genes [abpB (Huang et al., 2015), fbpA (Bao et al., 2014), fnBA (Yeswanth et al., 2017), shr (Dahesh et al., 2012), slrA (Bober et al., 2011), cbpD (Roig-Molina et al., 2020, fnz2 (Yi et al., 2013), fneB (Lannergård et al., 2005), lmb (Zhang et al., 2014), seM (Timoney et al., 2010), srr-1 (Sheen et al., 2011)] encode a number of surface proteins. These surface proteins are identified as important virulence factors that involve bacterial adhesion to the epithelium of the host cell mediated by microbial surface components recognizing adhesive matrix molecules, consequently contributing to host cell attachment and tissue colonization. Lmb was the most abundant adherence related gene in the 12 SDSD isolates. In previous studies, the lmb gene had a relatively high prevalence in Streptococcus uberis isolated from cattle (Fessia et al., 2019), but was less prevalent in bovine-associated Streptococcus dysgalactiae and Streptococcus agalactiae (Ding et al., 2016;Tian et al., 2019). Three biofilmrelated genes (srtA, aspA, and emm) were identified in 12 SDSD genomes. Biofilms are communities of microorganisms attached to a surface and are involved in chronic and recurrent infections in animals and humans (Veerachamy et al., 2014). Compared to planktonic bacteria, those with biofilms are more resistant to antibiotics (Rabin et al., 2015). Therefore, these genes FIGURE 6 | Pan-genome of 12 Streptococcus dysgalactiae subsp. dysgalactiae (SDSD) isolates. The pan-genome of 12 SDSD tested in this study had 2,663 genes; the size of the genome in the pan-genome continued to increase as the number of strains increased. However, the number of core genomes (shared by 100% of SDSD isolates) was fairly constant at ∼1,600 genes.
FIGURE 7 | Differential distribution of COG and KEGG functional categories in core, accessory and unique genes: (A) Proportion of 4 classes of COG functional categories in core, accessory and unique genes. (B) COG functional sub-categories in core, accessory and unique genes. (C) Proportion of 6 classes of KEGG functional categories in core, accessory and unique genes. (D) KEGG functional sub-categories in core, accessory and unique genes. D, Cell cycle control, cell division, and chromosome partitioning; M, Cell wall/membrane/envelope biogenesis; N, Cell motility; O, Post-translational modification, protein turnover, and chaperones; T, Signal transduction mechanisms; U, Intracellular trafficking, secretion, and vesicular transport; V, Defense mechanisms; J, Translation, ribosomal structure and biogenesis; K, Transcription; L, Replication, recombination and repair; C, Energy production and conversion; G, Carbohydrate transport and metabolism; E, Amino acid transport and metabolism; F, Nucleotide transport and metabolism; H, Coenzyme transport and metabolism; I, Lipid transport and metabolism; Q, Secondary metabolites biosynthesis, transport, and catabolism; P, Inorganic ion transport and metabolism; R, General function prediction only; S, Function unknown. AA, Cell growth and death; AB, Cell motility; AC, Cellular community; AD, Transport and catabolism; BA, Membrane transport; BB, Signal transduction; BC, Signaling molecules and interaction; CA, Folding, sorting and degradation; CB, Replication and repair; CC, Transcription; CD, Translation; DA, Cancers; DB, Cardiovascular diseases; DC, Drug resistance; DD, Endocrine and metabolic diseases; DE, Immune diseases; DF, Infectious diseases; DG, Neurodegenerative diseases; DH, Substance dependence; EA, Amino acid metabolism; EB, Biosynthesis of other secondary metabolites; EC, Carbohydrate metabolism; ED, Energy metabolism; EE, Glycan biosynthesis and metabolism; EF, Lipid metabolism; EG, Metabolism of cofactors and vitamins; EH, Metabolism of other amino acids; EI, Metabolism of terpenoids and polyketides; EJ, Nucleotide metabolism; EK, overview; EL, Xenobiotics biodegradation and metabolism; FA, Circulatory system; FB, Development; FC, Digestive system; FD, Endocrine system; FE, Environmental adaptation; FF, Excretory system; FG, Immune system; FH, Nervous system; FI, Sensory system. FIGURE 8 | Phylogenetic tree, based on the core genes of 24 Streptococcus genomes, including 12 SDSD genomes obtained from the Illumina sequencing, and 12 Streptococcus genomes obtained from NCBI. Scale bar indicates the base substitution per site. The phylogenetic tree revealed that the 24 Streptococcus genomes were divided into 3 groups. Group A only contained S. pneumoniae CGSP14. Group B harbored all SDSD isolates collected from bovine, including 12 SDSD isolates in this study and 2 SDSD genomes (NCTC13731 and ATCC27957) obtained from NCBI. Group C consisted of S. dysgalactiae strains obtained from humans (DB31752-13, FDAARGOS_654, DB49998-05 and DB60705-15), fish (kdys0611 and STREP97-15) and swine (NCTC6403). may contribute to the persistent infection induced by SDSD (Kaczorek et al., 2017).
A total of 23 AMR genes were identified with various copy numbers in 12 SDSD isolates according to the annotation of CARD. According to previous surveys, 5 categories of antimicrobials, including β-lactams (penicillin G, cefalexin, and ceftriaxone), macrolide (erythromycin), aminoglycoside (kanamycin and streptomycin), and tetracyclines, are frequently used in treatment of bovine mastitis on Chinese dairy farms (Shi et al., 2010;Zhang et al., 2018). The emergence of AMR genes are mainly through spontaneous mutations and acquisition of AMR genes from other bacteria (MacLean and Millan, 2019). The high prevalence and breadth diversity of AMR genes, which encode proteins conferring resistance to the 5 categories of antibiotics above, might be attributed to abuse of antimicrobials in Chinese dairy herds. In this study, only one integrative conjugative element ICETn6009 was detected in SDSD20 and SDSD37, both of which belonged to ST453 and harbored the most abundant AMR genes. ICETn6009, a Tn916 based element with Grampositive mer operon and directly linked to the tet(M) gene, was identified from two Gram-positive and three Gram-negative genera, including Streptococcus (Soge et al., 2008). Meanwhile, 4 AMR genes were located in insertion sequences from 4 isolates in ST453 and ST521 and one in transposon from SDSD 34 in ST521. This may indicate isolates of these two STs were more capable in acquisition of antimicrobial resistance genes from MGEs (Oppegaard et al., 2020).

CONCLUSION
To the best of our knowledge, this is the first report on characterization of Streptococcus dysgalactiae subsp. dysgalactiae isolated from mastitis cows in China using whole-genome sequencing. The apparent prevalence of SDSD was estimated at 7.6%. Eleven sequence types (ST298, ST305, ST453, ST454, ST460, ST461, ST521, ST523, ST526, ST527, and ST529) were determined according to MLST. A total of 108 VFs in 11 categories (adherence, enzyme, immune evasion, immune reactive antigen, iron and manganese uptake, protease, peptidase, superantigen, toxin, and others) and 23 AMR genes in 11 categories (aminoglycoside, tetracycline, vancomycin, bacitracin, fluoroquinolone, lincomycin, nucleoside, peptide, macrolide, β-lactams antimicrobials, and multidrug) were identified according to whole genome sequence analysis. These results can be used to study bovine-associated SDSD virulence and resistance genotypes, as well as to better understand the phylogenetic relationships among mastitis-derived SDSD in Northern China.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

ETHICS STATEMENT
This study was conducted in accordance with ethical guidelines and standard biosecurity and institutional safety procedures of China Agricultural University (CAU; Beijing, China). Prior to the start of the study, ethical approval was granted by the Departmental Committee of the College of Veterinary Medicine, CAU.