New Insights Into the Threshold Values of Multi-Locus Sequence Analysis, Average Nucleotide Identity and Digital DNA–DNA Hybridization in Delineating Streptomyces Species

Multi-locus sequence analysis (MLSA) has been proved to be a useful method for Streptomyces identification and MLSA distance of 0.007 is considered as the boundary value. However, we found that MLSA distance of 0.007 might be insufficient to act as a threshold according to the correlations among average nucleotide identity based on MuMmer ultra-rapid aligning tool (ANIm), digital DNA–DNA hybridization (dDDH) and MLSA from the 80 pairs of Streptomyces species; in addition, a 70% dDDH value did not correspond to a 95∼96% ANIm value but approximately to 96.7% in the genus Streptomyces. Based on our analysis, it was proposed that when the MLSA distance value between a novel Streptomyces and a reference strain was < 0.008, the novel strain could be considered as a heterotypic synonym of the reference strain; when the MLSA distance value was ≥ 0.014, the novel strain could be regarded as a new Streptomyces species; when the MLSA distance value was between 0.008 and 0.014 (not included), the dDDH or ANIm value between a new strain and a reference strain must be calculated in order to determine the taxonomic status of a novel strain. In this context, a 70% dDDH or 96.7% ANIm value could act as the threshold value in delineating Streptomyces species, but if the dDDH or ANIm value was less than but close to 70 or 96.7% cut-off point, the taxonomic status of a novel strain could only be determined by a combination of phenotypic characteristics, chemotaxonomic characteristics and phylogenomic analysis.


INTRODUCTION
In current prokaryote systematics, the classification of Bacteria and Archaea is based on a polyphasic taxonomic approach, comprised of phenotypic, chemotaxonomic and genotypic data, as well as phylogenetic information (Schleifer, 2009). Of these, the classical DNA-DNA hybridization (DDH) technology plays a key role in novel species identification. Although DDH has been the "gold standard" for bacterial species demarcation over the last 50 years, its procedures are known to be labor-intensive, error-prone and do not allow the generation of cumulative databases. Thus, there has been an urgent need for an alternative genotype-based standard (Stackebrandt et al., 2002;Gevers et al., 2005). With the rapid progress in the area of genome sequencing technology, many efforts have been made to develop a bioinformatic method to replace classical DDH for differentiating species. These efforts were mainly focused on devising values analogous to DDH values, such as genome BLAST distance phylogeny (GBDP) (Henz et al., 2005), average nucleotide identity (ANI) (Konstantinidis et al., 2006), maximal unique matches index (MUMi) (Deloger et al., 2009) and digital DNA-DNA hybridization (dDDH) (Auch et al., 2010). At present, ANI or dDDH has been most widely used as a gold standard for species delineation. Unfortunately, over the last two decades, even though a lot of efforts have been made to obtain genome data for prokaryotic organism, only approximately 2.1% of the global prokaryotic taxa are represented by sequenced genomes (Zhang et al., 2020). As far as Streptomyces species are concerned, genome data of about 30% type species with validly published and correct names are still unavailable at the time of writing this article 1 . In contrast, the nearly entire database of 16S rRNA gene sequences is available for the type strains of the genus Streptomyces. Nevertheless, when sequence similarity of 16S rRNA gene between two strains is over 97% (Stackebrandt et al., 2002;Tindall et al., 2010), it is hard to differentiate two species using 16S rRNA gene sequences alone. Therefore, in the modern classification of Streptomyces, 16S rRNA gene sequence similarity, ANI and dDDH values are usually used in combination to assess phylogenetic position of a novel species, and only species exhibiting ≥ 98.7% 16S rRNA gene sequence similarity are required to calculate ANI or dDDH values (Konstantinidis and Tiedje, 2005;Stackebrandt and Ebers, 2006;Meier-Kolthoff et al., 2013). If genome sequence data of the type strains with ≥ 98.7% 16S rRNA gene sequence similarity are unavailable, it is recommended to obtain their genome sequences, not only to measure ANI and dDDH but also to extend and improve the public genome database for taxonomic purposes (Chun et al., 2018). However, even though the whole-genome sequencing is accessible to most of the microbial taxonomists at the present, it is still time-consuming and costly. Thus, it is of great significance for many microbial taxonomists to find out an alternative to ANI or dDDH. In contrast to ANI or dDDH, multilocus sequence analysis (MLSA) based on housekeeping genes is a simple and low-cost approach and has been proved to be a useful method for identification of Streptomyces species (Guo et al., 2008;Rong et al., 2009Huang, 2010, 2012;Labeda et al., 2017). In an early comparative study between DDH and MLSA, Rong and Huang proposed that the MLSA evolutionary distance of 0.007 could act as the threshold value in delineating Streptomyces species . However, our recent findings are somewhat different from their conclusion. In addition, we also found that the 70% dDDH value was not equivalent to the 95∼96% ANI value in the genus Streptomyces. In the present work, new insights into the threshold values of MLSA, ANI and dDDH in delineating Streptomyces species were provided based on the correlation among ANI, dDDH and MLSA from 80 pairs of Streptomyces species (including heterotypic synonyms).

Correlation Among ANIm, dDDH and MLSA
Given that ANIm (average nucleotide identity based on MuMmer ultra-rapid aligning tool) provides more credible results when the pair of genomes compared share a high degree of similarity (ANI > 90%) (Richter and Rosselló-Móra, 2009), the ANIm value rather than the ANIb (ANI based on the BLAST algorithm) value was selected for comparative analysis in the current work. The calculations of ANIm and dDDH values were performed by using the JSpeciesWS online service (Richter et al., 2015) and the Genome-to-Genome Distance Calculator (Meier-Kolthoff et al., 2013), respectively. For calculating dDDH value, Formula 2 was used. The sequences of five protein-coding genes (atpD, gyrB, recA, rpoB, and trpB) were directly drawn from draft genome sequences. After trimmed manually using methods of Rong and Huang (2012), five gene sequences were concatenated head-totail in-frame in the order of atpD, gyrB, recA, rpoB, and trpB. The MLSA evolutionary distances between a set of type strains were calculated according to Kimura's two-parameter model (Kimura, 1980). The datasets for coherence analysis among ANIm, dDDH and MLSA were processed by Origin Pro 9.0. Coefficients of determination (R 2 ) among ANIm, dDDH and MLSA were calculated by exponential regression analysis.

Phenotypic Characterization
The cultural characteristics of ten tested strains, i.e., S. albidoflavus   S. osmaniensis JCM 17656 T , were evaluated on ISP serial agar media (Shirling and Gottlieb, 1966) following incubation at 28 • C for 14 days. The colors of colonies and soluble pigments were determined according to the Color Standards and Color Nomenclature (Ridgway, 1912). A range of physiological and biochemical tests were carried out according to Li et al.'s methods Li et al. (2020). Tolerance to different temperatures (4,10,15,20,25,28,30,37,40, and 45 • C) was tested on ISP2 agar for 14 days. Enzyme-activity tests were carried out using API-ZYM test system (France) according to the manufacturer's instructions. Other physiological characteristics including starch hydrolysis, gelatin liquefaction, milk coagulation and peptization, melanin production, Tweens (20, 40, 60, and 80) degradation, H 2 S production and nitrate reduction were performed according to the methods described by Xu et al. (2007). All these experiments were carried out in triplicate, and all these strains were grown under the same conditions for parallel comparison.

Chemotaxonomic Characterization
Cells were collected for chemotaxonomic analysis by centrifugation from five strains cultured at 28 • C in TSB medium for 7 days on a rotary shaker and then washed twice with distilled water. The diaminopimelic acid (DAP) isomer and whole-cell sugar compositions were analyzed using TLC according to the procedures described by Lechevalier and Lechevalier (1970) and Hasegawa et al. (1983). Cellular fatty acids analysis was carried out by China Center of Industrial Culture Collection (CICC) according to the protocol of the Sherlock Microbial Identification system [MIDI system, version 6.0B, MIDI (2005)]. Menaquinones were extracted according to Collins et al. (1977) and analyzed by HPLC (Wu et al., 1989). The polar lipids were extracted and identified by the method of Kates (1986).

Phylogenomic Analysis
The genome sequences of five pairs of Streptomyces and relevant reference strains for phylogenomic analysis were retrieved from NCBI database. Phylogenomic analysis was carried out using the Type (Strain) Genome Server (Meier-Kolthoff and Göker, 2019). A phylogenetic tree was inferred with FastME (Lefort et al., 2015) from the Genome BLAST Distance Phylogeny (GBDP) distances calculated from genome sequences.

RESULTS AND DISCUSSION
There is no doubt that MLSA plays an extremely important role in identifying Streptomyces species (Labeda et al., 2017). However, recently, during identifying a novel strain of endophytic Frontiers in Microbiology | www.frontiersin.org Streptomyces from a medicinal plant, we found that the MLSA evolutionary distance between S. stelliscabiei DSM 41803 T and S. bottropensis ATCC 25435 T was 0.011 (greater than the 0.007 critical point proposed for delineating Streptomyces species), suggesting that they should belong to different genomic species (Rong and Huang, 2012). This result was contradictory to Madhaiyan et al.'s (2020) conclusion that S. stelliscabiei is a later heterotypic synonym of S. bottropensis based on comparative genomic analysis. Is this case an exceptional one? To answer this question, firstly, we calculated ANIm values among the majority of validly published Streptomyces species whose genomes are available. Then, all strain pairs, whose ANIm values are greater than or equal to 90%, were collected for subsequent analysis. Finally, MLSA, dDDH and ANIm values of a total of 80 pairs of Streptomyces species were randomly selected from the above strain pairs to compare with each other ( , in which not only the MLSA evolutionary distance in each pair was higher than 0.007, but also the dDDH and ANIm values were more than the 70% or 95∼96% cut-off points recommended for delineating species (Stackebrandt and Goebel, 1994;Richter and Rosselló-Móra, 2009), respectively. In addition, there were four strain pairs, i.e., S. canarius JCM 4733 T and S. corchorusii DSM 40340 T , S. castelarensis NRRL B-24289 T and S. melanosporofaciens DSM 40318 T , S. chartreusis ATCC 14922 T and S. osmaniensis OU-63 T , and S. mirabilis JCM 4551 T and S. olivochromogenes DSM 40451 T , in which the MLSA evolutionary distances in each pair were greater than or equal to 0.007, and the dDDH values were lower than 70%, but the ANIm values were over 95∼96%. All these data indicated that the MLSA evolutionary distance of 0.007 might be insufficient to act as the threshold value in delineating Streptomyces species. Based on the above analysis, the correlation between dDDH and MLSA, and that between ANIm and MLSA from the aforementioned 80 strain pairs were evaluated by an exponential regression model in order to obtain a more reliable boundary value of MLSA in delineating Streptomyces species. As can be seen in Figure 1A, a 70% dDDH value recommended to delineate species approximately corresponded to a MLSA value of 0.008. Theoretically, the MLSA value should decrease with the increase of dDDH value in the light of the putative boundary of 70% dDDH for species circumscriptions. However, in the present work, there were seven scatter points that deviated from this rule. Therefore, the MLSA value of 0.008 could not be simply used as the boundary for Streptomyces species circumscriptions. Similarly, it may also be clear from Figure 1B that the proposed 95∼96% ANIm value for delineating species approximately corresponded to a MLSA distance range from 0.010 to 0.014. These results suggested that a certain MLSA value could not be used alone as the threshold for the definition of Streptomyces species. Then, what is a more reasonable MLSA value used for defining a Streptomyces species? From Table 1, we found that when the MLSA distance value was Frontiers in Microbiology | www.frontiersin.org FIGURE 2 | Phylogenomic tree of five pairs of Streptomyces species and related reference strains. The numbers above branches are GBDP pseudo-bootstrap support values > 60% from 100 replications, with an average branch support of 96.0%. The tree was rooted at the midpoint (Farris, 1972). greater than or equal to 0.014, each strain pair represented the different genomic species; when the MLSA distance value was less than 0.008, the ANIm and dDDH values between each strain pair (except S. canarius JCM 4733 T and S. corchorusii DSM 40340 T ) were more than the 95∼96% and 70% cut-off points recommended for delineating species, respectively. So, each pair should represent the same genomic species except for S. canarius and S. corchorusii whose taxonomic relationship needed be reevaluated because the dDDH value between them was 69.5%, below 70% boundary point a little, while ANIm value was 96.69%, higher than 95∼96% boundary point. When the MLSA value was between 0.008 and 0.014 (not included), there were seven strain pairs (mentioned above) in which ANIm and dDDH values in each pair were greater than the corresponding thresholds generally accepted by microbial taxonomists, suggesting each pair should represent the same genomic species. In addition, there were nine strain pairs, i.e., S. albidoflavus and S. koyangensis, S. bauhiniae and S. seoulensis, S. rhizosphaericola and S. cavourensis, S. longwoodensis and S. lasalocidi, S. longwoodensis and S. galbus, S. galbus and S. lasalocidi, S. mirabilis and S. olivochromogenes, S. chartreusis and S. osmaniensis, and S. castelarensis and S. melanosporofaciens, whose dDDH values were slightly below the threshold of 70%, suggesting each pair should represent the different genomic species. Nevertheless, there were at least four pairs among the foregoing nine strain pairs, for example, S. albidoflavus and S. koyangensis, S. mirabilis and S. olivochromogenes, S. chartreusis and S. osmaniensis, and S. castelarensis and S. melanosporofaciens, whose ANIm values were slightly greater than the threshold of 95∼96%, suggesting each pair should belong to the same genomic species. Consequently, what are the reasons for the aforesaid contradictory result? In addition, what is the taxonomic relationship between two strains whose dDDH or ANIm values are near the critical points?
To answer these problems, on the one hand, the correlation between ANIm and dDDH from the aforementioned 80 strain pairs were evaluated by an exponential regression model. It is shown in Figure 1C, the ANIm value revealed an extremely high correlation (R 2 = 0.99756) with the dDDH value, further supporting that ANI can accurately replace DDH values for strains whose genome sequences are available (Goris et al., 2007). However, a 70% dDDH value did not correspond to a 95∼96% ANIm value, but to a ANIm value of approximately 96.7%. Thus, the above contradiction can be well explained based on this corresponding relation. On the other hand, the taxonomic relationships of the five strain pairs (S. canarius and S. corchorusii, S. albidoflavus and S. koyangensis, S. mirabilis and S. olivochromogenes, S. chartreusis and S. osmaniensis, and S. castelarensis and S. melanosporofaciens) were reevaluated by using a polyphasic taxonomic approach. At present, it has become a generally accepted principle by biologists to classify living organisms according to the level of phylogenetic correlation since the birth of evolution theory (Ward, 1998). In current prokaryote taxonomy, phylogenetic analysis based on 16S rRNA gene sequences plays a key role in species discrimination. However, there have been evidence that phylogenomic analysis exhibits better resolution than phylogenetic analysis based on 16S rRNA gene sequences (Rodriguez-R et al., 2018;Duchêne, 2021). In the present work, phylogenetic analysis indicated that there were three strain pairs, i.e., S. albidoflavus CGMCC 4.1291 T and S. koyangensis JCM 14915 T , S. chartreusis CGMCC 4.1639 T and S. osmaniensis JCM 17656 T , and S. mirabilis CGMCC 4.7010 T and S. olivochromogenes CGMCC 4.2000 T , in which each pair did not belong to the same species cluster according to the labeled color in the phylogenomic tree (Figure 2), suggesting that these six Streptomyces species should represent different genomic species. This result has been further confirmed by differential comparisons of cultural, physio-biochemical and chemotaxonomic characteristics in each pair ( Supplementary  Tables 2-4); with regard to the remaining two pairs, i.e., S. canarius CGMCC 4.1581 T and S. corchorusii CGMCC 4.1592 T , and S. castelarensis CGMCC 4.3570 T and S. melanosporofaciens CGMCC 4.1742 T , strains within each pair should belong to the same genomic species according to the phylogenomic clustering patterns (Figure 2). This result could also be confirmed by the facts shown in Supplementary Tables 5-7, the vast majority of phenotypic features of each strain pair were very similar with only a few exceptions. For example, as far as the former pair was concerned, milk coagulation, milk peptization and α-mannosidase activity were negative for strain CGMCC 4.1581 T , while positive for strain CGMCC 4.1592 T ; cellular fatty acids such as iso-C 19:0 , anteiso-C 19:0 and C 20:0 were detected for strain CGMCC 4.1592 T , while not for strain CGMCC 4.1581 T ; the major menaquinone in strain CGMCC 4.1592 T was MK-9(H 8 ), up to 82.0%, while the major menaquinone in strain CGMCC 4.1581 T was MK-9(H 6 ), only 53.6%. As far as the latter pair was concerned, activities of α-chymotrypsin, β-galactosidase and valine arylamidase were positive for strain CGMCC 4.3570 T , while negative for strain CGMCC 4.1742 T ; Assimilation of L-Rhamnose was positive for strain CGMCC 4.3570 T , while negative for strain CGMCC 4.1742 T ; in cellular fatty acids, the percentage composition of Sum In Feature 8 was up to10.5% for strain CGMCC 4.3570 T , while only 0.5 for strain CGMCC 4.1742 T ; moreover, in cultural characteristics, color of aerial mycelia on ISP2 and ISP6 was, respectively, dark mouse gray and grayish white for strain CGMCC 4.3570 T , while both white for strain CGMCC 4.1742 T . The disagreement for phenotypic characteristics between each strain pair representing the same genomic species was probably due to different ecological niches or minor differences in genotype. All these data supported that 70% dDDH or 96.7% ANIm value could act as the threshold value in delineating Streptomyces species. But when dDDH or ANIm value between two closely related Streptomyces strains was less than but close to 70 or 96.7% cut-off point, the taxonomic status of a novel strain could only be determined by a combination of phenotypic characteristics, chemotaxonomic characteristics and phylogenomic analysis.

CONCLUSION
Based on the above analysis, on the one hand, a 70% dDDH value did not corresponded to a 95∼96% ANIm value but approximately to a 96.7% ANIm value in the genus Streptomyces. On the other hand, we proposed that when the MLSA distance value between a novel Streptomyces strain and a reference strain was less than 0.008, the novel strain could be considered as a heterotypic synonym of the reference strain; when the MLSA distance value was greater than or equal to 0.014, the novel strain could be regarded as a new Streptomyces species; when the MLSA distance value was between 0.008 and 0.014 (not included), ANIm or dDDH value between a new strain and a reference strain must be calculated in order to determine the taxonomic status of a novel strain. Although 70% dDDH or 96.7% ANIm value could act as the threshold value in delineating Streptomyces species, if dDDH or ANIm value was less than but close to 70 or 96.7% cut-off point, the taxonomic status of a novel strain could only be determined by a combination of phenotypic characteristics, chemotaxonomic characteristics and phylogenomic analysis. Arcamone et al., 1959(Approved Lists 1980) Is a Later Heterotypic Synonym of Streptomyces antimycoticus Waksman, 1957(Approved Lists 1980 In the present work, the MLSA distance value between S. antimycoticus NBRC 12839 T and S. melanosporofaciens DSM 40318 T is 0.01, higher than the boundary value of 0.008, but the ANIm and dDDH values between them are 96.9% and 72.0, respectively, greater than the 96.7 and 70% cut-off points recommended for delineating species, supporting that they represent the same genomic species. On the basis of these data and rule 42 of the Bacteriological Code (Parker et al., 2019), we propose that S. melanosporofaciens is a heterotypic synonym of S. antimycoticus.

Streptomyces melanosporofaciens
The description is as given by Komaki and Tamura (2020).

Emended Description of Streptomyces filipinensis (Approved Lists 1980)
Heterotypic synonym: Streptomyces durhamensis Gordon andLapa, 1966 (Approved Lists 1980). In the present work, the MLSA distance value between S. filipinensis JCM 4369 T and S. durhamensis NRRL B-3309 T is 0.01, higher than the boundary value of 0.008, but the ANIm and dDDH values between them are 96.97 and 72.9%, respectively, greater than the 96.7 and 70% cut-off points recommended for delineating species, supporting that they represent the same genomic species. On the basis of these data and rule 42 of the Bacteriological Code, we propose that S. durhamensis is a heterotypic synonym of S. filipinensis.
The description is as given by Ammann et al. (1955) with the following modification. The G + C content of the type-strain genome is 71.8%, its approximate size 9.03 Mbp, its GenBank deposit SAMD00245426.
In the present work, the ANIm, dDDH and MLSA distance values between S. recifensis NRRL B-3811 T and S. griseoluteus JCM 4765 T are 96.86, 72.4, and 0.005, respectively, far away from the 96.7%, 70%, and 0.008 cut-off points recommended for delineating species, supporting that they represent the same genomic species. On the basis of these data and rule 42 of the Bacteriological Code, we propose that S. recifensis is a heterotypic synonym of S. griseoluteus.
The description is as given Umezawa et al. (1950) with the following modification. The G + C content of the type-strain genome is 71.6%, its approximate size 6.51 Mbp, its GenBank deposit SAMD00245512.
In the present work, the ANIm, dDDH and MLSA distance values between S. olivaceoviridis JCM 4499 T and S. corchorusii DSM 40340 T are 96.88, 71.4 and 0.006, respectively, above the 96.7%, 70% and 0.008 cut-off points recommended for delineating species, supporting that they represent the same genomic species. Meanwhile, the ANIm and dDDH values between S. canarius JCM 4733 T and S. corchorusii DSM 40340 T are 96.69 and 69.5%, respectively, near the 96.7 and 70% boundary points, but the MLSA distance value of them is 0.007, below the 0.008 boundary value recommended for delineating species. In addition, this result is further confirmed by the clustering patterns resulting from phylogenomic analysis. Labeda et al. (2017) also recognized that S. corchorusii NRRL B-12289 T is a later heterotypic synonym of S. olivaceoviridis NRRL B-12280 T . Thus, On the basis of these data and rule 42 of the Bacteriological Code, we propose that S. corchorusii and S. canaries are latter heterotypic synonyms of S. olivaceoviridis.
The description is as given by Pridham et al. (1958) with the following modification. The G + C content of the type-strain genome is 72.1%, its approximate size 9.53 Mbp, its GenBank deposit SAMD00245462.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
JG and YZ: revising the manuscript critically for important intellectual content. SH, YW, LF, and YX: acquisition of data. XT: analysis and interpretation of data. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We are grateful to CICC (China Center of Industrial Culture Collection, Beijing, China) for providing excellent technical assistance.