Genomic Insights Into the Mycobacterium kansasii Complex: An Update

Only very recently, has it been proposed that the hitherto existing Mycobacterium kansasii subtypes (I–VI) should be elevated, each, to a species rank. Consequently, the former M. kansasii subtypes have been denominated as Mycobacterium kansasii (former type I), Mycobacterium persicum (II), Mycobacterium pseudokansasii (III), Mycobacterium innocens (V), and Mycobacterium attenuatum (VI). The present work extends the recently published findings by using a three-pronged computational strategy, based on the alignment fraction-average nucleotide identity, genome-to-genome distance, and core-genome phylogeny, yet essentially independent and much larger sample, and thus delivers a more refined and complete picture of the M. kansasii complex. Furthermore, five canonical taxonomic markers were used, i.e., 16S rRNA, hsp65, rpoB, and tuf genes, as well as the 16S-23S rRNA intergenic spacer region (ITS). The three major methods produced highly concordant results, corroborating the view that each M. kansasii subtype does represent a distinct species. This work not only consolidates the position of five of the currently erected species, but also provides a description of the sixth one, i.e., Mycobacterium ostraviense sp. nov. to replace the former subtype IV. By showing a close genetic relatedness, a monophyletic origin, and overlapping phenotypes, our findings support the recognition of the M. kansasii complex (MKC), accommodating all M. kansasii-derived species and Mycobacterium gastri. None of the most commonly used taxonomic markers was shown to accurately distinguish all the MKC species. Likewise, no species-specific phenotypic characteristics were found allowing for species differentiation within the complex, except the non-photochromogenicity of M. gastri. To distinguish, most reliably, between the MKC species, and between M. kansasii and M. persicum in particular, whole-genome-based approaches should be applied. In the absence of clear differences in the distribution of the virulence-associated region of difference 1 genes among the M. kansasii-derived species, the pathogenic potential of each of these species can only be speculatively assessed based on their prevalence among the clinically relevant population. Large-scale molecular epidemiological studies are needed to provide a better understanding of the clinical significance and pathobiology of the MKC species. The results of the in vitro drug susceptibility profiling emphasize the priority of rifampicin administration in the treatment of MKC-induced infections, while undermining the use of ethambutol, due to a high resistance to this drug.

Only very recently, has it been proposed that the hitherto existing Mycobacterium kansasii subtypes (I-VI) should be elevated, each, to a species rank. Consequently, the former M. kansasii subtypes have been denominated as Mycobacterium kansasii (former type I), Mycobacterium persicum (II), Mycobacterium pseudokansasii (III), Mycobacterium innocens (V), and Mycobacterium attenuatum (VI). The present work extends the recently published findings by using a three-pronged computational strategy, based on the alignment fraction-average nucleotide identity, genome-to-genome distance, and core-genome phylogeny, yet essentially independent and much larger sample, and thus delivers a more refined and complete picture of the M. kansasii complex. Furthermore, five canonical taxonomic markers were used, i.e., 16S rRNA, hsp65, rpoB, and tuf genes, as well as the 16S-23S rRNA intergenic spacer region (ITS). The three major methods produced highly concordant results, corroborating the view that each M. kansasii subtype does represent a distinct species. This work not only consolidates the position of five of the currently erected species, but also provides a description of the sixth one, i.e., Mycobacterium ostraviense sp. nov. to replace the former subtype IV. By showing a close genetic relatedness, a monophyletic origin, and overlapping phenotypes, our findings support the recognition of the M. kansasii complex (MKC), accommodating all M. kansasii-derived species and Mycobacterium gastri. None of the most commonly used taxonomic markers was shown to accurately distinguish all the MKC species. Likewise, no species-specific phenotypic characteristics were found allowing for species differentiation within the complex, except the non-photochromogenicity of M. gastri. To distinguish, most reliably, between the MKC species, and between M. kansasii and M. persicum in particular, whole-genome-based approaches should be applied. In the absence of clear differences in the distribution of the virulence-associated region of difference 1 genes among the M. kansasii-derived species, the pathogenic potential of each of these species can only be speculatively assessed based on their prevalence among the clinically relevant population. Large-scale molecular epidemiological studies are needed to provide a better understanding of the clinical significance and pathobiology of the MKC species. The results of the in vitro drug susceptibility profiling emphasize the priority of rifampicin administration in the treatment of MKC-induced infections, while undermining the use of ethambutol, due to a high resistance to this drug.

INTRODUCTION
Non-tuberculous mycobacteria (NTM) comprise all species of the Mycobacterium genus, except those aetiologically implicated in tuberculosis (TB) and leprosy, that is members of the M. tuberculosis complex and M. leprae or M. lepromatosis, respectively. More than 180 NTM species have been recognized to date (LPSN database, 2019). This figure, however, will soon need to be revised, since several new species, on average, are continuously being added every year (Tortoli, 2014). With the increasing number of mycobacterial species descriptions, the number of reported infections, essentially due to NTM, is also growing on a global level. Not as much an enlarging spectrum of NTM species, but more a heightened clinical awareness, expanding population of vulnerable hosts, and advancement of diagnostic and surveillance services are responsible for this scenario (Sood and Parrish, 2017). Although many of the newly described NTM species are potentially pathogenic, having been isolated from clinically affected individuals, only less than a third has consistently been associated with significant health disorders in humans.
Mycobacterium kansasii is one of the most virulent and prevalent NTM pathogen in human medicine. It was first described by Buhler and Pollak in 1953 from a series of respiratory samples of patients with a TB-like pulmonary disease (Buhler and Pollak, 1953). The species was originally named a "yellow bacillus" to emphasize its brilliant yellow pigmentation on exposure to light, but amended thereupon to M. tuberculosis luciflavum by Middlebrook (Middlebrook, 1956) and M. luciflavum by Manten (Manten, 1957). The current species name (M. kansasii) was proposed by Hauduroy in 1955 and refers to where its first isolations were performed (Kansas City, USA) (Hauduroy, 1955). Since the early 1960s, M. kansasii infections have been among the very top of all NTM diseases reported worldwide. A remarkable upsurge in the incidence of M. kansasii infections was seen at the turn of 1980s and 1990s, with the burgeoning of the HIV/AIDS epidemic, and persisted until the first antiretroviral therapies became widely available (Horsburgh and Selik, 1989;Witzig et al., 1995;Santin and Alcaide, 2003). Currently, M. kansasii is one of the six most frequently isolated NTM species across the world. The prevalence of this pathogen is exceptionally high in Slovakia, Poland, and the UK, with the isolation rate of 36, 35, and 11%, respectively, compared to a mean isolation rate of 5% in Europe and 4% globally (Hoefsloot et al., 2013). Chronic, fibro-cavitary lung disease, with upper lobe predominance, and with an overall clinical picture mimicking classical TB, is the most common manifestation attributable to M. kansasii (Matveychuk et al., 2012;Moon et al., 2015;Bakuła et al., 2018b). Much rarer are extrapulmonary infections, such as lymphadenitis, skin and soft-tissue infections, and disseminated disease (Liao et al., 2007;Chen et al., 2008;Park et al., 2012;Shaaban et al., 2014). The exact epidemiology of M. kansasii disease is difficult to ascertain because case reporting is not mandatory in most countries and differentiation between isolation (colonization) and infection may be diagnostically challenging. Moreover, the incidence rates are influenced by a combination of demographic and clinical factors, including patient geographical origin and HIV status. Pulmonary M. kansasii infections tend to cluster in specific geographical areas, such as central Europe or metropolitan centers of London, Brasilia, and Johannesburg (Hoefsloot et al., 2013). Strong epidemiological disparities exist in terms of HIV reactivity. The annual rate of M. kansasii infection among HIV-seropositive patients has been reported to be as high as 532 per 100,000 population, whereas in non-HIV infected individuals it has been calculated at 0.06-2.2 per 100,000 population (Marras and Daley, 2004;Ricketts et al., 2014). Not only the incidence rates, but also the sources of M. kansasii infections and routes of transmission are poorly defined. Similar to other NTM, M. kansasii infections are believed to be acquired from environmental exposures rather than by person-to-person transmission, although a case of interfamilial clustering has been described (Ricketts et al., 2014). Contrary to other NTM, M. kansasii has only sporadically been isolated from soil, natural water systems or animals. Instead, the pathogen has often been recovered from municipal tap water, which is considered its major environmental reservoir (Falkinham, 1996;Thomson et al., 2013).
The genetic structure of M. kansasii was first investigated in the early 1990s. A pioneering work by Ross et al. showed the existence of a genetic subspecies of M. kansasii by sequencing of the 5 ′ end of the 16S rRNA gene (Ross et al., 1992). Subsequent studies involving the amplification of the 16S-23S rRNA spacer region (Abed et al., 1995), PCR-restriction analysis (PRA) of the highly-conserved hsp65 gene (Plikaytis et al., 1992;Telenti et al., 1993) and Southern blot hybridization with the major polymorphic tandem repeat (MPTR) (Hermans et al., 1992) or insertion sequence-like element, IS1652 (Yang et al., 1993) as a probe have confirmed M. kansasii as a genetically heterogeneous species. The genetic variability of M. kansasii was clearly demonstrated in a study of Picardeau et al. who divided the species into five subspecies or (sub-)types, based on the analysis of restriction fragment length polymorphisms (RFLPs) using the MPTR probe, pulsed-field gel electrophoresis (PFGE), amplified fragment length polymorphism (AFLP) analysis, and PRA of the hsp65 gene (Picardeau et al., 1997). The validity of the five M. kansasii subtypes was further corroborated by sequencing of the 16S-23S rRNA gene or internal transcribed spacer (ITS) region (Alcaide et al., 1997). Somewhat later, two novel types (VI and VII) have been described, according to their hsp65 restriction profiles and sequencing results of the 16S rRNA gene and the 16S-23S rRNA spacer (Richter et al., 1999;Taillard et al., 2003). Moreover, M. kansasii isolates with an intermediate type I (I/II) and atypical type II (IIb) have been reported. Whereas, the former had type I-specific sequence of the hsp65 gene and type IIspecific sequence of the spacer region, the latter displayed type II-specific spacer sequence and a unique hsp65 gene sequence (Iwamoto and Saito, 2005). The separateness of the M. kansasii subspecies was further supported by polymorphisms at other genetic loci, including the RNA polymerase gene (rpoB) and the translational elongation factor Tu (tuf ), successfully applied for the differentiation between the subspecies I-VI (Kim et al., 2001;Bakuła et al., 2016). Noteworthy, the inter-subspecies differences have been detected at the protein level. For each of the six (I-VI) M. kansasii subspecies, specific matrix-assisted laser desorptionionization time-of-flight (MALDI TOF) mass spectral profiles have recently been established (Murugaiyan et al., 2018).
Across all genetic studies performed so far on M. kansasii, a controversy has been growing over the taxonomic rank of the genetic variants of the species, best reflected by their terminology, which includes subspecies, subtypes, and genotypes.
The purpose of this study was to resolve the phylogenetic and taxonomic structure of M. kansasii by combining whole genome sequencing with traditional polyphasic classification approaches. The use of a polyphasic strategy, incorporating phylogenetic, biochemical, and chemotaxonomic criteria for resolving the taxonomic identity of mycobacterial species, especially within the NTM group, has been heavily advocated (Saini et al., 2009).
The issue of molecular taxonomy of M. kansasii has been addressed in a very recent work by Tagini et al. (2019), which appeared shortly before the completion of our own draft. The present study extends the recently published findings by using a new independent sample and somewhat different methodology, and thus delivers a more refined and complete picture of the M. kansasii complex.
All strains were maintained as frozen stocks and cultured on Löwenstein-Jensen or Middlebrook 7H10 agar (Becton-Dickinson, Franklin Lakes, USA) medium, supplemented with oleic acid, albumin, dextrose, and catalase, and incubated at either 30 or 37 • C.

Species Identification and Genotyping
The strains were identified as M. kansasii by using high pressure liquid chromatography (HPLC) of cell wall mycolic acids, in accordance with the Centers for Disease Control and Prevention (CDC) guidelines (Butler et al., 1996) and by the GenoType Mycobacterium CM/AS assay (Hain Lifescience, Nehren, Germany), according to the manufacturer's instructions. For genotypic identification, total DNA was purified from solid bacterial cultures with a standard extraction method described previously (Santos et al., 1992). Genotyping of M. kansasii strains was performed by PCR-RFLP analysis of the hsp65, rpoB, and tuf genes, as reported elsewhere (Telenti et al., 1993;Kim et al., 2001;Bakuła et al., 2016).

Biochemical Profiling
The strains were evaluated for a panel of biochemical characteristics by conventional laboratory procedures (CLSI; Clinical and Laboratory Standards Institute, 2011). These comprised tests for niacin accumulation, nitrate and tellurite reduction, Tween 80 hydrolysis (10 days), catalase (thermostable and semi-quantitative), β-glucosidase, arylsulfatase (3 and 14 days), urease (5 days), and pyrazynamidase. Inhibition tests of tolerance to thiphene-2-carboxylic acid hydrazide (TCH), 5% sodium chloride, and acidic (pH 5.5) conditions were also carried out. In addition, cultural features, including colony morphology, photochromogenicity, the ability to grow on MacConkey agar without crystal violet, and at different temperatures (25, 35, and 45 • C) were assessed. For each assay, each strain was tested in triplicate. Only if at least two replications produced identical results, the test was considered complete.

Genome Sequencing and Assembly
For the whole-genome sequencing, chromosomal DNA from all 27 M. kansasii strains under the study was extracted by mechanical cell (100 mg pellet) disruption by using zirconia ceramic beads in a FastPrep-24 instrument (MP Biomedicals, Valiant Co., Yantai, China) and further extracted chemically followed with a DNAzol R reagent (Invitrogen, Carlsbad, USA). DNA concentration, its purity and integrity was determined using a Qubit high-sensitivity (HS) assay kit (ThermoFisher, Waltham, USA).
Paired-end libraries were prepared from 1 ng of high-quality genomic DNA with the Nextera XT DNA sample preparation kit according to the manufacturer's instructions (Illumina Inc., San Diego, USA). The libraries were sequenced on a HiSeq 2500 or a NextSeq 500 instrument (Illumina, San Diego, USA) at a read length of 2 × 150 bp. The quality of reads before and after pre-processing was assessed using FastQC (v0.11.5) (Andrews, 2010). The raw reads were trimmed with TrimGalore ver. 0.43 (http://www.bioinformatics.babraham.ac.uk/projects/ trim_galore/), and de novo assembled with the SPAdes Genome Assembler ver. 3.10.0 (Nurk et al., 2013). Scaffold-level assembly was tested using Quast (QUality ASsesment Tool)  to generate basic genome statistics.
In addition to genomes of 27 M. kansasii strains, sequenced in this study, genomes of other 53 Mycobacterium sp. strains were analyzed. The genomic sequences of these strains were retrieved from the GenBank database (http://www.ncbi.nlm.nih. gov/genbank/) and their appropriate accession numbers were provided in Supplementary Table 6.

Genomic Data Availability
The assembled genomes were deposited under NCBI Bio-Project accession numbers: PRJNA374853 and PRJNA317047. The genomes were deposited in the GenBank under accession numbers provided in Table 1 and Supplementary Table 6.

Annotation and Comparative Genome-and Gene-Scale Analyses
Gene identification and annotation were achieved using the DFAST pipeline v1.0.5 with default settings (Tanizawa et al., 2017).
The genomic relatedness between the strains analyzed was assessed using the MiSI (Microbial Species Identifier) method, based on a combination of genome-wide average nucleotide identity (gANI) and alignment fraction (AF) of orthologous genes (Varghese et al., 2015). The gANI and AF values of 96.5 and 0.6, respectively, were assumed as cutoffs for species delimitation.
As a second method to evaluate the genomic relatedness, the Genome-to-Genome Distance Calculator (GGDC at http:// ggdc.dsmz.de) was used. This algorithm was designed to replace standard DNA-DNA hybridization (DDH) by calculating DNA-DNA relatedness (Meier-Kolthoff et al., 2013a).The genome-togenome distance (GGD) value of 0.0258 was identified as a maximum threshold to assign a genome pair to the same species.

Identification of RD1-14 Genes
To establish the presence of genes within 14 regions of deletions RD1-14, DIFFIND software with -c 0.7 (sequence identity threshold) and -s2 0.5 (length difference cut-off) parameters was used (Marciniak et al., 2017). Identification of RD1-14 genes was based on the alignment of amino acid sequences of predicted genes from the analyzed genomes to the reference database. As references were used amino acid sequences produced by RD1-14 genes from the M. tuberculosis reference strain H37Rv (Accession no.: NC_000962.3) and listed by Brosch et al. (2002).
For single-gene phylogenies, the respective sequences of each genetic locus were subjected to multiple alignment in MEGA X software (ClustalW algorithm) (Kumar et al., 2018). The resulting fragments were further trimmed to remove unnecessary gaps or regions, to a final length of 1,537, 644, 3,439, 1,180, and 277 bp for 16S RNA hsp65, rpoB, tuf, and 16S-23S rRNA ITS region, respectively. The so prepared sequences were used for evolutionary distance calculation according to the Jukes-Cantor model (Jukes and Cantor, 1969). Phylogenetic trees for each target locus sequences were built using the neighbor-joining method and midpoint rooted with the MEGA X software (Saitou and Nei, 1987). Tree topologies were evaluated by bootstrap analysis based on 1,000 replications (Felsenstein, 1985).

RESULTS AND DISCUSSION
Sequencing of 27 M kansasii genomes yielded a mean coverage of 66.4x per genome. The average number of contigs per genome was 129 (±229) corresponding to an average N50 score of 4933 kb (±2,605 kb).
The genome sizes and GC contents ranged from 5.6 to 6.6 Mbp (avg. 6.2 Mbp ± 0.3 Mbp) and from 65.86 to 66.38 (avg. 66.14 ± 0.10), respectively. These values were consistent with the data reported for previously assembled M. kansasii genomes ( Table 2).
The whole-genome-level relatedness among the 27 M. kansasii strains under the study was assessed with three species identification-relevant parameters, namely the alignment fraction (AF) of orthologous genes, the average nucleotide identity (ANI), and the genome-to-genome distance (GGD) (Tables 3 and 4). The AF/ANI metrics were also computed for genomic sequences of another 32 M. kansasii strains and 24 Mycobacterium sp. strains, representing 5 NTM species and 5 M. tuberculosis complex species, all extracted from NCBI databases (Supplementary Table 6). Whereas, the GGD analysis was performed on 69 genomes in total, including 59 M. kansasii genomes and 10 genomes representing single NTM (other than M. kansasii) and M. tuberculosis complex species (Supplementary Table 6).
The AF values for all M. kansasii strains ranged between 0.6 and 1.0. The lowest AF value recorded for two strains, members of the same M. kansasii genotype was 0.69 (type I, range 0.69-1.0). Within all other types, the AF values were 0.85 or higher.
All M. kansasii genotypes showed AF values equal to or below 0.75 with strains of other Mycobacterium species, except for M. gastri and M. persicum, which yielded AF values as high as 0.86 with M. kansasii type IV and V (range, 0.79-0.86) or 0.99 with M. kansasii type II (range, 0.89-0.99), respectively (Table 3A).
Pairwise ANI values for any M. kansasii strains affiliated with different types never exceeded the 95-96% threshold (range, 88.6-94.9%), commonly used as a boundary of species delineation (Richter and Rosselló-Móra, 2009;Varghese et al., 2015). Whereas, the ANI values for strains of the same M. kansasii genotype were always higher than 98.8%. The ANI values between M. kansasii and other Mycobacterium  Table 3B).
The AF and ANI metrics were also analyzed combinatorially, using the Microbial Species Identifier (MiSI) algorithm. The MiSI method sorts the analyzed genomes into species-like taxa or cliques, based on the AF and ANI species-level cutoff values set at 0.6 and 96.5%, respectively (Varghese et al., 2015). For a total of 83 mycobacterial genomes studied, 15 different cliques were configured. The results were pictorially summarized in Figure 1. The genomes of 59 M. kansasii strains were clearly divided into six cliques, each containing strains of a distinct M. kansasii subtype (I-VI) only. All the remaining NTM species had their genomes clustered within separate cliques, except that four genomes of strains classified as M. persicum were allocated in the M. kansasii genotype II clique. The genomes of 19 strains, representing five M. tuberculosis complex species were accommodated in a single cluster (clique).
The results of the AF/ANI calculations were fully corroborated by the GGD analysis, which serves as an in silico equivalent of the laboratory-based DNA-DNA hybridization (DDH) (Auch et al., 2010;Meier-Kolthoff et al., 2013b). Here, all strains contained within the M. kansasii genotypes shared enough sequence similarity to be considered as separate species ( Table 4). The intra-genotype GGD values ranged from 0.0 to 0.0138, and thus fell under the recommended cut-off value of ≤0.0258, corresponding to a 70% DDH cut-off, for species demarcation (Meier-Kolthoff et al., 2013b). Much higher were the GGD values between strains representing different M. kansasii genotypes (range, 0.0601-0.124) or between any M. kansasii and any other Mycobacterium species (range, 0.0563-0.1971). The only exception was when comparing genomes of M. kansasii type II to any of M. persicum, for which the GGD values were between 0.0003 and 0.0044 (Table 4).
Another whole genome-level approach to clarify the taxonomic relationships between the six M. kansasii subtypes was the core-genome phylogenetic analysis. A dendrogram based on the concatenated 615,565-amino-acid sequences from 1,752 single-copy orthologous genes clearly separated all M. kansasii subtypes (Figure 2). All strains belonging to the same subtype formed distinct clades, supported by high bootstrap values (76.7-100%). Noteworthy, strains of M. persicum located in the same clade as the M. kansasii type II strains, whereas M. gastri branched sisterly to M. kansasii type IV. The topology of the tree supports the separation of M. kansasii subtypes, as   1925-0.1947 0.1915-0.1933 0.1915-0.1932 0.1916-0.1933 0.1925-0.1939 0.1921-0.1922 0.1909-0.192 0.1957-0.1971 0.1934-0.194 0.1937 0.1934 0.194 0.194 1924-0.1931 0.1871-0.1922 0.1914-0.1914 0.1904-0.1915 0.1939-0.1943 0.1929-0.1935 0.1932 0.1934 0.193 0.1935 0.1929 0.195 0.1909 Table 6. *The values were not given, since only one genome sequence per species was analyzed. GGDs calculated within the groups of strains (genomes) are indicated in bold. The GGD values lower than 0.0258 were used to assign a genome pair to the same species.
Frontiers in Microbiology | www.frontiersin.org FIGURE 2 | Maximum-likelihood phylogenetic tree based on the amino acid alignment of concatenated single-copy orthologous genes. Bar, number of amino acid substitutions per amino acid site. Node supports were computed using the Shimodaira-Hasegawa test. distinct species, with M. kansasii type II being conspecific with M. persicum. The taxonomic position of M. kansasii genotypes was also examined using phylogenetic inferences from five genetically conserved loci, including the canonical 16S rRNA gene, the ITS region, and three protein-coding genes, namely hsp65, tuf, and rpoB, all being widely applied as molecular markers for the classification of mycobacteria (Tortoli, 2014). Multialignment and phylogenetic analyses for each of the five loci was conducted on the sequences of all 59 M. kansasii strains and single strains of M. tuberculosis and five other NTM species (Supplementary Table 6).
Pairwise alignments of the 1,537-bp sequences of 16S rRNA gene from M. kansasii strains showed that they were highly similar or identical (99-100% sequence similarity), both within and between the subtypes (Supplementary Table 1). At the same time, strains of M. kansasii shared 98-98.6% similarity with M. tuberculosis and more than 98% similarity with other NTM species. The 16S rRNA gene sequences were identical between M. persicum and M. kansasii type II, and between M. gastri and M. kansasii types I and IV.
Comparisons of the 277-bp ITS sequences from M. kansasii strains showed at most 85 and 98% similarities with the corresponding sequences from M. tuberculosis and other NTM species, respectively. Only sequences from M. kansasii type II and M. persicum were almost identical, sharing 99.2-100% similarity (Supplementary Table 2). The ITS sequence similarities within and between different M. kansasii subtypes fell under relatively wide ranges, i.e., 92.1-100% and 81.2-100%, respectively. The highest inter-subtype similarity values (>95%) were observed between three M. kansasii type I strains (K14, K19, and NLA00100521) and strains of M. kansasii type II. The type II-specific ITS sequences of those three type I strains accounted for the high intra-type heterogeneity (92.1-100% sequence similarity).
Sequence analysis of the 644-bp hsp65 gene fragments showed similarities of 90.3-92.3% between M. kansasii and M. tuberculosis, and 90.9-97.9% between M. kansasii and other NTM species, except M. persicum which shared 99.5-100% similarity with M. kansasii type II (Supplementary Table 3). Alignments of the hsp65 gene sequences from members of different M. kansasii subtypes yielded similarities of <98%. The only exception were three strains (K4, K14, and K19) of M. kansasii type I sharing up to 99.8% similarity with M. kansasii type II strains.
The results of the partial tuf (1,180 bp) gene analysis were similar to those obtained with the hsp65 gene (Supplementary Table 4). The tuf gene sequence similarities between M. kansasii and M. tuberculosis were 93.4% at most, while those between M. kansasii and other NTM species were always below 98%, except that sequences of M. kansasii type II and M. persicum were identical. The similarity indexes calculated for M. kansasii of different subtypes did not exceed 98.3%, except that two strains of type I (K14 and K19) had the same tuf sequences as M. kansasii type II strains.
Finally, alignments of the partial rpoB (3,439 bp) gene sequences found all M. kansasii strains to share <89% sequence similarity with M. tuberculosis and <96% similarity with NTM species, but again not M. persicum, whose sequences were identical or nearly identical with those of M. kansasii type II (Supplementary Table 5). The level of the rpoB gene sequence similarity between members of different M. kansasii subtypes was consistently below 98%, excluding two type I strains (K14 and K19), which displayed high similarity or identity with type II strains.
To better illustrate the phylogenetic relatedness of M. kansasii subtypes, phylogenetic trees inferred from five individual loci were constructed (Figures 3-7). A separate tree was created using the concatenated 16S rRNA, hsp65, and rpoB genes (Figure 8), since such an approach is known to increase considerably discrimination and robustness of the dendrogram analysis (Devulder et al., 2005). In all but one dendrograms, all M. kansasii strains could be spread into six highly supported (bootstrap values ≥ 93%) clusters, according to their subtype affiliation (Figures 4-8). The 16S rRNA gene-based dendrogram was different in that it contained no cluster specific for M. kansasii type IV. Two type IV strains clustered together with M. kansasii type I strains (Figure 3). Noteworthy, in the same cluster the type strain of M. gastri was placed. A feature, which was apparent across all the trees was that strains of M. kansasii type II clustered along with M. persicum. Moreover, there were four M. kansasii type I strains that branched within that cluster. Two of these strains (K14 and K19) were always present in the M. kansasii type II-M. persicum cluster, whereas another two belonged to that cluster only in the trees based on the ITS region (strain no. NLA00100521), 16S rRNA gene (NLA00100521), and hsp65 gene (strain no. K4). Having the type I-specific hsp65 gene sequence (sequevar I) and type II-specific ITS sequence (sequevar II), strain no. NLA00100521 represents the so-called intermediate type I (I/II), considered a transitional form between environmental type II and human-adapted type I (Iwamoto and Saito, 2005). Whereas, strains K14 and K19 can be identified as atypical type II (IIb) due to their type II ITS sequence and unique, yet most similar to type II, hsp65 sequences (Iwamoto and Saito, 2005). Captivatingly, strain K4 had the same unique hsp65 gene sequence, with all the other sequences being characteristic of type I. Thus, the strain represents the so far unreported variant of M. kansasii type I, which can be tentatively designated as atypical type I (Ib).
Altogether, the results from the genome-wide comparisons demonstrated the six (I-VI) M. kansasii subtypes to represent distinct species. This was also the conclusion of the recent study by Tagini et al., who based their results upon ANI and GGD analysis of the genomes of 21 M. kansasii strains comprising all six subtypes (Tagini et al., 2019) (Twenty of these strains were used in the present work). Also, singleand multigene phylogenies, highly congruent between this and already published study, were indicative of species-level demarcations between M. kansasii subtypes. From these findings, Tagini et al. were first to propose new species designations, namely M. pseudokansasii, M. innocens, and M. attenuatum, replacing the former types III, V, and VI, respectively. The most prevalent type I was preserved under the 'M. kansasii' designation. Whereas, M. kansasii type II was found, as in our     study, conspecific with M. persicum. Therefore, we share the view of assigning this name to all M. kansasii type II strains. In fact, the conspecificity of M. persicum and M. kansasii type II would have been disclosed upon the original description of the species (M. persicum), if the authors had included M. kansasii type II in the genome-based comparative analysis (Shahraki et al., 2017).
The two M. kansasii type IV strains, analyzed in our study, fully satisfied the genomic criteria for a separate species. This was also implied by our predecessors, but in the absence of any type strain, they could not formally establish the species. Here, we propose a new species name, Mycobacterium ostraviense sp. nov., to accommodate M. kansasii type IV strains, with a strain no. 241/15, as a type. The description of this new species is given at the end of the article.
The taxonomic status of M. kansasii type VII remains an enigma. Neither the strain nor its genomic sequence is available, precluding any relevant phylogenetic analyses. This type was reported only once (Taillard et al., 2003), and given the similarity of its hsp65 RFLP banding patterns, which served as the only diagnostic means, to those of type III, it is plausible that type VII is a product of misidentification.
Since the mid-1990s, the PCR-RFLP analysis, based on singlecopy, orthologous genes (hsp65, rpoB, and tuf ) has been widely used for the identification of a plethora of NTM species, including M. kansasii and its subtypes (Alcaide et al., 1997;Devallois et al., 1997;Kim et al., 2001;da Silva Rocha et al., 2002;Santin and Alcaide, 2003;Zhang et al., 2004;Kwenda et al., 2015;Bakuła et al., 2016). However, PCR-RFLP typing may not seldom produce misleading results. Single nucleotide polymorphisms can alter the recognition sites of the restriction enzymes and thus generate either patterns unidentifiable or corresponding to those of other species. Also, sequence analysis, even conducted on a combination of genes, may not resolve the species identity adequately. This is best illustrated in the already discussed strains of atypical type IIb, which despite sharing nearly 99% average nucleotide identity with type I, had their 16S rRNA, hsp65, and rpoB gene sequences, either individually or concatenated, more closely associated with type II. The type IIb strains would have been considered type II (M. persicum), if they had not been inspected at the whole genome-level. Thus, neither PCR-RFLP profiling nor single or multilocus sequencing allows unequivocal identification of M. persicum. A definite diagnosis should be supported by the genome-wide analysis. Still, the two type IIb strains, under this study, displayed the type Ispecific hsp65 RFLP patterns, upon digestion with HaeIII (but not with BstEII, which was type II-specific). This feature can be exploited for differentiation between types I and II, if wholegenome sequencing is not affordable. Nevertheless, a new, robust genetic marker allowing for fast and accurate identification of all M. kansasii-derived species (former types I-VI), bypassing the need for whole-genome analysis, would be of great benefit. For this, a more in-depth, comparative analysis of the genomes of more strains representing the six M. kansasii-derived species, and other NTM species, will have to be undertaken.
There has been a continuing debate on how the differences between M. kansasii subtypes translate into pathogenicity. The prevailing view is that only types I and II are true human pathogens, with the latter having been associated with immunodeficiency, and HIV infection in particular, whereas all the remaining types are considered non-pathogenic, and their sporadic isolation from clinical samples has been interpreted as colonization or environmental contamination (Tortoli et al., 1994;Taillard et al., 2003). Indeed, M. kansasii type I is the most commonly detected among clinical isolates and the predominant cause of M. kansasii disease worldwide (Alcaide et al., 1997;Kim et al., 2001;Gaafar et al., 2003;Santin and Alcaide, 2003;Taillard et al., 2003;Zhang et al., 2004;da Silva Telles et al., 2005;Shitrit et al., 2006;Thomson et al., 2014;Kwenda et al., 2015;Bakuła et al., 2016). Infections attributable to M. kansasii type II are much rarer (Taillard et al., 2003;Zhang et al., 2004;Shitrit et al., 2006;Bakuła et al., 2016), and those caused by other types are almost unreported in the literature. Still, types III, IV, and VI have been recognized among clinical isolates (Alcaide et al., 1997;Picardeau et al., 1997;Santin and Alcaide, 2003;Thomson et al., 2014;Kwenda et al., 2015) with types IV and VI implicated in the disease (Santin and Alcaide, 2003;Thomson et al., 2014). Due to the paucity of isolations of M. kansasii other than type I, their clinical relevance remains obscure. Some new light on this problem may be shed by the findings of an international, multicenter investigation, currently in progress, on the global distribution of M. kansasii subtypes. So far, we have documented nine confirmed cases of M. kansasii disease, etiologically linked to either types III, IV, V or VI (Jagielski et al., data unpublished). In this context, the newly proposed species names for types V (M. innocens) and VI (M. attenuatum) may not reflect the true phenotype of those bacteria.
Searching for the genetic determinants of pathogenicity, a key genomic region associated with M. tuberculosis virulence, known as "region of difference 1" (RD1) was interrogated across the genomes of M. kansasii subtypes for its functional integrity (i.e., presence of RD1 genes). The RD1 encodes a secretory apparatus (ESX-1 type VII secretion system) responsible for exporting two highly potent antigens and virulence factors-the 6-kDa early secreted antigenic target (ESAT-6) and the 10-kDa culture filtrate protein (CFP-10) (Berthet et al., 1998). These proteins, encoded by the same operon, play a critical role in modulation of the host immune response through inhibition of phagosome maturation, cytosolic translocation of mycobacteria or granuloma formation van der Wel et al., 2007;Volkman et al., 2010). The esat-6 (esxA) and cfp-10 (esxB) genes have also been demonstrated in some NTM species, including M. kansasii (types I-V), M. szulgai, M. marinum, and M. riyadhense (van Ingen et al., 2009). Furthermore, the ESAT-6/CFP-10-mediated translocation of bacilli into the cytosol has been proven to occur in M. kansasii type I, but not in M. kansasii type V (Houben et al., 2012).
Our analysis showed the presence of six RD1 genes (rv3871-5 and rv3877), including the ESAT-6 and CFP-10-coding genes and two other genes (rv3871 and rv3877) coding for the essential components of the ESX-1 secretion system, in all types of M. kansasii and other NTM species (Table 5). Only single strains of M. kansasii type I, M. marinum, and M. szulgai were devoid of the rv3872 gene coding for PE35, conjectured to play a role in the regulation of esxB/A expression (Brodin et al., 2006).
a The presence of a given gene was marked with a "+" and highlighted in gray; the superscript letters refer to isolation source: D -clinical strains implicated in NTM disease; N -clinical strains not implicated in NTM disease; R -clinical strain from the rhesus macaque (Macaca mulatta); U -clinical strains with unknown relation to NTM disease; E -environmental strains. Strains whose genomes were sequenced in this study are marked with an asterisk (*).

Frontiers in Microbiology | www.frontiersin.org
The rv3876 gene was demonstrated in all M. kansasii types except for types II (M. persicum) and VI (M. attenuatum). Neither it was present in M. szulgai and M. riyadhense. The protein encoded by this gene is an ESX-1 secretion-associated protein EspI. It was shown that inactivation of the rv3876 gene did not impair secretion of ESAT-6 (Brodin et al., 2006). More recently, however, EspI was found to negatively regulate the ESX-1 secretion system in M. tuberculosis, in response to low cellular ATP levels. EspI was thus hypothesized to play a possible role during the latent phase of infection (Zhang et al., 2014).
The rv3879c gene, coding for another ESX-1 secretionassociated protein EspK, was variably distributed among M. kansasii strains. It was detected in half of the strains of types I and III, while being absent in all but one strains of types II and VI, and all type V strains. EspK seems not to be involved in virulence, since the rv3879c homolog deletion mutant of M. bovis was not attenuated in the guinea pig model (Inwald et al., 2003). Moreover, similar to rv3876, inactivation of the EspKcoding gene did not abolish ESAT-6 secretion (Brodin et al., 2006). Contrastingly, EspK of M. marinum was found crucial for the ESX-1-mediated secretion of EspB (Rv3881c), required for virulence and growth in macrophages (McLaughlin et al., 2007). It is thus conceivable that EspK may influence the pathogenicity also in M. kansasii.
Collectively, based on the distribution of RD1 genes, neither of the M. kansasii types could be categorized as being more or less pathogenic, given that all genes essential for the functioning of the ESX-1 secretion machinery were uniformly present in M. kansasii strains, and that the absence of certain genes was reported in both clinically relevant and neutral strains. To explore more in depth the genetic background of virulence in M. kansasii, more advanced, functional studies should be performed, with a focus not only on the RD1 genes but several other genes flanking that cluster, which together form the "extended RD1" region. Furthermore, we cannot exclude that the pathogenic and non-pathogenic M. kansasii strains differ in terms of expression of the RD1 genes or activity of their proteins. More RD1targeted experimental investigations are required to validate such scenarios.
Analysis of other than RD1 regions of deletions (RD2-14) did not show any consistent (i.e., shared across all strains of a given species) species-specific pattern of RD genes (Supplementary Data Sheet 2). It was noteworthy, however, that the only two genes found in the RD3 locus were present in either of the two most pathogenic M. kansasii types I and II. The rv1577 gene occurred in more than 90% of M. kansasii type I and almost 73% of M. kansasii type II. Whereas, the rv1586 gene was demonstrated exclusively in M. kansasii type I, at a frequency of nearly 81%. Likewise, only M. kansasii types I and II harbored genes of the RD11 locus. The rv2651 gene was present in 90 and 73% of M. kansasii type I and II, respectively. The rv2646 was evidenced in slightly more than 60% of M. kansasii type I. Interestingly, both RD3 and RD11 represent phage inserts within the M. tuberculosis genome, and are thought to generate antigenic variation (Ahmed et al., 2007).
It has been canonically accepted, upon description of new species, to provide a detailed phenotypic characterization.
However, in the era of genomic-based bacterial taxonomy, the significance of the phenotype has much eroded and the use of conventional biochemical testing has been increasingly abandoned. The algorithm for routine differential diagnostics of NTM species should obligatorily include only growth rate and pigment production (Tortoli et al., 2017).
For closely-related species, the diagnostic value of phenotyping is virtually negligible, as demonstrated also in this study ( Table 6). All strains, irrespective of subtype (species), were almost invariably photochromogenic, niacin-negative, and grew at 25 and 37 • C, but not at 45 • C, in the presence of 5% (w/v) NaCl or on MacConkey agar without crystal violet. They were all resistant to thiophene-2-carboxylic acid hydrazide (TCH) and presented a strong catalase activity, but none exhibited pyrazynamidase activity or arylsulfatase at 3 days. Some variability was observed when testing for nitrate and tellurite reduction, Tween 80 hydrolysis, and urease activity. The only marked difference between the subtypes (species) was that strains of M. ostraviense (formerly M. kansasii type IV) were, unlike all other species, unable to reduce nitrate and that their catalase was heat-liable. Whether these features are stable within the species need to be verified on a larger set of strains. Interestingly, both these features are typical for M. gastri (Kent and Kubica, 1985), with which M. ostraviense shares the highest genetic similarity, as evidenced upon whole-genome analysis. Mycobacterium gastri, a casual resident of human stomach and only exceptionally pathogenic (Velayati et al., 2005), is easily distinguishable from all M. kansasii-derived species, as it is non-photochromogenic.
Finally, drug susceptibility profiles of 30 mycobacterial strains, including type strains of three newly established species, by Tagini et al. (2019), were compared within and between the species (former M. kansasii types) ( Table 7). Shortly, all strains were susceptible to RIF, RFB, AMK, SXT, MFX, LZD, and CLR (except one CLR-resistant strain of M. kansasii). Of these drugs, only RIF and CLR showed interspecies differences in their activity, with the MICs for M. kansasii and M. persicum slightly higher than for other M. kansasii-derived species. More than 80% of strains were resistant to EMB. Among these, were all strains of M. kansasii (former type I), M. persicum (II), and M. ostraviense (IV). Single strains of M. kansasii, M. pseudokansasii, and M. attenuatum were resistant to CIP. The MICs of STR and DOX varied widely (<0.5-16 mg/L vs. 1->16 mg/L), yet the highest values (16 vs. >16 mg/L) were observed only for strains of M. kansasii, M. persicum, and M. attenuatum. The INH and ETO MICs were low, and within relatively narrow ranges (<0.25-2 mg/L vs. <0.3-0.6 mg/L). These findings confirm the key observations from previous studies, on the susceptibility of M. kansasii strains to RIF and their high resistance to EMB (da Silva Telles et al., 2005;Wu et al., 2009;Shahraki et al., 2017;Bakuła et al., 2018c). This, confronted with the ATS recommendation of a three-drug (INH-RIF-EMB) regimen for the treatment of M. kansasii disease, speaks for exclusion of EMB and its replacement with other potent drug, such as moxifloxacin.
In conclusion, the present paper updates and extends the findings of earlier investigation on the taxonomy of M. kansasii.      All test reactions are given as "+" (positive), "-" (negative) or "v" (variable); NT, not tested. Numbers in superscripts represent percentage of strains reacting as indicated. a According to the WGS-based (MiSI method) grouping. Vertically are given the newly proposed names for each of the M. kansasii subtype; (T), type strain.
*Results according to Kent and Kubica (1985); prior to subtype delineation. **Results according to Shahraki et al. (2017). ***Results according to Tagini et al. (2019  Not only does it further substantiate the delineation of new species from the M. kansasii group to replace the former subtypes I-VI, but consolidates the position of five of the so erected species, and provides a description of the sixth one, M. ostraviense, a successor of the subtype IV. By showing a close genetic relatedness, a monophyletic origin, and overlapping phenotypes, our findings support the recognition of the M. kansasii complex (MKC), accommodating all M. kansasii-derived species and M. gastri. Neither of the most commonly used taxonomic markers can accurately distinguish all the MKC species. Likewise, no species-specific phenotypic characteristics exist that would allow identification of the species, except the non-photochromogenicity of M. gastri. In the context of the previously proposed polyphasic strategy in resolving species boundaries and their interrelatedness, FAME (fatty acid methyl ester) analysis, as an adjunct typing method, might be useful (Saini et al., 2009). However, preparatory techniques for FAME analysis are typically time-consuming, laborious, and material-intensive. Furthermore, chromatography-dedicated facilities require investment in instrumentation and training, and despite their services being offered by universities and other centres, they are often less accessible than sequencing facilities, even in the developing countries.
To distinguish, most reliably, between the MKC species, and between M. kansasii and M. persicum in particular, wholegenome-based approaches should be applied.
Since no clear differences in the repertoire of the virulenceassociated RD1 genes have been observed among the M. kansasii-derived species, the pathogenic capacity of each of these species can only be speculated based on their prevalence among the clinically relevant population. Large-scale molecular epidemiological studies are needed to gain a better understanding of the clinical significance and pathobiology of the MKC species.

Description of Mycobacterium ostraviense sp. nov. Jagielski and Ulmann
Mycobacterium ostraviense [os.tra.vi.en'se. N.L. neut. adj. ostraviense pertaining to Ostravia, the Latin name of Ostrava, a city in the north-east of the Czech Republic where one of the strain was isolated].
The species name refers to the former M. kansasii subtype IV. Mature colonies, of rough surface and photochromogenic, are observed on Löwenstein-Jensen medium after more than 7 days of incubation at 37 • C (Supplementary Figure 1). No growth occurs at 45 • C and on the media containing TCH or 5% (w/v) NaCl. Similar to other MKC species, it tests positive for the semi-quantitative catalase (>45 mm) and 14day arylsulfatase activity, while negative for niacin accumulation, tellurite reduction, and urease and pyrazynamidase activities. Unlike to other M. kansasii-derived species but similar to M. gastri, it does not reduce nitrates, nor it produces a heat-stable (68 • C) catalase. Strongly resistant to ethambutol (>16 mg/L) but susceptible to amikacin, clarithromycin, co-trimoxazole, linezolid, fluoroquinolones, and rifamycins.
The species has the same 16S rRNA sequences as M. kansasii (former subtype I) and M. gastri, yet it displays unique sequences at the hsp65, tuf, and rpoB genes, and the ITS locus. At the genomic level, it is most closely related to M. gastri, with pairwise ANI and GGD values of 95.2 and 0.056, respectively.
The type strain, 241/15 T was isolated from a sputum of a patient with no NTM disease, based in Karviná, near Ostrava, in the Moravian-Silesian Region of the Czech Republic. The type strain has been deposited in the Leibniz Institute German Collection of Microorganisms and Cell Cultures (DSMZ; Braunschweig, Germany) under the accession number DSM 110538.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

ETHICS STATEMENT
Analyses were based on data that did not contain any sensitive personal information. Therefore, informed consent and ethical approval were not required in like with local legislation.

AUTHOR CONTRIBUTIONS
TJ conceptualized and supervised the study, provided the funding, organized and integrated the data, and wrote the manuscript. PB, JL, and DS performed bioinformatic analyses including AF-ANI, GGD, and phylogenetic tree analysis. ZB performed culturing, subtyping, and phenotypic profiling of M. kansasii strains. BM carried out analysis on regions of difference 1-14 (RD1-14) with a homemade script Diffind. AB carried out DNA isolations for whole-genome sequencing. JD analyzed the results on the distribution of the RD1-14 genes in M. kansasii genomes. MD constructed the core-genome phylogenies. LP performed drug susceptibility testing. JI provided 13 M. kansasii strains of subtypes I-VI and critically reviewed the manuscript. MZ-D co-performed phenotypic assays.