Comparative genome analysis reveals high-level drug resistance markers in a clinical isolate of Mycobacterium fortuitum subsp. fortuitum MF GZ001

Introduction Infections caused by non-tuberculosis mycobacteria are significantly worsening across the globe. M. fortuitum complex is a rapidly growing pathogenic species that is of clinical relevance to both humans and animals. This pathogen has the potential to create adverse effects on human healthcare. Methods The MF GZ001 clinical strain was collected from the sputum of a 45-year-old male patient with a pulmonary infection. The morphological studies, comparative genomic analysis, and drug resistance profiles along with variants detection were performed in this study. In addition, comparative analysis of virulence genes led us to understand the pathogenicity of this organism. Results Bacterial growth kinetics and morphology confirmed that MF GZ001 is a rapidly growing species with a rough morphotype. The MF GZ001 contains 6413573 bp genome size with 66.18 % high G+C content. MF GZ001 possesses a larger genome than other related mycobacteria and included 6156 protein-coding genes. Molecular phylogenetic tree, collinearity, and comparative genomic analysis suggested that MF GZ001 is a novel member of the M. fortuitum complex. We carried out the drug resistance profile analysis and found single nucleotide polymorphism (SNP) mutations in key drug resistance genes such as rpoB, katG, AAC(2')-Ib, gyrA, gyrB, embB, pncA, blaF, thyA, embC, embR, and iniA. In addition, the MF GZ001strain contains mutations in iniA, iniC, pncA, and ribD which conferred resistance to isoniazid, ethambutol, pyrazinamide, and para-aminosalicylic acid respectively, which are not frequently observed in rapidly growing mycobacteria. A wide variety of predicted putative potential virulence genes were found in MF GZ001, most of which are shared with well-recognized mycobacterial species with high pathogenic profiles such as M. tuberculosis and M. abscessus. Discussion Our identified novel features of a pathogenic member of the M. fortuitum complex will provide the foundation for further investigation of mycobacterial pathogenicity and effective treatment.


Introduction
Non-tuberculosis mycobacteria (NTM) are ubiquitous, freeliving, environmental saprophytic organisms that can cause human infections, including respiratory and skin diseases in both immunocompetent and immunocompromised individuals (Chan and Iseman, 2013;Maya et al., 2022). The prevalence of NTM infections has risen over the years around the world (Reves and Schluger, 2014). The yearly incidence of lung infections caused by NTM has increased from 3.13 to 4.73 per 100000 people, whereas the prevalence of NTM rose dramatically from 4.24% in 2014 to 12.68% in 2021, indicating a significant increase in the NTM outbreak in China. (Winthrop et al., 2020;Jia et al., 2021;Sun et al., 2022). The majority of NTM are rapidly growing species that have been associated with serious infectious diseases (De Groote and Huitt, 2006). Importantly, previous investigations reported that M. abscessus could be transmitted from person to person, and its prevalence has been widely observed in hospitals (Aitken et al., 2012;Bryant et al., 2013). The fast-growing mycobacteria comprise some clinically relevant species which include M. fortuitum, M. abscessus, M. smegmatis, and M. chelonae (De Groote and Huitt, 2006). M. fortuitum is frequently isolated from both respiratory and non-respiratory specimens (Park et al., 2008;Bryant et al., 2013).
M. fortuitum complex comprises opportunistic pathogens usually found in water, soil, and dust that are of clinical relevance to both humans and animals (Pavlik et al., 2021). The  (Johansen and kremer, 2020;Brown-Elliott and Philley, 2017;Brown-Elliott and Wallace, 2002). M. fortuitum is a rapidly growing human-pathogenic species that cause pulmonary, eye, post-surgical, and catheter as well as skin and soft tissue infections (Koh et al., 2006;Brown-Elliott et al., 2012;Diaz et al., 2019;Erber et al., 2020). M. fortuitum in respiratory samples has been categorized based on colony morphologies, growth characteristics, and in vitro resistance to anti-mycobacterials (Daley and Griffith, 2002).
Whole-genome sequencing (WGS) technologies are the new strategies for understanding the molecular basis of drug resistance, metabolism, and evolution of pathogens. The conventional methodologies may have several drawbacks, particularly discordance between phenotypic and genotypic susceptibility testing outcomes. Illumina HiSeq sequencing can provide a variety of sequencing data for differentiating the distinct variable gene expressions between various samples. The sequencing data can also help in the assembly of de novo organism genomes (Austin et al., 2017). Highly accurate base-by-base sequencing is provided by this technique with almost no errors and up to 750 GB of data can be produced per sequencing run (Teng et al., 2017). Despite these advantages, Illumina sequencing's low-quality transcripts and short reads can significantly reduce the scope of analyses of transcriptional variations and accurate annotation (Hert et al., 2008). However, WGS employs a short-read sequencing platform that allows for the identification of additional resistance-associated mutations. For instance, due to the repetitive structure and high GC content of the mycobacterial genome, amplification bias occurs frequently during library preparation, resulting in fragmented genome assembly and other genetic variations such as INDEL and copy number variations (Treangen and Salzberg, 2011;Leung et al., 2017).
The recently developed PacBio RS II with single-molecule real-time (SMRT) sequencing methods overcome the drawbacks of conventional methods. PacBio RS II offers longread or whole transcriptomes (Eid et al., 2009) which enable the large-scale long-read transcript collection with complete coding sequences and the characterization of the various gene families. Moreover, SMRT has enabled de novo assembly of the mycobacterial genome much easier (Roberts et al., 2013). SMRT sequencing can easily span highly repetitive DNA sequences due to its average read length of 10-20 kbp (Qi et al., 2013;Zhu et al., 2016;Ferrarini et al., 2013). Additionally, it can reduce the number of gaps in the final assembled mycobacterial genome. Isolation of new rapidly growing mycobacteria (RGM) species and comprehensive analysis of the WGS of several mycobacterial isolates seem to be important for such kind of research.
In this study, we sequenced M. fortuitum subsp. fortuitum (designated as MF GZ001) isolated from a patient with pulmonary infection. This was deduced from conserved sequences of 16S rRNA hsp65, and rpoB genes. We performed comparative studies with RGM and slow-growing mycobacteria (SGM) to understand the genomics, phylogeny, pathogenicity of this mycobacterial species, and the evolution of drug resistance. This investigation provides a genome-based description, which is ensuring that it's related to the M. fortuitum complex and displayed novel features of a potential pathogenic species.

Strains collection and growth conditions
The MF GZ001 clinical strain was collected from the sputum of a 45-year-old male patient with a pulmonary infection. The patient with the symptoms of non-paroxysmal irritant cough with yellow and white sputum, chest pain, shortness of breath, headache, and low fever was admitted to the Guangzhou Chest Hospital, Guangzhou, China. The isolate was grown on Middlebrook 7H11 agar (Becton, Dickinson, and Company) supplemented with 0.2% glycerol (Shanghai Macklin Biochemical, Shanghai, China) and 10% oleic acid-albumin-dextrose-catalase (OADC) and in Middlebrook 7H9 (Difco, Becton, Dickinson and Company, New Jersey, USA) broth medium supplemented with 10% OADC, (Difco) and 0.05% Tween-80 (Amresco, USA).

Growth kinetics and morphology detection
To determine the bacterial growth curves, three mycobacterial strains MF GZ001, M. abscessus GZ002 (Guo et al., 2016;Chhotaray et al., 2020), and M. smegmatis C 2 155 were collected from Guangzhou Chest Hospital and performed phenotypic characterization. Firstly, the bacterial strains were grown in Middlebrooks 7H9 broth medium to log phase. Then the strains were diluted up to OD 600 0.01 in each 100 mL flask of every strain and placed in a shaking incubator for 72 hrs at 200 rmp to determine the bacterial growth curve. The bacterial growth rates were measured by detecting OD 600 readings of bacterial strains every 6 hrs intervals using a spectrophotometer. The bacterial growth curve analysis was performed by GraphPad Prism version (8.0.2).
For the colony morphology study, the mycobacterial strains were cultured in Middlebrook 7H9 broth medium up to OD 600 1. Then all bacterial strains were equalized by dilution at 10 -5 and cultured for 7 days at 37°C. Bacterial colony images were captured after 7 days of incubation and visualized by microscope  to determine the size and surface structure of the colony respectively.

Library construction and WGS
MF GZ001 strain was cultured in 7H9 broth and extracted the genomic DNA (gDNA) by using MAagNA Pure LC DNA Isolation Kit III. The gDNA was fragmented and collected for the preparation of SMRTbell DNA template libraries. Briefly, the DNA fragments were end-repaired and the barcode overhang adapter-ligated by removing the single-strand overhangs. The library was quantified using a Qubit (version 3.0) Fluorometer (Invitrogen, Carlsbad, CA), and checked the library size using an Agilent 2100 Bioanalyzer System. Subsequent steps were followed as per the manufacturer's instructions to prepare the SMRTbell library. The constructed library was sequenced using the Sequel II sequencing platform and PacBio reads were assembled using HGAP4 (version 4.0)/Falcon of WGS-Assembler (Version 8.2) (Myers et al., 2000;McCarthy, 2010). The genome sequence was re-corrected with Pilon software using previous Illumina data or Quiver using Pacbio reads to resolve the errors that were found during SMRT sequencing. The paired-end library was constructed by using the Illumina HiSeq instrument (Illumina, San Diego, CA, USA). The library construction protocol and bioinformatics analysis are broadly illustrated in the supplementary material (Supplementary material 1). The preliminary Illumina raw data reads were trimmed at the percentage of bases with a Phred value greater than 20 or 30 (less than 0.1%/1% probability of error).
The unnecessary bases and reads of Pass Filter data were removed to get clean data using Cutadapt (version1.9.1) and the Burrows-Wheeler Alignment tool (BWA) (version 0.7.12) (Li and Durbin, 2009) was used to align the clean data generated from MF GZ001 to M. fortuitum CT6 reference complete genome sequence (NZ_CP011269.1) (Costa et al., 2015). All statistical analyses of raw data which were obtained from Illumina and SMRT sequencing are attached in the supplementary materials ((Tables S1-S6). The Prodigal (version 3.0.2, prokaryote) (Delcher et al., 2007) and Augustus (version 3.3, eukaryotes) (Stanke et al., 2006) both gene-finding software were used for predicting the coding genes. Non-coding RNA (ncRNA) includes rRNA, tRNA, snRNA, snoRNA and microRNA. Among the ncRNA, tRNAs and rRNA were detected by using tRNAscan-SE (version 1.3.1) (Lowe and Eddy, 1997), and Barrnap (version 0.9) respectively in the genome assembly. Additionally mapping Rfam (version 12.2) (Nawrocki et al., 2015) method was used to predict other ncRNAs. The coding genes were annotated with National Center for Biotechnology Information (NCBI) NR database by Diamond blastp. Functional categories in the genome were assigned to the Gene Ontology (GO) by using InterProScan software (Harris et al., 2004), to the Clusters of Orthologous Groups of proteins (COGs/KOG) (Tatusov et al., 2003;Han et al., 2018) database using rpstblastn software, and to the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000) pathway database by performing the KEGG database (http:// www.genome.jp/kegg/) with Blastn software. Carbohydrate-Active Enzyme (CAZy) annotation was displayed using Diamond blastp (Luo et al., 2022). The Pfam was annotated for a large collection of protein families using the Pfam database (http://pfam.xfam.org/) (Punta et al., 2012), and Swiss_Prot was annotated by applying the Swiss_Prot database (https://www.ebi. ac.uk/uniprot) (Magrane, 2011). The Circos (version 0.69) software was used to display the circular plot and to describe the common feature of the genome.

Construction of phylogenetic trees
The evolutionary trees were constructed to represent the proximity of the relatedness between referred genome MF GZ001 and other 30 mycobacterial species. Single and combined gene-based (16S rRNA, hsp65, and hsp65-rpoB) three phylogenetic trees were constructed using FastTree and Mafft software (Price et al., 2010;Katoh and Standley, 2013;Robbertse et al., 2017). To generate the phylogenetic trees, the gene sequences were aligned with the reference sequences using Mafft software (version 7.310) (Katoh and Standley, 2013), and then evolutionary trees were constructed by using FastTree (Version 2.1.10.Dbl) (Price et al., 2010), which predicts nucleotide evolution using the Jukes-Cantor model and infers phylogenetic trees via approximately maximum-likelihood methods. Moreover, a comparison of the genetic relatedness between prokaryotic organisms is performed using average nucleotide identity (ANI) analysis. The online ANI tool was used to calculate the ANI value of the MF GZ001 genome as well as of its closely related species (https://www.ezbiocloud.net/ tools/ani) (Yoon et al., 2017). Annovar software (Yang and Wang, 2015;Wang, 2020) was used to understand the annotation of mutation sites, the construction of the SNP/indel distribution map, and the detection of amino acid change caused by mutations.

Drug-resistant genes and virulence factors distribution, prediction, and analysis
Drug-resistant gene prediction and distribution were done using the comprehensive antibiotic resistance database (CARD) (Alcock et al., 2020). The CARD is a rigorously curated collection of characterized, peer-reviewed resistance determinants and associated antibiotics, organized by the antibiotic resistance ontology (ARO) and AMR gene detection models. M. fortuitum CT6 (CP011269.1) strain genome sequence of drug-resistant associated genes was obtained from the NCBI database which was used as a reference sequence. A BLAST search was done against the newly identified M. fortuitum complex member MF GZ001 strain using these genes as the query. To find virulence genes and virulence factors related genes, the protein sequences predicted by RAST were BLAST searched against the comprehensive online resource virulence factors database (VFDB) (Yang et al., 2007). We selected the genes that were orthologous to virulence genes with at least 60% identity and 60% sequence coverage in query and subject using our own Perl scripts. For the comparative study of virulence genes, the reference sequences were compared with M. fortuitum complex strains and 21 other mycobacterial species by using the same approaches to indicate the alterations in species-specific genes.

Prophage and CRISPR predictions
For the prediction of prophage, the annotated sequence of the MF GZ001 genome was checked using PhiSpy software (version 4.1.16) (Akhter et al., 2012). Additionally, clustered regularly interspaced short palindromic repeats (CRISPRs) were detected in the genome by using MinCED (from the CRISPR recognition tool) (Bland et al., 2007;Grissa et al., 2007).

Data availability
The MF GZ001 genome sequence data were deposited in the NCBI database under accession number CP107719. Raw data of WGS were deposited in the NCBI Sequence Read Archive (SRA) under the accession SRR22164027. All bacterial strains and analyses are illustrated in this manuscript and its Supplementary materials.

Growth kinetics and morphology study
To compare the growth kinetics of MF GZ001 and other NTM, growth curves were plotted for MF GZ001, M. abscessus GZ002, and M. smegmatis C 2 155. After culturing in Middlebrook 7H9, OD 600 was determined at 6 hrs intervals to detect bacterial growth rates. Both MF GZ001 and M. smegmatis C 2 155 strains have shown faster growth than M. abscessus GZ002 strain in terms of their OD 600 at various time intervals and same growth conditions ( Figure 1A), though the differences were not statistically significant (P > 0.05). To better understand mycobacterial morphology, the MF GZ001 and M. abscessus GZ002 strains were grown on 7H11 Middlebrook agar plates and incubated for 7 days at 37°C. We have noticed a colony shape of MF GZ001 slightly larger than the M. abscessus GZ002 strain by measuring the diameter of a single colony ( Figure 1B). OLYMPUS TH4-200 microscopic visualization of colony structure revealed that MF GZ001 has a wrinkled colony surface and rough morphotype, whereas M. abscessus GZ002 colonies are non-wrinkled and exist in a smooth morphotype ( Figure 1C).
General overview of MF GZ001 genome MF GZ001 genome was sequenced using SMRT sequencing technology and Illumina HiSeq sequencing platform at high sequencing depth. The mapping ratio of the genome was 98.79% which covered 100% reads of the genome (Table S6, Figure S1). For the paired-end data of sequencing, the total number of reads and bases counts were 19187140 and 2878071000 respectively. After trimming, the average length of sequence reads was 149.50 bp. The quality evaluation of Pacbio raw reads of sequences and bases obtained from MF GZ001 were 1368432 and 4062640505, respectively. The size of the MF GZ001 genome is 6413573 bp. The final assembly of the genome was circularized with a high G +C content of 66.18% (Figure 2, Table 1). It contained 6156 protein-coding genes, whereas protein-coding genes with enzymes were 1450. In addition, a total of 98 ncRNAs, 6 rRNAs, 55 tRNAs, and 37 other ncRNAs were identified (Table 1). Moreover, a prophage with approximate size of 39521 bp (Table S7) and three repeat numbers of clustered regularly interspaced palindromic repeats (CRISPR) were predicted in the genome (Table S8).

Phylogenetic construction analysis
The phylogenetic taxonomic position of M. fortuitum MF GZ001 within the mycobacterium genus was constructed based on FastTree and Mafft's alignment of the target sequence of 33 mycobacterial species. Initially, the phylogenetic tree was constructed based on the worldwide known bacterial identification marker genes 16S rRNA and hsp65. The 16S rRNA-based phylogenetic tree indicated that MF GZ001 was the closest sub-species to M. fortuitum ( Figure S8A). Among several mycobacterium species, the other closest members were M. phocaicum, M. mucogenicum, M. septicum, and M. bacteremicum, which belongs to M. fortuitum complex. In addition, hsp65-based phylogenetic analysis showed that MF GZ001 is closely related to M. fortuitum W4 ( Figure S8B), which suggests that MF GZ001 is a member of the M. fortuitum complex. Later on, we reconstructed a phylogenetic tree based on multiple gene approaches (16S rRNA, hsp65, rpoB) to further confirm the placement of MF GZ001. The multiple gene-based reconstructed phylogenetic results were most related to M. fortuitum CT6 ( Figure S8C) and distantly related to SGM. Finally, we subjected MF GZ001 to ANI analysis using the whole genome sequences to clarify the inconsistency from the single gene-based phylogenetic predictions. The ANI values between MF GZ001 and M. fortuitum CT6, M. alvei CIP103464, M. septicum DSM44393, and M. mageritense JR2009 strains were 98.77%, 86.26%, 86.53%, and 80.38% respectively. This study revealed that the MF GZ001 strain shares the most genetic relatedness with M. fortuitum CT6.

Functional annotation study
The functional pathways of annotated genes were interpreted using protein-coding genes of the MF GZ001 strain in the KEGG database. A total of 2411 genes were annotated to six-factor types and 40 KEGG functional pathways along with 558 for amino acid metabolism ( Figure  S2; File.Xls.S1). In addition, there were 23 genes which are possibly related to different pathways in cancer, for instance central carbon metabolism in cancer, proteoglycans in cancer, K005215 prostate cancer, and K05219 bladder cancer. These genes are not only related to cancer-associated pathways but also other pathways like lipid metabolism, pyridine metabolism, and drug metabolism. Moreover, 1 aquaporin z, 113 uncharacterized proteins, drug resistance genes, and human infectious diseasesrelated genes (File.Xls.S1). COG functional annotation of the MF GZ001 genome was studied by the COG database. A total of 4315 common COGs have been found in the MF GZ001 genome. For the MF GZ001 strain, the genes were functionally categorized into 21 different groups ( Figure S3; File.Xl.S2). The general and unknown functions of functionally annotated MF GZ001 genome sequence have been predicted by R and S categories respectively. Notably, there were 361 genes found as unknown function categories and no homologs were identified in the COG database. The majority of the remaining functional annotation categories were represented by "transcription (K category; 530), lipid transport and metabolism (I category; 486), secondary metabolites biosynthesis, transport, and catabolism (Q category; 434), energy production and conversion (C category; 385), amino acid transport and metabolism (E; category; 383) (File.Xl.S2). In addition, the non-redundant homologous species distribution map was created for comprehensive information using the NCBI database. Among different mycobacterial species, the highest (64.83%) similarity was found with M. fortuitum ( Figure S4, File Xl.S3). However, other analyses based on blast to GO, CAZy, Pfam, and Swiss_Prot database gene annotation result statistics   Circular representation of MF GZ001 genome displayed with Circos (version 0.69). The circular plot has seven levels. From outside to inside, the first is the information of genome position, the second is GC content information, the third is positive strand genes (marked in red color), the fourth is negative strand genes (marked in green color), the fifth is positive strand ncRNA data (marked in blue color), the sixth is negative strand ncRNA data (marked in purple color) and the seventh shows long repeats (>100 bp).

Comparative genomic analysis
To investigate the genomic evolution of MF GZ001 strain, based on phylogenetic construction analysis and ANI analysis results, we compared the MF GZ001 strain with the more closely related species M. fortuitum CT6 as well as other RGM and SGM mycobacterium species. The evolutionary results emphasized that the MF GZ001 strain has a close relationship with M. fortuitum subsp. fortuitum. The reannotation results revealed that MF GZ001 strain genome size (6413573 bp) and CDS sequences were higher than M. fortuitum CT6 (6254616 bp) ( Table 2). Interestingly, the MF GZ001 has the highest number of conserved unknown functional genes (1157) compared to M. fortuitum CT6, M. abscessus GZ002, M. smegmatis C 2 155, and M. tuberculosis H37Rv (Table 2). Additionally, the MF GZ001 genome contains more unique genes (333) than the reference genome M. fortuitum CT6 (264) and has more gene density (base pairs per gene) compared to other mycobacterial genomes (Table 2). Moreover, the analysis endorsed that the MF GZ001 strain is substantially diversified compared to the M. fortuitum CT6 strain.
The advantage of high accuracy in obtaining de novo sequencing is to identify both major and minor variations in the genome that can reveal the origin of strains or have effects on the gene function. For instance, we have detected 74168 single-nucleotide variations (SNVs) and 2001 INDEL mutations in the MF GZ001 strains compared to the reference genome (File.Xl.S6). The sequencing data produced 50914 synonymous, 17563 non-synonymous SNVs, and 2001 INDEL mutation (Table 3). Interestingly, there were 174 intergenic mutations belonging to the two genes (1_3425, 1_3426), and also obtained stop loss (33) and stop gain (110) variants in the MF GZ001 strain (File.Xl.S6 and Table 3). To know the divergence of other mycobacterium species, we investigated the SNVs data of MF GZ001 not only comparing it with closely related species but also with distantly related species. Additionally, the highest number of SNVs was obtained from MF GZ001 compared to M. smegmatis C 2 155. Specific positions of SNVs/INDEL in genes and mapping distribution of SNVs have been illustrated in Supplementary materials (File.Xl.S6 and Figure S7).
To study the genome modification, the collinearity analysis of the MF GZ001 strain compared to the M. fortuitum CT6, M. abscessus GZ002, M. smegmatis C 2 155, and M. tuberculosis H37Rv was performed. The collinearity outcomes predicted from the MF GZ001 genome sequence was more aligned with the M. fortuitum CT6 strain, exhibiting no significant genomic modifications ( Figure 3). The highest matching collinearity results are consistent with our previous analysis. Using the blast core ratio algorithm, we have investigated the MF GZ001, M. alvei CIP103464, M. fortuitum CT6, M. brisbanense UM_WWY, M. septicum DSM44393, and M. mageritense JR2009 genome to determine the account number of core gene clusters, pan-gene clusters, specific gene clusters, unique, and other gene clusters. There were 11503 pan-gene clusters with 38843 genes, 3480 core gene clusters (21621 genes), and 4098 specific gene clusters (4359 genes) ( Figure 4A). The comparative analysis of the pan-genome of the M. fortuitum complex detected lower and nearly similar unique gene clusters  Figure 4B).

Drug resistance profile and variants found in the MF GZ001 strain
From a comprehensive drug resistance analysis, the MF GZ001 strain encodes multiple drug-resistant related genes against important drugs, including RIF, macrolides, fluoroquinolones, tetracyclines, triclosan, penem, peptide antibiotics, and cephamycin. The analysis reveals that the highest (18.59%) drug-resistant related genes were associated with triclosan (71), 15.71% macrolides (60), 12.3% fluoroquinolones (47), tetracyclines (45), 5.5% RIF (21) and Diagram depicting genomic comparisons obtained using Mauve software. The alignment display is organized into one horizontal "panel" per input genome sequence. Each genome's panel contains the name of the genome sequence, a scale showing the sequence coordinates for that genome, and a single black horizontal center line. Colored block outlines appear above and possibly below the center line. Each of these block outlines surrounds a region of the genome sequence that is aligned to part of another genome and is presumably homologous and internally free from genomic rearrangement. Regions outside blocks lack detectable homology among the input genomes. Inside each block, Mauve draws a similar profile of the genome sequence. The height of the similarity profile corresponds to the average level of conservation in that region of the genome sequence. Areas that are completely white were not aligned and probably contain sequence elements specific to a particular genome. The height of the similarity profile is calculated to be inversely proportional to the average alignment column entropy over a region of the alignment. 7.33% cephalosporin (28) ( Figure 5; Table S10). Moreover, the prevalence of the underlying SNP mutations was found in the most comprehensive drug resistance-related genes, for example, rpoB, katG, AAC(2')-Ib, gyrA, gyrB, embB, pncA, blaF, thyA, embC, embR and iniA. The most prevalent and more than 60% identity resistance-related genes have been shown in supplementary materials (Table S13). The SNP investigations showed some mutations that are not commonly observed in RGM species. For instance, the mutations in iniA, iniC, pncA, and ribD are conferred resistance to isoniazid, ethambutol, pyrazinamide, and paraaminosalicylic acid in M. tuberculosis (Table S13). Additionally, we have detected a mutation in the AAC (2')-Ib gene that causes streptomycin resistance and is more frequent in the M. fortuitum complex.

Comparative study of virulence genes
The virulence genes analysis reveals that the MF GZ001 genome included 167 putative virulence genes that are shared by other SGM and RGM pathogenic mycobacterium species (File.Xl.S8). We have studied the predicted virulence gene clusters across the MF GZ001 and 21 other mycobacterial strains (Table S14) based on the existence and non-existence of specific genes which indicated the relationship status between MF GZ001 and M. fortuitum complex. The identified 167 putative virulence genes in the mycobacterium species are responsible for inducing nitrate reductase activity, mycobacterial cell envelope, inhibition of apoptosis, lipid and protein metabolism, gene regulation, and resistance of antimycobacterial agents. For example, the MF GZ001 genome contains nark2 non-redundant virulence genes which encode the nitrite transporter proteins related to exhorting nitrate reductase activity in response to reducing oxygen level by regulating its transcriptional regulation level whereas narH, narG, and narI (Table S14) genes encoded the proteins responsible for nitrate respiration in the absence of oxygen in the M. tuberculosis and M. fortuitum complex (Weber et al., 2000;Giffin et al., 2012). The nitrate reductase virulence factor lack in other mycobacterial species like M. abscessus, M. leprae, M. marinum, and M. ulcerans. Importantly, we also reported the virulence genes in MF GZ001 for instance fbpA, fbpB, fbpC, hbhA, and mce, encoded proteins belonging to antigen 85 complex locus (File.Xl.S8) essential for the synthesis of the bacterial cell wall (Armitige et al., 2000). These genes encoded proteins function is mycolyltransferase activity, which is essential for the development of the cell wall and the survival of mycobacteria (Mandato and Chai, 2018). We have identified five sigma factors (transcription initiation factors) that are linked to virulence in M. tuberculosis H37Rv (SigAP/rpoV, sigE, sigF, sigH and sigL), as well as the mammalian cell entry mce operons  (Table S14) that are extensively distributed across mycobacteria (Arruda et al., 1993;Wee et al., 2016). The mce operon has been demonstrated to be crucial for mycobacterial invasion and persistence in host macrophages and non-phagocytic mammalian cells, with mce4 being associated with cholesterol catabolism (Zhang and Xie, 2011;Griffin et al., 2011). The MF GZ001 genome has seven operons mce1, mce3, mce4, mce5, mce6, mce7, and mce8, and the absence of mce2 and mce9 operons. MF GZ001 contained mce1 operon homolog to M. tuberculosis but has not been detected in most therapeuticresistant pathogen M. abscessus. The Type VII secretion machinery is essential for mycobacterial pathophysiology and virulence (Groschel et al., 2016). Mycobacterium species include ESX 1 to 5 virulence factors and ESX-2 and ESX-5 factors are noticed in SGM. The ESX-5 is only present in SGM that could appear differentiated between RGM and SGM mycobacteria (Beckham et al., 2017). MF GZ001 genome contained ESX-1 (espR, mycP1, eccD1, espI, esxA, eccA1, eccB1, eccCa1, eccCb1, PPE68, esxB) virulence factor, without espJ, and espK genes that are related to SGM (File.Xl.S8). This result is consistent with M. fortuitum. The ESX-1 virulence factor is considered a significant virulence determinant (Madacki et al., 2021) that is associated with the inhibition of T-cell responses, inducing the differentiation of macrophages into foam cells (MacGurn and Cox, 2007;Samten et al., 2009) as well as aids in the escape of mycobacteria from phagosomes by ESAT-6 mediated perforation of vacuolar membrane (Smith et al., 2008). Moreover, we have reported that ESX-3 included esxG, esxH, espG3, eccD3, mycP3, eccE3, eccA3, eccB3, and eccC3 virulence-related genes in the MF GZ001 genome. The ESX-3 factor is present in all mycobacteria and specifically essential for in vitro growth in M. tuberculosis (Sassetti et al., 2003). The ESX-3 factor is involved in iron homeostasis and uptake through the mycobactin pathway (Newton-Foot et al., 2016).
We found novel putative virulence genes homologs to Rv1837c, groEL2, Rv0926c, and Rv0204c associated with malate synthesis, dendritic cell responses, uncharacterized hypothetical protein, and transmembrane-related protein respectively that may only present in MF GZ001 strain and M. tuberculosis H37Rv (File.Xl.S8). Additionally, we have also identified unknown function virulence factors in the MF GZ001 strain which are related to RGM and SGM.
Furthermore, the MF GZ001 strain includes genes homologs to katG, sodA, and sodC that encode enzymes such as catalaseperoxidase and superoxide dismutase A which are responsible for oxidative stress tolerance (Forrellad et al., 2013). These genes may be essential for preventing the oxidative burst that occurs during macrophage survival. We have identified nuoG which is considered a key virulence gene of M. tuberculosis that encodes NADH dehydrogenase type I complex protein in the mycobacterial membrane. Similarly, some other important virulence genes are also identified for instance secA2, and ptpA (Cossu et al., 2012). These genes have been implicated as an antiapoptotic factor for macrophages (Velmurugan et al., 2007). Consequently, these genes enable the mycobacteria to avoid the host cells' built-in mechanism for cell death.

Discussion
Immunocompromised patients are particularly vulnerable to NTM infections, and the incidence of NTM infections has significantly increased globally in recent decades (Dohal et al., 2021). In this investigation, we have provided the whole genome sequencing and comparative genome analysis of a clinical strain of Drug resistance-related genes distribution in MF GZ001 strain. Alam et al. 10.3389/fcimb.2022.1056007 NTM species, MF GZ001. In bacterial growth kinetic analysis, there was no statistically significant growth difference compared with other RGM, indicating that MF GZ001 is an RGM NTM species. The MF GZ001 has a wrinkled colony surface and rough morphotype which are consistent with a previous study (Gharbi et al., 2021). The rough morphotype M. fortuitum and M. abscessus comparatively possess more pathogenic potential (Brambilla et al., 2016;Gharbi et al., 2021). Recently, Dedrick et al., predicted the prophage length from 39.1 kb bp to 80.7 kb in clinical isolates, with an average size of ∼55.3 kbp (Dedrick et al., 2021), which was closely related to prophage (39.521 kb) from our clinical strain MF GZ001 and distantly related to prophage (<11.0 kb) from M. smegmatis (Jordan et al., 2014;Sewell, 2017). The average prophage size indicates that the M. fortuitum MF GZ001 pathogenic profile may be the same as M. abscessus. Additionally, the DST revealed MF GZ001 strain was highly resistant to most of the therapeutic agents which also indicates that MF GZ001 strain is more virulent. Based on a single marker gene and multiple gene approaches, the phylogenetic study and ANI revealed that MF GZ001 is closely related to the M. fortuitum CT6. The functional reannotation analysis revealed the new mycobacterium species MF GZ001 (6254616 bp) genome size is larger than that of M. fortuitum CT6 (6254616 bp). Interestingly, MF GZ001 contains the highest number of conserved unknown functional genes compared to M. fortuitum complex members, and other RGM and SGM mycobacterium species. Moreover, MF GZ001 included more unique genes (333) than the reference genome M. fortuitum CT6 (264). The larger genome size of MF GZ001 may reveal the diverse genomic structure of M. fortuitum complex. This diverse structure was observed may be due to horizontal gene transfer, CRISPR elements, and the prophage found in this species which may have an important impact on the fitness and pathogenicity of their bacterial hosts (Cote et al., 2022). According to previous studies of prophage in NTM clinical strains, prophage was found more likely in RGM than SGM (Dedrick et al., 2021;Senhaji-Kacha et al., 2021). The clinical NTM prophages may have more abundant virulence genes than the prophages from the environmental mycobacteria (Glickman et al., 2020). Based on an overview of the MF GZ001 genome, the prophage analysis predicted that it contains a (39.521 kb) prophage region. The prophage sequence is flanked by 13 bp phage attachment sites, attL, and attR (Table S7). Though it has been studied that clinical NTM exhibits various levels of virulence, but it is not thoroughly verified yet that whether the prophage elements containing virulence genes have any impact on treatment outcomes (Gonzalez-Perez et al., 2013). The existence of virulence genes does not imply that they are being actively expressed, and the presence of a prophage in a genome does not prove that the virus is excisable or functional (Glickman et al., 2020). According to the literature contradictory review, it would be beneficial to conduct further research to characterize the prophage of the M. fortuitum complex. Mycobacterium species can also transfer DNA intrinsically and incorporate DNA from foreign sources between species (Rabello et al., 2012;Wee et al., 2016). In addition to comparative analysis based on genomes such as higher level of colinearity, a Venn diagram, and a pan-genome study showed a close relationship between MF GZ001 and M. fortuitum CT6. Although the genome variation was also observed, more characteristics were shared with M. fortuitum complex members than other mycobacterium species. This investigation uncovered a close link between MF GZ001 and M. fortuitum complex which provides insight into the deep evolution of pathogenesis in RGM species. To our knowledge, this is the first report of SNV/INDEL mutation based on de novo sequencing methods in M. fortuitum complex members. In this report, we have detected SNV/INDEL mutations compared to the reference strains and other mycobacterium species. We have identified 74168 SNVs and 2001 INDEL mutations in the MF GZ001 strains compared to the reference genome. Moreover, we have identified stop loss (33) and stop gain (110) variants in the MF GZ001 strain. Most putative resistance mutations were observed in nonessential areas where hundreds of loss-of-function mutations might cause resistance (Miotto et al., 2017). Several variants conferred in a loss of function due to INDEL or stop codons (Richard et al., 2019). Importantly, we have found 174 intragenic mutations in two genes (1_3425, 1_3426) that are of unknown function. This study further emphasizes investigating the role of genes and the therapeutic resistance mechanisms in the RGM M. fortuitum complex.
Furthermore, the MF GZ001 drug resistance profile was studied via in vitro DST and prediction of drug resistance genes and variants with in silico analysis. M. fortuitum infected cases resistant to clarithromycin and fluoroquinolones have also been reported previously (Kurokawa et al., 2020). In vitro testing of MF GZ001 showed high resistance to macrolides, clofazimine, sulfamethoxazole, levofloxacin, streptomycin, and RIF. No frequent studies have been conducted to investigate the M. fortuitum complex. In this first report, drug resistance gene prediction through in silico approaches showed 18.59% genes associated with resistance to triclosan, 15.71% to macrolides, and 12.3% to fluoroquinolones. We have found SNPs in drug resistance-related genes such as rpoB, arr, AAC(2')-Ib, katG, gyrA, gyrB, embB, blaF, thyA, embC and embR that are previously reported in M. fortuitum and other mycobacterium species (Nash et al., 2005;Ramon-Garcia et al., 2006). For instance, among them recently reported, the ribosylation ADP-ribosyltransferase arr gene in M. fortuitum 7G was linked to the high level of RIF resistance which is consistent with our findings (Morgado et al., 2022a). A mutation in the rpoB gene's particular region known as the rifampin-resistancedetermining region (RRDR) leads to the emergence of RIF resistance in NTM and M. tuberculosis (Saxena et al., 2021). Additionally, by expressing the arr gene, M. abscessus and M. smegmatis intrinsically reduced the function of RIF (Brown-Elliott et al., 2012). The erm gene may responsible for the intrinsic resistance to macrolides in M. fortuitum strains. Erm is a member of a large family of proteins that are encoded by a variety of alleles, some of which (erm37-41) are specifically linked to Mycobacteriaceae species (Nash et al., 2005). Furthermore, similar to our study the mutations in gyrA have also been reported to be responsible for quinolone resistance in M. fortuitum (Kamada et al., 2021).
Importantly, the SNPs found in genes of MF GZ001, for example, iniA, iniC, pncA, and ribD conferring resistance to isoniazid, ethambutol, pyrazinamide, para-aminosalicylic acid, and isoniazid resistance respectively, are not commonly observed in RGM. We have reported a mutation in a distinctive gene aph (3")-Ic that is related to streptomycin resistance which is consistent with other studies (Morgado et al., 2022a). The aph (3")-Ic distinctive gene was first reported in environmental M. fortuitum and later in clinical isolates as well (Ramon-Garcia et al., 2006;Morgado et al., 2022b). We anticipate that further investigation of the novel SNPs in iniA, iniC, ribD, pncA, and aph (3")-Ic detected in this study are necessary to explore their role in the development of resistance. Particularly, these well-known drug-resistant genes are found in the MF GZ001 strain, which will be useful for further study on strain identification and characterizations enriching the genetic data sources for pathogenic diseases.
The genome of MF GZ001 comprises a variety of putative virulence genes that may have enhanced its ability for intracellular replication and persistence. M. fortuitum exhibits robust pathogenicity linked to various virulence genes, for instance, mce, nar, sigma, and antigen 85 complex clusters. These genes could encode potential virulence factors associated with mycobacterial cell envelope, inhibition of apoptosis, lipid and protein metabolism, gene regulation as well as nitrate reductase activity. The mce operons distributed across mycobacterium species are responsible for mycobacterial invasion and persistence in host macrophages and non-phagocytic mammalian cells, with mce4 being associated with cholesterol catabolism (Zhang and Xie, 2011). M. fortuitum contains the nuoG gene, which encodes NADH dehydrogenase type I complex protein in the mycobacterial membrane which is considered as an emergent virulence factor in M. tuberculosis (Velmurugan et al., 2007;Cossu et al., 2012). MF GZ001 genome contains an important trpD gene which is also confirmed by the previous study. Among RGM, a gene copy of trpD was only detected in M. abscessus and M. fortuitum complex members (Gharbi et al., 2021). It encodes an anthranilate phosphoribosyltransferase associated with tryptophan biosynthesis which has contributed a significant role during infection of patients for SGM (Zhang et al., 2013). Particularly, the MF GZ001 genome has katG, sodA gene for oxidative stress resistance which may be crucial for preventing the oxidative burst that occurs during macrophage survival. These genes have been shown by Wee et al., in M. brisbanense which is a member of the M. fortuitum complex (Wee et al., 2016). Furthermore, in recent years it has been identified that the genes responsible for the biosynthesis of the many unique lipids are found in the mycobacterial cell wall. For instance, MF GZ001 contains the acyl-coenzyme A (CoA) synthase gene (fadD28) which is associated with the esterification of the acid to phthiocerol to produce dimycocerosyl phthiocerol which is involved in virulence (Fitzmaurice and Kolattukudy, 1997;Sirakova et al., 2002;Brodin et al., 2010).
Furthermore, we reported novel putative virulence genes that are homologous to Rv1837c, groEL2, Rv0926c, and Rv0204c, and are related to malate synthesis, dendritic cell responses, an unidentified hypothetical protein, and a transmembrane-related protein, respectively. These genes may only be found in the MF GZ001 strain and M. tuberculosis H37Rv. Recently, Luo et al., reported that the secretion proteins locus ESX-1, ESX-3, and ESX-5 are crucial for virulence in M. tuberculosis (Luo et al., 2022) and were found in the MF GZ001 strain which is consistent with the previous report (Gharbi et al., 2021;Morgado et al., 2022b). Therefore, the MF GZ001 genome sequence will provide insights into M. fortuitum complex potential pathogenesis. The existence of the potential virulence determinants still requires experimental evidence to detect whether the MF GZ001 strain is associated with crucial human diseases or not.
In summary, we have sequenced and analyzed the genome of the MF GZ001 strain that might be a new member of the M. fortuitum complex. Our comparative analysis indicated that it is a new member of the M. fortuitum complex and a human-pathogenic species. The findings of this study will establish a foundation for further investigations of the pathogenic mechanism of M. fortuitum as well as its diagnosis and treatment, especially in southern China.

Data availability statement
The original datasets generated in the study are publicly available. The MFGZ001 WGS and SRA data were deposited in the NCBI database under accession numbers CP107719 and SRR22164027.