Complete Genome Sequencing and Comparative Analysis of the Clinically-Derived Apiotrichum mycotoxinivorans Strain GMU1709

Over the past decade, Apiotrichum mycotoxinivorans has been recognized globally as a source of opportunistic infections. It is a yeast-like fungus, and its association as an uncommon pulmonary pathogen with cystic fibrosis patients has been previously reported. Immunocompromised patients are at the highest risk of A. mycotoxinivorans infections. Therefore, to investigate the genetic basis for the pathogenicity of A. mycotoxinivorans, we performed whole-genome sequencing and comparative genomic analysis of A. mycotoxinivorans GMU1709 that was isolated from sputum specimens of a pneumonia patient receiving cardiac repair surgery. The assembly of Oxford Nanopore reads from the GMU1709 strain and its subsequent correction using Illumina paired-end reads yielded a high-quality complete genome with a genome size of 30.5 Mb in length, which comprised six chromosomes and one mitochondrion. Subsequently, 8,066 protein-coding genes were predicted based on multiple pieces of evidence, including transcriptomes. Phylogenomic analysis indicated that A. mycotoxinivorans exhibited the closest evolutionary affinity to A. veenhuisii, and both the A. mycotoxinivorans strains and the formerly Trichosporon cutaneum ACCC 20271 strain occupied the same phylogenetic position. Further comparative analysis supported that the ACCC 20271 strain belonged to A. mycotoxinivorans. Comparisons of three A. mycotoxinivorans strains indicated that the differences between clinical and non-clinical strains in pathogenicity and drug resistance may be little or none. Based on the comparisons with strains of other species in the Trichosporonaceae family, we identified potential key genetic factors associated with A. mycotoxinivorans infection or pathogenicity. In addition, we also deduced that A. mycotoxinivorans had great potential to inactivate some antibiotics (e.g., tetracycline), which may affect the efficacy of these drugs in co-infection. In general, our analyses provide a better understanding of the classification and phylogeny of the Trichosporonaceae family, uncover the underlying genetic basis of A. mycotoxinivorans infections and associated drug resistance, and provide clues into potential targets for further research and the therapeutic intervention of infections.

Over the past decade, Apiotrichum mycotoxinivorans has been recognized globally as a source of opportunistic infections. It is a yeast-like fungus, and its association as an uncommon pulmonary pathogen with cystic fibrosis patients has been previously reported. Immunocompromised patients are at the highest risk of A. mycotoxinivorans infections. Therefore, to investigate the genetic basis for the pathogenicity of A. mycotoxinivorans, we performed whole-genome sequencing and comparative genomic analysis of A. mycotoxinivorans GMU1709 that was isolated from sputum specimens of a pneumonia patient receiving cardiac repair surgery. The assembly of Oxford Nanopore reads from the GMU1709 strain and its subsequent correction using Illumina paired-end reads yielded a high-quality complete genome with a genome size of 30.5 Mb in length, which comprised six chromosomes and one mitochondrion. Subsequently, 8,066 protein-coding genes were predicted based on multiple pieces of evidence, including transcriptomes. Phylogenomic analysis indicated that A. mycotoxinivorans exhibited the closest evolutionary affinity to A. veenhuisii, and both the A. mycotoxinivorans strains and the formerly Trichosporon cutaneum ACCC 20271 strain occupied the same phylogenetic position. Further comparative analysis supported that the ACCC 20271 strain belonged to A. mycotoxinivorans. Comparisons of three A. mycotoxinivorans strains indicated that the differences between clinical and non-clinical strains in pathogenicity and drug resistance may be little or none. Based on the comparisons with strains of other species in the Trichosporonaceae family, we identified potential key genetic factors associated with A. mycotoxinivorans infection or pathogenicity. In addition, we also deduced that A. mycotoxinivorans had great potential to inactivate some antibiotics (e.g., tetracycline), which may affect the efficacy of these drugs in co-infection. In general, our analyses INTRODUCTION Apiotrichum mycotoxinivorans (formerly Trichosporon mycotoxinivorans) is a basidiomycete yeast of the Trichosporonaceae family (Peng et al., 2019). It was first isolated from the hindgut contents of Australian lower termites (Schatzmayr et al., 2003), then identified as a new species of the genus Trichosporon based on the phylogenetic analysis of 26S rDNA, and named T. mycotoxinivorans for its beneficial detoxifying properties on the mycotoxins zearalenone and ochratoxin A (Molnar et al., 2004). Owing to the limitations of phenotype/26S rDNA-based classifications, Liu et al. (2015) proposed an updated classification for Tremellomycetes based on a seven-gene phylogeny in which T. mycotoxinivorans was assigned to the genus Apiotrichum under the name A. mycotoxinivorans. With the advances in sequencing technology, an increasing number of genomes belonging to the Trichosporonaceae family have been sequenced Close and Ojumu, 2016;Gil et al., 2018;Gorte et al., 2019;Sun et al., 2020) and released recently by different laboratories worldwide that provides an unprecedented opportunity to identify some potential controversial phylogenetic relationships in current classifications extensively (Rodrıǵuez et al., 2017).
Although A. mycotoxinivorans is universally regarded as a validated anti-mycotoxin feed additive (Khalel et al., 2012), it has been gradually recognized as an opportunistic pathogen with a known propensity for patients with cystic fibrosis. As early as 2009, Hickey et al. reported the first case of human disease caused by A. mycotoxinivorans in a non-transplant patient with cystic fibrosis. Later, Hirschi et al. (2012) reported the first case of disseminated fungal co-infection caused by A. mycotoxinivorans and another fungus, which emerged after liver and lung transplantation in a patient with cystic fibrosis. Subsequently, a case series of chronic A. mycotoxinivorans infection has been reported in patients with cystic fibrosis (Shah et al., 2014;Goldenberger et al., 2016). Unlike previous studies, Dabas et al. (2017) reported the emergence of A. mycotoxinivorans in three cases of bloodstream infections in a retrospective study. Almeida et al. (2017) detected A. mycotoxinivorans from peritoneal fluid in the early stage of disseminated infection and further from the blood in the late stage of the infection. Marcelo and Farooq (2018) presented the first case of A. mycotoxinivorans dissemination in the brain of a patient who had positive blood and stool cultures for A. mycotoxinivorans. In 2019, we also isolated and identified the A. mycotoxinivorans GMU1709 strain from sputum specimens of a pediatric patient with pneumonia, and then reported its morphological, biochemical, and molecular characteristics (Peng et al., 2019). Sadamatsu et al. (2020) reported a rare case of pulmonary co-infection with A. mycotoxinivorans and Cryptococcus neoformans. Although the aforementioned studies have adequately described the clinical cases of human disease caused by this opportunistic fungal pathogen, including the antifungal susceptibility patterns, different infection modes, and clinical consequences, the molecular genetic bases closely related to its infection, pathogenicity, and drug resistance remain largely unexplored.
As is well known, genome sequencing refers to adopting sequencing technology to obtain the whole-genome sequence of an organism, which is an important step toward associating genotypes with phenotypic characters (Verma et al., 2017). By investigating the genome database of the National Center for Biotechnology Information (NCBI), we deduced that more than 30 genomes of Trichosporonaceae have been recently published. Before performing this study, no genome sequence of A. mycotoxinivorans has been sequenced or released, although this yeast has multiple important biological properties, especially pathogenicity. Its evolutionary relationships with other species in Trichosporonaceae also need to be assessed at higher phylogenetic resolutions. Hence, we sequenced the genome and transcriptome of the clinically-derived A. mycotoxinivorans strain (GMU1709), and obtained and published its complete genome sequence for the first time on March 15, 2020. Soon after, from the NCBI genome database, we retrieved the genome of environmentally derived A. mycotoxinivorans strain (CICC 1454) that was published on May 27, 2020, which further provided important support for the intraspecific comparative analysis. We found that Sun et al. (2020) mainly determined the possible zearalenone-degradation enzymes by their bioinformatic method after they assembled and annotated the genome of CICC 1454 strain. Therefore, this study attempted to uncover the pathogenic genetic basis of A. mycotoxinivorans and the genetic differences between clinical and non-clinical strains as well as to provide a clearer phylogenetic relationship with other species. The results of this study would provide insights into the prevention and control of A. mycotoxinivorans infection and how to avoid the influence of its pathogenicity on its application in detoxification. centrifugation at 4500 ×g for 10 min and broken by grinding under liquid nitrogen for 20 min. Genomic DNA was extracted according to the cetyltrimethylammonium bromide protocol (Kim et al., 2010). Next, three different libraries were constructed by the Biomarker Technologies, Inc., according to the manufacturer's standard procedures, in which the paired-end library with a 350-bp insert size was sequenced on an Illumina HiSeq platform, the 30-kb library was sequenced using a GridION DNA sequencer, and the Hi-C library was sequenced with a 150-bp read length (PE 150) on the Illumina platform. Quality control was performed on the sequencing data as follows: For Illumina reads, adaptors and low-quality bases located at both read ends were first trimmed; then, the corresponding paired-end reads were abandoned if any read contained more than 5% unknown bases or was shorter than 30 bp after qualitybased trimming; PCR duplicates were routinely discarded using SAMTools (Li et al., 2009); For Nanopore reads, reads with mean quality < 7 or length < 500 bp were discarded; For Hi-C reads, the HiC-Pro software (Servant et al., 2015) was employed with default settings to perform quality control filtering. GCE 1.0.0 (ftp://ftp.genomics.org.cn/pub/gce) was adopted to estimate the genome size. Error correction of Nanopore reads was conducted using Canu v1.5 (Koren et al., 2017), and the corrected reads were assembled into contigs using WTDBG2 with defaults (Ruan et al., 2020). Illumina paired-end reads were mapped to the assembly, based on which dubious bases were corrected by the Pilon pipeline (Walker et al., 2014) to achieve a high-quality genome. Genomic completeness was estimated using the BUSCO value (Simão et al., 2015) and the ratio of Illumina reads mapped back to the genome. Finally, LACHESIS was used to determine the order and orientation of contigs by mapping the Hi-C reads to contigs (Burton et al., 2013).
Total RNA was extracted from GMU1709 using a TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. cDNA was synthesized from RNA using the PrimerScript RT reagent kit (Takara, Dalian, Liaoning, China). The cDNA library was prepared using the Illumina TreSeq RNA Sample Preparation Kit (Illumina, Inc., San Diego, CA, USA) following the manufacturer's protocol, in which oligo-dT beads were used to capture mRNA, the first strand DNA was synthesized using oligo dT primers. RNA-seq was conducted on an Illumina HiSeq 2500 platform with a 150-bp paired-end strategy. Transcriptome reads were mapped to the genome sequences using Hisat2 v2.0.4 (Pertea et al., 2016). Transcripts were assembled using Stringtie v1.2.3 (Pertea et al., 2016), and then adopted to provide evidence for subsequent ab initio gene prediction.

Gene Prediction and Functional Annotation
Before gene prediction, repetitive DNA sequences were identified by using RepeatModeler open-1.0.11 (Smit et al., 2015) and RepeatMasker v4.0.6 (Tarailo-Graovac and Chen, 2009) with default and masked. Gene prediction was performed as follows: First, we separately employed Genscan (Burge and Karlin, 1997), Augustus v2.4 (Stanke and Waack, 2003), GeneID v1.4 (Blanco et al., 2007), GlimmerHMM v3.0.4 (Majoros et al., 2004), and SNAP v2006-07-28 (Korf, 2004) to perform the ab initio gene prediction. Next, GeMoMa v1.3.1 (Keilwagen et al., 2016) was adopted for homology-based gene prediction, in which the protein sets from A. porosum, Cutaneotrichosporon oleaginosum, and T. asahii were taken as references. Transcriptome-based gene prediction was performed using the TransDecoder v2.0 (Haas et al., 2013) and PASA v2.0.2 (Haas et al., 2008), respectively. Finally, EVM v1.1.1 (Haas et al., 2008) was used to improve the results derived from the aforementioned three methods, as well as integrate them into the final gene set. Regarding noncoding RNAs, tRNA genes were predicted by tRNAscan-SE (Lowe and Eddy, 1997), whereas rRNA genes and other ncRNA genes were predicted based on the Rfam database (Nawrocki et al., 2014) using Infernal v1.1 (Nawrocki and Eddy, 2013). The GenBlastA (She et al., 2009) was adopted to align the predicted proteins against expertly curated proteins from the Swiss-Prot database, and then the local version of the Genewise program (Birney et al., 2004) was utilized to identify the premature termination codon and frameshift mutation of homologous genes to determine these potential pseudogenes. The functions of the predicted genes were annotated by BLAST (Altschul et al., 1997, E-value cutoff of 1e-5) based on the NCBI non-redundant database (NR) (Deng et al., 2006). SignalP v4.0 (Petersen et al., 2011) was used to identify proteins with signal peptides, and the Transmembrane Helices Hidden Markov Model program (Krogh et al., 2001) was adopted to predict proteins with transmembrane helices. Proteins with signal peptides, but without transmembrane helices, were considered as potential secretory proteins.

Whole Genome-Based Phylogenetic Analysis
We obtained all genome sequences of Trichosporonaceae (Table 1) from the NCBI GenBank database. Two Takashimella strains (T. koratensis JCM 12878 and T. tepidaria JCM 11965) in the family Tetragoniomycetaceae, which are closely related to the family Trichosporonaceae, were served as outgroups. Whole genome-based phylogenetic tree was constructed via the maximum-likelihood (ML) method as follows: First, Mugsy v1.2.3 (Angiuoli and Salzberg 2010) was adopted to identify the homologous regions, whereas the homologous blocks present in all genomes were extracted using a custom PERL script and then realigned using muscle v3.8.31 (Edgar 2004) with the default parameters. Next, nongaped sites were concatenated into a new multiple sequence alignment using a custom PERL script. Subsequently, a phylogenetic tree was constructed using raxmlHPC-PTHREADS-AVX v8.2.10 (Stamatakis 2014) with the GTRGAMMA model of evolution (100 bootstrap replicates), including a subsequent search for the best-scoring ML topology. Finally, we employed the Interactive Tree of Life v4 (Letunic and Bork, 2019) to display this tree and its relevant information.

Orthologous Gene Families and Synteny Analysis
Currently, 25 species and 36 strains from four genera ( Table 1) in the Trichosporonaceae family have been sequenced; however, most of the gene or protein sets are not available from public databases, which limits our comparative analysis. Here, we downloaded all these genomes from the NCBI genome database and carried out gene prediction using Funannotate v1.7.4 (https:// funannotate.readthedocs.io/en/latest/index.html). Subsequently, gene clustering analysis was conducted using the OrthoMCL software (Li et al., 2003) with the following parameters: p e r c e n t M a t c h C u t o ff = 8 0 , i n fl a t i o n v a l u e = 1 . 5 , percentIdentityCutoff = 80, and blastPvalueCutoff = 1e-5. The output was transformed into matrix data using a custom PERL script and then clustered using heatmap.2 in R (Zhao et al., 2014). JCVI v1.0.1 was adopted to construct chromosome-level synteny block plots with default parameters (Tang et al., 2015). For interspecies comparisons, two species with the most complete genomes (the A. brassicae JCM 1599, A. gracile JCM 10018, T. inkin JCM 9195, T. faecale JCM 2941, C. oleaginosum ATCC 20508, and C. daszewskae JCM 11166) were selected from each of three most closely related genera. For intraspecific comparisons, we used the A. mycotoxinovorans CICC 1454 and GMU1709 strains, as well as the previously reported Trichosporon cutaneum ACCC 20271 strain . Genomic comparison analyses would provide further evidence for their evolutionary relationships.

Identification of Virulence Genes and Resistance Genes
To identify the virulence and resistance genes in A. mycotoxinivorans and compare the intraspecific differences in pathogenic genetic basis between clinical and non-clinical strains and the interspecific differences between A. mycotoxinivorans and other Trichosporonaceae species (Table 1), the protein sequences of all these strains were aligned using the BLASTP command of DIAMOND (v2.0.11.149) (Buchfink et al., 2015) with an E-value cutoff of 1e-5 against the mycology antifungal resistance database (MARDy) (Nash et al., 2018), pathogen-host interactions database (PHI-base) (Urban et al., 2020), and comprehensive antibiotic database (CARD) (Jia et al., 2017). Group differences were analyzed using the Wilcox test, and the results were visualized using the pheatmap R package.

Bacterial Survival Assay
The yeast A. mycotoxinivorans strain GMU1709 was cultured in malt extract medium at 35°C and 200 rpm for 24 h. Then, the culture was collected by centrifugation for 2 min at 4500 ×g. The cell pellet was washed with 0.9% saline for 3 times, and the absorbance of suspension at 600 nm was adjusted to 2.0. The cultured Escherichia coli strain ATCC 25922 in Luria-Bertani (LB) medium was treated similarly to that of the GMU1709, but the absorbance of 600 nm was adjusted to 0.5. The experiment was divided into three groups: group A of live fungi mixed with bacteria, group B of inactivated fungi (sterilization at 121°C for 15 min) mixed with bacteria, group C of bacteria suspended in saline. The fungi and bacteria were mixed in a ratio of 2:1. All the three groups were added with 10 µg/mL of tetracycline and incubated at 35°C for 2 h, then the samples of three groups were collected respectively and cultured on LB agar plates with 10 times gradient dilution for survival bacterial counting. The relative survival rate of E. coli was expressed using the following equation: (survival bacteria of group B or C/survival bacteria of group A) ×100%. Experiments were conducted in triplicate.

Assembly and Characterization of A. mycotoxinivorans GMU1709 Genome
In this study, we completed the sequencing, de novo assembly, and annotation of clinically-derived A. mycotoxinivorans GMU1709 genome. First, approximately 11.6 Gb of highquality genomic data was generated using the Illumina platform. Based on these data, the genome size was estimated to be 32.7 Mb (actual) or 29.2 (fitted) Mb via a 21-mer analysis ( Figure S1). Next, 8.1 Gb of long-read data (~231X depth) with an average length of 29.8 kB was produced by the Nanopore platform (Table S1 and Figure S2). Via the de novo genome assembly, we obtained seven ungapped contigs with a total length of 30.5 Mb ( Table 2), an average GC content of 57.6%, and an N50 length of approximately 7.4 Mb. Subsequently, Illumina reads were used to correct the assembly. Based on Hi-C reads (10.6 Gb), six out of seven contigs (accounting for 99.9% of the total bases) were anchored to six chromosomes. The only one unallocated was found to be the mitochondrial sequence (16 895 bp) by BLAST against the NT database. It was reported that the combination of Illumina and Nanopore sequencing reads allowed for producing the telomere-to-telomere chromosome assemblies (Bliznina et al., 2021). However, we didn't detect typical telomere sequences from the GMU1709 genome according to previously reported rules, e.g. (CCCTAA/ TxAyGz)n (Peska and Garcia, 2020;Rahnama et al., 2021). Meanwhile, we also failed to find telomere sequences from the CICC 1454 genome. Current evidence has suggested that the interchromosomal contacts would result in strong enrichment of centromere-to-centromere Hi-C links (Duan et al., 2012). Using the Hi-C heatmap ( Figure 1A), five out of six chromosomes were determined to contain clear centromeric signatures. However, according to Varoquaux et al. (2015), the remaining one may be influenced by the sequence around centromere, thus showing indiscernible centromeric signature. We also examined the mapping ratio of Illumina reads against the genome and determined that 92.2% of the reads were properly mapped ( Table S2). BUSCO analysis indicated that 95.2% of the expected fungal genes were present in the gene set of GMU1709. These results suggest a high-quality chromosomallevel genome assembly ( Figure 1B). Using the RepeatModeler and RepeatMasker prediction, we totally identified 1 416 118 bp of repetitive sequences. Specifically, there were 340 424 bp, 1793 bp, 1612 bp, 937 936 bp, and 134 423 bp of interspersed repeats (including the short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), long terminal repeat (LTR) retrotransposons, DNA elements, and others), small RNA, satellites, simple repeats, and low complexity repeats, respectively. Compared to most of other strains, we found a high proportion of simple repeats and unclassified interspersed repeats in the genomes of A. mycotoxinivorans strains ( Figure S3). Using homology-based, RNA-seq-based, and ab initio methods, we identified 8066 protein-coding genes. The average length of protein-coding genes was 2371.5 bp, and they averagely contained 4.8 exons (average length: 416.6 bp) and 3.8 introns (average length: 99.4 bp). In this genome, there were two rRNAs, 917 tRNA, 17 ncRNAs, and eight pseudogenes. A total of 89.6% of proteins were annotated, and 5.6% of them were identified as putative secretory proteins.

Phylogenetic Relationship
The whole genome-based phylogenetic tree (Figure 2) demonstrated that the formerly called T. mycotoxinivorans undoubtedly belongs to the genus Apiotrichum, which is consistent with the findings of Liu et al. (2015). T. cutaneum ACCC 20271, A. mycotoxinivorans GMU1709, and CICC 1454 strains were located almost in the same phylogenetic position (Figure 2), thus indicating that the ACCC 20271 strain belongs to the A. mycotoxinivorans species. Aliyu et al. (2020) classified the ACCC 20271 strain into the genus Apiotrichum, but did not subdivide it into A. mycotoxinivorans because the genome of A.
mycotoxinivorans is yet to be published. In addition, the T. akiyoshidainum HP2023 strain should be reclassified as A. akiyoshidainum, which is consistent with the findings of Aliyu et al. (2020), while C. Cutaneum B3 and JCM 1462 strains are located in different branches of the genus Cutanetrichosporon, indicating that they belong to different species. Hence, our analysis presented highly reliable evolutionary relationships among different Trichosporonaceae species, together with the discovery of previously misclassified strains.

Gene Clustering Analysis
Via the Markov clustering analysis of protein sets (294 280 proteins in total) of 36 strains in four genera of Trichosporonaceae (Table 1), 76% (224 590) of the proteins were clustered into 48 804 orthologous groups, of which 2 (corresponding to the fibrillarin and DEAD/DEAH box helicase, respectively) and 395 were present in all strains as single and multiple copies, respectively. Based on the hierarchical cluster analysis of gene presence and absence patterns across 36 strains, we identified that different strains of the same species had nearly identical patterns of gene presence and absence ( Figure S4), which further supported that all three strains (GMU1709, CICC 1454, and ACCC 20271) belong to the same species. According to the Venn relationships of the aforementioned three strains and four Trichosporonaceae genera (Apiotrichum, Cutaneotrichosporon, Trichosporon, and Vanrija), we found that only 878 orthologous groups were shared by them; 2942 groups were exclusively shared by these three strains, whereas few groups were shared by these three strains and any of the other genera, except the Apiotrichum genus ( Figure 3A). This also supported the position that the formerly named T. mycotoxinivorans belongs to the Apiotrichum genus. In addition, we also determined that the Apiotrichum genus possessed the largest number of groups and genus-specific groups, followed by the Cutaneotrichosporon and Trichosporon genera, and the least abundant Vanrija genus ( Figure 3B), which may be influenced by the number of strains or species in each genus ( Table 1). In summary, these results strongly support the whole genome-based phylogenetic relationship.

Syntenic Relationship
The evolutionary relationship of chromosomes between strains was determined using the genomic synteny block analysis. Our results indicated strong genomic collinearity among GMU1709, CICC 1454, and ACCC 20271 strains ( Figures S5A-C), except for an inconsistent event (highlighted in Figure 4) in one genomic sequence of the ACCC 20271 strain. In fact, the genome of ACCC 20271 strain was sequenced only using a short-read sequencing platform (Illumina MiSeq), which resulted in poor quality of genome assembly (922 contigs, 21 scaffolds with 901 spanned gaps). More importantly, the scaffold with an inconsistent event contained 221 gaps, and some gaps were just located near the location of this inconsistent event. So, we deduced that this inconsistency may well be caused by low-quality assembly and incorrect scaffolding of contigs. Furthermore, we also investigated the genomic synteny of GMU1709 strain with other Trichosporonaceae members. The A. gracile JCM 10018 and A. brassicae JCM 1599 strains from Apiotrichum genus exhibited a relatively high genomic collinearity with GMU1709 strain ( Figure  S6), in which the 1:1 homologous block (one-to-one correspondence of homologous region, namely, each reference region matches only one target region) averagely accounted for 87.5% and 81% of the genome, respectively ( Figures S5D-E); however, the T. inkin JCM 9195, T. faecale JCM 2941, C. oleaginosum ATCC 20508, and C. daszewskae JCM 11166 strains from both Trichosporon and Cutaneotrichosporon genera exhibited low genomic collinearity with GMU1709 strain ( Figure S6), in which the 1:1 homologous block averagely accounted for 53% and 51.5% of two Trichosporon genomes respectively ( Figures S5F-J) and for 59.5% and 57.5% of two Cutaneotrichosporon genomes respectively ( Figures S5H-I). In addition, compared to intraspecific strains ( Figures S5A-C), there was a relatively high proportion of ≥ 2:1 homologous block (moreto-one correspondence of homologous region) between GMU1709 and other strains ( Figures S5D-I). These results

Genes Putatively Involved in Pathogenicity
PHI-base is a gene-centric database that catalogs experimentally verified virulence, pathogenicity, and effector genes from different pathogens, in which each gene has a corresponding PHI-base accession number (Urban et al., 2015) and the included pathogens interact with a wide range of hosts. This database is an invaluable resource for discovering pathogenicity-related genes from these emerging opportunistic pathogens, which could be potential intervention targets (Urban et al., 2020). Currently, the PHI-base 4.12 has cataloged 8411 genes and 18 190 interactions (http://www.phi-base.org/releaseNote.htm) to be associated with pathogenicity. A. mycotoxinivorans is an emerging opportunistic pathogenic fungus, which is closely related to invasive devices and immunodeficiency (Almeida et al., 2017) and primarily causes pulmonary infections in humans with significant predilection in patients with cystic fibrosis (Marcelo and Farooq, 2018). Using BLAST analysis, we obtained 2149, 2098, and 2068 homologs of PHI genes from GMU1709, CICC 1454, and ACCC 20271 genomes, respectively. By comparing A. mycotoxinivorans with other Trichosporonaceae species, we found that A. mycotoxinivorans strains averagely contained 2105 homologs of PHI genes, which was lower than the average number (2143) of PHI homologs in all strains ( Figure  S7). However, it is noteworthy that C. mucoides JCM 9939, T. coremiiforme JCM 2938, T. ovoides JCM 9940, and C. cutaneum B3 contained substantially more homologs of PHI genes than any other strain, thus indicating that these four strains might be closely associated with infection ( Figure S7). In the PHI-base, nine high-level phenotypic terms ("increased virulence," "increased virulence (hypervirulence)," "reduced virulence," "unaffected pathogenicity," "loss of pathogenicity," "lethal effector," "sensitivity to chemical," and "resistance to chemical") have been defined to compare the pathogen-host interactions between organisms across the tree of life (Urban et al., 2015). To make the results more reasonable, we retained only putative PHI genes, whose objects of interaction were the "Birds", "Bony fishes", "Flies", "Nematodes", "Primates", "Rabbits & hares", and "Rodents", because these objects have been used in several cases as model organisms for human diseases. Accordingly, only 704, 699, and 681 homologs of PHI genes (detailed in Table S3) were found in the GMU1709, CICC 1454, and ACCC 20271 strains, respectively. Then, we counted these putative PHI genes according to phenotypic terms, and the obtained results indicated that there was approximately the same number of genes with increased virulence (hypervirulence) or lethal factor between environmentally derived strain (ACCC 20271 and CICC 1454; Figures 5A, B) and clinically-derived strain (GMU1709; Figure 5C). So, these comparisons suggest that the differences between clinical and environmental strains of A. mycotoxinivorans may be small. We further compared the differences of PHI-base entries in the number of genes between A. mycotoxinivorans strains and other Trichosporonaceae strains via the Wilcox test. We deduced that there were 82 PHI-base accessions with a significant difference (P < 0.05) (Table S4 and Figure 5D). We further focused on these PHI-base accessions with the phenotype of increased virulence and found three PHI-base accessions with significantly more genes in A. mycotoxinivorans strains. More importantly, all three accessions (PHI:6096, PHI:5267, and PHI:8494) were closely linked to pulmonary infections (pneumococcal pneumonia and invasive pulmonary aspergillosis), which is consistent with the phenotype of this opportunistic pathogen. We found that PHI:6096 (Gene locus: Apimy_4244) and PHI: 5267 (Gene locus: Apimy_4588 and Apimy_6176) were present in all three strains; however, PHI:8494 was only present in the ACCC 20271 strain. According to the PHI-base, PHI:5267, PHI:6096, and PHI:8494 correspond to epimerase (UgeB), endopeptidase O (PepO), and phospholipase D (PLD), respectively. Based on above RNA-seq data, we investigated two pathogenic factors (UgeB and PepO) shared by three A. mycotoxinivorans strains, and found that genes encoding UgeB had extremely low expression levels with FPKM value ≤0.1, but PepO had a very high gene expression level with FPKM value >4500. Considering that gene expression may vary greatly under different conditions (Guan et al., 2013;Speth et al., 2019), we inferred that both enzymes could be the potentially important pathogenic factors of this emerging opportunistic yeast pathogen.

Genes Putatively Involved in Antibiotic Resistance
As we previously reported (Peng et al., 2019), the sputum bacterial and fungal cultures of a pediatric patient with pneumonia were positive for Elizabethkingia anophelis and A. mycotoxinivorans (GMU1709 strain in this study). Considering that A. mycotoxinivorans can efficiently degrade tetracycline (Huang et al., 2021), we inferred that the coexistence of E. anopheles and A. mycotoxinivorans may have improved the antibiotic resistance of E. anopheles by inactivating antibiotics. In fact, the interaction between bacteria and fungi is widely known, and the Candida albicans-Pseudomonas aeruginosa interaction is a well-studied model system (Peleg et al., 2010;Frey-Klett et al., 2011). To comprehensively estimate the possible inactivation of A. mycotoxinivorans against antibiotics, we identified all the putative antibiotic resistance genes (126, 122, and 122 genes) from the GMU1709, CICC 1454, and ACCC 20271 strains based on the CARD database (McArthur et al., 2013), respectively, which could be classified into 53 primary ontologies, and deduced that their antibiotic resistance patterns were substantially similar ( Table 3). Tetracycline resistance had the largest number of ontologies, in which three, six, and 10 primary ontologies were related to antibiotic inactivation, target protection, and efflux, respectively, which is consistent with the findings of Huang et al. (2021). In addition, A. mycotoxinivorans might be able to inactivate rifamycin, penam, and cephalosporin. Such inactivation effects are also present to a greater or lesser extent in other species ( Figure S8). Hence, we further tested the resistance of E. coli to tetracycline in the presence/absence of A. mycotoxinivorans. As shown in Figure S9, when compared to the bacterial survival rate of the group A (live GMU1709 mixed with E. coli) (defined as 100%), the bacterial survival rate was (50.3 ± 11.0)% and (52.3 ± 20.1)% for group B (inactivated GMU1709 mixed with E. coli) and group C (E. coli suspended in saline) respectively after incubation for 2 h (P<0.01), which strongly indicated that A. mycotoxinivorans was able to significantly enhanced the resistance of coexisting bacteria to related antibiotics. Moreover, all A. mycotoxinivorans strains possessed the same inactivation profiles ( Table 3, these ARO accessions with antibiotic inactivation). In summary, the coexistence of A. mycotoxinivorans with other bacteria may improve the antibiotic resistance of bacteria; hence, this aspect deserves more attention.
In our previous study (Peng et al., 2019), the sensitivity of the GMU1709 strain to antifungal drugs was determined using the commercial method ATB FUNGUS 3, and our results indicated that the A. mycotoxinivorans GMU1709 strain is sensitive to the antifungal drugs, including several triazoles, flucytosine, and amphotericin B. Notably, amphotericin B also exhibits high efficacy in treating A. mycotoxinivorans GMU1709 infection in our previous study (Peng et al., 2019), which is in contrast with previous clinical reports (Hickey et al., 2009;Goldenberger et al., 2016). Here, we further detected the genes possibly involved in antifungal drug resistance in the GMU1709, CICC 1454, and ACCC 20271 strains based on MARDy (Nash et al., 2018). Consistent with our previous experimental results obtained from the antifungal susceptibility testing for the GMU1709 strain, we did not detect any gene or genetic events associated with antifungal drug resistance. These results also indicated that all three strains were sensitive to these antifungal drugs, which is consistent with the findings of Almeida et al. (2017) that clinical and environmental isolates of A. mycotoxinivorans had similar spectral profiles in genetic, proteomic diversity, and antibiotic resistance, based on antifungal susceptibility tests.   The S1, S2 and S3 represent the strains GMU1709, CICC 1454, and ACCC 20271, respectively.

DISCUSSION
A. mycotoxinivorans is a versatile yeast that is not only able to efficiently detoxify mycotoxins (Molnar et al., 2004) and degrade tetracycline, but also turn organic wastes (such as oily waste and carbohydrates) or pollutants into microbial lipids (Sagia et al., 2020), and produce bioemulsifiers that could be used to bioemulsify oils (Domingues et al., 2017). Hence, it has important application value for realizing sustainable energy development and eliminating environmental and food contamination. More notably, this yeast is a potentially fatal cause of disseminated and localized infections in immunocompromised patients (Hickey et al., 2009). In a previous study, we isolated an A. mycotoxinivorans strain from the sputum of a pediatric patient with congenital ventricular septal defect and pulmonary E. anopheles infection (Peng et al., 2019). Accordingly, in addition to helping us better understand its genomic signatures, deciphering the genome of A. mycotoxinivorans also provides clear genetic information for promoting its potential applications and developing prevention and control strategies against infection. Here, we adopted the Illumina and Nanopore platforms, completed the genome assembly and annotation of A. mycotoxinivorans for the first time, obtained a genome assembly with a contig N50 of 7.4 MB, comprising seven ungapped sequences. Six out of seven genomic sequences were anchored by Hi-C data to six chromosomes ( Figure 1A), and the remaining genomic sequence was the mitochondrial sequence. Together with the evaluation of genomic integrity and fidelity (Tables S2, S5), these results indicate a highquality and complete genome. Combined with transcriptome data, 8066 protein-coding genes were identified, and the structural features of nuclear genes (such as exon number and exon length) were consistent with those of other species of the same genus. Hence, the high-quality genome of GMU1709 and its relevant information provide an important data source for studying the characteristics of A. mycotoxinivorans.
Although the basidiomycetous yeast is conventionally classified according to morphological characteristics, its phenotypic characteristics do not reflect phylogeny in many cases . Consequently, the phylogenetic analysis of universally conserved proteins or genes (especially rRNA genes) has gradually become a powerful tool for the classification of basidiomycetous yeast. However, phylogenetic analyses of single or multiple genes exhibit only low resolution of phylogenetic relationships and cannot provide sufficient information to resolve deep branches (Fitz-Gibbon and House, 1999). Furthermore, different evolutionary rates or horizontal gene transfers would distort the topology of the phylogenetic tree constructed based on a single gene or a few genes, which may incorrectly influence their phylogenetic relationships. Aliyu et al. (2020) constructed a phylogenetic tree of 33 strains in the Trichosporonaceae family based on 405 orthologous proteins and identified two distinct misidentifications in which T. akiyoshidainum HP2023 and C. cutaneum ACCC 20271 were reassigned to Apiotrichum. In a previous study (Peng et al., 2019), we identified the GMU1709 strain as A. mycotoxinivorans via a 26S rDNA-based phylogenetic relationship, including biochemical and morphological characteristics. In this study, we performed a whole genomebased phylogenetic analysis based on multi-genome alignments for four major genera of Trichosporonaceae. In our results, the formerly named T. cutaneum ACCC 20271 strain belonged to A. mycotoxinivorans (Figure 2), and the genome collinearity analysis (Figure 4), gene pattern clustering, and Venn diagrams of shared relationships between genus or/and strains (Figure 3) all supported this finding. In addition, the HP2023 strain was reassigned to A. akiyoshidainum based on our results and previous reports (Aliyu et al., 2020). Therefore, our genome-based phylogenetic analysis further clarified the evolutionary relationship between A. mycotoxinivorans and other Trichosporonaceae members.
Recently, opportunistic fungal infections have become more common, owing to the rapid increase in the number of immunocompromised patients, primarily caused by immunosuppressive therapies (such as organ transplants, cytotoxic chemotherapies, and the use of intravenous catheters) or the human immunodeficiency virus (Peŕez-Torrado and Querol, 2016). A. mycotoxinivorans has also been recognized as an emerging opportunistic yeast pathogen that causes infections in immunocompromised patients (do Espıŕito Santo et al., 2020). In this study, differences in pathogenic genetic basis between clinical and non-clinical strains of A. mycotoxinivorans seem small to nonexistent, which is consistent with the findings of Pfaller (2015), who reported that any fungus can cause infection in a sufficiently immunocompromised host. For example, although Saccharomyces cerevisiae has a well-documented history of safe use in food (Sewalt et al., 2016), its colonization of the human body, whether long-lasting or transient, may occasionally result in infections (Anoop et al., 2015). A. mycotoxinivorans primarily causes pulmonary infection in immunocompromised humans, with a higher propensity for patients with cystic fibrosis (Marcelo and Farooq, 2018). Via comparative analysis with other strains of the Trichosporonaceae family, in A. mycotoxinivorans strains, we deduced that there are three PHI-base genes (including the PHI:5267: UgeB, PHI:6096: PepO, and PHI:8494: PLD) with a phenotype of increased virulence (hypervirulence) and significantly more genes, which are closely related to pulmonary infections. PepO is a newly discovered virulence protein correlated with host cell adhesion and invasion (Zhang et al., 2016). In 36 strains from four genera of the Trichosporonaceae family, PepO was determined to solely exist in three A. mycotoxinivorans strains and was predicted to be a transmembrane protein (a portion of the PepO protein sequence may be extracellular), which could play an important role in the pathogen invasion of host cells mediated by host cell adhesion and the immune evasion mediated by binding to plasminogen and fibronectin. UgeB encodes the epimerase required for the synthesis of N-acetyl-galactosamine and subsequent production of galactosaminogalactan (Henriet et al., 2016). Animal experiments have demonstrated that the overexpression of UgeB increa ses the amount of cell wall-bound galactosaminogalactan, thereby enhancing the adherence and virulence of the pathogen (Lee et al., 2015). We determined that two or three genes encoded UgeB in A. mycotoxinivorans strains, which is significantly higher than almost all strains of other species (containing only zero or one gene; Table S4). According to the report by Lee et al. (2015), high amounts of cell wall-bound galactosaminogalactan may increase its virulence by mediating resistance to NADPH oxidase-dependent neutrophil extracellular traps. PLD is an essential enzyme responsible for producing the lipid second messenger phosphatidic acid, which is involved in several fundamental cellular processes, including actin cytoskeleton remodeling, membrane trafficking, cell survival, and proliferation (Oude Weernink et al., 2007). This enzyme is an important virulence factor for pathogen infections and can enhance the production of ROS by increasing the expression level of histone deacetylase (a regulator of inflammatory responses and ROS production) involved in immunomodulation during infection (Yu and Huo, 2018). Genes encoding PLD are solely present in the nonclinical ACCC 20271 strain, and a previous study has demonstrated that it can cause invasive pulmonary aspergillosis. This gene might be obtained via horizontal gene transfer (Van Etten and Bhattacharya, 2020), which may further enhance its virulence. From above discussion, it is clear that no evidence exists to support that clinical strain had stronger pathogenicity than non-clinical strain, and above three proteins may well be directly related to the opportunistic infections of A. mycotoxinivorans.
It has been reported that fungi and bacteria live together in a wide variety of environments and engage in several types of interactions that lead to their behavioral shifts (Deveau et al., 2018). In our previous study, we isolated and identified two pathogens (the A. mycotoxinivorans GMU1709 strain and an E. anopheles strain) from sputum specimens collected with bronchofiberscopes (Peng et al., 2019). Considering that A. mycotoxinivorans can degrade antibiotics, including tetracycline (Huang et al., 2021), A. mycotoxinivorans could help pathogenic bacteria survive by inactivating antibiotics. We examined the antibiotic resistance genes from three A. mycotoxinivorans strains and determined that they all contained putative genes responsible for inactivating tetracycline, rifamycin, penam, and cephalosporin. Therefore, A. mycotoxinivorans, when living with bacteria, is substantially likely to increase the resistance of the surrounding bacteria to these antibiotics, thus making antibiotic therapy ineffective. In addition, our previous experimental analysis indicates that the GMU1709 strain is not resistant to antifungal drugs (Peng et al., 2019), which is consistent with the fact that no gene or mutation associated with antifungal drug resistance was identified in any of three A. mycotoxinivorans strains (including the GMU1709 strain) in this study. This further shows that there is no difference between clinical and non-clinical strains in antifungal drug resistance, which is consistent with the findings of Almeida et al. (2017). Hence, these results indicate that A. mycotoxinivorans can interact with pathogenic bacteria to resist antibiotic treatment, and there is also no significant genetic difference in drug resistance between clinical and non-clinical strains.

CONCLUSION
For the first time, we reported a clinically-derived A. mycotoxinivorans (GMU1709) genome with high-quality annotation, using Illumina, Nanopore, and Hi-C technologies. The assembly exhibited a higher level of completeness and genome quality than other Apiotrichum genomes. Comparative genomic and phylogenomic analyses supported the classification of the formerly named T. cutaneum ACCC 20271 strain within the A. mycotoxinivorans species and provided further evidence for the evolutionary relationships of A. mycotoxinivorans and other Trichosporonaceae members. We found no obvious genetic difference in drug resistance and pathogenicity between clinical and non-clinical strains of A. mycotoxinivorans. More importantly, we identified the putative virulence factors and drug resistance genes of A. mycotoxinivorans, which could be potential targets for the further research and therapeutic intervention of such opportunistic infections. In conclusion, this study lays a solid foundation for understanding the phylogenetic relationship of Trichosporonaceae, promoting the potential application of A. mycotoxinivorans, and developing disease prevention and control strategies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.