Comparative Analysis of Clinical and Environmental Strains of Exophiala spinifera by Long-Reads Sequencing and RNAseq Reveal Adaptive Strategies

Exophiala spinifera, a capsule-producing black yeast, is overrepresented as agent of disseminated infection in humans with inherited dysfunction of the CARD9 gene. In a review of published caspase recruitment domain-containing protein 9 (CARD9) deficiency cases, black fungi were linked to mutations other than those prevalent in yeast and dermatophyte cases, and were found to respond to a larger panel of cytokines. Here, we sequenced and annotated the genomes of BMU 08022 from a patient with CARD9 deficiency and two environmental strains, BMU 00051 and BMU 00047. We performed genomic and transcriptomic analysis for these isolates including published black yeasts genomes, using a combination of long-read (PACBIO) and short-read (Illumina) sequencing technologies with a hybrid assembly strategy. We identified the virulence factors, fitness, and the major genetic and gene expression differences between the strains with RNAseq technology. Genome assembly reached sub-chromosome level with between 12,043 and 12,130 predicted genes. The number of indels identified in the clinical strain was higher than observed in environmental strains. We identify a relatively large core genome of 9,887 genes. Moreover, substantial syntenic rearrangements of scaffolds I and III in the CARD9-related isolate were detected. Seventeen gene clusters were involved in the production of secondary metabolites. PKS-cluster 17 was consistently found to be absent in the clinical strain. Comparative transcriptome analysis demonstrated that 16 single-copy genes were significantly differentially expressed upon incubation in brain-heart infusion broth vs. Sabouraud glucose broth. Most of the single-copy genes upregulated with Brain Heart Infusion (BHI) were transporters. There were 48 unique genes differentially expressed exclusively to the clinical strain in two different media, including genes from various metabolic processes and transcriptional regulation. Up-regulated genes in the clinical strain with Gene Ontology (GO) enrichment are mainly involved in transmembrane transport, biosynthetic process and metabolic process. This study has provided novel insights into understanding of strain-differences in intrinsic virulence of the species and indicated that intraspecific variability may be related to habitat choice. This indicates that strains of E. spinifera are differentially prone to cause infection in susceptible patient populations, and provides clues for future studies exploring the mechanisms of pathogenic and adaptive strategies of black yeasts in immunodeficient patients.


INTRODUCTION
Melanized fungi of the order Chaetothyriales are renowned as agents of human infection. The number of cases is low when compared to dermatophytes, Aspergillus, and Candida, but with 86 species in 7 genera that have been proven as potential agents of vertebrate disease (de Hoog et al., 2020) the order ranges fourth in clinical biodiversity, after Onygenales, Eurotiales, and Saccharomycetales. In 19 chaetothyrialean species, systemic or disseminated infection has been observed, which frequently led to death of the patient. The neurotropic species Cladophialophora bantiana is one of the most feared fungi, because it causes brain abscesses in healthy-appearing patients, with a case fatality rate of 65% despite antifungal therapy (Kantarcioglu et al., 2017). Numerous investigations nevertheless suggest that most members of the order -with a possible exception of agents of chromoblastomycosis -are opportunists rather than pathogens. Opportunists are defined as fungi that complete their natural life cycle without involvement of an animal host, being able of unintentional infection only through coincidental similarities between conditions of host tissue and natural habitat. Gostincar et al. (2018) ascribed their infectious ability to polyextremotolerance, which is tolerance to environmental stress supplemented with efficient nutrient scavenging and toxin management. Properties that in pathogenic fungi have a role in virulence may have very different functionalities in the fungus' natural habitat. Song et al. (2017) noted that hypothesized virulence factors, which enhance infection in truly host-associated fungi like Candida or Histoplasma, in black yeasts may respond adversely to proxies of host resistance, and thus comparable genes can be functionally different between fungal groups.
An important consequence of opportunism is that all strains have an equal chance to be inoculated into the host, and thus clinical strains are not necessarily different from environmental ones. Pathogenic adaptation is then less likely to occur because of absence of host-related transmission. However, after coincidental inoculation, some strains may be more prone to cause symptoms than others, and thus there may be a difference between clinical and environmental strains. Black yeast-like fungi mostly cause local, (sub) cutaneous infections. The host conditions of patients with brain abscesses acquired via the inhalative route are not well understood. For disseminated cases, recent data have shown (Lanternier et al., 2013) that these are often associated with host-inherited mutations in caspase recruitment domaincontaining protein 9 (CARD9). This protein plays a role in the dectin pathway regulating innate immunity activating proinflammatory cytokines. Since most cases of disseminated disease in black yeast-like fungi were described prior to 2013, it is possible that the majority of these cases occurred in patients with CARD9 dysfunctional innate immunity. Several of the infection types of chaetothyrialean fungi show a certain degree of unexplained endemism, such as the prevalence of Cladophialophora brain abscess in the Indian subcontinent (Chakrabarti et al., 2016). Similarly, the majority of CARD9-associated black fungal cases is found in East Asia: China, Korea, Japan, although the etiologic agents have a global distribution. For an explanation of virulence of members of Chaetothyriales, we thus might expect answers from the interaction of the fungus with windows of opportunity associated with CARD9 regulation and human race.
To study the fungal side of this tripartite relationship, we sequenced the genomes and transcriptomes with different culture conditions of an Exophiala spinifera strain isolated from a patient with a CARD9-related disorder, and compared this with two environmental strains of the same species, and with other published genomes (Supplementary Table S1). We applied stateof-the-art technology, to quantify eventual differences between clinical and environmental strains with maximum precision and to describe the species' intraspecific variability. A number of genomes of Chaetothyriales were already available, but these were sequenced with somewhat older techniques. One of the aims is therefore to describe the technical and biological variation in these datasets, and determine whether the new techniques provide a better answer to questions as formulated above.

Literature Search
Keywords "CARD9" and "fungal infection" were used in PubMed to search the English literature including research articles, reviews and case reports published until November 2019.

Strains and Culture Conditions
Three strains of E. spinifera were analyzed ( Table 1): BMU 08022 (ESC1, from a CARD9-deficient patient, Jiangsu, China), BMU00051 (ESE1, from bark, Shenzen, China), and BMU00047 (ESE2, from soil, Colombia). To prepare DNA for genome sequencing, mycelia were harvested from fresh cultures on Sabouraud Dextrose Agar (SDA), frozen in liquid nitrogen, and stored at −80 • C until further processing. For RNA extraction, strains were inoculated in Sabouraud Dextrose Broth (SDB) and in Brain Heart Infusion Broth (BHI) (Das et al., 2010) and harvested after 20 h, centrifuged and frozen in liquid nitrogen. Liquid nitrogen was used to grind the samples for DNA/RNA extraction.

cDNA Synthesis
The cDNA synthesis was performed using IMPRON II reverse transcriptase kit (Promega, Madison, WI, United States) with 1 µg of total RNA added according to manufacturer's instructions. The efficiency of the cDNA synthesis process was assessed by PCR using the following reference genes: β-tubulin and ITS. The integrity of PCR amplification products was verified by electrophoresis on 1.2% agarose gels.

PacBio Sequencing and Raw Assembly
The extracted DNA of three E. spinifera strains were sequenced on the Pacific Biosciences (Pacific Biosciences, Menlo Park, CA, United States) Single Molecule, Real-Time (SMRT) DNA sequencing technology (platform: RS II; chemistry: P6-C4).
The raw reads were processed using the standard SMRT analysis pipeline v5.0.1 (Chin et al., 2013). The de novo assembly was carried out following CANU v1.4 assembly protocol (Koren et al., 2017) with QUIVER polishing (Chin et al., 2013). Assemblies were refined by manual curation. Pairwise alignments were carried out using BLASTN v2.6.0+ (Altschul et al., 1990). The draft assemblies were further improved with FINISHERSC (Lam et al., 2015) and polished with ARROW (Chin et al., 2013). For each genome assembly, contigs were examined and removed if redundant (i.e., aligning to any other contig in the same assembly with >90% identity). All contigs containing rDNA repeats were excluded from the above step.

Illumina Sequencing, Read Mapping and Error Correction
Our genome assembly strategy is summarized in Figure 1. In addition to the PACBIO sequencing, we also performed Illumina 150-bp paired-end sequencing for each strain. We examined the raw Illumina reads via FASTQC v0.11.5 1 and performed adaptorremoval and quality-based trimming by TRIMMOMATIC v0.36 (Bolger et al., 2014). For each strain, the trimmed reads were mapped to the corresponding PACBIO assemblies by BWA 0.7.12 and using BWAMEM v0.7.16 (Li and Durbin, 2009). The resulting read alignments were subsequently processed by SAMTOOLS v1.6 ). On the basis of Illumina read alignments, we further performed error correction with PILON v1.22 (Walker et al., 2014) to generate final assemblies for downstream analysis. BUSCO v3 (Simão et al., 2015) was used to assess the completeness of the final assembled genomes.

Repetitive Sequence Prediction and Masking
Repetitive elements were predicted using a de novo approach by applying REPEATMODELER v1.0.11 2 , which includes RECON v1.08 (Bao and Eddy, 2002) and REPEATSCOUT v1.0.5 (Price et al., 2005) to construct a strain-specific repeat library. To further classify repetitive sequences, BLAST v2.2.28 (Camacho et al., 2009) was used to search the repeat library against SWISS-PROT protein database. Sequences with similarities to known proteins were discarded from the repeat library. Repetitive sequences were masked with REPEATMASKER v4.0.7 2 using the de novo constructed library.

Annotation of rRNA, tRNA and Repeat Elements
The rRNA loci were predicted by using RNAMMER v1.2 (Lagesen et al., 2007) and tRNA by TRNASCAN-SE v2.0 FIGURE 1 | Genome assembly and annotation strategy used in this study. Assembly: the raw reads were processed using the standard SMRT analysis pipeline v5.0. The de novo assembly was carried out following CANU v1.4 assembly protocol with QUIVER polishing. Assemblies were refined by manual curation. Pairwise alignments were carried out using BLASTN v2.6.0+. The draft assemblies were further improved with FINISHERSC and polished with ARROW. Annotation: a hybrid strategy combining ab initio predictions and transcriptomic support (RNA-seq) was applied in gene prediction. Three ab initio gene finders, GENEMARK.HMM, FGENESH, and AUGUSTUS were used. (Lowe and Eddy, 1997). Repeat elements were identified using REPEATMASKER v4.0.6 2 based on REPBASE LIBRARY 20160829. REPEATMASKER was run with options that skipped the lowcomplexity DNAs masking for the purpose of gene prediction.

Gene Prediction and Functional Annotation
A hybrid strategy combining ab initio predictions and transcript alignments was applied in annotation of protein coding genes. Firstly, protein coding genes were predicted using BRAKER (Hoff et al., 2016), which was an ab initio approach combined with RNA-seq data. Then, RNA-seq was de novo assembled by using TRINITY (Grabherr et al., 2011). The assembled transcripts were aligned to the reference genome of corresponding strains by PASA pipeline (Haas et al., 2003). Finally, EVIDENCEMODELER (Haas et al., 2008) was used to combine ab initio predictions and transcript alignments into consensus gene structure. Functional annotation of predicted proteins was done by BLAST (Camacho et al., 2009) similarity search of the predicted proteins against NR database with an e-value of 1e-05. Gene Ontology (GO) terms were retrieved by using Blast2go (Gotz et al., 2008) with default annotation parameters. To further improve the annotations, INTERPROSCAN v5.33-72.0 (Jones et al., 2014) was used to identify known protein domains. BLAST (Camacho et al., 2009) results of NR database and results of the INTERPROSCAN analysis were combined to gain a complete functional annotation of all predicted genes. Genes predicted to encode carbohydrate-active enzymes (CAZYMES) were identified with DBCAN2 (Zhang et al., 2018). Also the software SMURF (Khaldi et al., 2010) was used to predict secondary metabolite gene clusters in three genomes of E. spinifera.

Comparison of Genome Assemblies
The synteny among three strains of E. spinifera was determined using MUMMER v3.23 . We used NUCMER implemented in MUMMER v3.23 to align the genomes of the three strains of E. spinifera. In addition, the DNADIFF package in MUMMER v3.23 was used to generate coordinates for the differences among genomes of the three strains analyzed. The alignment statistics, SNPs, indels, inversions, etc., were also calculated from the results of MUMMER. We plotted these features circularly with CIRCOS v0.69.6 (Krzywinski et al., 2009).

Identification of Homology Groups
For nuclear protein-coding genes, we used INPARANOID v4.1 3 to identify gene homology across the three strains of E. spinifera. We calculated protein length distributions for each homologous group (1:1:1 association among strains) of the strains. The correlation of exon length and intron length of homologous groups (1:1 association between strains) were calculated by pairwise comparisons among three strains.

Identification of Orthologous Genes
Orthologous groups were delimited using ORTHOFINDER v2.2.6 (Emms and Kelly, 2015), in which all predicted protein sequences were compared using a BLAST (Camacho et al., 2009) all-againstall search. The single-copy genes, duplicated genes, and strainspecific genes were extracted from the ORTHOFINDER output. To detect the core genome, specific and shared genes, the data was filtered based on the ORTHOFINDER output. The R package UPSETR (Conway et al., 2017) was used to construct statistics of orthologous groups among the three strains under study.

Identification of Genes Under Positive Selection
Single-copy orthologous proteins were aligned by using CLUSTALO v1.2.4 4 . To obtain the alignments of codons, the corresponding nucleotide-sequence alignments were derived by substituting the respective coding sequences from the protein sequences. For each single-copy orthologous group, genes from the two environmental strains and their orthologous genes in the clinical strain formed sequence triplets. The test for the asymmetric evolution was constituted as a relative rate test between clinical strain and environmental strains on an unrooted tree. The statistical tests were conducted with a codon-based branch-site model using the CODEML program of PAMLv4.9 (Yang, 2007). We compared ω0 and ω1 between single-copy orthologous genes to detect differences in proportion of selected sites in the two clades. Likelihood ratio (LR) was used to test for significance. To do this, two models were applied to the data: model 1 (ω0 = ω1) constrains the two ω values to be equal for the three sequences, and model 2 (ω0 = ω1) estimates the two ω values as free parameters. Collected maximum likelihood values ML1 and ML2 from the two models were used to calculate the LR, LR = 2 (lnML2−lnML1). LR is then compared against the χ 2 distribution with one degree of freedom.

Transcriptome Analysis
The raw reads were assessed for quality by FASTQC v0.11.5 and filtered to remove low-quality reads with TRIMMOMATIC v0.36 (Bolger et al., 2014). Filtered reads were mapped to the corresponding reference genome of three E. spinifera strains using STAR v2.5.3a (Dobin et al., 2013). The gene expression levels were calculated using FEATURECOUNTS v1.5.2 (Liao et al., 2014) and normalized based on the FPKM method. Differentially expressed genes were detected by R package DESEQ2 (Love et al., 2014). Differentially expressed genes were filtered by adjusted p-value < 0.05 and | log2 fold change| > 1. GO enrichment analysis was performed by R package TOPGO 5 . Only 1-1-1 orthologous genes were considered when detecting differentially expressed genes among different strains. For analysis of RNAseq data, differentially expressed genes were detected by R Bioconductor package DESeq2 with the Benjamini-Hochberg adjusted p-value < 0.05 and fold change >2. Student's t-tests were performed for the comparisons, and a p-value < 0.05 was considered to represent significance.

Literature Data on Black Fungi in CARD9 Patients
An overview was made of published cases of fungal infections in patients with CARD9 deficiencies ( Table 2; Glocker et al., 2009;Drewniak et al., 2013;Lanternier et al., 2013Lanternier et al., , 2015aGavino et al., 2014Gavino et al., , 2016Drummond et al., 2015;Grumach et al., 2015;Herbst et al., 2015;Jachiet et al., 2015;Alves de Medeiros et al., 2016;Celmeli et al., 2016;Yan et al., 2016;Boudghene et al., 2017;Zhang et al., 2017;Arango-Franco et al., 2018;Cetinkaya et al., 2018;De Bruyne et al., 2018;Zhong et al., 2018;Huang et al., 2019). Sixteen different mutations have been observed. When cases are arranged according to phylogeny of the fungus, three main groups can be recognized: ascomycetous yeasts (Candida: order Saccharomycetales), dermatophytes (order Onygenales), and black fungi (mainly order Chaetothyriales, single representatives of orders Pleosporales and Venturiales). A single proven case with a member of Mucorales has been published. Mutations are not randomly distributed in the three fungal groups (Figure 2; p = 0.001). For example, p.Q295X is found in 13 cases of Candida infection, but was not found among cases by other fungi. The main cytokine impaired was IL-6 (43/50 cases). In Onygenales, this was the only deficient cytokine reported, while, in contrast, in seven cases of Candida infection GM-CSF or TNFα were absent instead. Black fungi were associated with a relatively large panel of missing cytokines, particularly TNFα and IL-1β. The relative absence of cytokines in the three fungal main groups is illustrated in Figure 3.

Genome Sequencing and Assembly
We sequenced the genomes of three strains: BMU 08022 (clinical strain from human skin, ESC1), BMU 00051 (environmental strain from bark, ESE1), and BMU 00047 (environmental strain from soil, ESE2) (Table 1) using a combination of long-read (PACBIO) and short-read (Illumina) sequencing technologies. The third generation sequencing is supplemented by the second generation sequencing, and combined with a variety of assembly software and parameter optimizations (Figure 1). For each genome, PACBIO sequencing provided more than 100 × coverage and Illumina reached 170× coverage. For strain ESC1, we obtained a draft genome assembly of 33,559,924 bp organized into nine contigs with an L90 of 7, and an N50 of 4.0 Mb. The genome of ESE1 was 32,380,025 bp in size and comprised seven contigs with an L90 of 7, and N50 of 4.9 Mb. In ESE2, the genome assembly of 32,696,644 bp contained nine contigs with an L90 of 7, and an N50 4.6 Mb. For all three strains, more than 94% of the Illumina reads were aligned to the draft genome assemblies in post hoc validation ( Table 3). The results of genome assembly of the three strains reached sub-chromosome level.

Colinear Analysis
We used whole-genome alignment to identify homologous and strain-specific segments among three E. spinifera strains. We identified a total of 172,269 and 170,807 SNPs for the clinical strain compared to the environmental strains BMU     (Figure 4). Comparing the genomes of ESC1, ESE1, and ESE2, recombination was observed in scaffolds I and III between the clinical and environmental strains ( Figure 4B). Scaffold I of ESC1 is equal to part of scaffolds I and III of the environmental strains, and scaffold III of ESC1 is equal to the remainders of scaffolds I and III of the environmental strains. This is shown in more detail in Figure 4, where scaffold VIII of ESC1 is shown to be comparable to part of scaffold I of the environmental strains, and two areas proved to be inverted between clinical and environmental strains.

Gene Content and Functional Annotation
Genome functional annotation was performed with INTERPROSCAN, resulting in over 59% of genes being proteincoding genes annotated with GO term. A hybrid strategy combining ab initio predictions and transcriptomic support (RNA-seq) was applied in gene prediction. Annotation of the strains yielded 12,131 protein-coding genes for E. spinifera ESC1, 12,074 protein-coding genes for ESE1 and 12,043 for ESE2 (Table 3). A total of 9,887 genes represented the core genome (Supplementary Figure S1). The environmental strains shared more genes (336) than compared with clinical strain ESC1. The clinical strain had 647 specific gene clusters, included transporter, and binding protein. The list of the respective functional annotations is available in Supplementary Table S2.
Plotting the protein lengths of the three strains, considerable differences were noted (Figure 5): the distribution pattern of strain-specific genes length varied from clinical strain ESC1 to environmental strain, and ESC1 deviated by having more genes. Single-copy orthologous genes were compared to detect differences in proportion of selected sites, with LR as test of significance. In contrast, homologous introns were weakly correlated between ESC1 and environmental strain (ESE1 and ESE2), with significant length variations, while correlation were present between environmental strains (r 2 = 0.57, higher than the other two groups) (Figure 6).  showing an intraspecific variability. Greater differences were found between clinical strain ESC1 and environmental strains ESE1 and ESE2, while the protein length distribution of the two environmental strains was more consistent.

Secondary Metabolite Clusters
We identified a total of 9,329 homologous groups at the gene level (1:1:1 association among three strains). In the assumption that the core genomes with basic metabolic functions have similar biological functions, we focused on secondary metabolites. Seventeen gene clusters were involved in the production of secondary metabolites (Figure 7). Clusters 1, 2, 4, 14, and 16 are PKS-related clusters. Cluster 16 is present in environmental strain ESE2 only. The cluster contains the single-copy gene benzoate 4monooxygenase cytochrome P450, which catalyzes the benzoate degradation step in the toluene catabolic pathway ( Table 5).

RNA-seq
Sequence clustering at the mRNA level was used to compare the three strains to identify homologous groups among the three strains. For all three strains, poly (A)-enriched, strandspecific RNA-seq from two cultivation conditions (SDB and BHI broth) with different culture degrees was performed. Comparison of differentially expressed genes under different culture conditions demonstrated that ESC1 revealed significant differences in response to the two media in 16 single-copy genes, while the environmental strains ESE1 and ESE2 remained almost indifferent (Figure 8 and Supplementary Table S3). There were 48 unique genes (22 in BHI media and 26 genes in SDB media; Table 6) involved in metabolism and transcriptional regulation differentially expressed exclusively in the clinical strain compared to the two environmental strains, including 12 genes without functional annotation. Supplementary Table S4 displays the functional gene content (name, IPR families, KOG, and GO) and respective expression levels in each scenery/media, to enable the visualization/filters of this data. The table includes the gene ID, description, IPR numbers and the GO information. Differentially expressed genes between clinical isolate and environmental isolates under BHI culture are listed in Supplementary Table S5. There were 637 upregulated genes between ESC1 and ESE1, and 659 up-regulated genes between ESC1 and ESE2. These specific upregulated genes in the clinical strain mainly act in transmembrane transport, translational elongation, ribosome biogenesis, and some other biosynthetic processes (GO enrichment results are summarized in Supplementary Table S6). GO analysis showed that the gene annotations were mainly in three categories: biological processes (BP), molecular function (MF), and cell component (CC). GO categories among three strains showed a similar enrichment distribution (Supplementary Figure S2). However, GO annotations of genes unique to each strain were not identical in BP and MF ontologies. In the biological process, the differentially expressed genes were mainly enriched in response to cellular process and metabolic process. In the MF, differentially expressed genes were mainly related to binding and catalytic activity (Supplementary Figure S3). The correlation between differential expression genes and positive selection genes are listed in Supplementary Table S7. There were 53 positively selected genes that were also upregulated between ESC1 and ESE1 in BHI culture. Comparing ESC1 and ESE2 under BHI conditions, 53 positively selected genes were also up-regulated in ESC1. A total of 29 positively selected genes were up-regulated in ESC1 when compared with both ESE1 and ESE2 in BHI broth. These genes were involved in positive regulation of translation, regulation of translational termination, and ATP synthesis coupled proton transport.

Genes Under Positive Selection
Models 1 (ω0 = ω1) and 2 (ω0 = ω1) were used to estimate the ω values as free parameters, with LRs as parameters of significance. A total of 29 genes yielded significant values indicating positive selection (p ≤ 0.0001) (Supplementary Table S8). These genes were subjected to GO analysis and were mainly enriched for BP, including cellular response, positive regulation of phosphorylation and transferase activity. Some replication-and transcription-associated genes and several structural protein genes were also discovered under positive selection (Supplementary Table S9).

DISCUSSION
Black yeasts are generally considered to be opportunists; consequently, they should lack any specialized adaptation to the vertebrate host (Seyedmousavi et al., 2018). Infection is coincidentally promoted by factors that enhance survival in FIGURE 7 | Genome comparison of secondary metabolite clusters of three strains. Seventeen gene clusters were involved in the production of secondary metabolites. Environmental strains ESE1 and ESE2 are predicted to have more secondary metabolite clusters than the clinical strain ESC1.Clusters 1, 2, 4, 14, and 16 are PKS-related clusters. Cluster 16 is present in environmental strain ESE2 only.
the environmental niche of the fungus. Such factors must be present in chaetothyrialean black fungi, given the relatively large number of species reported from infections in humans and in cold-blooded animals. The Chaetothyriales are particularly overrepresented in patients with inherited deficiencies in caspase recruitment domain-containing signaling protein 9 (CARD9), a disorder impairing the dectin pathway of innate immunity. This protein enhances pattern recognition receptors to induce NF-κB and MAPK activation, leading to a cytokine cascade. Malfunctioning of the protein decreases protection against infection by larger microbes such as parasites and especially fungi. It has been puzzling why the patients, theoretically being susceptible to any fungal infection, usually carry only a  (Table 1), a possible explanation might be found in the fact that this protection deficiency appears to be highly specific.
Only three groups of fungi are prevalent: Candida spp. (order Saccharomycetales), dermatophytes (order Onygenales) and black fungi of the order Chaetothyriales. A single case of Mucor irregularis has been published. This is the only species of Mucor that is known to cause chronic infections in apparently healthy individuals (Lu et al., 2013), other Mucorales being acute in compromised patients. As noted by Vaezi et al. (2018), there is a regional bias in CARD9 deficiency cases in that the great majority of cases by Candida and dermatophytes are from northern Africa, Turkey, and Iran, while chaetothyrialean cases are particularly encountered in East Asia. Notably, severe and chronic black fungal infections are well-known in China (de Hoog et al., 2020). However, most cases of disseminated fungal infection in "healthy" patients were published before CARD9 deficiency was discovered, thus the possibility is not excluded that more patients had a hidden inherited immune disorder. Remarkably, common opportunists such as Aspergillus or Fusarium are nearly or completely lacking. The over-representation of chaetothyrialean black fungi in this susceptible patient population is also striking because such infections are rare in other patient cohorts, in healthy as well as in otherwise immunocompromised hosts. In black fungi, i.e., members of Chaetothyriales, Pleosporales, and Venturiales, 40% of the mutations were heterozygous. Mutation types tend to be different between the groups of yeasts, dermatophytes, and black fungi (Figure 2): black fungi responded to other mutations than yeasts and dermatophytes. Also the affected cytokines differed between groups. From these data it is obvious that black fungi, and particularly those belonging to the order Chaetothyriales, respond in a rather specific manner to impairment of the human immune system. Comparing the three main groups (yeasts, dermatophytes, black fungi) with respect to their response to cytokines, quite remarkably, the patients with chronic dermatophyte infections all were reported to have IL-6 deficiency alone, while in black fungi TNFα and often also IL-17A were impaired (Figure 3).
In Candida, a more variable picture was observed, but IL-6 deficiency was observed in 23 out of 31 cases. IL-6 is a proinflammatory cytokine controlling T-cell differentiation, particularly Th17 and regulatory T cells (Zhao et al., 2012). These conclusions should however be taken with some care, as cytokine measurements were not consistent between publications.
The data are nevertheless sufficient to surmise that black fungal infections, particularly of members of Chaetothyriales, are not randomly occurring in susceptible patients with impaired immunity in general, but respond specifically to the conditions provided by CARD9 impairment. Because of their pronounced ability to decompose monoaromatic toxins, black yeasts of this order are enriched in the domestic  (Moreno et al., 2018), and therefore close vicinity to humans is probable. Many species are oligotrophic and have efficient nutrient scavenging systems. Toxin management has been hypothesized to enhance survival strategies in the environment (Gostincar et al., 2018), and industrial pollution by monoaromates might promote growth of these fungi near humans. In this scenario, take-up by susceptible individuals followed by successful infection, may explain their high frequency in CARD9 patients. The cytochrome P-450 gene family has been suggested as a possible factor explaining fungal neurotropism, but our clinical strain lacked PKS cluster 16 which is present in one of the environmental strains ( Figure 5 and Table 4) and thus lacks benzoate 4-monooxygenase, an important gene in hydrocarbon catabolism. Over the past decade, advances in Next Generation Sequencing (NGS) technologies and decreasing sequencing costs allowed an increase in the number of sequenced genomes, with better quality. Long-read sequencing technologies have contributed greatly to comparative genomics among species and can also be applied to study genomics within a species (Kim et al., 2019). The long-read sequencing method, PACBIO, has allowed the ability to obtain complete and accurate sequences. Genome data of our strains are listed in Table 2, showing a range of intraspecific variability of 863,280 bp and 87 genes. These genomes were sequenced by a combined Illumina and PACBIO strategy. We identify a relatively large core genome of 9,887 genes from our three genomes. The two environmental strains shared more genes (336) than a comparison with clinical strain ESC1. The clinical strain had 647 specific gene clusters. Comparing the genomes of E. spinifera sequenced to date, the older Illumina genomes were within the range of the above established species variance. Similarly, comparable numbers of genes were found, suggesting that published genomes are reliable, despite the fact that the number of scaffolds in the Illumina-only genomes was considerably higher.
In our study, the utility of long read sequencing is the possibility to investigate complex fungal genomes and characterize genomic variation. Despite surmised opportunism in black yeasts, our clinical strain of E. spinifera, derived from a CARD9-deficient patient, differed significantly from two environmental strains which originated from different continents. In the environmental/clinical comparison, rearrangements were observed in scaffolds I and III (Figure 4), including two inversions, and ESC1 having an extra scaffold VIII (Figure 4). The rearrangements in this genome area are significant ( Figure 4B). In contrast, clinical strain ESC1 was intermediate between the two environmental strains in numbers of SNPs and Indels. This suggests that the genomic rearrangements were a saltational event in an otherwise homogeneous population, as was noted in Candida after passage through a murine host (Forche et al., 2009). If, in contrast to Candida, E. spinifera is an opportunist residing in a non-human habitat, no transmission takes place after infection, and thus mutations are unlikely to be maintained in the population . The possibility cannot be excluded that the rearrangements were acquired during presence in the CARD9-deficient patient. Whether or not these may contribute to increased virulence of the species in an evolutionary step, as suggested by Casadevall (2007), or if they are lost as supposed by De Hoog et al. (2018), requires study of a larger set of genomes and understanding of natural variability of E. spinifera in a wider array of habitats.
E. spinifera shows variable responses with hemolysis (Song et al., 2017). Although these authors noted that this ability did not match with the division environmental/clinical, the possibility is not excluded that hemolytic strains have an enhanced invasive potency for vertebrate hosts. We registered up-or downregulation of single-copy proteins in response to SDB vs. BHI, and noted that ESC1 showed significant differences with the two media in 16 single-copy genes, while the environmental strains remained almost indifferent; in vitro, ESC1 and ESE1 showed hemolysis, while ESE2 was negative. Most of the singlecopy genes upregulated with BHI were transporters in KOG classification. Moreover, there were 48 unique genes differentially expressed exclusively to the clinical strain in two different media, including genes from various metabolic processes and transcriptional regulation. Up-regulated genes in clinical strain in GO enrichment were mainly involved in transmembrane transport, biosynthetic process, and metabolic process.
In this study, significant genomic rearrangements between strains of the same species were demonstrated. Two environmental strains from different continents were largely identical, whereas a clinical strain from a CARD9-deficient patient was different and deviated in the number of genes, even though all strains were phenotypically similar (Song et al., 2017). In general, clinical and environmental strains of a single opportunistic fungus are taken to have comparable infectious abilities, because neither of them is equipped with specialized virulence factors. However, the observed quantitative genetic variability of strains of the single species E. spinifera possibly is associated with a differential chance to cause infection in susceptible patient populations; differences may be small but clinically relevant. We compared our NGS approach with older Illumina techniques, which showed some minor flaws e.g., in the number of scaffolds, but for quantitative description of the genome these data proved to be sufficient. The power of long reads obtained by PACBIO sequencing is to enable assembly of a complete genome sequence, which disclosed significant genomic rearrangements. Main improvement of PACBIO data are expected in qualitative studies of specific gene functions.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the NCBI. The RNA-seq data can be accessed by PRJNA600825, the PacBio assemblies (PRJNA600825), Illumina assemblies (PRJNA600036). The RNA-seq is available already (https://www. ncbi.nlm.nih.gov/sra/PRJNA600825).

AUTHOR CONTRIBUTIONS
RL and GS designed the experiments and supervised the data analysis. YS performed the experiments and wrote the relevant portions of the manuscript. MD and EY analyzed the data. NM and VV provided the technical support. All authors discussed the results and commented on the manuscript.