Metabolic potential, ecology and presence of endohyphal bacteria is reflected in genomic diversity of Mucoromycotina

We describe the genomes of six Mucoromycotina fungi representing distant saprotrophic lineages within the subphylum (i.e. Umbelopsidales and Mucorales). We selected two Umbelopsis isolates from soil (i.e. U. isabellina, U. vinacea), two soil-derived Mucor isolates (i.e. M. circinatus, M. plumbeus), and two Mucorales representatives with extended proteolytic activity (i.e. Thamnidium elegans and Mucor saturninus). We complement genome analyses with a description of their digestive capabilities, their cell wall carbohydrate composition, and total lipid profiles. Finally, we link the presence of endohyphal bacteria with observed characteristics. One of the genomes, Thamnidium elegans, harbours a complete genome of an associated bacterium classified to Paenibacillus sp. This fungus displays multiple altered traits compared to remaining isolates regardless of their evolutionary distance. T. elegans has expanded carbon assimilation capabilities particularly efficiently degrades carboxylic acids, has a higher diacylglycerol: triacylglycerol ratio and phospholipid composition suggesting a more rigid cellular membrane. Comparison of early-diverging Umbelopsidales with evolutionary younger Mucorales points at several differences particularly in their carbon source preferences and encoded carbohydrate repertoire. All tested Mucoromycotina shares features including the ability to produce 18:3 gamma-linoleic acid and fucose as a cell wall component. Author Summary In our paper, we report on the genomic sequences of six Mucoromycotina strains and an associated bacterium from Paenibacillus genus. Mucoromycotina are often studied in pathogenic context albeit their basic biology remains understudied. This manuscript expands on the collection of currently sequenced Mucorales and Umbelopsidales, including the first sequenced Thamnidium isolate, which was sequenced together with a Paenibacillus bacterium. The interaction with a bacterial partner alters the metabolism, cell membrane composition but not the exoskeleton of the fungus. The associated bacterium provided multiple enzymes that significantly expanded the digestive capabilities of the fungal host. Parallel sequencing and phenotyping of Mucorales and Umbelopsidales enabled us to look at the differences of both lineages within Mucoromycotina. We demonstrate that the predicted digestive capabilities are in line with experimental validation. Based on the cell wall composition data and genomic underpinnings of carbohydrate metabolism we were able to confirm the universal presence of fucose in Mucoromycotina cell walls. Fatty acid, phospholipid and acylglycerol composition support the usage of 18:3 gamma-linoleic acid as a chemotaxonomic marker of Mucoromycotina and corroborate TAG as a dominant storage lipid in these organisms. Genomic features, digestive capabilities, fatty acid composition differ between Mucorales and Ubelopsidales pointing at subtle but significant changes in the course of Mucoromycotina radiation.


Author Summary
In our paper, we report on the genomic sequences of six Mucoromycotina strains and an associated bacterium from Paenibacillus genus. Mucoromycotina  Mucorales [1]. While Umbelopsidales and Mucorales group mostly saprotrophic fungi living in the soil, on dung and litter, Endogonales are known to establish symbiotic interactions with plants [2].
The ancestors of extant Mucoromycotina were among the first colonizers of land. These early-branching fungi possess all of the traits commonly acknowledged as distinctive characteristics of the fungal kingdom such as apical growth, presence of ergosterol in the membranes and a cell wall made of chitin and beta-glucan [3]. Yet, their cell wall differs from ascomycetes and basidiomycetes by the presence of fucose and high amounts of N-acetylglucosamine and glucuronic acid [4,5] which is typical for other Opisthokonta rather than fungi. Mucoromycotina representatives form a fast-growing mycelium of haploid hyphae. The sexual phase includes the fusion of two gametangia and formation of a resting spore called zygospore [1].
Apart from decomposing organic matter as saprotrophs, some Mucoromycotina are capable of forming mutualistic and mycorrhiza-like associations with Haplomitriopsida liverworts [6,7]. Others are parasites of plants and animals [8,9]. The ecology of Mucoromycotina is poorly studied, which hinders understanding of their role in the ecosystem [10].
The best-studied order of Mucoromycotina is Mucorales. Its representatives are perceived mostly as a source of food spoilage, rotting crops and decomposing litter [11]. Due to their efficient plant matter degrading potential, Mucorales are used in various food production processes like the fermentation of soya beans known in the traditional Asian and African cuisine as tempeh [12] or cheese making [13]. Others are employed as bio producers of polyunsaturated fatty acids (eg. M. circinelloides), β-carotene (eg. Blakeslea trispora, M. circinelloides, and Phycomyces blakesleeanus) and diverse hydrolases [14][15][16]. Certain Mucorales (e.g. M. indicus) are also able to produce ethanol [17] while others (e.g. Rhizomucor miehei) are commercially used for lipase production [18] [11]. Mucorales lipid composition includes gamma-linolenic fatty acid 18:3 [19] which is found mostly in plants [20], algae [21] and not in Dikarya which produce the n-3 isomer of the C18 trienoic fatty acid alpha-linolenic acid.
Some Mucorales are involved in life-threatening, opportunistic infections with mortality rates reaching up to 40-80% [22]. There are general traits which predispose microorganisms to become

Genome assembly and annotation
Obtained assemblies showed diverse levels of fragmentation depending on genome size and abundance of repeats. Both Umbelopsis genomes assembled into fewer than 200 scaffolds whereas Mucor assemblies were significantly more fragmented ( Table 1).
Genome completeness was verified using single copy fungal orthologous genes searched by BUSCO [38]. Only a few BUSCOSs were duplicated from 27 in U. isabellina up to 57 in M.
saturninus suggests that sequenced genomes were not duplicated. The number of predicted genes, repeats and transposable elements (TEs) content correlates positively with genome size.

Detection of Paenibacillus bacteria in Thamnidium genome
Initial Thamnidium assembly contained genome fragments characterized by two clearly distinct GC content ratios and for that reason was additionally analysed as a metagenome. It is composed of two easily distinguishable fractions, one belonging to a presumed fungal host and the latter to an associated bacterium representing Paennibacillales, Firmicutes. Fungal and bacterial genomes were re-assembled separately. Remaining five Mucoromycotina assemblies did not contain significant amounts of sequences with sequence similarity to non-Mucoromycotina taxa.

Paenibacillus features
The bacterial genome is moderately complete (BUSCO score 85.7%) and encodes 5815 genes. RNA of related isolates shows its proximity to P. illinoensis isolates despite differences in GC content ( Fig.1.). GC content of the identified Paenibacillus genome could be altered by misplaced fungal reads. Despite discarding all reads mapping on eukaryote genomes when assembling the Paenibacillus genome it is still possible that some fungal fragments, not similar to other eukaryotic sequences, could not be filtered out.
The bacterial genome has only one complete antimicrobial locus, a rifamycin-inactivating phosphotransferase RphD, identified by scanning using ABRicate. We found a partial beta-lactam resistance pathway lacking the central beta-lactamase (four copies of beta-Nacetylhexosaminidase, BlaI family transcriptional regulator, oligopeptide transport system ATPbinding protein from oppA to oppF, penicillin-binding protein 1A and 2A). Additionally, the Peanibacillus genome encodes a single M56 peptidase and 15 proteins bearing an S12 domain.
Family M56 includes BlaR1 which is the antirepressor in beta-lactamase operon [46]. Family S12 groups carboxypeptidases B vital for cell wall synthesis and remodelling, as well as betalactamases [47]. Taken together, all these traces point at a potential beta-lactam resistance of Paniebacillus. Glycopeptide antibiotic resistance may also be present in some form since partial vancomycin resistance operon was also identified, once more only regulation and accessory components are preserved. The possible activity of these proteins and their relevance for resistance is not known.
The Paenibacillus genome harbours vitamin biosynthesis pathways e.g. cobalamin (B12), riboflavin (B2), menaquinone (K2) and thiamine (B1). It also has all the necessary genes for molybdenum cofactor synthesis and manipulation. Paenibacillus has many quorum sensing genes and those coding for flagellar and mobility proteins. Its metabolic potential is described below in parallel with its fungal host and remaining fungal genomes. The Paenibacillus genome encodes 3 copies of Peptidase_U32 (PF01136), a collagenase that may facilitate meat degradation by Thamnidium elegans which is one of the few fungi known for this property [48]. This protein is found mostly in Firmicutes and is absent from eukaryotic genomes. The U32 collagenases are considered as virulence factors in animal-infecting bacteria [49]. The U32 collagenase (PrtC) together with urease subunit alpha (UreB) are parts of the Helicobacter pylori arsenal used in epithelial cell invasion [50]. Paenibacillus sp. genome harbours all three urease subunits (alpha, beta and gamma). Moreover, we identified genes coding other urea processing enzymes including: cyanuric acid amidohydrolase, biuret amidohydrolase (BiuH), urea carboxylase and two copies of allophanate hydrolase (AtzF). The Paenibacillus genome encodes also one copy of ulilysin Peptidase_M43 (PF05572) with possible gelatinase function [51], and 33 copies of Peptidase_M23 (PF01551), which includes mostly bacterial peptidoglycan hydrolases but also prokaryotic collagenases [52].

Proteases
The overall profile of encoded proteases is very similar in all sequenced fungi ( Fig.2.), with high numbers of encoded pepsin-like A01A peptidases, FtsH-like M41 peptidases, M48 peptidases active on di-and tripeptides, proteasome peptidases T1, lysosomal peptidases C26, and ubiquitin-specific proteases C19. These proteases, except for pepsin, contribute to intracellular protein turnover and regulation. Pepsin and subtilisin proteases are found in high copy numbers in all of the genomes pointing at a high degradation potential of these saprotrophic organisms. In all analyzed fungal genomes, we found an expansion of the C44 family. Proteins from this family are homologs of glutamine-fructose-6-phosphate transaminase (GFPT) and its precursor. GFTP is known to control the flux of glucose into the hexosamine pathway which plays a crucial role in the regulation of chitin synthesis in fungi [53]. There is a two-fold expansion of S8A serine proteases in Mucorales when compared to Umbelopsidales (Supplementary Table S3 Among sequenced Mucorales, the average number of protein-coding genes per peptidase family is elevated, which could be a consequence of the whole genome duplication (WGD) described in Mucor and Phycomyces by Corrochano and coworkers [54]. However, BUSCO results show a low level of duplicated genes in all of the isolates. The genome of Thamnidium stands out as it encodes fewer proteases than remaining Mucorales. However, it may benefit from enzymes provided by its endohyphal bacterium which has genes encoding meat crumbling collagenase U32 and plenty of typically bacterial proteases representing families A36, A25, S15, S55, S66, S51, M29, and U57. NagA α-N-acetylgalactosaminidase, NagZ β-N-acetylhexosaminidase (GH109) involved in peptidoglycan recycling occur in multiple copies not only in Paenibacillus genome but also in sequenced fungal genomes. It is worth noting that Peanibacillus has three times more copies (30) of Nag genes than studied fungi. No proteins from this family had been characterized in Eukaryota.
Thamnidium elegans possesses the narrowest set of carbohydrate-processing enzymes which, like in the case of proteases, might be expanded by its associated bacterium. Paenibacullus

Metabolic clusters, secondary metabolites and cofactors
The genome of Paenibacillus harbours NRPS, terpene, bacteriocin, T3PKS and S-layer-glycan producing clusters. Mucoromycotina were long considered devoid of secondary metabolite clusters. A review by Kerstin Voigt and colleagues [16] showed genetic determinants for natural product synthesis present in all analysed genomes. Our newly sequenced genomes encode between 3 and 8 secondary metabolite clusters (according to AntiSMASH scans) belonging to different classes. Interestingly, all of the genomes encode terpene clusters (Supplementary Table S6) which potentially could produce new natural products like those isolated from Umbelopsidales additionally have a trans-2,3-dihydro-3-hydroxyanthranilate isomerase (PhzF) involved in phenazine biosynthesis with yet unknown biological product in fungi [58]. This gene is also present in other fungi and in Endogonales but seems to be missing in Mucorales.
Umbelopsidales produce also a citronellol/citronellal dehydrogenase which converts citronellol to citronellic acid an odorous compound with antimicrobial properties. Sequenced isolates belong to heterothallic genera [64] and genomic screening shows a single sex locus per genome. Generally, these loci contain a single high mobility group (HMG)-domain transcription factor gene (sexP or sexM), flanked by genes for an RNA helicase (rnhA), a triosephosphate transporter (tptA) and an alginate lyase (agl) [65].
The sex locus of T. elegans and M. plumbeus is organized like in other Mucorales with genes in the following order agl/tptA/sex/rnhA [65]. M. saturninus lacks the tptA gene and has a sex locus architecture (agl/sexP/rnhA) like M. mucedo [66] . M. circinatus has remains of an integrase instead of the sex gene between tptA and rnhA genes (agl/tptA/rve/rnhA). Mobile element insertions in the sex locus have already been documented in Phycomyces blakesleeanus [67].
The rnhA gene seems to be excluded from the sex locus in both Umbelopsidales genomes, in which the sex locus is organized like in Mucorales but followed by a gene with an additional protein of unknown function belonging to DUF2405 (PF09435) family (agl/tptA/sexP/DUF2405). Schultz and colleagues described a similar architecture with the DUF2405 for Umbelopsis ramaniana from JGI database [68].

Carbon assimilation profiles
Carbon assimilation profiles obtained for six Mucoromycotina strains by screening on Biolog FF microplates are summarized in Supplementary Table (Supplementary Table S7). None of the analysed strains was able to use a full set of 95 tested carbon sources. Each strain was able to grow on 40 to 70 different substrates and had a unique carbon assimilation profile ( Fig.4) with T.
elegans being the most versatile degrader (Fig.5).    Table   S8). The lowest levels were observed for myristic (14:0) and palmitoleic (16:1) acids. A high percentage of oleic acid (18:1) is characteristic for the Umbelopsis strains and has also been reported by others [70][71][72]. It is also the dominant fatty acid for all isolates except for Thamnidium elegans. A higher share of linolenic acid (18:3) was observed in Mucorales 18.5% compared to an average of 7.5% in Umbelopsidales. However, the efficiency/quantity of the production/synthesis of particular fatty acids depends, inter alia, on the type of substrate used [73].  The genome of Paenibacillus sp. encodes processive diacylglycerol beta-glucosyltransferase required for the synthesis of beta-diglucosyl-DAG -a predominant glycolipid found in Bacillales [84], as well as the Ugp sn-glycerol-3-phosphate transport system which transports glycerol-3phosphate, essential for phospholipid biosynthesis.  Cell-wall carbohydrates A quantitative analysis of the cell wall carbohydrates revealed the presence of high amounts of glucosamine and fucose, and low amounts of mannose, galactose, and glucose compared to an ascomycetous fungus Trichoderma reesei TU-6 ( Fig.9). Glucosamine content in their cell wall was up to 10 fold higher compared to Trichoderma. High glucosamine fraction can be a highlight of the chitin-chitosan cell wall typical for Mucoromycotina [5]. Melida and coworkers [4] reported that genes encoding chitin biosynthesis (CHS) and its respectively. Presence of fucan in the cell wall was previously reported for P. blakesleeanus and R. oryzae [4]. During the synthesis of fucan, fucose is transferred from GDP-fucose to polysaccharides by α-fucosyltransferase and this enzyme was detected in the membrane fraction of M. circinelloides and partially characterized [85].
The α-fucosyltransferase encoding genes were found in 2 and 4 copies in the genomes of P. blakesleeanus and R. oryzae respectively, and they were not detected in N. crassa, which is in accordance with the fact that no fucose was detected in the cell wall of Neurospora [4] and Trichoderma (this study). Differences in the copy number of α-fucosyltransferase encoding genes do not explain the significant differences in the amount of fucose in the cell wall of Mucor spp.  Australia, most often in soil, root, and shoots probes from forest or grassland biomes. The representatives of this genus are well-known late wood colonisers, that probably feed on the substrates which were decomposed by other organisms able to degrade complex substrates like cellulose [95]. However, Umbelopsis representatives are also often isolated from living plant material or forest soil [96] and are considered to be plant growth-promoting organisms and root endophytes [97,98]. They can alter plant metabolism leading to the enhanced production of complex metabolites which are not produced without the endophyte [99]. U. isabellina and U.

Discussion
vinacea described in this study also encoded metabolic clusters including terpenoid clusters.
The representatives of Umbelopsidales were also shown to represent relatively high resistance to some heavy metals like Zn, Mn, Ni or Pb [100] and xenobiotics, such as herbicides [101]. The isabellina cultivated on glucose has presented exceptionally high lipid production (comparable to the highest values achieved for genetically engineered SCO-producing bacterial strains) [73].
Analysed strains displayed a typical composition of PUFAs with a predominance of oleic acid (18:1), and higher levels of gamma-linolenic acid (18:3) in Mucorales compared to Umbelopsidales. Surprisingly, Thamnidium mycelium showed a relatively high saturation index possibly due to interaction with Paenibacillus. Bacteria produce more saturated fatty acids during biofilm formation yet it remains to be elucidated if the bacteria-fungus interaction has a similar effect on fatty acid composition in both partners [103]. Paenibacillus sp. associated with Thamniudium elegans is in line with this trend. Diverse Bacillus bacteria were found living with truffles [109,110]. Paenibacillus has been reported from Laccaria bicolor [111], Sebacina vermifera [112], and is known to produce pre-symbiotic and symbiotic interactions with Glomus [113]. The genus Paenibacillus was erected from Bacillus by Ash et al. [114]; it belongs to the family Paenibacillaceae and comprises 253 highly variable species (https://lpsn.dsmz.de/genus/paenibacillus). The representatives of this genus were isolated from a wide range of sources of plant and animal origin. Paenibacillus tend to occupy a similar niche to Mucoromycotina moulds as they inhabit soil and dung. The best-studied species -P. larvae is known to cause lethal disease of honeybees. However, some other species are known for their plant growth promotion capacities (via siderophores or phytohormones synthesis), others produce a variety of antimicrobials and insecticides. Bacteria from this genus were also shown to produce a plethora of enzymes, like amylases, cellulases, hemicellulases, lipases, pectinases, oxygenases or dehydrogenases [115]. The proteome of P. larvae includes a wide range of virulence factors including proteases and toxins [116]. Paenibacillus validus stimulated the growth of Glomus intraradices [117] and P. vortex facilitates dispersal of Aspergillus fumigatus [118].
Although the representatives of Paenibacillus are known to promote plant growth and secrete several antimicrobial compounds, the endohyphal strain from our study did not encode any antibiotic compounds except for rifamycin-inactivating phosphotransferase. Rather, it provided multiple enzymes that significantly expanded the digestive capabilities of Thamnidium while reducing its genome size. Observed genome shrinking cannot be explained solely by the fragmented assembly of Thamnidium (its incompleteness is estimated at approximately 5%) because the differences in CaZyme and peptidase abundance from remaining isolates are far greater.
One of the major differences between Thamnidium and the remaining isolates is in the lipid composition potentially contributing to cell membrane stiffness. EHB are known to influence host lipid production and bias the TAG/PE ratio [33]. Thamnidium had the lowest TAG/PE ratio among tested isolates which might be a sign of symbiotic interaction with Paenibacillus, yet the noted ratio was still far from the 1:1 "symbiotic equilibrium" described in Lastovetsky's report Thamnidium elegans) were also selected ( Table 2). Related organisms were tested for biotechnological usage and reference genomic information for some of them (e.g. U. isabellina) was already available. The species-level identification of all isolates was confirmed by sequencing of ITS rDNA fragments (according to the protocol proposed by Walther and co-workers [119]) prior to genome sequencing.  formic acid were used. The solvent gradient was initiated at 35% B, increased to 100% B over 4 min, and maintained at 100% B for 11 min before returning to the initial solvent composition over 2 min. The column temperature was maintained at 40 °C, and the flow rate was 600 µL min−1.
The QTRAP 4500 was operated with positive ionization at an electrospray voltage of 5500 V and a targeted multiple reaction monitoring (MRM) approach containing transitions for known precursor/product mass-to-charge ratio. Under these conditions, the TAG ionize as ammonium adducts.
Fatty acid analysis. A lipid sample was diluted in 1.

Culture conditions and DNA extraction
All fungal strains were cultured on 4% Potato Dextrose Agar for 7 days at 20ºC. Total genomic DNA was extracted from 30 mg of fresh mycelium following a CTAB-based chloroform extraction protocol [124] and cleaned-up following caesium chloride density gradient centrifugation method [125]. DNA quality and concentration were estimated by 1% agarose gel electrophoresis and NanoDrop ® (Thermo Fisher Scientific). Quantification of purified DNA was performed on a Qubit Fluorometer. The identity of all strains was confirmed by the preliminary sequencing of the internal transcribed spacer (ITS rDNA) region and with standard morphological identification procedures.

Sequencing
DNA was sequenced at the High Throughput Sequencing Facility of UNC, Chapel Hill, NC, USA.
Whole-genome sequencing was accomplished using a hybrid approach, combining Illumina short-read data with PacBio long-read data.
Total cellular DNA was sheared using a Covaris E220 sonicator to achieve fragments with an average size of 500 bp. Then libraries were prepared using the Kapa Hyper kit. Libraries were size selected for insert fragments around 500 base pairs using Pippin Prep automatic DNA size selection system (Sage Science). Libraries were analyzed and quantified using a LabChip GX automated electrophoresis system (Caliper) and pooled. The pools were sequenced on Illumina MiSeq sequencer paired-end sequencing (2×300 cycles) to obtain longer reads and on Illumina HiSeq sequencer 2500 (2x150 cycles) to obtain more coverage.
For the PacBio RSII data, ten microgram aliquots of genomic DNA were sheared in a Covaris g-TUBE to a target fragment size of 20 kb using the shearing conditions provided in the Covaris g-TUBE user manual. The protocol for preparing a 20 kb library (Pacific Biosciences Procedure and Checklist-20kb Template Preparation Using Blue Pippin TM Size-Selection system) was subsequently followed, using 5 µg of purified, sheared DNA as starting material. Template

Assembly
PacBio reads were processed with SMRT Analysis Software. Spades (v.3.10.1) [126] was chosen as a slow but high-quality assembler [127]. An initial assembly of Illumina reads alone with Abyss [128] produced very fragmented and incomplete assemblies. Quast (v.4.6.1) was used to select the best assembly [129]. The completeness of genome assembly and gene annotation was assessed using BUSCO (v.3b) [38], using BUSCO's fungal dataset universal single-copy orthologs from OrthoDB9 as reference.
BlobTools2 [130] was used to partition the scaffolds, on the basis of read coverage, G/C content, and taxonomic affiliation. The bacterial and fungal reads were assembled separately.  [135] and RMBlast (NCBI blast package of RepeatMasker), based on Repbase database 2017 edition [136] as described in [39].
Phylogenetic analysis of bacteria found in the genome of Thamnidium 16S rRNA sequence was extracted from the bacterial reads found in Thamnidium using blast.
Then it was combined with publicly available 16S sequences of several species and strains of Paenibacillus and Bacillus (GB accession numbers can be found on Fig. 1). Sequences were then aligned using MAFFT [149] and trimmed using trimal -automated1 [150]. Then the best evolution model was detected using modeltest-ng across all evolutionary models [151] (TrN+I+G4 model selected based on AIC, BIC and AICc criteria) and phylogenetic tree was calculated using raxmlng [152] with 1000 bootstrap replicates. The tree was then rooted using four Bacillus sequences. and S.P. performed experimental procedures and prepared the samples, T.A-P and K.SA.
performed carbon source usage experiments U.P-L and J.S.K. analysed cell wall composition, P.B. analysed lipid composition, J.P. analysed carbon source usage.

Supplementary Materials
Supplementary Figure S1. Tree inferred with FastME 2.1.6.1 [153] from Genome BLAST Distance Phylogeny approach distances calculated from genome sequences of Paenibacillus isolates [154]. The numbers above branches are pseudo-bootstrap support values > 60 % from 100 replications, with average branch support of 47.8 %. The tree was rooted at the midpoint by the program. Pae_T is the Paenibacillus sequenced as associated with Thamnidium elegans groups with P. illinoisensis strains.

Supplementary Tables
Supplementary Table S1 Repetitive elements in analysed genomes