Genomic Analysis of Sarcomyxa edulis Reveals the Basis of Its Medicinal Properties and Evolutionary Relationships

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.


INTRODUCTION
Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada is a fungus that is native to the temperate regions of Northeastern China, Northern Japan, United States, and Russian Far East (Jin et al., 2001;Dai et al., 2003;Saito et al., 2014). It is commonly known as the late oyster and is also called "Yuanmo, " "Huangmo, " or "Dongmo" in Chinese and "Mukitake" in Japanese. The fruiting body of S. edulis is fan-shaped, which is similar to Ganoderma lucidum ("Lingzhi"), but because of its yellow color, it is also known as "Huanglinggu" in China.
Sarcomyxa edulis was first described and assigned to the genus Panellus (Dai et al., 2003) through a comparison with the terribly bitter mushrooms from Finland to the specimens of China. S. edulis is under the same genus as Sarcomyxa serotina. As the type species, the taxonomic status of S. serotina has changed multiple times from the Agaricus genus to Pleurotus, Sarcomyxa, Hohenbuehelia, and finally Panellus (Dai et al., 2003). Phylogenetic studies have shown that S. serotina and S. edulis are excluded from the Panellus clade. A maximum likelihood (ML) tree constructed based on the D1/D2 region of the large subunit of the 28S ribosomal RNA gene showed that the Sarcomyxa genus formed a clade that was independent of Mycena and Panellus (Matheny et al., 2006;Saito et al., 2014). Thus, the classification of Sarcomyxa is controversial.
Sarcomyxa edulis was domesticated in the last decade in China; like Hericium erinaceus, it is an important specialty mushroom cultivated in Northern China, the largest production region. The commercial cultivation of S. edulis is profitable because of high demand. However, production has markedly declined in recent years as a result of diseases and the lack of resistant varieties.
Knowledge of the biology and evolution of S. edulis is limited because molecular-level information from the genome is lacking. The availability of a whole-genome sequence can clarify the taxonomy of S. edulis and aid future breeding efforts to improve this species for commercial cultivation. To this end, in the present study, we sequenced and annotated the genome of a monokaryotic strain of S. edulis and carried out a phylogenetic analysis that included 32 sequenced fungi, with the aim of establishing the taxonomic status of S. edulis and identifying secondary metabolites of medicinal importance.

Sarcomyxa edulis Specimens
Sarcomyxa edulis basidiomes were collected from maple wood in Antu County, Jilin Province, China. Specimen 2016092521 was identified by morphologic and molecular analyses (Dai et al., 2003;Saito et al., 2014). The specimen was deposited in the herbarium of the Culture Center of Mycophyta in Jilin Agricultural University under accession number CCMJ18024 (Figure 1). The monokaryotic strain SE1 was germinated from one of the spores of specimen 2016092521 (Choi et al., 1999;Senanayake et al., 2020) and used for whole-genome sequencing. The mycelia of SE1 were cultured on an improved potato dextrose agar medium containing 5% wheat bran for 10 days at 24 • C in the dark and collected for genome sequencing.

Whole-Genome Sequencing
Total DNA of S. edulis strain SE1 was extracted using the NuClear Plant Genomic DNA kit (Tiangen Biotech, Beijing, China). The DNA was detected by agarose gel electrophoresis and quantified with a Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, United States). The whole genome of SE1 was sequenced using PacBio RSII Single Molecule Real-Time (SMRT) technology, which generated 20-kb SMRTbell libraries. Highthroughput sequencing on an HiSeq PE150 system (Illumina, San Diego, CA, United States) was carried out to polish the DNA sequence; additionally, paired-end read libraries were obtained by sequencing 350-bp inserts.

Gene Annotation and Functional Analysis
In order to determine the functions of the predicted genes, we compared homologous genes to protein and nucleotide sequences in the general functional BLAST databases. The basic steps of functional annotation were as follows: (1) the predicted gene protein sequences were compared with the functional databases by DIAMOND (with an e-value ≤ 10 −5 ; Buchfink et al., 2014) and (2) filtering of alignment results: for the alignment results of each sequence, select the alignment result with the highest score (default identity ≥ 40%, coverage ≥ 40%) for annotation (Altschul et al., 1990).
Secondary metabolites were annotated using the antiSMASH (fungiSMASH option) database 1 and NaPDoS 2 (Ziemert et al., 2012;Weber et al., 2015;Blin et al., 2017). To validate the predicted results, the obtained gene clusters were manually verified. Each gene model within the database was searched with BLASTP and TBLASTN algorithms (Grigoriev et al., 2012). For polyketide synthase (PKS)/non-ribosomal peptide synthetase (NRPS) analysis, we selected a query type and entered the sequence data to identify ketosynthase KS and/or condensation C domains using the NaPDos pipeline 3 . Carbohydrate-active enzymes (CAZymes) were determined using the dbCAN 2 meta server (Zhang et al., 2018). Cytochrome P450 monooxygenase (P450) analysis was performed by searching a reference dataset 4 using the BLASTP program (Nelson, 2009).

Identification of Specimen 2016092521
Specimen 2016092521 was identified as S. edulis. It is mild-tasting with ventricose cystidia, while the related species S. serotina is very bitter-tasting with hymenial cystidia. The basidiospores of the specimen are slightly longer than in S. serotina, 4.5-6.0 µm × 1.0-1.3 µm (Supplementary Figure 1). Maximum parsimony phylogenetic tree based on ITS gene sequences showed that specimen 2016092521 clusters with S. edulis and is distinct to S. serotina (Supplementary Figure 2).

Features of the S. edulis Genome
In total, 5,318 Mb of clean data were obtained, from which a 35.65-Mb assembly was obtained. The genome consisted of 41 contigs with N50 of 1,772,559 bp, N90 of 554,178 bp, and 48.31% G + C content ( Table 1). A BLAST search of repeat sequences yielded 1,371,373 bp, covering 3.85% of the SE1 genome; meanwhile, interspersed nuclear elements and TRs accounted for 1.79 and 2.05% of the genome, respectively. Approximately 1.53% of the genome was long terminal repeats, 0.09% was DNA transposons, and 0.16% was long interspersed nuclear elements.
The proportion of TRs in the assembled genome was 1.36%, while minisatellite and microsatellite DNA accounted for 0.60 and 0.01% of the genome, respectively.

Functional Annotation
There were 9,364 gene models predicted in the different databases with a total sequence length of 15,135,900 bp, accounting for 42.45% of the whole genome with an average sequence length of 1,616 bp. We predicted 143 tRNAs (13,216 bp), 14 rRNAs (29,757 bp), 37 snRNAs (3,044 bp), 143 miRNAs (8,449 bp), and six sRNAs (359 bp; Supplementary Table 2). Among the 143 tRNAs, 21 were putative pseudogenes and 122 were anticodon tRNAs corresponding to the 20 common amino acid codons.
Cytochrome P450 (CYP) is a superfamily of hemoproteins that use heme as a cofactor. CYPs have various substrates in different enzymatic reactions and are present in all kingdoms. We identified 83 putative CYP genes in S. edulis through a BLAST search and classified them into 34 families (Supplementary Table 4). The CYP5144 family had the highest number of enriched genes (14), followed by CYP5037 (13).
The CAZymes play important roles in the degradation of renewable lignocelluloses to provide carbohydrates for fungal growth, development, and reproduction (Xie et al., 2018). We determined CAZymes in the SE1 genome using the dbCAN 2 meta server. A total of 313 CAZyme-encoding gene models were assigned, including 93 superfamilies, viz., six carbohydrate esterases, 41 glycoside hydrolases, 23 glycosyltransferases, 11 carbohydrate-binding modules, eight auxiliary activities, and four polysaccharide lyases (Table 3). There were 63 AA (auxiliary activities) genes in the SE1 genome, including 13 AA1 (multicopper oxidase), 10 AA2 (lignin-modifying peroxidase),  20 AA3 (glucose-methanol-choline oxidoreductase including cellobiose dehydrogenase, arylalcohol oxidase/glucose oxidase, alcohol oxidase, and pyranose oxidase), five AA5 (copper radical oxidase), two AA6 (1,4-benzoquinone reductase), one AA8 (cellobiose dehydrogenase), and 11 AA9 (lytic polysaccharide monooxygenase) genes. We further investigated genes involved in the biosynthesis of secondary metabolites in S. edulis based on those identified in previous studies and found genes encoding fatty acid synthetases, NRPS, PKS, siderophore synthetases, and terpene synthases (Schwarzer et al., 2003;Finking and Marahiel, 2004;Sims and Schmidt, 2008;Chen et al., 2012;Lackner et al., 2012;Quin et al., 2014) that were often in contiguous gene clusters . Using the fungiSMASH database, we identified 39 secondary metabolite gene clusters in the SE1 genome, including four terpenes, one NRPS, one T1PKS, one siderophore, and one fatty acid gene cluster (Supplementary Table 5). Unlike primary metabolism, secondary metabolism does not participate in the growth and development of an organism but instead promotes its survival in a given environment, for example, by enhancing the defense response to pathogens. A Natural Product Domain Seeker analysis identified two C domains (C2 bacitracin_DCL and C1 cyclosporin_dual) and four KS domains (naphthopyrone_iterative, avermectin_modular, myxothiazol_modular, and epothilone_modular) in the SE1 genome.

Phylogenetic Analysis of S. edulis
A phylogenetic tree was constructed based on single-copy orthologous protein genes from 32 species of fungus, including 30 from the phylum Basidiomycota and two from the phylum Ascomycota serving as outgroups (Figure 7). The result showed that the Sarcomyxa genus formed a distinct clade to Panellus and Mycena in the Mycenaceae family. The topology suggested that Sarcomyxa was also independent of Pleurotaceae. Fungi in the Basidiomycota and Polyporales clades were clearly separated from those in the Agaricales clade, with Polyporales diverging before Agaricales. Notably, the Agaricales order followed a certain evolutionary pattern from "on log" to "on ground" growth.

CAZyme Analysis
GHs, GTs, PLs, CEs, and AAs-which catalyze the modification, biosynthesis, or degradation of glycoconjugates and carbohydrates-were the main CAZymes in the SE1 genome. CMBs were non-catalytic modules that appended to the enzymes stated above. There were twice as many GH genes as there were GT genes; this may be related to the lignocellulose degradation capacity of S. edulis that is necessary for its survival. Polysaccharide decomposition is more important for S. edulis than anabolism. S. edulis had fewer genes related to initial lignin degradation than the average number present in Basidiomycota fungi (Supplementary Table 6; Yuan et al., 2019). The protein products of CE, GH, and PL superfamilies are involved in the breakdown of the polysaccharide portion of plant cell walls, which mainly consists of pectin, cellulose, and hemicellulose and are known as cell wall-degrading enzymes (Ospina-Giraldo et al., 2010;Yap et al., 2014). S. edulis had fewer candidate CAZymes than other edible fungi, but it could make better use of hardwood. Therefore, the mechanism of substance degradation of S. edulis needs to be further explored.

Secondary Metabolite Analysis
Medicinal fungi have been the focus of many pharmacologic studies because of their secondary metabolites, which have antioxidant, antitumor, and antimicrobial properties. S. edulis has shown promising pharmacodynamic activities (Li, 1991;Inafuku et al., 2012;Kim et al., 2012;Inoue et al., 2013;Li et al., 2013Li et al., , 2016Zhang et al., 2014). Type I PKSs constitute a family of multifunctional proteins with high molecular weight and multiple catalytic domains that play an important role in the biosynthesis of reducing macrolide polyketides (Fujii et al., 2001). The chemical properties of predicted type I PKS genes in the SE1 genome have been characterized. Two C domains on contig 7-bacitracin and cyclosporine-were identified. Bacitracin is a branched cyclic peptide antibiotic active against Gram-positive bacteria and acts by binding to undecaprenyl pyrophosphatea lipid carrier of cell wall precursors-and interfering with cell wall and peptidoglycan biosynthesis. Furthermore, bacitracin has been used for the prevention and treatment of skin and ophthalmic diseases (Ishihara et al., 2002;Hiron et al., 2011).
As an immunosuppressant that is administered orally or via injection, cyclosporine is used to treat nephrotic syndrome (Cattran et al., 2007), psoriasis (Kumar et al., 2016), and keratoconjunctivitis sicca (dry eyes; Kasper et al., 2017). Four KS domains were also identified: naphthopyrone (KS4 on contig 12), avermectin (KS1 on contig 17), myxothiazol (KS2 on contig 17), and epothilone (KS3 on contig 17). Naphthopyrone prevents advanced glycation end product formation (Lee et al., 2006), while avermectin belongs to a family of macrocyclic lactones and functions as a biological pesticide that is highly active against a variety of nematodes. Avermectin has been widely used as an antiparasitic agent in aquaculture (Sheng et al., 2015).
Myxothiazol is an antibiotic that inhibits ubiquinol. Epothilones are a class of water-soluble compounds similar to taxanes and with anticancer potential that has shown greater efficacy than paclitaxel in P-glycoprotein-expressing multidrug-resistant cell lines (Kowalski et al., 1997). At present, research on the active components of S. edulis are mainly focused on polysaccharides and lipids. However, study on PKS has not been reported. Therefore, the active constituents of the mushroom are still waiting to be tapped.

Phylogenetic Analysis of S. edulis
Sarcomyxa edulis is an important edible and medicinal fungus that belongs to the order Agaricus, but the taxonomy of this genus is still controversial. In order to comprehensively analyze the relationship between S. edulis and related species and genera, 32 fungal species from Basidiomycota and Ascomycota were used in the phylogenetic analysis. The whole genome sequences of one to two representative species from different groups of edible and medicinal fungi and several adjacent genera of Sarcomyxa were analyzed to show the correctness of the analysis results. Phylogenetic analyses of the concatenated dataset using ML methods resulted in an identical and well-supported topology in all alignment strategies compared to the study of Lin et al. (2013).  It can reasonably model the large stochastic processes with lower variance. Even with very short sequences, it may outperform alternative methods such as parsimony or distance methods.
It also has an explicit evolutionary model for data and better accounting for branch length (Mishler, 2006). For example, Li et al. (2021) and Xu et al. (2020) used the ML algorithm in RAxML to construct a genome-based phylogenetic tree. In this study, based on the ML/NJ phylogenetic analysis of S. edulis and 31 other species of fungus, it was shown that the Sarcomyxa genus is a distinct group. This is consistent with the previous classification of Sarcomyxa as a clade that is independent of and emerged before the Mycena and Panellus genera based on a ML tree constructed using the D1/D2 region of S. edulis comb. nov. and other species (Saito et al., 2014). A study of the major clades of the Agaricales order arrived at a similar conclusion based on a maximum probability tree that combined rpb1, rpb1-intron2, rpb2, 18S, 25S, and 5.8S nucleotide sequences (Matheny et al., 2006). Additionally, we found that Basidiomycota, Polyporales, and Agaricales constituted distinct clades. The evolution of the Agaricales order followed a trend from "on log" to "on ground" growth. We also observed a transition from white rot to brown rot in the Polyporales clade over the course of evolution. This is in accordance with a previous analysis of lignin-degrading peroxidases in 31 fungal genomes, which showed that the lignin degradation mechanism of white rot fungi was retained in the early evolution of Agaricomycetes, while the loss of lignin peroxidases led to the emergence of brown rot fungi exhibiting a non-ligninolytic mode of wood decay (Floudas et al., 2012; Supplementary Table 1).
The comparative analyses of secondary metabolites revealed that S. edulis harbors terpene and PKS biosynthesis-associated genes. The annotated whole-genome sequence of S. edulis can serve as a reference for identifying genes related to the synthesis of bioactive compounds with medicinal or nutritional value FIGURE 7 | Phylogenetic tree of Sarcomyxa edulis and 31 other fungal species. Maximum likelihood and neighbor-joining above 50% were placed close to topological nodes and separated by "/". The bootstrap values below 50% were labeled with "-". Asc, Ascomycota; Bas, Basidiomycota; Sor, Sordariales; Sac, Saccharomycetales; Ust, Ustilaginales; Tre, Tremellales; Seb, Sebacinales; Pol, Polyporales; Bol, Boletales; and Aga, Agaricales. and the development and commercial production of superior S. edulis varieties.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are publicly available. This data can be found here: The genome sequence data and assembly reported in this paper are associated with NCBI BioProject: PRJNA483858 and BioSample: SAMN09748201 in GenBank. The Whole Genome Shotgun (WGS) number is INSDC: QUOL00000000.1).

AUTHOR CONTRIBUTIONS
FT contributed to conceptualization, methodology, software, data analysis, investigation, resources, data curation, and manuscript writing-original draft preparation. CL contributed to data validation, supervision, and project administration. CL and YL contributed to writing-review and editing. YL contributed to visualization and funding acquisition. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We are grateful to Francis Martin and Christoffer Bugge Harder and the 1000 Fungal Genome consortium for access to unpublished genome data. The genome sequence data were produced by the United States Department of Energy Joint Genome Institute in collaboration with the user community, and we would like to thank the National Center for Biotechnology Information for the genomic data provided. We thank the Mushroom Base of Jilin Agricultural University for assistance with cultivation.