Uncovering Diagnostic Value of Mitogenome for Identification of Cryptic Species Fusarium graminearum Sensu Stricto

Fungal complexes are often composed of morphologically nearly indistinguishable species with high genetic similarity. However, despite their close relationship, they can exhibit distinct phenotypic differences in pathogenicity and production of mycotoxins. Many plant pathogenic and toxigenic fungi have been shown to consist of such cryptic species. Identification of cryptic species in economically important pathogens has added value in epidemiologic studies and provides opportunities for better control. Analysis of mitochondrial genomes or mitogenomics opens up dimensions for improved diagnostics of fungi, especially when efficient recovery of DNA is problematic. In comparison to nuclear DNA, mitochondrial DNA (mtDNA) can be amplified with improved efficacy due to its multi-copy nature. However, to date, only a few studies have demonstrated the usefulness of mtDNA for identification of cryptic species within fungal complexes. In this study, we explored the value of mtDNA for identification of one of the most important cereal pathogens Fusarium graminearum sensu stricto (F.g.). We found that homing endonucleases (HEGs), which are widely distributed in mitogenomes of fungi, display small indel polymorphism, proven to be potentially species specific. The resulting small differences in their lengths may facilitate further differentiation of F.g. from the other cryptic species belonging to F. graminearum species complex. We also explored the value of SNP analysis of the mitogenome for typing F.g. The success in identifying F.g. strains was estimated at 96%, making this tool an attractive complement to other techniques for identification of F.g.


INTRODUCTION
Fusarium graminearum sensu stricto (F.g.) ranks number 4 in the top 10 most important and best-studied species that cause diseases on agriculturally important plants (Dean et al., 2012). The fungus is often involved in two diseases of cereals: fusarium head blight (FHB) of wheat and barley and fusarium ear rot (FER) of maize, which lead to major losses for grain production worldwide. Besides yield reduction, F.g. is able to produce mycotoxins, among which trichothecenes pose a serious hazard to human and animals (Dong et al., 2020;Yao et al., 2020). F.g. belongs to the monophyletic fungal complex referred to as F. graminearum species complex (FGSC). This complex includes 16 genetically characterized cryptic species (Sarver et al., 2011), several of which are involved in cereal diseases in certain agricultural areas. The species are difficult to identify by morphological characters and often share a high DNA sequence similarity (Walkowiak et al., 2016). However, despite their close relationship, FGSC species and even strains within species can exhibit distinct phenotypic differences in pathogenicity and mycotoxin production, while some strains lack pathogenicity on certain hosts (Jianbo, 2014;van der Lee et al., 2015;Walkowiak et al., 2016).
The increasing number of genomic sequences that are deposited in public databases has enabled research in genomic features that contribute to phenotypic variation and niche specialization within FGSC. Genome comparisons revealed significant divergence between them that can be mostly linked to single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), and gene content variation, which is species specific or fixed in certain populations. Comparative genomics provides insights into the evolutionary processes contributing to pathogen divergence at both the macroevolutionary and microevolutionary scales (Walkowiak et al., 2016;Kelly and Ward, 2018).
Besides nuclear genome analyses, analysis of fusarium mitogenomes may provide useful data to study phylogenetic relationships and evolution. Despite high conservation, mitogenomes of fungi within complexes often exhibit a significant degree of polymorphism (Brankovics et al., 2017(Brankovics et al., , 2020. This variation largely results from the irregular distribution of mobile genetic elements (MGEs) such as introns and associated homing endonucleases (HEGs; Basse, 2010;Joardar et al., 2012;Pogoda et al., 2019). MGEs comprise a substantial fraction of fungal mitogenomes and display a different evolutionary history from other mitochondrial loci (Guha et al., 2018;Kolesnikova et al., 2019). Their mosaic distribution throughout evolution can be explained with the "aenaon" model, which combines "intronearly" evolution enriched by "intron-late" events through recombination involving vertical and horizontal gene transfer (Megarioti and Kouvelis, 2020). Notably, variation in MGE content appears to be dependent on taxonomic sampling. This variation can be highly conserved, indicating a low mobility of ancestrally acquired MGEs (Kulik et al., 2020b). In contrast, more recent enrichment events result in increased mosaicism of MGE patterns that can be observed at the population level (Wolters et al., 2015).
Fungi of the genus Fusarium display a high variation in MGE content, from MGE-poor to MGE-rich mitogenomes (Losada et al., 2014;Brankovics et al., 2017). Our previous study performed on a large collection of F.g. strains has shown that the mitogenome of this fungus is MGE rich and mainly includes introns from the group I intron family, which harbor either LAGLIDADG or GIY-YIG HEGs. Comparison of mitogenomes of geographically diverse F.g. strains did not reveal any population-specific profiles, thus supporting the hypothesis on ancestral acquisition of HEGs. Assuming their early acquisition and low mobility, recently diverging cryptic species may thus share a similar content of MGEs (Kulik et al., 2020b). This hypothesis will be tested using mitogenomes obtained in this study.
Variation in mitogenomes of closely related fungi may also be detected by phylogenetic conflict between single-gene trees. This phenomenon was discovered among and within members of closely related fungal complexes: the Fusarium fujikuroi species complex (FFSC) and Fusarium oxysporum species complex (FOSC; Brankovics et al., 2020). Conflicting tree topologies indicate incomplete lineage sorting from ancestral polymorphism or more recent interspecies gene flow since both can result in similar phylogenetic patterns (Fourie et al., 2013(Fourie et al., , 2018. The phylogenetic discordance complicates interpretation of phylogenetic reconstructions and largely limits clustering-based identification approaches (Wu et al., 2018).
Besides its evolutionary value, mitochondrial DNA (mtDNA) is promising in the diagnostics of fungi, especially with regard to its multi-copy nature, which facilitates its high recovery and amplification success (Santamaria et al., 2009). Both MGE content variation and SNP in mitogenomes could be used for identification purposes of fusaria (Wu et al., 2019;Kulik et al., 2020b). However, discriminating cryptic species based on mitochondrial sequences is challenging due to the high degree of conservation, and detecting species-specific sites requires screening of a large set of genomic data to differentiate between intraspecific and interspecific variation (Kulik et al., 2015(Kulik et al., , 2020a. The analysis of SNPs provides an excellent tool for identifying and typing species (Uelze et al., 2020a). In general, SNPs display relatively low mutation rates and are evolutionarily stable (Leekitcharoenphon et al., 2012). Usually, SNPs are identified by mapping sequence data from individual strains against a closely related reference genome. This type of analysis relies on the generation of a certain number of SNPs. It includes all studied genomes, including the reference genome (Uelze et al., 2020b).
The aims of this study are (1) to explore whether MGE polymorphisms can be used for identification of F.g., (2) discover the value of SNP analysis for typing F.g., and (3) confirm the utility of mt-based phylogenetic approach for the determination of F.g. identity.
In total, 122 strains were analyzed: 88 strains of F.g. and 34 strains representing other members of the FGSC. We discovered that HEGs present in mitogenomes of the FGSC display small indel polymorphism, which facilitates recognition of F.g. We also showed that whole-mitogenome SNP analysis enables typing of F.g. with 96% confidence, which can make this tool a valuable complement to other diagnostic tools for F.g. A phylogenetic analyses based on core mitochondrial genes showed different topologies in the reconstructed phylogenetic trees and did not cluster strains of F.g. into single Frontiers in Microbiology | www.frontiersin.org 3 August 2021 | Volume 12 | Article 714651 species-specific clades, which neglects the value of this approach for determination of species identity.

DNA Extraction and Sequencing
DNA from fungal strains was extracted from 0.

Exploration of MGE Diversity
A total of 122 mitogenomes were compared through multiple sequence alignments. The analysis was performed with progressive Mauve (Darling et al., 2010) implemented in Geneious Prime software to examine the distribution of MGEs in the mitogenomes.

SNP Analysis
SNP analysis was performed on the interactive web-based platform Edge bioinformatics. 1 SNPs were detected in multiple comparisons of assembled mitogenomes of the strains to the reference whole genome of PH-1 of F.g. (accession number: MH412632). Eight mitogenomes from closely related morphospecies (Fusarium culmorum, Fusarium cerealis, and Fusarium pseudograminearum) were also included in the analysis.
The remaining single-gene alignments were excluded for further phylogenetic analyses due to low sequence polymorphism. The best partition schemes and corresponding substitution models were estimated using PartitionFinder2 (Lanfear et al., 2016). Afterwards, based on the alignment and obtained models, maximum likelihood analysis was conducted using IQ-TREE 2.0.3 with 1,000 ultrafast bootstrap (Minh et al., 2020). F. pseudograminearum was used as an outgroup.
To reveal variation in mitochondrial CDS, core mitochondrial genes were extracted and aligned separately using MAFFT software (v.7.453; Katoh and Standley, 2013). Gene polymorphism analyses were conducted for each CDS based on the alignment of 130 strains. Variation within each CDS was identified as a SNP or indel and counted with the use of an in-house Python script. Additionally, each SNP was characterized as synonymous or nonsynonymous. Nucleotide diversity values (π) for each CDS were calculated with TASSEL software (v.5.2.40;Bradbury et al., 2007). As nucleotide diversity is based only on nucleotide substitutions, the number of indels and percentage of polymorphic sites are given for each CDS. To reveal variation in intronic sequences, conserved introns (found in all studied strains) were extracted and analyzed as described for mitochondrial CDS.

General Characteristics of Fungal Mitogenomes
Mitogenomes of species within the FGSC are highly conserved in terms of a set of 15 protein-coding genes, two rRNA genes (rns and rnl), and 28 tRNA genes, which were localized in the same order and orientation. Their mitogenomes displayed a similar GC content between 31.3 and 32.1%. However, mitogenome comparisons showed considerable differences in their size ranging from the smallest 88.409 bp mitogenome of Frontiers in Microbiology | www.frontiersin.org CBS 119183 Fusarium cortaderiae to the largest 106.714 bp in CBS 119177 F. vorosii. This variation was mainly associated with a mosaic distribution of MGEs, which is described in one of the following sections.
All FGSC strains contained a large open reading frame with unknown function (LV-uORF;Al-Reedy et al., 2012), which is in every strain located between the rnl and nad2 genes. This LV-uORF differed in size from 4.536 to 6.273 bp due to indel polymorphism (Figure 1).

Mobile Genetic Elements and Their Distribution in FGSC
A total of 45 introns and 56 HEGs were identified in the set of mitochondrial protein-coding genes of the studied strains. Nineteen introns were highly conserved and present in all studied strains. More than half of the introns (n = 24) identified were located in the three subunits of the cytochrome c oxidase (cox genes): 14 in cox1, 5 in cox2, and 5 in cox3. Positions of introns were highly conserved among different strains. All introns belonged to the group I intron family except i1 cob found in three strains: F.g.  and F. vorosii (CBS 119177), which was classified as belonging to the group II intron. This intron encoded a protein of the reverse transcriptase family. Most of group I introns harbored HEGs, of either the LAGLIDADG or GIY-YIG type, which can be determined based on differences in conserved amino acid motifs. Among 56 distinct HEGs, 33 were assigned as LAGLIDADG endonucleases and 16 as GIY-YIG endonucleases, while seven HEGs could not be precisely predicted as LAGLIDADG or GIY-YIG.
Eight introns showed very high similarity with no size difference variation, while 11 introns displayed some size variation due to small indels mostly below 100 bp in size. Nine introns showed large size variation (the i4 nad2 intron, the i1 and i2 cox2 introns, the i2 and i5 cob introns, the i3 cox1 intron, the i1 atp6 intron, and the i1 and i3 cox3 introns), mostly due to the irregular presence of HEGs. HEGs and/or introns that are absent in at least one studied strain are termed "optional" throughout this manuscript.
From the total 46 optional HEGs, 11 were very limited in distribution. A single strain of F. mesoamericanum (CBS 110252) harbored two optional HEGs (144 aa and 170 aa) in the i5 nad2 intron. Due to loss of conserved amino acid motifs reflecting their functional differences, these two HEGs were not precisely determined to either LAGLIDADG or GIY-YIG endonucleases. Similarly, a single strain of F. vorosii (CBS 119177) contained an optional LAGLIDADG (352 aa) found in the i1 atp6 intron. The presence of two optional GIY-YIG (314 aa and 315 aa) and one LAGLIDADG (196 aa) sequences in intron i9 of the cob gene was found in a single strain of F. asiaticum (CBS 110258). More frequent distributions were observed for the remaining MGEs found in the cob gene. An optional LAGLIDADG (474 aa) located in the i2 cob intron was shared by two (CBS 316.73 and CBS 119170) of the three strains of F. boothii. Three strains, CBS 123662 (F. acaciaemearnsii), CBS 110244 (F. austroamericanum), and CBS 123754 (F. ussurianum), harbored an optional LAGLIDADG (234 aa) in the i8 cob intron. Two other HEGs found in the cob gene were distributed in FGSC at 12.5% frequency and included an optional GIY-YIG (127 aa) in the i2 cob intron of   A large collection of F.g. strains enabled the study of MGE distribution in their mitogenomes and possible cryptic divergence. Forty different MGEs were found in F.g.; however, none showed a species-specific conservation. We previously found that a unique MGE pattern in the cox3 gene allows reliable differentiation of F.g. from other closely related morphospecies. This unique pattern mainly concerns the species-specific loss of the i3 intron, which was evident in all strains of F.g., regardless of their geographic origin. In this study, we found that in contrast to F.g., the i3 cox3 intron was present in almost all other members of the FGSC, except for two strains of F. asiaticum (191,317 and 180,363;Yang et al., 2020; Supplementary File S1). However, besides their irregular distribution, HEGs can differ in size due to small indel polymorphism ( Table 1). To explore the value of this type of polymorphism for identification of F.g., we compared a total of 1,564 individual HEGs and found that the difference in HEG size observed in eight HEGs (Table 1) can be used as a unique diagnostic feature for F.g.

SNP Analysis
We performed a whole-mitogenome SNP analysis to reveal the utility of this type of approach for differentiation of F.g. from the other species. We calculated the number of SNPs for every mitogenome to reveal if interspecific variability in mtDNA is greater than intraspecific variability, indicating the existence of the so-called "barcoding gap" in the data set. SNPs were detected from the core genome of assembled mitogenomes by multiple comparisons of studied mitogenomes against the reference mitogenome of PH-1 strain (CBS 123657, accession number: MH412632). Analyses of variation in both CDS and conserved introns (Tables 2 and 3) showed that most SNPs could be found in intronic sequences. In general, SNP variability was low, ranging from 9 to 230 SNPs per strain (Supplementary File S2). In most cases, intraspecific variability did not exceed 30 SNPs. However, two exceptional strains of F.g., INRA-156 and CBS 134070, yielded 32 and 36 SNPs, respectively, which exceeded the least interspecies variability (31 SNPs) found for F. vorosii strain CBS 119177. Thus, assuming intraspecific variability below 30 SNPs, two out of 88 strains of F.g. could not be determined based on SNP analysis, which indicates 96% accuracy of this SNP-based approach. Besides exceptional strain (CBS 119177), the least interspecific variability was found for two strains of F. gerlachii, yielding 36 SNPs each. Surprisingly, the highest number of interspecific SNPs was found for two strains of F. ussurianum (227 and 230 SNPs, respectively), which exceeded the number of SNPs found in case of three morphospecies: F. cerealis, F. culmorum, and F. pseudograminearum. This result may indicate that some cryptic species may have an accelerated  evolutionary rate with a unique evolutionary pattern in their mtDNA. In addition, the difference in evolutionary rate could be even observed at the strain level. Among the three strains of F. vorosii, CBS 119178 and CBS 123664 exhibited a similar number of interspecies SNPs (147 and 148 SNPs, respectively), which is in contrast to the exceptional strain (CBS 119177), which, as indicated above, yielded only 31 SNPs. An evidence for different evolutionary rates at the strain level was also observed by positioning some strains of the same species into scattered clades on a phylogenetic tree. This has been indicated in the next section.

Phylogenetic Analyses
Phylogenetic trees of atp9, cox1, and cox2 genes possess different topologies, which can be explained by their different evolutionary histories. In addition, majority of the well-supported clades were multispecies. These results arise from low CDS variation in the studied strains, which were further confirmed by calculating nucleotide diversity values, number of polymorphic sites, and SNPs ( Table 2). We also found that some strains of the same species were scattered in different clades of the tree. This was mostly evident in the cox1 tree for strains from F.g. and other cryptic species such as F. boothii, F. mesoamericanum, F. cortadariae, and F. vorosii (Supplementary File S4), suggesting incomplete lineage sorting and introgression in the course of the evolution of the studied Fusaria.

DISCUSSION
The introduction of molecular tools has revolutionized the diagnostics of microorganisms, including fungal plant pathogens. The use of DNA-based markers opens possibilities for improved control of specific pathogens and pathotypes (Luchi et al., 2020). Identification of fungi based on mtDNA provides a valuable alternative to existing molecular approaches based on nuclear data, especially when efficient recovery of DNA is problematic. mtDNA can be amplified with higher success than nuclear DNA due to its high copy number in fungal cells (Krimitzas et al., 2013;Franco et al., 2017). However, exploration of mtDNA for diagnostic purposes may be challenging due to difficulties in detecting species-specific polymorphism. These difficulties are mostly reflected by (1) the highly conserved nature of mtDNA , (2) intrastrain mosaicism in MGE patterns (Losada et al., 2014), and (3) phylogenetic discordance between different mitochondrial genes (Brankovics et al., 2017;Fourie et al., 2018). A previous comparative analysis of a large set of F.g. strains underlined the high conservation and similarity of mitogenomes in closely related morphospecies: F. cerealis, F. culmorum, and F. pseudograminearum. In general, mitogenomes of these closely related species were MGE rich, and their irregular distribution among strains determines, to a large extent, mitogenome variation. MGEs that do not display  Frontiers in Microbiology | www.frontiersin.org mobility among other lineages can be fixed in species. Such targets are good candidates for developing diagnostic tools for identification of species; however, their identification requires comparison of a large set of target and nontarget strains. On the other hand, frequent mobility of MGEs could be incorporated into population genetic studies illustrating the local genetic variation and gene flow (Wolters et al., 2015;Wu et al., 2019).
In this study, we confirmed that the irregular distribution of MGEs does not follow divergence of cryptic species within the FGSC. This finding supports the hypothesis of low mobility of ancestrally acquired MGEs that are consequently shared by geographically diverse cryptic species (Kulik et al., 2020b). These ancestral elements may undergo changes throughout evolution, and assuming the lack of horizontal gene flow, these alternations should, at least in part, exhibit species-specific polymorphism. Indeed, in this study, we found that HEGs shared by different species can display small indel polymorphism that is species specific. The small differences in lengths of certain HEGs (Table 1) may facilitate further differentiation of F.g. from the other members of the FGSC.
In this study, we also explored the value of SNP analysis for typing F.g. This approach is increasingly used in the epidemiologic analyses of bacterial (Schürch et al., 2018) and fungal pathogens (Uelze et al., 2020b); however, to our knowledge, it has not yet been tested using fungal mtDNA. Successful identification of strains to the species level is achieved based on difference in intraspecific and interspecific SNP divergences (Bae et al., 2021). We found that intraspecific variation among F.g. strains was very low. Based on our assumption that intraspecific variation may not exceed 30 SNPs, the success rate in identifying strains of F.g. was estimated at 96%. The high confidence of this approach, its simplicity to perform, and the limited time needed for the analysis make this tool an attractive complement to other more refined techniques for identifying of F.g. (Wu et al., 2019;Kulik et al., 2020b).
Mitochondrial sequences can contribute to species identification due to their efficacy in revealing phylogenetic relationships among the studied taxa. However, we found that for F.g. and the other cryptic species within the FGSC, this approach is not discriminative enough mostly due to the highly conserved nature of mtDNA and lack of concordance between the different gene genealogies, which presumably results from incomplete ancestral lineage sorting. Our findings on the lack of concordance between different phylogenies of mitochondrial genes are in line with previous studies by Fourie et al. (2013Fourie et al. ( , 2018 and Brankovics et al. (2020) who detected interspecies gene flow among mitogenomes of closely related members of the FFSC and FOSC. The success in detecting ancestral gene flow events through mitochondrial comparisons is due to the greater levels of conservation and synteny of fungal mitogenomes than observed in nuclear compartments (Brankovics et al., 2020). However, from a diagnostic point of view, this phylogenetic discordance largely limits clustering-based identification approaches (Wu et al., 2018).
Overall, the results presented in this study showed that the use of mtDNA provides valuable information for identification of cryptic species within fungal complexes. Although fungal mitogenomes lack a "universal barcode" for tagging cryptic species, they might display other patterns of species-specific conservation. Uncovering of these sites requires testing a large collection of geographically diverse strains to differentiate between strain-and species-specific variation.

CONCLUSION
Mitogenomes show promise for identification purposes of important cryptic species like F.g. in the FGSC. Improved identification could be achieved by the combination of intronic variation analysis and whole-mitogenome SNP analysis. However, mitogenomes also show evidence of ancestral gene flow among other members of the FGSC, which largely limits clusteringbased identification.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
JW and TK wrote the manuscript. JW, AD, MŻ, AS, and KB contributed to isolation and strain growth. JW, MŻ, and KB extracted the fungal DNA and prepared libraries for wholegenome sequencing. TK assembled the mitogenomes and was responsible for the study design. KM and TM performed the mitogenome annotation. TM performed the phylogenetic analysis. AD and AS edited the manuscript. All authors contributed to the article and approved the submitted version.