Insights into the genomic architecture of a newly discovered endophytic Fusarium species belonging to the Fusarium concolor complex from India

In this study, a new species Fusarium indicum belonging to the Fusarium concolor species complex is established to accommodate an endophytic fungus isolated from Bambusa sp. and collected from Himachal Pradesh. The identity of this isolate was confirmed based on the asexual morphs, its cultural characteristics, and phylogenetic analyses. This isolate revealed out to be distinct by showing less similarity with described species in the genus Fusarium based on molecular sequence data, approximately 93.9% similarity based on translation elongation factor 1-alpha, and 94.2% similarity based on RNA polymerase II subunit. Furthermore, to increase knowledge about this novel species, whole-genome sequencing was carried out. The results displayed that Fusarium indicum NFCCI 5145 possesses a 40.2 Mb genome and 48.39% of GC content. Approximately 12,963 functional protein-coding genes were carefully predicted and annotated using different BLAST databases, such as Uniprot, Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene Ontology (GO), Pathogen Host Interactions (PHI), Clusters of Orthologous Groups (COG), and Carbohydrate-Active enzymes (CAZy). The orthologous proteins were identified using OrthoFinder and used for the phylogenetic analysis. ANIb confirmed that the isolate is closely related to the F. concolor species complex. It is known that Fusarium strains can produce a wide range of bioactive secondary metabolites. Therefore, in-depth mining for biosynthetic gene clusters for secondary metabolite biosynthesis of Fusarium indicum NFCCI 5145 was investigated using Antibiotics and Secondary Metabolites Analysis Shell (AntiSMASH) annotation. AntiSMASH results displayed that this isolate possesses 45 secondary metabolites of biosynthetic gene clusters (BGCs). These findings significantly improved our understanding of the strain Fusarium indicum NFCCI 5145 and its possible applications in different sectors including industry for the secondary metabolites and enzymes it can produce.


Introduction
The first report on the genus Fusarium was described by Link in 1809 (Link, 1809).Fusarium species are present in almost any ecosystem globally (Leslie and Summerell, 2006).They have been reported from nearly all bioclimatic regions of the world, including tropical and temperate grasslands, shrublands, forests, harsh desert, and alpine environments, soils associated with plants, organic debris, and any part of plants from plants' deepest roots to highest flowers (Leslie and Summerell, 2006).Therefore, Fusarium species can colonize multiple habitats in almost all ecosystems worldwide (Young et al., 1978;Nelson et al., 1994;Arney et al., 1997).Fusarium is one of the most economically significant fungi, mostly known as a pathogen, capable of infecting essential agricultural and horticultural crops worldwide (Booth, 1971; Leslie and Summerell, 2006).Fusarium species are mainly responsible for wilts, blights, root rots, and cankers (Ingle, 2017).Its species can also occur as saprophytes, endophytes, parasites, pathogens of plants, and pathogens or mutualists of animals (Torbati et al., 2021).
Fusarium concolor was first established on Hordeum vulgare from Montevideo, Uruguay (Reinking, 1934) and later was reported from various hosts, such as wheat, banana, koa tree, Guarana, Hybanthus prunifolius, plant debris, and soil (Chambers, 1972;Saadabi, 2006;Murad and Baiti, 2017;Almeida et al., 2021).It has also been known to cause keratitis and fusariosis (Al-Hatmi et al., 2016).The Fusarium concolor species complex currently consists of four species, namely, F. anguioides, F. austroafricanum, F. bambusarum, and F. concolor.The isolates of F. anguioides have been reported from bamboo, Cordyline stricta, and Alocasia odora; Fusarium bambusarum have been reported from bamboo (Nelson et al., 1995;Wang et al., 2022).Fusarium austroafricanum was isolated as an endophyte of Kikuyu grass associated with putative mycotoxicosis of cattle (Jacobs-Venter et al., 2018).In the present study, an endophytic isolate was isolated from Bambusa sp. and was established as a new species of the genus Fusarium, falling into the Fusarium concolor species complex based on morphology and phylogeny.Furthermore, whole-genome sequencing was targeted to get insights into the genomic architecture of this newly discovered Fusarium species.

Collection, isolation, and morphological characterization of endophytic Fusarium indicum
Healthy leaves of Bambusa sp.collected from Panchrukhi, Palampur, District Kangra of Himachal Pradesh, India, on 4 October 2021 were placed in sterile polythene bags and transported carefully to the laboratory.The surface adherents were removed after thorough washing under tap water.Then, bigger leaves were chopped into small pieces and subjected to surface sterilization following a modified method by Dobranic et al. (1995).Concisely, bamboo leaves were first dipped in 70% ethanol for 5 s, followed by 4% sodium hypochlorite for 90 s, and later rinsed with sterile water for 10 s (four times).These surface sterilized leaves were cut into small pieces using a sterilized sharp blade and inoculated on potato dextrose agar (PDA) plates.These plates were kept at 25°C until any vegetative growth appeared from the inoculated tissues.Individual colonies arising from inoculated tissues were transferred to fresh PDA plates by hyphal tipping and allowed to grow (Bills and Polishook, 1992).Furthermore, pure culture was raised using a single spore isolation technique.Colony characteristics of this isolate were studied on PDA and synthetic nutrient agar (SNA).Methuen's Handbook of Color was referred to recording the colors of the colonies on different agar media (Kornerup and Wanscher, 1978).Microscopic structures of the isolates were recorded from pure culture using staining cum-mounting medium, lactophenol cotton blue, under a Carl Zeiss Image Analyzer 2 (Germany) microscope.Measurements and photomicrographs of the fungal structures were recorded using Axiovision Rel 4.8 software and Digi-Cam attached with Carl Zeiss Image Analyzer 2 microscope.The holotype specimen is deposited and accessioned in Ajrekar Mycological Herbarium (AMH 10381), and ex-type pure culture is deposited and accessioned in the National Fungal Culture Collection of India (NFCCI 5145).
PCR was carried out in a 25 μL reaction using 12.5 μL 2x Invitrogen Platinum SuperFi PCR Mastermix, 2 μL template DNA (10-20 ng), 1.5 μL 10 pmol primer, 5 μL 5x GC enhancer, and H 2 O (Sterile Ultra-Pure Water, Sigma, St. Louis, MO, United States), with the total volume made to 25 μL.The conditions of the thermocycling involved those as follows: For tef1-α gene region, an initial denaturation of 5 min at 94°C, 30 cycles of 45 s at 94°C, 30 s at 57°C, and 1 min at 72°C followed by a final 7-min extension at 72°C; 5 min denaturation at 94°C, 35 cycles of 1 min at 94°C, 50 s at 52°C, and 1.2 min at 72°C, with a final 8 min extension at 72°C for LSU; and 5 min denaturation at 95°C, 35 cycles of 45 s at 95°C, 1 min at 52°C, and 1.5 min at 72°C, with a final 10 min extension at 72°C for rpb2.
The PCR amplicons were purified with a FavorPrep™ PCR purification kit as per the manufacturer's instructions.Purified PCR products of all marker genes were checked on 1.2% agarose gel electrophoresis stained with 0.5 μg/mL ethidium bromide and were further subjected to a sequencing PCR using a BigDye ® Terminator v3.1 Cycle Sequencing Kit, as per the manufacturer's instructions.In brief, the sequencing PCR of 20 μL included 4 μL of 5× sequencing buffer, 2 μL of BigDye™ Terminator premix, 4 μL of primer (5 pmol), and 4 μL of the purified amplicon and H 2 O (Sterile Ultra-Pure Water, Sigma), with the volume made to 20 μL.Thermal cycling conditions consisted of an initial denaturing at 96°C for 3 min, followed by 30 cycles of 94°C for 10 s, 50°C for 40 s, and 60°C for 4 min.The BigDye ® terminators and salts were removed using the BigDye Xterminator ®

Phylogenetic analysis
To determine the phylogenetic status of this isolate, tef-1α and rpb2 gene regions were used to compare the present isolate with already known authentic strains in the genus Fusarium.The sequences of the related authentic strains were retrieved from NCBI.A total of 83 isolates of the genus Fusarium were used in the phylogenetic analysis and were aligned along with the sequences of Fusarium indicum NFCCI 5145.Geejayessia zealandica CBS 111.93 and Geejayessia cicatricum CBS 125549 were selected as the outgroup taxa.The strains which were used in making phylogenetic tree, along with their accession numbers and other related details, are presented in Supplementary Table 1.Each gene region was aligned individually with MAFFT v. 6.864b (Katoh and Standley, 2013).The alignments were checked and adjusted manually using AliView (Larsson, 2014).Furthermore, alignments were concatenated and processed for the phylogenetic analyses.The best substitution model was figured using ModelFinder (Kalyaanamoorthy et al., 2017).Additionally, Windows version IQ-tree tool v.1.6.11(Nguyen et al., 2015) was used to reconstruct the phylogenetic tree.The reliability of the tree branches was assessed and tested on the basis of 1,000 ultrafast bootstrap support replicates (UFBoot) and the SH-like approximate likelihood ratio test (SH-like aLRT) with 1,000 replicates.The constructed phylogenetic tree was visualized in FigTree v.1.4.4.

High molecular weight DNA extraction for whole-genome sequencing
In total, 100 mg of the fungal mass was crushed using a mortar-pestle in liquid N 2 .The powder was placed in a 2 mL sterile Eppendorf tube.Overall, 1 mL of pre-heated CTAB buffer [20 mM EDTA, 100 mM Tris HCl, and 1.4 M NaCl and CTAB 2%] was added along with 20 μL of β-mercaptoethanol and 1 mg of polyvinylpyrrolidone (PVP).The mixture was mixed properly and incubated at 65°C ± 2°C for at least 30 min.An equal amount of phenol:chloroform:isoamyl alcohol (25:24:1, v/v) was added and mixed well, and centrifuged at 10,000 rpm for 10 min.The upper aqueous layer was transferred to a fresh 1.5 mL Eppendorf tube and followed by the addition of an equal volume of chloroform:isoamyl alcohol (24:1, v/v) and mixed well and then centrifuged at 10,000 rpm for 10 min.The upper aqueous layer was transferred to a fresh 1.5 mL Eppendorf tube, and an equal volume of isopropanol was added and incubated under cold conditions at −20°C for 20 min.Furthermore, centrifugation was carried out at 10,000 rpm for 10 min at 4°C.The supernatant was removed, and the pellet was carefully washed using 500 μL of 70% ethanol.Again, centrifugation was carried out at 10,000 rpm for 5 min at 4°C.The supernatant was discarded, and the pellet was dried.The pellet was dissolved in 70 μL of 1× TE buffer.In total, 1 μL of RNase A solution (20 mg mL −1 ) was added and later incubated at 37°C for at least 30 min.The integrity was evaluated by 1% agarose gel electrophoresis, and purity was accessed by a NanoDrop™ 1,000 Spectrophotometer (Thermo Fisher Scientific).

Library preparation and sequencing
Library construction was done using the QIASeq FX DNA library preparation protocol (Cat#180475) as per the manufacturer's instructions.In total, 50 ng of Qubit quantified DNA was enzymatically fragmented, end-repaired, and A-tailed in the one-tube reaction using the FX enzyme mix provided in the QIASeq FX DNA kit.The end-repaired and adenylated fragments were subjected to adapter ligation, whereby the index-incorporated Illumina adapter was ligated to generate sequencing libraries.These libraries were subjected to 6 cycles of Indexing PCR [initial denaturation at 98°C for 2 min and cycling (98°C for 20 s, 60°C for 30 s, and 72°C for 30 s) and final extension at 72°C for 1 min] to enrich the adapter-tagged fragments.Finally, for purification of the amplified libraries, JetSeq Beads were used (Bio-68031).Furthermore, quantification of the sequencing library was carried out by a Qubit fluorometer (Thermo Fisher Scientific, MA, United States).The sequencing of the libraries was performed on an Illumina NovaSeq 6000 sequencer (Illumina, San Diego, CA) for 150 bp paired-end chemistry, according to the manufacturer's procedure.

Gene prediction and annotation
Libraries were paired-end sequenced using an Illumina NovaSeq 6000 sequencer.After sequencing, the paired-end raw data were kept in FASTQ format.Fastp (0.20.1) tool was used to remove low-quality reads, adapters, and polyG tails from FastQ files. 1 To get the draft genome sequence, St. Petersburg genome  (Benson, 1999).

Comparative genomics and phylogenetic analysis
Orthologous proteins were identified using OrthoFinder version 2.5.4 (Emms and Kelly, 2019), and the results were used to build a species tree, visualized using FigTree.Fusarium species used for the study of the orthologous proteins and phylogenetic tree construction are presented in Table 1.The analysis of average nucleotide identity (ANI) was performed using the pyani script and ANIb as an algorithm for the alignment (Pritchard et al., 2016).Fusarium species used in the analysis of ANI are presented in Table 2.
Cultural characteristics: -Colonies growing on PDA reached 39 mm after 4 days at 25°C; yellowish white (1A2) to white (1A1), reverse pale yellow (3A3) after 1 week.In the colony, aerial mycelium is produced, colonies lacking distinct odor, colonies slightly raised, cottony, margin smooth and entire.Colonies growing on SNA reached 33 mm after 4 days of incubation at 25°C; grayish white (1B1) to white (1A1) after 1 week.Colony cottony, raised from the center, and flat toward the periphery with entire smooth margins.

Phylogenetic analysis
The sequence alignments of both tef-1α and rpb2 were used to confirm the identity of this isolate.The concatenated file had sequence data of 86 taxa (Supplementary Table 1).Alignment contained 1,671 columns, 777 parsimony-informative, 1,061 distinct patterns, 113 singleton sites, and 781 constant sites.TIM2e + I + G4 was considered to be the best model and was selected based on the Bayesian Information Criterion (BIC).The phylogenetic tree was generated using the maximum likelihood method based on the above-mentioned model.The log-likelihood of the consensus tree was −24373.628.Rate parameters were A-C: 1.95361, A-G: 4.81458, A-T: 1.95361, C-G: 1.00000, C-T: 9.32964, and G-T: 1.00000; base frequencies were A: 0.250, C: 0.250, G: 0.250, and T: 0.250, and the proportion of invariable sites was 0.387 and gamma shape alpha parameter was 1.138.Combined phylogenetic analysis using tef-1α and rpb2 nested this isolate, Fusarium indicum, in a distinct and unique clade in the 5 https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn on date 18.07.2023 Fusarium concolor species complex.The clade was well supported with strong SH-like aLRT and ultrafast bootstrap (UFBoot; Figure 2).

Genome sequencing and assembly of Fusarium indicum NFCCI 5145
The genome sequence of Fusarium indicum NFCCI 5145 was assembled and deposited in the NCBI GenBank database (SRA accession No. PRJNA993417; BioProject PRJNA993417; BioSample SAMN36633317; accession number JAWDHB000000000).The genome diagram of Fusarium indicum NFCCI 5145 shows that there are nine circles in the circle diagram (Figure 3), which are as follows from inside to outside: the first blue line shows in-paralog regions; the second circle shows the GC skew, with the green part showing a positive GC SKEW and the orange part showing a negative GC SKEW; the third circle shows the GC content (%; negative gene; positive gene); the fourth circle shows secondary metabolites; the fifth circle shows ncRNA; the sixth circle shows repeats; the seventh circle and the eighth circle display CDS annotation information.The seventh circle indicates that CDS is in a positive chain, and the eighth circle indicates that CDS is in a negative chain.The outer rim shows the contigs.
The whole genome size of Fusarium indicum NFCCI 5145 was 40.2 Mb.This consisted of 513 scaffolds with an N50 of 0.46 Mb and 48.39% of GC content.A total of 7,884,552 raw reads and 7,847,766 clean and high-quality reads (99.53%) were generated in the Illumina sequencing.Before filtering, total reads were 7.884552 M, total bases were 1.182683 G, Q20 bases were 1.162804 G (98.319170%), Q30 bases were 1.121368 G (94.815622%), and GC content was 48.005470%.After filtering, total reads were 7.847766 M, total bases were 1.176427 G, Q20 bases were 1.158196 G (98.450277%), Q30 bases were 1.117366 G (94.979678%), and GC content was 48.000585%.Reads that passed the filters were 7.847766 M (99.533442%), reads with low quality were 33.424000K (0.423918%), reads with too many N were 2.328000 K (0.029526%), and reads which were too short were 1.034000 K (0.013114%).The total number of scaffolds was 513, the total number of bases (bp) were 40,226,774 (~40.2Mb), the minimum scaffold length (bp) was 300, the maximum scaffold length (bp) was 1,668,425, scaffolds ≥ 200 kbp were 513, scaffolds ≥ 500 kbp were 347, scaffolds ≥ 1 kbp were 246, scaffolds ≥ 10 kbp were 169, scaffolds ≥ 1 Mbp were 6, and N50 value was 461,173.The Augustus prediction method was used to predict the encoding gene.In total, 12,724 protein-coding genes were predicted using highly annotated databases such as Uniprot.Gene total length    (bp) was 6,336,400, gene average length (bp) was 498, maximum gene length (bp) was 8,599, and minimum gene length (bp) was 24.
Simultaneously, the Tandem Repeats Finder Program (trf) was found to predict the number of repeats, which were found to be 3,232.

Genome sequence annotation of
Fusarium indicum NFCCI 5145 using KEGG, COG, and GO For the prediction of the protein sequences, 12,963 non-redundant genes of Fusarium indicum NFCCI 5145 were subjected to a similarity search on the basis of various public databases.Many genes were mapped using the Uniprot database (12,638 genes/97.49%),Clusters of Orthologous Groups (COG; 6,509 genes/50.21%),and Kyoto Encyclopedia of Genes and Genomes (KEGG; 9,155 genes/70.62%).
The findings from KEGG functional classification suggest that the predicted proteins fell under various categories, such as amino acid metabolism (2451), metabolism of cofactors and vitamins (1682), global map (1526), and carbohydrate metabolism (1525; Figure 5; Yamada et al., 2021).The results indicate that there is a varied and enriched array of various metabolic functions present that probably will provide higher secondary metabolism efficacy.
GO annotation depicts varied genes possessed by Fusarium indicum, which may be involved in biological processes, cellular components, and molecular functions (Figure 6; Huntley et al., 2014).

Genome sequence annotation of Fusarium indicum NFCCI 5145 for carbohydrate genes
Carbohydrate-active enzymes (CAZymes) are classes of enzymes that catalyze the breakdown and assembly of glycoconjugates as well as glycans (oligosaccharides and polysaccharides; Garron and Henrissat, 2019).These enzymes play an important role in fungal metabolism as they are accountable for carbohydrate degradation and modification as well as biosynthesis (Liu et al., 2022).The CAZy is a specialized database for carbohydrate enzymes that have the capability to create, modify, and degrade glycosidic bonds (Liu et al., 2022).The analysis showed 1,012 genes that encode for carbohydrate-active enzymes (CAZy)  Functional annotation of Fusarium indicum NFCCI 5145 genes encoding for proteins using Clusters of Orthologous Genes (COG) database.

Genome sequence annotation of Fusarium indicum NFCCI 5145 for pathogen host interactions
The Pathogen Host Interactions Database (PHI-base) is a database that is manually curated by experts based on experimental evidence that consists of genes related to virulence,  Functional annotation of Fusarium indicum NFCCI 5145 genes encoding for proteins using Gene Ontology (GO) analysis.Pernas, 2021).Unaffected pathogenicity and reduced virulence were the major annotation genes, indicating that Fusarium indicum NFCCI 5145 is not a pathogenic strain as expected as it was isolated as an endophyte.
AntiSMASH results revealed the potential of this isolate to produce interesting compounds, such as α-acorenol, oxyjavanicin, chrysogine, equisetin, bikaverin, squalestatin S1, and many other known and unknown secondary metabolites.α-Acorenol is a highly oxygenated sesquiterpene that has antioxidant activity (Elshamy et al., 2019).Oxyjavanicin belongs to the naphthoquinone class of compounds and is similar to Fusarubin (C 15 H 14 O 7 ), a red antibiotic (Ruelius and Gauhe, 1950).Many studies have reported its antituberculosis, cytotoxic, and antimicrobial activities; it can be a beneficial drug for the treatment of asthma as it regulates cytokine balance in OVA-sensitized (Hong et al., 2022).Fusarubin is also known to inhibit proliferation and also increase apoptosis in cell lines derived from hematological cancers (Adorisio et al., 2019).Chrysogine is a yellow pigment known to lack antimicrobial or anticancer activity (Viggiano et al., 2018).Equisetin is more known for its antibiotic and cytotoxic activity; it also inhibits HIV-1 integrase.It can also potentiate antibiotic activity against multidrug-resistant gram-negative bacteria (Zhang et al., 2021).It is known to inhibit bacterial acetyl-CoA carboxylase (ACC), which is the first step of fatty acid synthesis (Larson et al., 2020).Bikaverin belongs to the polyketide class of compounds and is a reddish pigment having many biological properties including antitumoral activity against different cancer cell lines (Limón et al., 2010).Squalestatin S1 is a potent inhibitor of squalene synthase with potential use in the control of cholesterol biosynthesis (Lebe and Cox, 2018).The functions associated with other cryptic BGCs can be characterized with the help of gene knockout and heterogeneous expression experiments, along with the help of LC-MS analysis.

The average nucleotide identity
The average nucleotide identity (ANI) performed on the Fusarium indicum NFCCI 5145 genome provided an overall idea of the sequence identity between the Fusarium strains under comparison with the F. indicum strain NFCCI 5145 as depicted in the heatmap in Figure 10.Fusarium indicum NFCCI 5145 was grouped with F. concolor and F. austroafricanum, thus confirming the clustering of our strain Fusarium indicum NFCCI 5145 within the Fusarium concolor species complex.ANIb analysis included 51 genomes of various species of Fusarium.

Phylogeny based on orthologous proteins
Fusarium indicum strain NFCCI 5145 was grouped in the Fusarium concolor species complex, which presently contains two species, F. concolor and F. austroafricanum.This was based on the phylogenetic tree constructed using the orthologous proteins of   Fusarium strains, figured out using orthofinder.This confirms the clustering of the strain NFCCI 5145 in the Fusarium concolor species complex (Figure 11).

Conclusion
Species of Fusarium produce a variety of secondary metabolites.In this study, a novel endophytic, Fusarium indicum, isolated from Bambusa sp., was obtained and comprehensively examined by gene prediction and annotation.This isolate has many functional genes for energy production and conversion, amino acid and carbohydrate metabolism, secondary metabolites biosynthesis, transport, and catabolism.AntiSMASH analysis showed that it could produce secondary metabolites for drug development.According to the reports, species belonging to Fusarium concolor can produce ligninolytic enzymes, such as lignin peroxidase, laccase, and manganese peroxidase on wheat straw, thus leading to efficient delignification under solidstate fermentation conditions (Li et al., 2008).Species falling under this complex have also been known to treat poplar chemithermomechanical pulp to inhibit light-induced yellowing and enhance the brightness of the pulp (Daolei et al., 2020).Reports also establish the usage of an endophytic Fusarium concolor in synthesizing silver nanoparticles (Almeida et al., 2021).This newly isolated strain may be used for similar applications, such as the pretreatment of lignocellulose materials before bio-pulping or the bioconversion to fuel (Li et al., 2008).Nowadays, nanoparticles are used for the controlled release of pesticides or nanocides and the production of nanofertilizers.This isolate can be used to produce nanoparticles that may have great applicability in agriculture, biotechnology, and medicine.Phylogenetic analysis of 53 Fusarium strains based on the orthologous proteins identified using OrthoFinder.The new species is represented with blue bold and the Fusarium concolor species complex is marked with a blue rectangular box.

FIGURE 2
FIGURE 2 Molecular phylogenetic analysis of new species of Fusarium indicum (NFCCI 5145) based on maximum-likelihood (ML) method using both tef-1α and rpb2 sequence data.New species Fusarium indicum (NFCCI 5145) is shown in blue.Statistical support values are shown next to each node, UFBS values and SH-aLRT obtained from 1,000 replicates using IQ-TREE and the TIM2e + I + G4 model.

FIGURE 7
FIGURE 7Carbohydrate-active enzymes (CAZy) functional classification and corresponding genes present in the genome of Fusarium indicum NFCCI 5145.

FIGURE 8
FIGURE 8Distribution map of mutation types in the pathogen PHI phenotype of Fusarium indicum NFCCI 5145.

FIGURE 10
FIGURE 10Heatmap of ANIb percentage identity between the Fusarium strains under comparison with the Fusarium indicum strain NFCCI 5145.ANIb was for all 51 genomes calculated based on genome sequences.

TABLE 2
Details of Fusarium species used for the calculation of genome-scale average nucleotide identity with Fusarium indicum NFCCI 5145.