Comparative Genomics of Spatholobus suberectus and Insight Into Flavonoid Biosynthesis

Spatholobus suberectus Dunn (S. suberectus), has been widely used in traditional medicines plant source of the Leguminosae family. Its vine stem of which plays an important role in the prevention and treatment of various diseases because it contains various flavonoids. Comparative genome analysis suggested well-conserved genomic components and genetic collinearity between the genome of S. suberectus and other genera of Leguminosae such as Glycine max. We discovered two whole genome duplications (WGD) events in S. suberectus and G. max lineage underwent a WGD after speciation from S. suberectus. The determination of expansion and contractions of orthologous gene families revealed 1,001 expanded gene families and 3,649 contracted gene families in the S. suberectus lineage. Comparing to the model plants, many novel flavonoid biosynthesis-related genes were predicted in the genome of S. suberectus, and the expression patterns of these genes in the roots are similar to those in the stems [such as the isoflavone synthase (IFS) genes]. The expansion of IFS from a single copy in the Leguminosae ancestor to four copies in S. suberectus, will accelerate the biosynthesis of flavonoids. MYB genes are widely involved in plant flavonoid biosynthesis and the most abundant member of the TF family in S. suberectus. Activated retrotransponson positive regulates the accumulation of flavonoid in S. suberectus by introducing the cis-elements of tissue-specific expressed MYBs. Our study not only provides significant insight into the evolution of specific flavonoid biosynthetic pathways in S. suberectus, but also would facilitate the development of tools for enhancing bioactive productivity by metabolic engineering in microbes or by molecular breeding for alleviating resource shortage of S. suberectus.


INTRODUCTION
Spatholobus suberectus Dunn (S. suberectus) is an Leguminosae popularly used in Chinese Tradi tional Medicine. Pharmacological and clinical studies have demonstrated that the dried stems of S. suberectus (Figure 1) exhibit various functions and flavonoids are the main bioactive components (Wang et al., 2011;Zhou et al., 2017). Four flavonoid compounds have been found to have important pharmacological activities, among which formononetin, genistein, and isoliquiritigenin are effective in cancer prevention or therapy (Wang et al., 2011;Peng et al., 2016) and catechin can promote the proliferation of hematopoietic progenitor cells (Wang et al., 2008). S. suberectus is therefore widely used in patented Chinese medicines, and the market demand for the wild resource is increasing rapidly. In addition, the crud drug of S. suberectus must grow for more than 7 years before it can be used in medicine. Owing to its long growth cycle and increased use in medicines, the wild resources of S. suberectus in China are on the verge of extinction.
Comparative genome analysis is an effective means of investigating the evolution and identifying functional elements of S. suberectus genome. Based on the reported genome sequence of S. suberectus with 798 Mb in size (Qin et al., 2019), we compared it with the genomes of other reported leguminous plants, such as Lotus japonicus (Sato et al., 2008), Glycine max (Schmutz et al., 2010), Medicago truncatula (Young et al., 2011), Glycyrrhiza uralensis (Mochida et al., 2017), and Cicer arietinum (Gupta et al., 2017), to determine how genomes came to be and for the study of evolution. Whole-genome duplication (WGD) events, which create additional copies of the entire genomes in organisms, have a considerable influence on plant evolution and speciation.
Although many genes involved in the flavonoids biosynthesis have been identified (Bowerman et al., 2012;Saito et al., 2013), the overall genetic information of the flavonoids involved in biosynthetic pathways of S. suberectus remains lacking. Transcription factors, which have great values in flavonoid biosynthesis, have not been studied for S. suberectus. The expansion of some genes are likely related to its abundance of compounds and their expression are just correlate with content of the main bioactive components in this study. These results will be value for biosynthesis studies seeking to the rapid accumulation of bioactive components by metabolic engineering in microbes or by molecular breeding for alleviating resource shortage of S. suberectus.

Plant Materials
S. suberectus plants were grown in Guangxi Botanical Garden of Medicinal Plants (22°51'28" N, 108°22'2" E). Plant tissues, including roots, stems, leaves, flowers, and fruits from 8-yearsold plants were collected. Each plant tissue had three biological repetitions. Each sample was randomly collected from five plants and divided into three groups for different purposes: quickfrozen samples, used for RNA isolation; dried samples, used for total flavonoid content determination; and freeze-dried samples, used for bioactive compounds content measurement. acids and retained the longest protein with alternative splicing variations, then using the BLASTP (E value < le−7) to blast the filtered proteins and clustered into orthologous groups using OrthoMCL with the inflation parameter at 1.5 (Li et al., 2003). One protein per species in a cluster was clustered into the singlecopy orthologues, which were used for MUSCLE alignment and generated the phylogenetic tree via the maximum likelihood method (Edgar, 2004). The divergence time of each species was generated via the PAML MCMCtree (http://abacus.gene.ucl.ac. uk/software/paml.html) with the two corrected divergence time point, A. thaliana vs. G. max: 97~109 Mya; A. thaliana vs. Salvia miltiorrhiza: 110~124 Mya, from the TimeTree (http://http:// www.timetree.org). The gene families' expansion or contraction analysis were conducted by the CAFÉsoftware. The whole genome duplication (WGD) analysis was performed by the MCscanX software with default parameters.

Transcriptome Library Preparation and Gene Expression Analysis
Three technical replicates for each sample and prepared for RNA isolation and transcriptome library construction. Total RNAs were extracted using TRIzol ® Reagent (Thermo Fisher Scientific, USA) according to the manufacturer's instructions, while the RNA-seq libraries were constructed using the NEBNext Ultra Directional RNA Library Prep Kit (NEB, USA). The raw data were then filtered according the standard quality control (QC) method. The retained pared end reads were then mapped to the S. suberectus reference genome using HISAT2 (http://ccb.jhu. edu/software/hisat2/index.shtml). The total numbers of aligned reads were normalized by gene length and sequencing depth for an accurate estimation of expression level and then used the Reads Per Kilobase per Million mapped reads (RPKM) to represent the expression level of each gene for further calculation of the differential expression gene (DEG) using Deseq2 software (https://bioconductor.org/packages/release/ bioc/html/DESeq2.html).
We used these normalized read counts (RPKM) as the expression level for each gene.
Based on log10 (RPKM+1) values, we used the R software (version 3.3.1) to plot the heatmap graph with package heatmap.
Because of the IFS gene were treated as repeat sequences be masked, the pipeline determined IFS protein sequences from the originally predicted coding sequence by gene wise with protein sequences of G. max as reference, combine with the AUGUSTUS software to predict the structure. For phylogenetic analyses, the total IFS proteins from five Leguminosae plants (including S. suberectus, G. max, L. japonicus, G. uralensis, and C. arietinum) were subject to do multiple alignments using MUSCLE. Neighbor-joining tree was built using TreeBeST with the Jones-Taylor-Thornton (JTT) model and 100 bootstrap replicates using results of multiple alignments. We analyzed structure of the IFS genes among five Leguminosae plants and investigated the position relationship between the repetitive element and the IFS genes.

Total Flavonoid Content
Determination of total flavonoid content (TFC): The TFC was carried out according to Chen et al. (2016). Dry samples (0.3 g) and sonicated in 25 ml of 50% ethanol at a ratio of 1:20 (w/v) for 1 h using a SB-800 DTD sonicator (Ningbo Xinzhi Biotechnology Co., Ltd, Ningbo, China; power: 100 W; frequency: 40 kHz). In this method, rutin was used as standard and flavonoid contents were measured as rutin equivalent. For this purpose, the calibration curve of rutin was drawn. One milliliter of standard or extract solution (0.5, 1.0, 1.5, 2.0, 2.5, 3.0 ml) was taken into 25 ml volumetric flask, and 1 ml of 5% NaNO 2 added to the flask. After 5 min, 1 ml 10% AlNO 3 was added to the mixture. At the 5th min add 10 ml of 4% NaOH was added and volume made up to 25 ml with 50% ethanol. The absorbance was noted at 505 nm using UV-Visible spectrophotometer.

Ultraperformance Liquid Chromatography-Electrospray Ionization-Mass Spectrometry/Mass Spectrometry Analysis
Each standard compound was accurately weighed, and then dissolved in methanol-water (80:20, v/v) solvent to a final diluted stock solutions of 100 mg/ml. Working standard solutions containing five reference standards were prepared by diluting the stock solutions with methanol-water (80:20, v/v) solvent to produce the standard curves. The solutions were stored at 4°C for further analysis.
All the freeze-dried samples were cut into smaller pieces, further grounded into powder. Each sample powder (0.05 g) was weighed accurately soaked in 1.2 ml of extracting solution (80% methanol with 0.01 mol/L butylated hydroxytoluene (BHT) and 0.1% formic acid). The mixture was vortexed for 10 s and ground for 2 min, followed by ultrasonic extraction for 2 h and centrifuging at 12,000 rpm for 10 min. The precipitation was extracted again with ultrasonic and centrifuging. All the supernatants were vacuum freeze-dried and then diluted with 100 µl 80% methanol for UPLC-ESI-MS/MS analysis.

Expression Network Construction
Hierarchical cluster analyses were separately performed for the PC genes [mean fragments per kilobase of transcript per million mapped reads (FPKM) ≥ 2] using the OmicShare tools (www.omicshare. com/tools). WGCNA (v1.47) was used to construct the unsigned coexpression networks based on the transcript expression matrix. A step-by-step network construction and module detection method were adopted using the "cutreeDynamic" and "mergeCloseModules" with the following parameters: the power was 13; the minModuleSize was 30; the cutHeight was 0.25. We investigated the relationships between the transcripts in the modules and the samples, and the important modules that were significantly associated with the content of flavonoid, formononetin, isoliquiritigenin, genistein, and catechin. To understand the biological functions of the modules, the genes in the modules were subjected to GO enrichment analysis. Finally, the co-expression network was visualized by Cytoscape (v3.5.0) software.

Yeast One-Hybrid Assays
One-hybrid system in YM4271 yeast strain was used to test the binding ability of MYBs to the promoter of DFR, LAR, IFSs. Promoter fragments were inserted into pLacZi as reporters. MYBs were expressed in the yeast cells with pGADT7-AD.
Total DNA was isolated from fresh young leaves of 8-year-old S. suberectus using the Plant DNA Kit (TIANGEN) according to the manufacturer's instructions. The extraction of messenger RNA (mRNA) was performed with the Oligotex mRNA Mini Kit (Qiagen). Long-distance PCR (LD-PCR) was executed by the PCR cDNA Synthesis Kit (SMART). The promoters cloned from the DNA template and the full-length cDNA of two MYBs were cloned from the cDNA library. All the detail of these sequences were provided in the Supplemental Data S1. The primers were listed in Supplemental Table S5.

Comparative Genome Analysis and Divergence Time Estimation
To investigate the evolution of S. suberectus genome, we compared it with the genome of seven other sequenced plant species (Figure 2A), namely, G. max, L. japonicus, G. uralensis, Cicer arietinum, M. truncatula, and Cajanus cajan, which are Leguminosae plants, and Arabidopsis thaliana as an outgroup. A total of 24,523 (77.5%) S. suberectus genes were clustered into four groups and included 853 unique genes, 6,253 single-copy orthologs. Overall, 108 single copy genes that were shared among eight angiosperm plants (Supplemental Table S1). A total of 361 gene families, consisting of 853 genes, were unique to S. suberectus ( Figure 2B). The accuracy of these results were further validated the accuracy by the phylogenetic analysis. A total of 1,073 single-copy orthologs were obtained from eight species, indicating that the Leguminosae plants can be divided into galegoid (M. truncatula, five genes unique to G. uralensis and C. arietinum) and Millettioid (G. max, C. cajan, and S. suberectus) clades. G. max is much closer to C. cajan than S. suberectus within the Leguminosae family. We estimated the divergence times of S. suberectus from the other plants, and the results suggested that galegoid clade diverged from the Millettioid clade approximately 30.8 million years ago, and the divergence of G. max-C. cajan common ancestor and S. suberectus occurred approximately 18 million years ago (Supplementary Figure 1).

Whole-Genome Duplication and Gene Family Expansion Analysis
Whole-genome duplication (WGD) events are common in plant genomes and have a significant role in plant evolution and speciation. To investigate WGDs in the S. suberectus lineage, we identified the syntenic regions across the S. suberectus, G. max, and G. uralensis genomes. Each region consists of at least five collinear homologous genes. Two ancient whole-genome duplication (WGD) events occurred in these three species: g event (all core eudicots share an ancient WGD, 4dtv [transversion substitutions at fourfold degenerate sites) =0.6] and a WGD incident shared by Leguminosae plant (4dtv=0.25), suggesting they occurred prior to their divergence. A third WGD event that occurred in G. max might have contribute to the divergence of S. suberectus and G. max ( Figure 3A). The presence of 1,409 syntenic blocks between S. suberectus and G. max are present in the multiple copies in G. max strongly suggests that the G. max lineage underwent a WGD after speciation from S. suberectus ( Figure 3B). The expansion and contractions of orthologous gene families were determined, and the result revealed 1,001 expanded gene families and 3,649 contracted gene families in the S. suberectus lineage ( Figure 3C).

Metabolic of Bioactive Flavonoid and Gene Families Involved in Flavonoid Biosynthesis
We investigated the metabolic processes in various tissues of S. suberectus (Figures 4A-E). Flavonoids can be detected in all the tissues of S. suberectus. The quantitative metabolite profiles of total flavonoid compounds showed that the stem had the highest accumulation (up to 2.3%), root and flower tissues had moderate accumulation (~1%), and the other two tissues had minimum accumulations ( Figure 4A). Formononetin and isoliquiritigenin are widely synthesized in the roots ( Figures 4E  and 5C). Genistein can be detected in other tissues and is mainly synthesized in the fruit ( Figure 4C). As a medicinal ingredient, the content of catechins in the stems of S. suberectus is more abundant than other flavonoids (Li et al., 2017) and may be the reason that the stem had the highest amount of flavonoids ( Figure 4B).
In this study, through comparative analysis with homologous genes of Arabidopsis, soybean, and other reported plants, we have searched 14 gene families involved in flavonoid synthesis in the S. suberectus genome (Supplemental Tables S2 and S3). The detailed biosynthetic pathways of the flavonoid in various tissues are shown in Figure 4F. Based on their expression levels in five tissues, all of them had at least one highly expressed member in all the tissues ( Figure 4F and Supplementary Table S2). Notably, the number of PAL, 4CL, and CHS homologs in S. suberectus was dramatically decreased relative to G. max (Supplementary Table S2).
The different expression trends of genes in different tissues may determine the difference in the content of flavonoids in these tissues. The high flavonoid content in S. suberectus tissues is likely due to the constant and high expression of flavonoid biosynthesis-related genes. IFS catalyzes the oxidation of naringenin to genistein and plays an important role in the biosynthesis of formononetin. Gene expression profiling and RNA-seq data mining showed that IFS (chr5.1661; chr5.1664; chr5.1665) copies have maintained their transcriptional activity in root and fruit tissues, and IFS (chr5.1660) is highly expressed in the root. These results indicated that organ-specific expression patterns are similar to those observed in different formononetin and genistein synthesis pathways in different tissues ( Figures  4C, E).

Differential Expression Networks of the Biosynthesis and Metabolism of Kinds of Flavonoid
Flavonoid content in percentage varies among the stems, leaves, and other tissues. For the exploration of genes related to the biosynthesis and metabolism of different flavonoids, a weighted gene co-expression network analysis (WGCNA) was performed with the RNA-Seq and the content data, 14 distinct modules were obtained and shown in the dendrogram ( Figure 5A). The modules were labeled with different colors and correlated with the content of flavonoid, and formononetin, isoliquiritigenin, genistein, and catechin were of particular interest in our study. We found the "greenyellow" and "red," "turquoise," "green," "yellow" and "brown," "red" modules are significantly associated with the content of flavonoid, formononetin, isoliquiritigenin, genistein, and catechin respectively ( Figure 5B). It is worth noting that genes in the "greenyellow" and "red" modules were highly expressed in the stem, the module "turquoise" comprised transcripts that were highly expressed in the root, while genes which related to the content of genistein were highly expressed in the fruits, and parts of genes were also highly expressed in the roots (such as the IFS). These results were merely coincidental with the genistein biosynthetic pathways of various tissues in S. suberectus in Figure 4C. And the gene oncology (GO) classification of genes related to the content of genistein primarily showed an involved in the aromatic compound (genistein contains an aromatic A ring) biosynthetic process (such as PAL1, Chr7.1096/Chr7.1095), the metabolic of nucleic acid, and a certain amount of genes were nuclear localization transcription factors (such as WRKY, Chr8.1000; bHLH, Chr6.3388; MYB, Chr8.323/Chr9.1209 et al.) ( Figure 5D). All these results showed that IFS plays an important role in the biosynthesis of flavonoids.

Gene Expansion Involved in Flavonoid Biosynthesis
To further explain how IFS affects the content of flavonoid scientifically, the expansion of the gene families involved in flavonoid biosynthesis were analysis. Interestingly, IFS expanded compared with other Leguminosae species (red star, four copies in S. suberectus, two copies in G. max, C. arietinum and L. japonicus, and one copy in G. uralensis; Figure 6A). Studies on Leguminosae showed that IFSs are the key enzyme for the biosynthesis of genistein ( Figure 6B). To investigate potential mechanisms of the expansion of IFS from a single copy in the Leguminosae ancestor to four copies in S. suberectus, we performed phylogenetic analysis in IFS in Leguminosae ( Figure 6C). The four S. suberectus IFSs were in close proximity on the same chromosome and presented on two separate chromosomes in G. max. In addition, IFS had only one copy in G. uralensis ( Figure 6C). This result suggests that IFS initially expanded in the G. max lineage through a largescale genomic duplication event (such as a WGD). Nevertheless, the different copies of IFS in S. suberectus, C. arietinum, and L. japonicus underwent unequal tandem duplication events. Microsynteny analysis provided clear evidence that IFS genes in Leguminosae showed regional synteny to each other ( Figure 6D). All the results demonstrated that IFSs in Leguminosae evolved by lineage-specific whole-genome and tandem duplications. We found that they are difference in the presence of retrotransposon in the position of the IFS homologue in Leguminosae ( Figure 6D). Copia at the end of the second exon. L. japonicus and S. suberectus IFS genes had no internal insertion by repetitive element, but a large number of repetitive element distributions were observed before (the promoter regions) or after these genes ( Figure 6D). Qin et al.

Identification of the Upstream Regulatory Transcription Factors of Flavonoid Biosynthesis
Given that a certain number of genes correlated with the content of flavonoid are transcription factors, motif discovery with the promoter regions (from −2,000 to 0 bp to the ATG) of genes participating in the flavonoid biosynthesis (163 genes) were performed using MEME-ChIP. Except the TATA-box (ATATATAT, E-value=3.6-012) and CAAT-box (CAAT, E-value=2.7-009), which cis-acting sequence elements are generally found upstream of the ATG, the binding sites of MYB TF families (CAACA/CG/A) showed the highest occurrences frequency with the E-value of 5.5e−007. The promoter region of IFS (Chr5.1660) had only one ABRE site (cis-acting element involved in the abscisic acid responsiveness), three AuxRR-core sites (cis-acting regulatory element involved in auxin responsiveness), and six MYB TFs binding sites ( Figure  6E). Five of MYB TF binding sites were obtained by the insertion of DNA/MULE-MuDR. On the other hand, all the MYB TFs in the "yellow" and "brown" modules were specific highly expressed in the stem ( Figure 6E), that were also coincidence with the stem has the high percentage of total flavonoid in the stem ( Figure  4A). In general, we preliminarily speculated the activated retrotransponson positive regulate the accumulation of flavonoid in S. suberectus by introducing the cis-elements of TFs specifically expressed in the stem (such as MYBs).
To further identify the crucial MYB TFs and the candidate downstream genes of them, two MYB TFs (Chr6.2653 and Chr8.494) ( Figure 7A), whose expression (based on the RNAseq analysis results) were significantly correlated with the content of flavonoids or catechin respectively, were cloned and recombined in the pGADT7-AD as the effectors. Meanwhile, one DFR (Chr5.129) and one LAR (Chr2.1366) whose expression were highly correlated with the content of flavonoids, two IFS (Chr5.1661 and Chr5.1665) whose expression were highly correlated with the content of catechin (all of these four genes with higher absolute expression intensity than other homologous genes ( Figure 7A) and with the binding motifs of the MYB TFs in their promoter regions (Supplemental Data S1)) were cloned and recombined into the pLacZi. A one-hybrid yeast assay in YM4271 strain proved that AD-MYB_Chr6.2653 interacted with pLacZi-DFR_Chr5.129/LAR_Chr2.1366, and AD-MYB_ Chr8.494 interacted with pLacZi-IFS_Chr5.1661/Chr5.1665 ( Figure 7B), which identified these two MYB TFs respectively recognize the promoter regions of their candidate target genes in vitro and MYB_Chr6.2653/Chr8.494 played roles in the biosynthesis of flavonoids/catechin by regulating the expression of key genes in the synthetic pathway.

The Revolution of Spatholobus suberectus and the Genome Expansion
Our previous study provided a high-quality genome sequence for S. suberectus (Qin et al., 2019). It is worth noting that the genome size of S. suberectus (748 Mb) is smaller than the Glycine max (1.1 G) but bigger than the G. uralensis (379 M) (Mochida et al., 2017). Basing on the genomic data published on National Center for Biotechnology Information (NCBI), we discovered three WGD events in G. max ( Figure 3A, yellow line), WGD analysis also identified S. suberectus underwent two WGD events ( Figure 3A, purple line) and the recent WGD event happened before the latest WGD event in G. max. Meanwhile, 47.82% of the S. suberectus genome was occupied by the repetitive elements and 17.32% of the genome the long terminal repeat (LTR), both of them were lower than that of G. max genome (53.9% were repetitive elements, 34.1% were LTR) (Qin et al., 2019;Xie et al., 2019). G. uralensis also underwent two WGD events ( Figure 3A, green line), and the proportion of transposable elements and unclassified repeats was only 36.48% in G. uralensis (Mochida et al., 2017). Taken together, these evidences showed the transposable element (TE) insertion resulted in the larger genome size of S. suberectus than G. uralensis, on the other hand, both of the WGD and the expansion of the repetitive sequence (especially the LTR) were the great contributors to the genome expansion of G. max.

Correspondence Analysis Between the Flavonoids Content and the Key Gene Expression Profiles
To further identified the crucial genes for the biosynthesis of flavonoids, the RNA-seq results analyses were combined with the content of flavonoids. Formononetin was specific accumulated at the root, that was just correlated with the root specific expression profiles of CHI_Chr7.105, IFS_Chr5.1660, OMT_Chr6.3726/ 4.60/4.59/4.3073, and HID_Chr2.3174/2.3175. The content of genistein was highly correlated with the root and fruit specific expressed IFSs (Chr5.1661/5.1004/5.1665). But, one worth noting thing was the content of isoliquiritigenin showed no clear correlation with the expression of its upstream synthesis regulatory genes such as the CHS (high expressed in the leaf and flower) and CHR (high expressed in the stem and fruit). That might have related to isoliquiritigenin being an intermediate, which content was not only determined by the expression of upstream biosynthesis regulatory genes, but also the metabolism of intermediate. About the accumulation of catechin, DFR_ Chr5.129 showed the highest absolute expression intensity at the stem, that was just coincidence with the highest content of catechin in the stem. DFR_Ch7.2986 and LAR_Chr7.1162 showed on low expression profile at all the tissues and DFR_Chr5.130 and LAR_Chr2.1366 highly expressed in all the tissues excepted the flower. It can be speculated that DFR_Chr5.129 is a dominant gene involved in the biosynthesis of catechin than other homologues genes, and DFR_Chr5.130 and LAR_Chr2.1366 might play important roles in the vegetative development.
The Expansion of Isoflavone Synthase Accelerated the Biosynthesis of Flavonoids in Spatholobus suberectus S. suberectus has been widely used as the traditional medicines plant source because it contained various flavonoids, which was higher than other Leguminosae plants. In particular, stems and roots of S. suberectus were choosed into medicine to treat anemia in the minority areas of China, it may be relevant to flavonoids as the main bioactive components with the similar gene expression pattern in these two tissues. Gene tandem duplication was one of the most intriguing features of the S. suberectus genome. Gene tandem duplication contributed to the development and environment adaption of different plants, such as the Leguminous plants nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes were duplicated to improve the tolerance to kinds of diseases in the soybean (Shao et al., 2014) and most TCP, cytochrome P450, and NB-ARC domain gene families were derived from tandem duplication events in the genome of Antirrhinum majus L. to affect the development of flowers and abiotic stress response . With the high-quality genome sequence for S. suberectus, a series of flavonoid-related genes were identified in S. suberectus genome compared to model plants. IFS was the key enzyme for the biosynthesis of genistein and 2,7,4'-trihydroxyisoflavanone (the substrate of formononetin) ( Figure 4F), and these IFSs expression patterns were just highly coincidence with the consent of genistein in roots and fruits ( Figure 4C), and formononetin in roots ( Figure 4E). We Promoter fragments of DFR, LAR, and IFSs were connected to pLacZi and transformed into YM4271 strain harboring GAL4-AD-MYBs. The b-galactosidase activity was validated using X-gal staining. All bars represent means± s.d, and three biological replicates in the experiment. Significant differences (Student's t-test) at P < 0.01 (**). identified four tandemly duplicated IFS genes ( Figure 6C) with high sequence similarity and further synteny block assay indicated these four IFSs are tandemly located in Chr 5 ( Figure 6D) and with high sequence similarity. Collectively, IFSs, the biosynthesis pathway of genistein and formononetin and generated by the tandem duplication, might strengthen the active control of the accumulation of kinds of flavonoids in the stem and root of S. suberectus.

The Crucial Roles of MYB TFs in the Flavonoids Biosynthesis Regulation
TFs play a major role in regulating the genes expression in plant secondary metabolism, and the overexpression of TFs regulates the expression of multiple genes in biosynthetic pathways. Notably, MYB TFs, which is proved to be widely involved in plant phenylpropanoid metabolic pathways and plays an important regulatory role in plant flavonoid biosynthesis in various species such as peach and buckwheat (Matsui et al., 2018;Cao et al., 2019), were the most abundant TF family in S. suberectus. PbMYB12b could activate other genes involved in flavonoid metabolism and promote flavonoid synthesis (Zhai et al., 2019). In our study, over 70% genes, which participate in the flavonoid biosynthesis, had the MYB binding sites in their promoter regions. This results verified the intermediate regulatory roles of MYB TFs in the flavonoid biosynthesis pathway. Further, the cis-acting elements which recognized by stem-express MYB TFs were introduced by the activated retrotransponson: MYB_Chr6.2653 could recognize promoter regions of DFR_Chr5.129 and LAR_Chr2.1366, MYB_Chr8.494 could bind to promoter regions of IFS_Chr5.1661 and Chr5.1665 in the yeast. The expression of these two MYBs were correlated with the content of catechin or genistein respectively, and had higher absolute expression intensity than other homologous genes in the stem or root. All these results demonstrated the crucial roles of MYB TFs in the flavonoids biosynthesis regulation at the molecular level.
In summary, our study provided abundant valuable information on the genomic resources of S. suberectus, one of most important Chinese medicine plant. The genome and transcriptome data we provided here should be valuable to both fundamental biological research and breeding research. Therefore, on the basis of this study, the germplasm resources of S. suberectus will be further optimized for the alleviation of resources shortage.