Identification of Transcription Factor Genes and Functional Characterization of PlMYB1 From Pueraria lobata

Kudzu, Pueraria lobata, is a traditional Chinese food and medicinal herb that has been commonly used since ancient times. Kudzu roots are rich sources of isoflavonoids, e.g., puerarin, with beneficial effects on human health. To gain global information on the isoflavonoid biosynthetic regulation network in kudzu, de novo transcriptome sequencings were performed using two genotypes of kudzu with and without puerarin accumulation in roots. RNAseq data showed that the genes of the isoflavonoid biosynthetic pathway were significantly represented in the upregulated genes in the kudzu with puerarin. To discover regulatory genes, 105, 112, and 143 genes encoding MYB, bHLH, and WD40 transcription regulators were identified and classified, respectively. Among them, three MYB, four bHLHs, and one WD40 gene were found to be highly identical to their orthologs involved in flavonoid biosynthesis in other plants. Notably, the expression profiles of PlMYB1, PlHLH3-4, and PlWD40-1 genes were closely correlated with isoflavonoid accumulation profiles in different tissues and cell cultures of kudzu. Over-expression of PlMYB1 in Arabidopsis thaliana significantly increased the accumulation of anthocyanins in leaves and proanthocyanidins in seeds, by activating AtDFR, AtANR, and AtANS genes. Our study provided valuable comparative transcriptome information for further identification of regulatory or structural genes involved in the isoflavonoid pathway in P. lobata, as well as for bioengineering of bioactive isoflavonoid compounds.


INTRODUCTION
Pueraria lobata, commonly known as kudzu, belongs to the Leguminosae family. The roots of kudzu are enriched in starch, which has traditionally been used as a source of food consumption and beverage production in East Asia. Additionally, kudzu has been used for centuries in Chinese traditional medicine as an antipyretic, antidiarrheic, and antiemetic agent (Keung and Vallee, 1998;Wong et al., 2011). Kudzu roots are rich resources of natural product isoflavonoids, including daidzein, genistein, formononetin, and puerarin (also called daidzein 8-C-glycoside) (Prasain et al., 2003;Li et al., 2014). Among these isoflavonoids, puerarin is the major effective ingredient with antioxidative, antidiabetic, and antithrombotic effects (Hien et al., 2010;Liu et al., 2013), and it could also help to cure non-alcoholic fatty liver diseases, alcohol-induced adipogenesis, and osteonecrosis (Xia et al., 2013). The beneficial therapeutic effects of isoflavonoids, in particular puerarin, have made P. lobata an interesting plant species in investigating isoflavonoid biosynthesis and regulation.
Isoflavonoids are almost exclusively limited to the family of Leguminosae, the biosynthesis of which share the common upstream pathway with flavonoids (Supplementary Figure 1). Three molecules of malony-CoA were condensed with one molecule of 4-coumaroyl-CoA to form naringenin-chalcone or isoliquiritigenin, under the catalysis of chalcone synthase (CHS) or chalcone synthase/chalcone reductase (CHR). Chalcone isomerase (CHI) then catalyzes the following reaction to form naringenin and liquiritigenin, respectively. Subsequently, these two products were converted to daizein and genistein by the sequential actions of isoflavone synthase (IFS) and hydroxyflavanone dehydratase (HID). Finally, UDPglycosyltransferase (UGT) could add glucose moiety to these two aglycone intermediates to form different O-glucosides at C7 or C5 position (He et al., 2011). Puerarin has been revealed to be synthesized via daidzein (He et al., 2011;Li et al., 2014;Wang et al., 2017). Although several structural genes, such as 4-Coumarate: Coenzyme A ligases, and UGTs involved in the isoflavonoid pathway have been identified in kudzu (He et al., 2011;Li et al., 2014;Wang et al., 2017), transcription factors were less identified for the regulation of isoflavonoid biosynthesis in kudzu.
MYB proteins, R2R3-MYBs in particular, are major players as the positive or negative regulators toward key biosynthetic genes required for the production of flavonoids (Jiu et al., 2021;Shi et al., 2021). In Arabidopsis, flavonoid biosynthesis is mainly regulated by a set of R2R3-MYB transcription factors. Transparent Testa 2 (AtTT2), Production of Anthocyanin Pigments 1 (AtPAP1), and AtMYB11 could activate flavonoid biosynthesis, whereas AtMYBL2 and others repressed the biosynthesis of flavonoids (Xu et al., 2014).
The first bHLH (the basic helix-loop-helix) protein Lc was reported from maize, which cooperates with the MYB transcription factors C1 and PL1 to regulate the anthocyanin biosynthetic pathway in maize (Ludwig et al., 1989). Several corresponding orthologs of the Lc gene have been identified in other plant species (Nesi et al., 2000;Ramsay et al., 2003;Park et al., 2007;Montefiori et al., 2015). The bHLH type protein AtTT8 from Arabidopsis is required for normal expression of late flavonoid biosynthetic genes in the siliques of Arabidopsis (Nesi et al., 2000), which interacts directly with MYB transcription factors for flavonoid biosynthesis.
The biosynthesis of flavonoids/isoflavonoids has been extensively studied in model plants, but it was left behind in nonmodel plants due to the lack of genomic and genetic information. The current fast-growing RNA-Seq sequencing technique makes it possible to take the advantage of global gene expression data in some species without available genomic information. In this study, we integrated comparative transcriptome information from two kudzu genotypes with contrasting isoflavonoid concentrations in roots and identified a set of putative genes that are possibly involved in isoflavonoid production in kudzu roots, with an emphasis on transcription factors (MYB,bHLH,and WD40). Moreover, our investigation demonstrated that PlMYB1 was a critical transcription factor involved in the isoflavonoid pathway in P. lobata. These findings, thus, provided insights into the regulation network of isoflavonoid biosynthesis in kudzu and offered an important target for bioengineering of bioactive isoflavonoids with beneficial effects on human health.

Transcriptome Sequencing and Gene Annotation
In a previous report, it was revealed that a kudzu genotype (No. 1, Figure 1A) accumulates a massive amount of puerarin in the roots, whereas the other genotype did not (No. 2, Figure 1B; He et al., 2011). In the present study, we further carried out transcriptome sequencing using the roots of these two types of kudzu plants, attempting to obtain global transcriptome information and to discover new genes involved in the regulation of isoflavonoid biosynthesis by transcriptome comparison.
After sequence cleaning and assembling by using Trinity (trinityrnaseq_r2013-02-25) program (Grabherr et al., 2011), a total of 88,398 unigenes were obtained from these two kudzu plants, and the number of unigenes decreased with the length ranging from short (201 bp) to long (more than 3 kb) (Supplementary Figure 2 and Supplementary Table 1). Among them, 27,515 and 25,175 unigenes were detected solely in Nos. 1 and 2 kudzu, respectively. For the genes expressed in both samples, the reads per kilobases per million reads (RPKM) value of 22, 389 unigenes changed less than twofold between the samples, whereas 5,938 and 7,373 unigenes increased or decreased significantly more than twofold in roots of No. 1 kudzu than in No. 2 kudzu (Supplementary Table 1).
The function of 14,896 significantly upregulated unigenes in roots of the No. 1 sample was further annotated and classed into 25 groups based on the Cluster of Orthologous Groups of proteins (COG) database. Among them, the highest amounts (2,289) are those having signal transduction mechanisms (Figure 2A). Overall, 863 unigenes were predicted to be involved in the biosynthesis, transport, and catabolism of secondary metabolites (Figure 2A). Among them, 20 unigenes encoding enzymes were related to the isoflavonoid pathway, including 1 IFS, 14 CHS, 2 CHR, and 3 CHI genes (Supplementary Table 2).
In particular, KEGG pathway analyses showed that these upregulated genes were significantly enriched in the phenylpropanoid pathway and flavonoid pathway (Figures 2B,C). It was speculated that the accumulation of a higher amount of isoflavonoids resulted from the higher expression of entire pathway genes, like CHS, CHI, IFS, and so on, which should be coordinately regulated by some unknown transcription factors. Therefore, this study mainly focused on the discovery of transcription factor genes, specially MYB, bHLH, and WD40 genes.

Identification and Classification of MYBs in the Isoflavonoid Biosynthetic Pathway
In this study, 105 MYB genes encoding transcription factors from kudzu roots were identified and confirmed by their conserved domains (Supplementary Table 3). These putative MYB transcription factors were classified into seven different super-families, including DNA_pol_phi superfamily, GAT_SF superfamily, H15 superfamily, Myb_CC_LHEQLE superfamily, SANT superfamily, SKIP_SNW superfamily, and VHS_ENTH_ANTH superfamily (Supplementary Table 3). Among them, the SANT superfamily is the largest superfamily with 65 unigenes (Supplementary Table 3).
R2R3-MYB transcription factors of the SANT superfamily in Arabidopsis were divided into 25 different subgroups, and the members from Sg4, Sg5, Sg6, Sg7, and Sg15 subgroups were reported to be involved in the regulation of anthocyanin and proanthocyanidin biosynthesis (Stracke et al., 2001;Hichri et al., 2011). In kudzu, two MYB genes PlMYB4-1 (comp739_c0_seq1) and PlMYB4-2 (comp41186_c0_seq1) were classed in the Sg4 subgroup, and they shared 34% identity with AtMYB32 and 88% identity with AtMYB4, respectively. The available transcript of PlMYB4-1 (204 bp) and PlMYB4-2 (1,110 bp) were predicted to encode a truncated peptide and a full-length protein of 370 amino acid residues, respectively.

Identification and Classification of bHLH Genes in the Isoflavonoid Biosynthetic Pathway in Kudzu
A total of 111 bHLH unigenes were predicted in the transcriptome of kudzu roots, and they were classified into corresponding subgroups by sequence identity comparison with orthologs in Arabidopsis (Supplementary Table 4). In particular, four kudzu bHLH proteins were grouped into Subgroup III, the ortholog of which was involved in flavonoid biosynthesis in Arabidopsis (Heim et al., 2003). PlbHLH3-1 (Comp36398_c2_seq1, 201 bp) encoding protein shared the highest identity at amino acid level with AtbHLH93 (62%) that belongs to Subgroup IIIb. The other three were assigned to Subgroup IIId. Among them, both PlbHLH3-2 (comp37021_c0_seq1, 474 bp) and PlbHLH3-3 (comp37021_c1_seq1, 1,332 bp) shared 63% and 45% identity with AtbHLH13, respectively. PlbHLH3-4 (comp45038_c0_seq1, 1,539 bp, Genbank Accession No. KT236099) shared the highest identity (48%) with AtbHLH3.
Sequence alignments of the deduced PlbHLH3-1, PlbHLH3-2, PlbHLH3-3, and PlbHLH3-4 proteins with other representative bHLH proteins involved in the flavonoid pathway showed that only PlbHLH3-4 had the intact bHLH-MYC_N region ( Figure 4A), which is usually present in the N-terminal of bHLH transcription factors regulating phenylpropanoid biosynthesis (Marchler-Bauer et al., 2015). The bHLH-MYC_N domain commonly has the specific DNA-binding ability attributed to the amphipathic affinity of its N-terminus. This was especially apparent in the critical His-Glu-Arg (H-E-R) residues located at positions 5, 9, and 13 in the basic region for PlbHLH3-3 and PlbHLH3-4 ( Figure 4A), which could bind to DNA target as reported previously (Atchley and Fitch, 1997;Massari and Murre, 2000;Toledo-Ortiz et al., 2003). The bHLH-MYC_N region was composed of two hydrophobic α-helices linked by a divergent loop, but only PlbHLH3-3 and PlbHLH3-4 contained an intact domain in this region ( Figure 4A).
The phylogenic relationship showed that PlbHLH3-3 and PlbHLH3-4 were separated from the other bHLH transcription factors regulating the anthocyanin pathway, such as AtTT8 from Arabidopsis and Lc from maize ( Figure 4B), suggesting that they are likely involved in other flavonoid branch pathways, e.g., isoflavonoid pathway.

Identification and Classification of Unigenes Encoding WD40 Repeat Domain Proteins in Kudzu
The repeat proteins WD40 are key components in the MBW complex, therefore, we searched and identified a total of 143 unigenes encoding WD40 repeat domain proteins in the transcriptome of kudzu roots (Supplementary Table 5). They were grouped into 16 different super-families, and the WD40 super-family was the largest super-family (96 members, Supplementary Table 5). Among all these predicted WD40 repeat domain proteins, the deduced amino acid sequence of PlWD40-1 (comp42449_c1_seq1_24, Genbank Accession No. AKR04124.1) showed the highest identity (64%) to Transparent testa glabra 1 (AtTTG1) from Arabidopsis, 62% to MtWD40-1 from M. truncatula, and 61% to PhAN1 from P. hybrid. The WD40 repeat domains are highly conserved among these WD40 proteins ( Figure 5A). Phylogenetic analysis showed that PlWD40-1 is most closely related to ZmPAC1 that regulates anthocyanins production in Zea mays ( Figure 5B). As these WD40 orthologs are involved in the flavonoid pathway, the high identity and close phylogenetic relationship of PlWD40-1 with them suggested that PlWD40-1 may have a similar function in the isoflavonoid pathway in kudzu.

Expression of Key Transcription Factor Genes Was Closely Correlated With Total Flavonoid Accumulation in Kudzu
To further screen candidate transcription factors that play key roles in the isoflavonoid pathway, the association between flavonoid accumulation level and the expression level of candidate genes was investigated. We found that the total flavonoid level was relatively higher in leaves than in roots or stems in both two kudzu genotypes ( Figure 6A). Furthermore, total flavonoid content was higher in the roots of No. 1 than in No. 2, which was essentially contributed by puerarin as confirmed on high performance liquid chromatography (HPLC) in the present study (Figure 1). By contrast, total flavonoids were the lowest in stems in both kudzu genotypes ( Figure 6A). Accordingly, the transcript level of PlMYB1 determined by qPCR was relatively higher in leaves than in other tissues of both kudzu plants ( Figure 6C). Notably, the transcript level of PlMYB1 is exactly consistent with the accumulation levels of total flavonoids in various tissues. This result implied that PlMYB1 was likely involved in the regulation of isoflavonoid biosynthesis. By contrast, the PlMYB4-2 gene was highly expressed in the roots of both kudzu genotypes, with a very low level in stems and leaves (Figure 6E), implying a weaker association with total flavonoid accumulation.
It was revealed that PlbHLH3-4 was expressed at the highest level in leaves of No. 1 kudzu genotype plants, but it accumulated at a low level in the leaves of the No. 2 kudzu genotype ( Figure 6G), which is far more different from the total flavonoid accumulation pattern. Same as PlbHLH3-4, the transcript level of PlWD40-1 was relatively high (more than 8-fold than in other tissues) in leaves of the No. 1 kudzu plant, but lower in other tissues of the No. 1 kudzu plant ( Figure 6I, left), implying less correlation with total flavonoid accumulation.

Key Transcription Factor Genes Were Consistently Associated With Puerarin Content in Cell Cultures
Puerarin is the predominant isoflavonoid compound in kudzu. It was revealed that the cell culture produced from the No. 1 kudzu genotype plant accumulated an evident amount of puerarin, and its content was not significantly affected by sugar or naphthyl acetic acid (NAA) concentration, or the presence of SA, MeJA, or light, but by the pH value of medium (Supplementary Figure 3).   In comparison with the control pH value of 5.8, the puerarin content doubled when the pH value dropped to 4.8, whereas the puerarin content reduced about 6 folds when the pH value was increased to 6.8 (Figure 6B, right).
Quantitative PCR analyses showed that the transcript level of PlMYB1 was relatively higher at a pH value of 4.8 than at 5.8 or 6.8 ( Figure 6D). The transcript profile of PlMYB1 is consistent with the accumulation levels of puerarin in cell cultures grown under different pH conditions. This result implied that PlMYB1 is likely involved in the regulation of isoflavonoids, in particular puerarin biosynthesis. However, the expression level of PlMYB4-2 was not affected by the pH value of the cell culture medium (Figure 6F, right), suggesting a weaker association with puerarin biosynthesis. Similar to PlMYB1, the expression level of PlbHLH3-4 in cell culture was increased at a pH value of 4.8 and decreased at a pH value of 6.8 as compared to the control at a pH value of 5.8 ( Figure 6H). In addition, the transcript level of PlWD40-1 was also higher at a pH value of 4.8 but lower at a pH value of 6.8 when compared with that of control at a pH value of 5.8 (Figure 6J, right).
Taken together, PlMYB1, PlbHLH3-4, and PlWD40-1 were highly expressed in cell culture under a pH value of 4.8, and their expression patterns were highly correlated with levels of puerarin in cell cultures under various pH treatments, suggesting they might cooperate as an MBW complex to regulate the isoflavonoids, e.g., puerarin biosynthesis, under different pH treatments. Especially, the expression pattern of PlMYB1 matched very well with the accumulation of puerarin, implying it is likely a key player in the MBW complex.

Subcellular Localization of PlMYB1
To validate the function of the putative transcription factor in the regulation of the isoflavonoid pathway, the PlMYB1 gene was successfully cloned for further characterization. The open reading frame (ORF) of PlMYB1 was fused with green fluorescent protein (GFP) at the C-terminus and transferred into Arabidopsis leaf protoplasts. Green fluorescence signals for PlMYB1:GFP were detected in the nucleus (Figure 7A), which was evidently distinct from that of the control GFP in the cytosol ( Figure 7B). This result indicated that PlMYB1 is localized in the nucleus as a transcription factor to exert its function.

In vivo Functional Characterization of PlMYB1 in Arabidopsis
To further determine the regulatory function of PlMYB1, it was also over-expressed in the wild-type Arabidopsis. The expression levels of PlMYB1 were confirmed by qPCR analysis in different lines ( Figure 8A). Quantitative analysis revealed that anthocyanin levels increased by more than 0.4, 1.4, and 0.9-fold in rosette leaves of three transgenic lines as compared to the wild-type control (Figure 8B and Supplementary Figure 4A). In addition, soluble and insoluble proanthocyanidins in the mature seeds of these transgenic lines increased from 0.6-to 1.5-fold and 0.7-to 1-fold than the wild-type control, respectively ( Figure 8C and Supplementary Figure 4B).
To further investigate the effects of PlMYB1 on the expression of anthocyanin/proanthocyanidin pathway genes, the expression levels of several key pathway genes were determined in rosette leaves by qPCR analysis (Figures 8D-I). For the early pathway genes of AtCHS and AtF3H, they were both increased more significantly in the No. 7 transgenic line (Figures 8D,E). Especially, it showed that the expression levels of three later pathway genes AtDFR, AtANS, and AtANR genes were highly increased in these transgenic lines (Figures 8G-I). Notably, the expression level of AtFLS was significantly decreased in the transgenic lines than that in the wild-type control ( Figure 8F). Taken together, these data indicated that the over-expression of PlMYB1 activated the expression of anthocyanin/proanthocyanidin pathway genes, and consequently increased the accumulation of these compounds.

DISCUSSION
Pueraria lobata is a legume plant endemic to China, which is well known for its special accumulation of health-beneficial isoflavonoid compound of puerarin in the roots. In a previous study, we found that a kudzu genotype produces high puerarin content, while another genotype has low puerarin content in the roots (He et al., 2011; Figure 1). In the present study, we explored the transcriptomes of roots from two previously established kudzu plants. Transcriptome comparative analysis revealed several transcription factor genes were differentially expressed in two kudzu genotypes. Among them, the expression of PlMYB1 showed a close correlation with the biosynthesis of puerarin, and it was able to significantly increase the accumulation of anthocyanins/proanthocyanidins in transgenic Arabidopsis through activating AtDFR, AtANR, and AtANS (Figure 8).

PlMYB1 Regulates Flavonoid Pathway in Kudzu and Arabidopsis
It is well known that flavonoid and isoflavonoid pathway genes were mainly regulated by the MBW complex comprising of MYB, bHLH, and WD40, and the regulatory function of MBW is conserved in most plants (Goodrich et al., 1992). In the MBW complex, MYB transcription factors are the major player, and they were classified into distinct groups. The MYBs in the Sg4 subgroup share the ERF-associated amphiphilic repression (EAR) motif core (Goodrich et al., 1992). The Sg4 subgroup members were involved in stress responses and plant evolution (Bedon et al., 2010), and also acted as a repressor factor of phenolic acid metabolism and lignin biosynthesis (Zhao et al., 2013). The MYBs members in Arabidopsis (AtMYB3, AtMYB4, AtMYB7, and AtMYB32) of the Sg4 subgroup are able to repress the biosynthesis of polyphenols by interacting with bHLH proteins (Zhao et al., 2013). AtMYB4 has been shown to be a transcriptional repressor involved in the inhibition of genes in the polyphenol biosynthetic pathway, such as the Cinnamate 4-Hydroxylase gene (C4H) (Jin et al., 2000).
Our study showed that two MYB members of the Sg4 subgroup were present in kudzu, namely PlMYB4-1, and PlMYB4-2. Among them, the expression pattern of PlMYB4-2 was not consistent with the accumulation profiles of puerarin ( Figures 6E,F), therefore, PlMYB4-2 is unlikely related to puerarin accumulation in kudzu. By contrast, PlMYB1 showed a high expression level in roots and leaves in both kudzu plants, which is completely consistent with the accumulation pattern of total flavonoid in various tissues of kudzu.
Moreover, a previous study demonstrated that pH condition is a key regulatory factor in flavonoid biosynthesis. It was found that treatment with a high medium pH value induced a dramatic decrease in the concentration of cyanidin in crabapple leaves , whereas high medium pH values increased the content of flavones and flavonols . Several MYB TFs have been suggested to be involved in the regulation of pH responses . In our study, we found that low pH treatment increased puerarin content in kudzu cell cultures, and the transcript level of the PlMYB1 gene was also accelerated (Figure 6). Particularly, PlMYB1 was evidently increased under low pH value treatment and significantly decreased under high pH value treatment (Figure 6), which is consistent with the accumulation levels of puerarin under various pH value conditions. Therefore, PlMYB1 possibly affected puerarin contents via regulating the transcript level of downstream key structural genes. Furthermore, the ectopic over-expression of PlMYB1 led to significant increases in the expression level of several pathway structural genes, e.g., AtDFR, AtANS, AtANR, Figure 8), and accordingly increased the content of anthocyanins and proanthocyanidins in Arabidopsis. These results demonstrated that PlMYB1 functioned as an activator for the anthocyanins/proanthocyanidins branch. Notably, the transcript level of FLS was reduced significantly (Figure 9), which might block the flavonol branch and switch the flux to the anthocyanidins/proanthocyanidins branch as illustrated in Figure 9, indicating PlMYB functioned as a repressor for the flavonol branch. Therefore, PlMYB1 acted dual role as activator and repressor in the flavonoid biosynthetic pathway in Arabidopsis, which is similar to VviMYB86 which oppositely regulates different flavonoid subpathways in grape berries (Cheng et al., 2020).

PlbHLH3-4 and PlWD40-1 Co-expressed With PlMYB1 Under Various pH Treatments
Most bHLH proteins can interact with R3 repeat domains of MYB proteins at the N-terminal acidic region to form the MYB-bHLH complex which frequently occurred in flavonoid biosynthetic pathways (Zhao et al., 2013). The bHLH members in subgroup IIIf were involved in anthocyanin biosynthesis (Spelt et al., 2000), seed coat differentiation, trichome formation, and root hair formation (Nesi et al., 2001). In addition, bHLH genes in subfamily 2 were found to generally respond to wounds, insects, drought, oxidative stress, jasmonic acid, abscisic acid, and chitin, but they also are able to regulate anthocyanin metabolism (Carretero-Paulet et al., 2010). AtbHLH13 is a member of subfamily 2 (Heim et al., 2003) and subgroup IIId (Song et al., 2013) as well.
In this study, three bHLH genes in kudzu, namely PlbHLH3-2, PlbHLH3-3, PlbHLH3-4, belong to the subfamily 2. They showed high similarities to AtbHLH13, with 63%, 65%, and 48% identity at amino acid level, respectively. The expression patterns of the PlbHLH3-4 gene were well consistent with total flavonoid content in the roots and leaves of high puerarin kudzu, but were inconsistent with that in low puerarin kudzu, implying PlbHLH3-4 might be involved in isoflavonoid biosynthesis but is not the determinant factor in kudzu. Interestingly, the expression of the PlbHLH3-4 gene was evidently increased under low pH medium and significantly decreased under high pH conditions, which is completely consistent with the accumulation of puerarin, suggesting PlbHLH3-4 was possibly responsible for the biosynthesis regulation of puerarin under various pH conditions. Moreover, PlbHLH3-4 was found to be localized in the nucleus as PlMYB1 at the subcellular level (Supplementary Figure 5), implying that it is likely to interact with PlMYB1 in the nucleus as a major regulator for the flavonoid pathway.
Except for bHLH, Arabidopsis WD40 like AtTTG1 also plays a key role in the regulation of flavonoid biosynthesis. AtTTG1 plays an important part in the regulation of AtDFR, AtANS, and AtANR by interacting with bHLH transcription factors (AtGL3, AtEGL3, or AtTT8) and MYB transcription factors (AtPAP1, AtPAP2, AtMYB113, or AtMYB114) in the MBW complex. This ternary MBW complex was known for controlling proanthocyanidin accumulation in seeds and anthocyanin accumulation in leaves (Walker et al., 1999;Payne et al., 2000;Baudry et al., 2004;Hichri et al., 2011). MdTTG1 identified from M. domestica was capable of fully replacing AtTTG1 to activate AtBAN promoter in cooperation with AtTT2 and AtTT8 in a co-transfection system (An et al., 2012). In M. truncatula, the deficiency of MtWD40-1 expression strongly suppressed the expression of flavonoid structural genes and thus blocked the accumulation of a range of flavonoid compounds (Pang et al., 2009).
In the present study, kudzu PlWD40-1 showed high sequence similarity with MdTTG1 and MtWD40-1 (Figure 5). The expression profiles of the PlWD40-1 gene were less correlated with the accumulation of total flavonoids in either high or low puerarin kudzu plants. However, the expression of the PlWD40-1 gene was highly consistent with the puerarin accumulation under various pH treatments, suggesting PlWD40-1 possibly involved in the biosynthesis regulation of puerarin under various pH conditions. As PlbHLH3-4 and PlWD40-1 showed very high similarity with AtTT8 and AtTTG1 of Arabidopsis, and they displayed similar transcript profiles to PlMYB1 under various pH treatments (Figure 6), thus PlMYB1, PlbHLH3-4, and PlWD40-1 might form an MBW complex to regulate the accumulation of isoflavonoids in kudzu. In particular, PlbHLH3-4 and PlWD40-1 possibly play key roles in the isoflavonoid biosynthesis under various stresses like pH stimuli.
In summary, PlMYB1 acted as a potent transcript factor to regulate the production of various flavonoids/isoflavonoids. The expression profile of the PlMYB1 gene was significantly consistent with the total flavonoid level in kudzu. Furthermore, expression levels of PlMYB1 together with PlbHLH3-4 and PlWD40-1 were consistent with the puerarin content under various pH treatments. Therefore, it is reasonable to speculate that PlMYB1, PlbHLH3-4, and PlWD40-1 should cooperate together to finely tune the production of various isoflavonoids in kudzu. Overexpression of PlMYB1 induced a significant increase of anthocyanins/proanthocyanidins as well as related biosynthesis pathway genes. Our investigation could shed some light on the regulation network of isoflavonoid biosynthesis in kudzu and provide a potential gene target for the bioengineering of particular flavonoids in plants.

Transcriptome Sequencing and de novo Assembly
The roots of the two previously reported kudzu plants were propagated and collected separately at 7, 14, and 21 days after rooting. Kudzu plant of No. 1 accumulates puerarin, but No. 2 does not (He et al., 2011). Root materials were immediately frozen in liquid nitrogen (LN) and stored at -80 • C prior to further analysis. Total RNAs were extracted with Tri-reagent according to the protocol of the manufacturer (Invitrogen, Waltham, MA, United States), followed by cleaning and purification with the DNase I. Equal amount of RNA from 7-, 14-, and 21-day-old root samples from Nos. 1 and 2 kudzu plant were pooled together, respectively, for sequencing with a biological triplicate. Poly(A) mRNA was purified from total RNA with polyoligo d(T) attached magnetic beads and then broken into short fragments, which were used as templates for double-stranded cDNA synthesis using random hexamer primers. The double-stranded cDNAs were purified, connected with sequencing adapters, and were separated by gel electrophoresis (Liuyi, Beijing, China). The purified doublestranded cDNAs with an average insert size of 400 bp were sequenced by the Illumina sequencing platform (San Diego, CA, United States). Reads were then assembled into contigs using Trinity software.

Functional Annotation of Unigenes
Functional annotations of the unigenes were performed by alignment of the assembly with unigenes against the NCBI Nr, SwissProt (UniProt Consortium, Switzerland), KOG 1 , and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases 2 using BLASTX (E-value < 10 −5 ). The proteins from the Nr database with the highest hits to the unigenes were used to assign functional annotations. GoPipe software was used to analyze the GO annotations and GO functional classifications (Chen et al., 2005) (IGDB-CAS, Beijing, China).
The expression levels of unigenes were calculated using the reads per kb per million reads (RPKM) method, which eliminates the influence of different gene lengths and sequencing discrepancies (Mortazavi et al., 2008). Thus, this method can be used directly for comparing the differences in gene expression levels between two types of P. lobata roots. The fold-change in each gene expression in the two samples was calculated from the ratio of the RPKMs. In this study, differentially expressed genes (DEGs) were screened with an absolute value of log 2 ratio > 2 and a threshold of false discovery rate (FDR) value lower than 0.005 (Wang et al., 2010). The identified DEGs were mapped to each term of the GO database 3 and we calculated the gene numbers in each GO term. In addition, DEGs were also used in pathway enrichment analysis by calculating the gene numbers which mapped to KEGG in each pathway (Kanehisa et al., 2010).

Sequence Analysis
Alignments were performed using Clustal W algorithm-based AlignX module (UCD, Dublin, Ireland). The rooted trees were constructed using the ML method with MEGA X Software (Kumar et al., 2018). Tree nodes were evaluated by bootstrap analysis for 1,000 replicates (pairwise deletion, uniform rates, and Poisson correction options). Evolutionary distances were computed using the p-distance method and expressed in units of amino acid differences per site. All positions containing gaps and missing data were eliminated prior to the construction of phylogenetic trees.

Cloning and Expression of Candidate Genes in P. lobata
Putative transcription factor genes were cloned from the roots of P. lobata No. 1. First-strand cDNA was synthesized from total RNAs of roots using FastQuant RT Kit (TIANGEN, China). The primary PCR was performed using cDNA from the roots and the PCR conditions were as following: 5 min of initial denaturation at 94 • C, followed by 94 • C for 30 s, 54 • C for 30 s, and 72 • C for 1 min in a 35-cycle reaction, and a final elongation step of 72 • C for 7 min. Primers sequences were listed in Supplementary Table 6.
Quantitative PCRs were performed on a Bio-Rad CFX96 TM Real-time PCR System using SYBR Real Master Mix (Kangwei, China). The Arabidopsis pp2A gene (accession No. U39568) was used as an internal reference gene for the calculation of relative transcript levels. Primers used for genes from kudzu were listed in Supplementary Table 6, and primers for genes from Arabidopsis were the same as in our previous study (Su et al., 2020). Each reaction (20 µl) contained 1 µM each primer, 1 µl cDNA (1:10 diluted), 7 µl RNase-free H 2 O, and 10 µl PCR buffer of SYBR Real Master Mix. Thermal cycling conditions were as follows: pre-incubation at 95 • C for 10 min, followed by 95 • C for 20 s, 60 • C for 30 s, and 72 • C for 20 s for 40 cycles. Data were calculated from three biological replicates, and each biological replicate was examined in triplicate.

Subcellular Localization Analysis of Candidate genes
The coding region of transcription factor genes was amplified with prime pairs containing Sal I and Bam HI restriction sites, respectively (Supplementary Table 6). The resulting amplification product was digested and ligated to the same enzyme digested destination vector PJIT163hGFP. After confirmation by sequencing, the recombinant constructs were used in Arabidopsis protoplast transformation as described previously (Sheen, 2002). GFP fluorescence in Arabidopsis protoplast cells was detected by laser scanning confocal microscopy using Leica TCS SP5 (Wetzlar, Germany). The emission was collected for GFP from 500 to 560 nm, and for the chlorophyll from 605 to 700 nm.

Establishment and Treatment of P. lobata Cell Suspension Culture
Stems of kudzu plant No. 1 were surface sterilized in 75% (v/v) ethanol for 1-2 min, followed by three washes in sterile distilled water, 15 min in 10% hydrogen peroxide, and another three washes in distilled water. Then the axenic stems were cut into pieces in 1 cm length and planted on MS basal medium (pH = 5.8) with 3% sucrose, 0.8% agarose, 1 mg/L L-NAA, and 2 mg/L 6benzylaminopurine (BA). Two weeks later, the emerged calli were transformed to B5 liquid medium (pH = 5.8), supplemented with 1 mg/L 2, 4-dichlorophenoxyacetic acid (2, 4-D), 1 mg/L NAA, 0.5 mg/L kinetin (KT), and 1% casein hydrolysate. Calli were then sub-cultured every week and incubated at 25 ± 2 • C with a 16/8 h photoperiod. Cell culture was incubated in a flask on a rotary shaker (110-130 rpm) under the same photoperiod at 25 • C. After about 1 month, soft, loose, and pale green calli were obtained. Once established, cultures were periodically sub-cultured into 100 ml flasks by transferring 15 ml 7-day-old cells into 40 ml fresh B5 liquid medium for treatment. For the treatment with salicylic acid (SA, 0.1 mg/L) and methyl jasmonate (MeJA, 1 mg/L), the cell cultures were collected at time points of 0, 2, 4, 8, 16, 24, 48, and 72 h. The fresh samples were collected and used for flavonoid analysis on HPLC with a triplicate.

Generation of Transgenic Arabidopsis Plants
To produce PlMYB1-overexpressing Arabidopsis plants, the 651bp CDS (coding sequence) fragment was amplified by PCR, and then cloned into the binary vector pCXSN at the Xcm I site for gene over-expression in the wild-type A. thaliana. The resulting sequenced pCXSN-PlMYB1 construct was transformed into Agrobacterium strain GV3101 and used to generate transgenic A. thaliana plants by using the inflorescence dip method (Clough and Bent, 1998). The transgenic A. thaliana plants were screened on mass spectrometry (MS) medium supplied with hygromycin (30 mg/L). The 30-day-old rosette leaves and seeds of T 3 generation homozygous plants were collected for further analyses.

Analyses of Flavonoid Compounds
For the extraction and quantification of total flavonoids, plant materials were all ground into powder and freeze-dried. Ten milligram dry powders were extracted by sonication for 30 min with 500 µl of 80% methanol in biological triplicates. After an additional overnight extraction at 4 • C, the extracts were centrifuged at 12,000 rpm for 10 min. Deionized water amounting to 400 µl and 30 µl 5% NaNO 2 was added to every 100 µl supernatant, followed by the addition of 30 µl 10% AlCl 3 after 5 min. Later, 200 µl of 1 M NaOH were added 10 min, followed by the addition of 240 µl deionized water to make the final volume to 1 ml. The absorbance at 510 nm was measured using a spectrophotometer with quercetin as standard (BIO-RAD, CA, United States).
The above methanolic extracts were also applied for the identification and quantification of isoflavonoids on HPLC. The analyses were carried out using an Agilent 1260 chromatographic system (Santa Clara, CA, United States) equipped with a quaternary pump, an autosampler, a photodiode array detector, and Eclipse XDB-C18 reverse-phase column (4.6 mm × 250 mm, 5 µm). Flavonoids were separated with a linear eluting gradient (5-70% solvent B over 30 min) with solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in acetonitrile) at a flow rate of 1 ml/min and detected at 254 nm. The synthetic standard used in this study were all purchased from Sigma-Aldrich (Darmstadt, Germany).
Anthocyanins were extracted in acidified MeOH (0.1% HCl) overnight in the dark at 4 • C, followed by sonication for 30 min. After centrifugation at 12,000 rpm for 10 min, the supernatant was mixed with the same volume of water and extracted with chloroform. The supernatant was then measured at 530 nm using a spectrophotometer. Three independent replicates were collected for each infiltration.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The RNA-Seq data presented in the study are deposited in the NCBI SRA repository with accession number PRJNA747842. The sequences of PlMYB1, PlbHLH3-4, and PlWD40-1 genes are deposited at Genbank with accession Nos. KR698796, KT236099, and AKR04124.1, respectively.