Genome-Wide Identification and Comparative Analysis of WRKY Transcription Factors Related to Momilactone Biosynthesis in Calohypnum plumiforme

Momilactones are diterpenoid phytoalexins with allelopathic functions, which have been found in the widely distributed bryophyte Calohypnum plumiforme. Clustered genes containing CpDTC1/HpDTC1, CpCYP970A14, CpCYP964A1, and CpMAS are involved in momilactone biosynthesis. Besides, momilactone concentration in C. plumiforme is affected by heavy metal treatment such as CuCl2. However, transcription factors which might regulate momilactone biosynthesis are unclear. WRKY transcription factors (TFs) regulate phytoalexin biosynthesis in many plant species. In this study, a systematic analysis of the WRKY TFs was performed according to the C. plumiforme genome. A total of 19 CpWRKY genes were identified and categorized into five subgroups based on their phylogenetic relationship. Conserved domain and motif analysis suggested that the WRKY domain was highly conserved, but there were some variations. Cis-acting elements and binding sites analysis implied that CpWRKY genes might be induced by stress and further regulate the biosynthesis of momilactones. Our study lays a foundation for further functional characterization of the candidate CpWRKY genes involved in the regulation of momilactone biosynthesis, and provides new strategies for increasing momilactone production.


INTRODUCTION
Calohypnum plumiforme (formerly Hypnum plumaeforme) is a widely distributed bryophyte in East Asia, which has been broadly used as a bio monitor to trace metals in the atmosphere (Okada et al., 2016;Li et al., 2020). Besides, C. plumiforme was used in traditional herb medicine to treat burns and epistaxis . Recently, it was reported that the moss C. plumiforme owns allelopathic activity by suppressing the growth of plants around, mainly due to the production of chemical defense compound momilactones (Kato-Noguchi and Kobayashi, 2009;Okada et al., 2016;Mao et al., 2020). Momilactone A and B were firstly found in the cultivated rice Oryza sativa, exhibiting remarkable antimicrobial and allelopathic functions (Kato-Noguchi et al., 2010;Toyomasu et al., 2014;Mao et al., 2020), which are biosynthesized through the biosynthetic gene clusters (BGCs) (Kato et al., 1973;Nützmann et al., 2018). Recently, the C. plumiforme genome was sequenced and annotated, with a genome size of 335 Mb (Mao et al., 2020). Notably, genome sequencing revealed a functional BGC responsible for momilactone biosynthesis in C. plumiforme (Mao et al., 2020).
Stress treatment is an efficient strategy to promote the production of momilactones A and B in C. plumiforme. For example, concentration of momilactone A and B in C. plumiforme in UV-irradiated condition was 14-and 15fold greater than that of non-UV irradiated (Kato-Noguchi and Kobayashi, 2009). Jasmonic acid-treated C. plumiforme produced higher content of momilactone A and B than in non-treated group (Kato-Noguchi and Kobayashi, 2009). CuCl 2treated C. plumiforme produced more than 6-folds momilactone A and B than the control (Kato-Noguchi and Kobayashi, 2009), along with expression level of the four clustered genes much higher than the untreated samples (Mao et al., 2020). Previous studies have shown that different transcription factors (TFs) were induced under CuCl 2 treatment including WRKY TFs (Okada et al., 2016). Meanwhile, we found that variable W-box existed in the promoters of clustered genes.
WRKY is the largest transcription factor family in plants, which are involved in responses to biotic and abiotic stress (Eulgem et al., 2000;Rushton et al., 2010;Phukan et al., 2016). WRKY TFs is named because of its highly conserved WRKY domain consist of N-terminal WRKYGQK domain and a C-terminal zinc finger structure (Eulgem et al., 2000;Eulgem and Somssich, 2007). According to the number of WRKY domains and the type of zinc finger structure, the WRKY family can be divided into three groups: Group I contains two WRKY domains and a C 2 H 2 zinc finger structure; Group II contains a WRKY domain and a C 2 H 2 type zinc finger structure; Group III harbors a WRKY domain and a C 2 HC zinc finger structure (Zhang and Wang, 2005). Most WRKY TFs bind to W-box (TTGACT/C), which located in the promoters of many biosynthetic and defense-associated genes (Eulgem et al., 2000;Rushton et al., 2010). WRKY genes are involved in secondary metabolism and disease resistance (Ryu et al., 2006;Yokotani et al., 2013;. For example, ZmWRKY79 promoted the accumulation of maize terpenoid phytoalexins through binding to biosynthetic gene promoters (Fu et al., 2018). OsWRKY53 was phosphorylated by the OsMKK4-OsMPK 3/OsMPK6 cascade and positively regulated the diterpenoid phytoalexins (DP, momilactones and phytocassane, et al.) biosynthetic genes (Chujo et al., 2014). Overexpression of phosphomimic mutated OsWRKY53 increased the resistance of rice to blast fungus (Magnaporthe grisea) (Chujo et al., 2014). Nevertheless, OsWRKY76 is a negative transcription factor of DP biosynthesis, which playing opposite roles in blast disease resistance (Bagnaresi et al., 2012;Yokotani et al., 2013). Overexpression of OsWRKY45 in rice plants showed extremely strong resistance to Magnaporthe oryzae, and further study revealed that the overexpression of OsWRKY45 upregulated the DP biosynthetic genes such as KSL4 and CYP99A2 (Akagi et al., 2014).
Although WRKY transcription factors have been identified in many plant species, information on WRKY TFs in C. plumiforme has not been reported. In this study, WRKY proteins in bryophytes C. plumiforme were identified using bioinformatic methods based on the genomic information, and classification and evolutionary relationships, conserved motif and gene structure, cis-acting elements and binding sites were systematically analyzed. This study provides available data to reveal the biological function of CpWRKY proteins, and lays a theoretical foundation for the discovery of potential CpWRKY proteins that regulates the biosynthesis of momilactones.

Gene Identification and Characteristics
The WRKY hidden Markov model was obtained from Pfam database (PF03106), 1 which was used to search all putative WRKY genes from the C. plumiforme genome database (GSA accession: PRJCA001833) using Hmmsearch tools (Finn et al., 2011). Pfam (see text footnote 1), NCBI CDD, 2 and SMART 3 were used to verify the presence of WRKY domain (Marchler-Bauer et al., 2015;Finn et al., 2016). Proteins without WRKY conserved domains were removed and 19 WRKY proteins were retained for further analysis. Additionally, the theoretical isoelectric point (pI) and molecular weight (MW) of the CpWRKY proteins were calculated using the ExPASy 4 (Gasteiger et al., 2003). WolF PSORT II 5 was used to predict the subcellular localization of CpWRKY protein .

Multiple Sequence Alignment and Phylogenetic Analyses
WRKY protein sequences in Arabidopsis thaliana were downloaded from UniProt protein database, 6 named as AtWRKY1-AtWRKY 70 (Ding et al., 2020). Subsequently, the Clustal W multiple sequences alignment of CpWRKY (19) and AtWRKY (70) proteins was performed using MEGA6 software. The phylogenetic tree was constructed by using Neighbor-Joining method with bootstrap value as 1,000 (default for other parameters) (Tamura et al., 2013). The display of the phylogenetic tree was performed by Interactive Tree of Life (iTOL) 7 (Letunic and Bork, 2007). A multiple sequence alignment of AtWRKY and CpWRKY proteins was proformed with DNAMAN software.

Gene Structure and Motif Composition Analysis
Conserved motifs of CpWRKY proteins were analyzed through the MEME online tools, 8 and the number of motifs was set to 10 (the rest of the reference default) (Bailey et al., 2006). Exon-intron structure of CpWRKY genes was determined by comparing the genome sequence with the corresponding coding sequence (CDS), and visualized by software TBtools .

Analysis of Promoter cis-Acting Elements
The upstream promoter region sequences of CpWRKY genes corresponding to 3,000 bp were extracted from the genomic database of C. plumiforme, and then submitted to PlantCARE 9 for cis-acting elements annotation (Lescot et al., 2002). TBtools software was used to draw the location distribution map of related elements .

Analysis of Expression Pattern of CpWRKY Genes and Binding Sites in the Promoters of Momilactone-Biosynthesis Pathway Genes
Transcriptomic data (accession number PRJDB9779) of CuCl 2treated C. plumiforme (which was treated with 500 µM CuCl 2 for 8 h) were downloaded from the public database of NCBI for further analysis. Heatmap of CpWRKYs and momilactone biosynthetic gene cluster including CpDTC1, CpCYP964A1, CpCYP970A14 and CpMAS gene expression was constructed

Identification of CpWRKY Genes in the Calohypnum plumiforme Genome
In this study, 19 CpWRKY genes were screened out in the complete C. plumiforme genome. The physical and chemical properties of CpWRKY genes were analyzed by ExPASy, and the results showed that the number of amino acids ranged from 385 aa (CpWRKY15) to 1043 aa (CpWRKY9). The molecular weight ranged from 42.6 kDa (CpWRKY15) to 111.8 kDa (CpWRKY9). The isoelectric point results showed that 12 CpWRKY members were less than 7, which were acidic protein, and 7 CpWRKY members were greater than 7, which were alkaline protein.
The hydrophilic and hydrophobic results were all less than 0, indicating that CpWRKY members were hydrophilic proteins. The subcellular localization prediction results showed that 16 CpWRKY proteins were located in the nucleus (except CpWRKY2, CpWRKY4 and CpWRKY9) ( Table 1).

Phylogenetic Analysis and Classification of CpWRKY Genes
In order to investigate the phylogenetic and evolutionary relationship of WRKY genes in C. plumiforme, an unrooted phylogenetic tree was constructed (Figure 1). CpWRKY proteins were divided into five groups: IIa, IIb, IIc, IId, III. Notably, no WRKY proteins were found in group I and IIe. Group IIc has the most CpWRKY members, with high homology to AtWRKY8, AtWRKY28, AtWRKY71, AtWRKY57, AtWRKY23, AtWRKY48 and AtWRKY68. Multiple sequence alignment revealed that all the CpWRKY proteins contained the conserved "WRKYGQK" domain and C 2 H 2 or C 2 HC Zinc finger structure. Additionally, the WRKY domain of subgroup III was in WKKYGNK form (Figure 2). The zinc finger structure at the C terminal was slightly different among the five groups, such as subgroup IIa contained CX 4 CX 23 HXH, while the zinc finger structure of subgroup III is CX 4 CX 22 HXC (Figure 2).

Conserved Motif Compositions and
Gene Structure of CpWRKY Genes MEME analysis revealed 10 conserved motifs in CpWRKY proteins (Figures 3A,B,D). Motif 1 and motif 2 are two relatively conserved motifs, which exist in all five subgroups. Nevertheless, CpWRKY proteins in some subgroups have their own unique conserved motifs. For example, motif 8 only appears in IIa CpWRKY, motif 9 is a conservative motif unique to IIc; motif 5 and motif 6 occur only in the IId (Figure 3B).
In addition, all 19 CpWRKY family members had exon-intron structure, and the number of exons was variable, which ranged from 3 to 13. Subgroup IIc and IId members have 3-4 exons (except CpWRKY 9, which has 7 exons), while the CpWRKY family members in groups IIa, IIb, and III have 8 or 13 exons at  most ( Figure 3C). These results showed that the structure of the CpWRKY gene family was significantly changed, indicating that the genome of C. plumiforme underwent large variation selection in the long-term evolution process.

Cis-Elements Analyses of CpWRKY Genes
In order to further investigate the function and regulatory mechanism of CpWRKY genes, we analyzed the cis-elements in promoters of CpWRKY genes (Figure 4). The results showed that CpWRKY gene promoters contained cis-elements related to plant growth and development, hormones and stress, among which hormones-related cis-elements were the most (236, accounting for 61.5% of the total). Among CpWRKY genes promoter sequences, MeJA-responsive element (CGTCA motif and TGACG motif) accounted for the largest proportion (120, 31.3%). It can be speculated that CpWRKYs play important and complex roles in plant response to hormone regulation (especially MeJA). CpWRKY7 had the most cis-elements (32) among all the CpWRKY genes, suggesting that it was relatively active and had more biological functions. Eighteen genes contained one or more ABRE (ABA-responsive element), TGACG-motif and CGTVA-motif (MeJA-responsive element), suggesting that these genes may be involved in the MeJA and ABA signal regulation pathway. Twelve genes contained one or more LTR (low temperature response element) and MBS (drought induced response element), indicating that these genes may be involved in the response to temperature stress. CpWRKY9 has a specific HD-ZIP1 (palisade mesophyll cells), which may play a specific role in leaf development (Figure 4).

Expression Pattern of CpWRKY Genes and Binding Sites Analyses of Biosynthetic Gene Cluster
The expression patterns of BGCs and WRKY genes in the gametophytes of C. plumiforme treated with or without 500 µM CuCl 2 for 8 h were analyzed. After 8 h treatment with CuCl 2 , the expression levels of CpWRKY genes showed two patterns: CpWRKY1, CpWRKY3, CpWRKY4, CpWRKY5, CpWRKY6, CpWRKY7, CpWRKY9, CpWRKY10, CpWRKY11, CpWRKY12, CpWRKY13, CpWRKY17, CpWRKY18 and CpWRKY19 showed an upward trend, while CpWRKY2, CpWRKY8, CpWRKY14, CpWRKY15, and CpWRKY16 showed decreased expression (Supplementary Figure 1). Expression level of the clustered genes in momilactone biosynthesis was also elevated after CuCl 2 treatment (Supplementary Figure 2). Meanwhile, WRKY binding sites (W-box) were identified in the promoters of the BGCs. Two W-box elements existed in each of CpDTC1 and CpCYP970A14, four in CpCYP964A1 promoter, and five elements in CpMAS (Figure 5), suggesting that CpWRKY may play a regulatory role in momilactone biosynthesis.

Identification of CpWRKY Transcription Factors
WRKY TFs are a class of transcription factors that participate in many physiological and biochemical processes and exist widely in eukaryotes. With the development of high-throughput sequencing techniques, WRKY TFs have been identified in dozens of species (Di et al., 2021;Hao et al., 2021;Liu et al., 2021;Qu et al., 2021). Recent studies have shown that WRKY transcription factors are involved in the regulation of natural product biosynthesis (Yamada et al., 2021;. In this study, 19 WRKY TFs were excavated by analyzing the C. plumiforme genome. The number of WRKY transcription factors in C. plumiforme is much less than that of other higher plants, and the huge difference in the number is not only related to the genome of the species, but also related to the environmental influence of plants in the long-term evolution process (Rinerson et al., 2015).

Characterization of CpWRKY Transcription Factors
Phylogenetic analysis showed that CpWRKY proteins were divided into five groups: IIa, IIb, IIc, IId, III. Among the five groups, the IIc harbored the most CpWRKY proteins, which was consistent with the most WRKY members of IIc subgroup in A. thaliana. However, there was no WRKY protein in group I in C. plumiforme genome, which was different from higher plants. In some higher plants, such as Morus alba, Hordeum vulgare, and Sorghum bicolor, the C-terminal domain of WRKY proteins classified into the group I may be the source of all WRKY proteins (Rinerson et al., 2015;Liu et al., 2017;Mao et al., 2019;Zhao et al., 2020). The WRKY proteins in the C. plumiforme do not seem to originate from this source. The results of multiple sequence alignment showed that the WRKY domain of the two proteins in group III was mutated ("R" mutated to "K, " and "Q" mutated to "N"), which may affect the normal interaction between WRKY gene and W-box in downstream target genes, but it is also possible to gain the ability to identify other new motif, thereby obtaining new function (Zhou et al., 2008). Besides, MEME analysis showed that motif 1 and motif 2 was highly conserved, which may be part of the WRKY conserved domain. Motif 8 and motif 9 were two relatively specific motifs, which may enable some WRKY proteins to have specific functions. Gene structure analysis showed that 3-13 exons were found in the genetic structure of CpWRKY genes, which provides available information for the study of the evolutionary relationship of WRKYs between C. plumiforme and other plant species.

Possible Regulatory Function and Mechanism of CpWRKY Transcription Factors in Momilactone Biosynthesis
Previous studies have shown that the expression of momilactones biosynthesis genes in O. sativa and C. plumiforme can be induced by biotic and abiotic stress (Tamogami and Kodama, 2000;Rodrigues et al., 2004;Bi et al., 2007;Kato-Noguchi et al., 2007;Daw et al., 2008;Kato-Noguchi and Kobayashi, 2009;Kanno et al., 2012;Okada et al., 2016). CuCl 2 treatment increased the contents of momilactones and the expression level of momilactone biosynthesis genes. It is noteworthy that expression level of some CpWRKY genes was increased under CuCl 2 treatment (Okada et al., 2016). The analysis of cis-acting elements can provide data basis to analyze the relationship between stress and WRKY proteins (Li and Chen, 2015). CpWRKY family members contained cis-acting elements related to hormones and stresses, among which MeJA-responsive element accounted for the largest proportion, MeJA elicitation was efficient to enhance the momilactone A and B, and the exact function of CpWRKY on these clustered genes needs further study.
WRKY proteins specifically bind to the smallest motif TTGAC (C/T) in DNA called W-box (Eulgem et al., 2000). Most of the target gene promoters of WRKY TFs contain variable numbers of W-box, which are arranged in the same direction or form palindromic structures. WRKY TFs bind to them and regulate the expression of downstream functional biosynthetic genes or other transcription factors (Eulgem et al., 2000). Different amount of W-box was present in the BGCs genes (CpDTC1, CpCYP964A1, CpCYP970A14, and CpMAS). Given all that information, it is likely that CpWRKY TFs actively respond to CuCl 2 stress and stimulates the expression of momilactones biosynthetic genes by binding to W-box in the promoter regions of target biosynthetic genes, thereby increasing the content of momilactones.

CONCLUSION
In this study, 19 WRKY family members were identified from the genome of the C. plumiforme, and their phylogenetic evolution, conserved domains, gene structure and conserved motif, cisacting elements and binding sites were analyzed. These results provide valuable information for further elucidating the role of WRKY transcription factors in momilactones biosynthesis in C. plumiforme.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
GK and SG conceived and designed the project. YW, RZ, QH, and SZ analyzed the data. YW, RZ, and MS wrote the manuscript. YW, MS, GK, and SG revised the manuscript. All authors approved the final manuscript.